LLM Training Dashboard - (AI-Augmented Software Engineering)

Dashboard crest LLM Training Dashboard - (AI-Augmented Software Engineering)

Manage Training Workflow, 3D Visualizations, Ollama Integration and Model Artifact Clean Up
Architecture, Design and Development by Franz Ayestaran / Enhanced Pair Programming with Claude Code & OpenAI GPT

Loading workflow documentation...

📊 LLM Training Pipeline Presentation

Download the complete PowerPoint presentation covering the LLM creation and Ollama deployment pipeline

⬇️ Download PPTX

🚀 Train Your Model

📂 Upload Training Data

Upload a .txt, .json, or .jsonl training file. JSON datasets can use records like {"instruction": "...", "output": "..."}.

⚙️ Training Configuration

e.g. 0.00001 for 1e-5

LoRA Configuration

QLoRA uses bitsandbytes on NVIDIA CUDA and switches to an Apple-native MLX-LM path on Apple Silicon for real quantized adapter training. Recommended defaults: 5 epochs, batch size 4, QLoRA, native profile.
Native keeps each backend's preferred defaults. Comparable aligns sequence length, dataset packing, and adapter scope more closely so Apple and CUDA runs are easier to compare.
Keep this enabled for easiest inference and export. Disable it to save only the LoRA adapter and reduce disk usage.

🔄 Resume from Checkpoint (Optional)

Continue training from a saved checkpoint instead of starting from scratch.

Recommended resets the form to the dashboard baseline. Export saves the current dashboard settings as a versioned JSON config. Load applies a previously saved config back into the form and restores the referenced training file when available.

⚠️ Please upload a training data file first

💻 System Status

⚙️

CPU

0 cores
Usage 0%
0 MHz
🧠

Memory (RAM)

0 GB total
Used 0%
0 GB / 0 GB available
🎮

GPU

Checking...
No GPU detected or PyTorch not installed
🖥️

Device Information

Platform: -
Architecture: -
Processor: -
Python: -
📊

Utilization Details

CPU Cores
Disk I/O
📖 Read
0 MB
✏️ Write
0 MB
Network I/O
📤 Sent
0 MB
📥 Received
0 MB
Last updated: Never Updates every 2 seconds

🤖 Chat with Your Model

💬 Generate Text

Use a model loaded from this workspace or switch to the remote Ollama server.
Select the exact local model directory or remote Ollama model tag the chat tab should use.
Run the same prompt against two local artifacts or two cloud model tags and inspect both responses together.
Exact mode returns the saved training answer verbatim for exact prompt matches. Context mode still queries the model, but includes the matched dataset entry as supporting context.
Higher values = longer responses (50-2000 tokens)
Fact 0.20 • mostly factual Fiction
Lower values reduce creative drift. Set this near 0 for deterministic factual answers, especially on NVIDIA.

🦙 Import Model to Ollama

📦 Import GGUF Model

ℹ️ About: This will import your trained model into Ollama, allowing you to run it locally using the ollama run command.

⚠️ No GGUF file? If you don't see any GGUF files below, you need to convert your trained model first.

This will be the name you use with ollama run <name>
Controls randomness (0.0 = deterministic, 1.0+ = creative)

☁️ Cloud deploy: Use the second button to upload the selected GGUF and create the model on ollama.ayestaran.dev. This requires SSH access from the dashboard host to the server.

☁️ Ollama Cloud Server Admin

ℹ️ About: Manage models on ollama.ayestaran.dev. Listing uses the remote tags endpoint, and deletion uses SSH access from the dashboard host.

📊 Training Artifacts

Loading artifacts...

📊 3D Model Visualizations

Interactive 3D visualizations of your model's training process, checkpoints, embeddings, layer structures, and internal LLM topology.

Checking for visualizations...

Interactive Attention Flow

🧠 3D LLM Attention Explorer

A dedicated scene for inspecting how query, key, and value paths move through a transformer block. The layout is intentionally closer to a cinematic architecture diagram than the existing topology plot, with Q/K/V weight slabs, vector stages, layer norm, attention scoring, and animated token flow.

What You Can Inspect

Separate Q, K, and V stages, token score routing, residual handoff, and downstream attention aggregation in one navigable 3D scene.

Interaction Model

Orbit, pan, and zoom the scene. Click a subsystem to pin metadata and use the scene controls to jump straight to Q, K, V, or the attention matrix.

Visual Direction

The page uses a hand-built scene rather than Plotly so the composition can read more like a technical explainer, matching the reference style more closely.

Immersive Neural Atlas

🧬 3D Neural Brain Journey

A dedicated brain-shaped scene for travelling through your trained model. The page samples real learned tensor rows from each transformer layer, places them in an outlined brain shell, and lets you orbit between embeddings, attention projections, feed-forward blocks, and the output head.

How To Read It

The center spine is the residual stream. Branches represent Q, K, V, and feed-forward structures. Smaller glowing nodes are strong sampled rows taken from the real checkpoint weights.

Journey Model

Use the built-in waypoints to jump from input cortex to mid-layer reasoning and then into the output crown. Click any node to pin its layer, tensor path, and sampled row metrics.

Practical Constraint

This atlas is sampled, not literal-all-edges. Rendering every trained connection in a browser is not feasible at LLM scale, so the scene uses the strongest learned pathways to stay explorable.

🛠️ Technical Stack

Comprehensive overview of the technologies, frameworks, and tools powering this LLM training dashboard.

🧠 Core Deep Learning Framework

PyTorch 2.0+

Primary deep learning framework powering the transformer architecture, training loops, and model optimization.

Transformers 4.30+

Hugging Face Transformers library for model architecture, tokenization, and pre-trained model integration.

CUDA Toolkit

GPU acceleration support for high-performance model training with NVIDIA graphics cards.

Apple Silicon (MPS): Native GPU acceleration for the custom M-series ARM-based processors, via Metal Performance Shaders. PyTorch uses MPS for GPU support on Apple Silicon.

🏗️ Custom Transformer Implementation

Core Components (transformer.py)

  • Multi-Head Attention - Scaled dot-product attention with multiple heads for parallel processing
  • Positional Encoding - Sinusoidal position embeddings for sequence ordering
  • Feed-Forward Networks - Position-wise fully connected layers with GELU activation
  • Transformer Layers - Encoder and decoder layers with residual connections and layer normalization
  • Model Variants - Support for both encoder-decoder and decoder-only architectures

Training Utilities (train_utils.py)

  • Gradient Accumulation - Memory-efficient training with large batch sizes
  • Mixed Precision Training - FP16 automatic mixed precision for faster training
  • Learning Rate Scheduling - Cosine annealing and warmup strategies
  • Checkpoint Management - Save/load model states with SafeTensors format
  • Text Generation - Temperature, top-k, and top-p sampling strategies

🌐 Web Dashboard & Visualization

Flask 3.0+

Lightweight web framework serving the training dashboard API and real-time status updates.

Plotly 5.20+

Interactive 3D visualizations for model checkpoints, embeddings, and layer structures.

Marked.js

Markdown parsing for rendering documentation and workflow guides directly in the dashboard.

📊 Data Processing & System Monitoring

NumPy 1.24+

Numerical computing library for efficient array operations and data manipulation.

PSUtil 5.9+

Real-time system and process monitoring for CPU, memory, GPU, and disk usage tracking.

TQDM 4.65+

Progress bars for training loops and batch processing with ETA estimates.

🦙 Model Export & Deployment

SafeTensors 0.4+

Fast and secure model serialization format for checkpoint saving and model conversion.

Ollama Integration

Export trained models to GGUF format and import directly into Ollama for local inference.

llama.cpp

Convert PyTorch models to GGUF format for efficient CPU/GPU inference with quantization.

🔧 Optional Tools & Extensions

TensorBoard 2.12+

Training visualization and metrics tracking for loss curves, learning rates, and gradients.

Weights & Biases

Experiment tracking and collaboration platform for machine learning projects.

Scikit-learn 1.2+

Data preprocessing utilities and evaluation metrics for model performance analysis.

🎯 System Architecture

Data Pipeline: Text/JSONL Input → Tokenization → Dataset Creation → DataLoader

Training Loop: Forward Pass → Loss Calculation → Backpropagation → Optimizer Step → Checkpoint Save

Model Export: PyTorch Model → SafeTensors → GGUF Conversion → Ollama Import

Dashboard: Flask API ↔ Real-time Updates ↔ 3D Visualizations ↔ System Monitoring

🚀 Built for Production-Ready LLM Training

Architecture, Design and Development by Franz Ayestaran / Enhanced Pair Programming with Claude Code & OpenAI GPT