LLM Training Dashboard - (AI-Augmented Software Engineering)

🤖 LLM Training Dashboard - (AI-Augmented Software Engineering)

Manage Training Workflow, 3D Visualizations, Ollama Integration and Model Artifact Clean Up
Architecture, Design and Development by Franz Ayestaran / Enhanced Pair Programming with Claude Code

Loading workflow documentation...

📊 LLM Training Pipeline Presentation

Download the complete PowerPoint presentation covering the LLM creation and Ollama deployment pipeline

⬇️ Download PPTX

🚀 Train Your Model

📂 Upload Training Data

Upload a .txt, .json, or .jsonl training file. JSON datasets can use records like {"instruction": "...", "output": "..."}.

⚙️ Training Configuration

e.g. 0.00001 for 1e-5

LoRA Configuration

QLoRA uses bitsandbytes on NVIDIA CUDA and switches to an Apple-native MLX-LM path on Apple Silicon for real quantized adapter training. Recommended defaults: 5 epochs, batch size 4, QLoRA, native profile.
Native keeps each backend's preferred defaults. Comparable aligns sequence length, dataset packing, and adapter scope more closely so Apple and CUDA runs are easier to compare.
Keep this enabled for easiest inference and export. Disable it to save only the LoRA adapter and reduce disk usage.

🔄 Resume from Checkpoint (Optional)

Continue training from a saved checkpoint instead of starting from scratch.

Recommended resets the form to the dashboard baseline. Export saves the current dashboard settings as a versioned JSON config. Load applies a previously saved config back into the form and restores the referenced training file when available.

⚠️ Please upload a training data file first

💻 System Status

⚙️

CPU

0 cores
Usage 0%
0 MHz
🧠

Memory (RAM)

0 GB total
Used 0%
0 GB / 0 GB available
🎮

GPU

Checking...
No GPU detected or PyTorch not installed
🖥️

Device Information

Platform: -
Architecture: -
Processor: -
Python: -
📊

Utilization Details

CPU Cores
Disk I/O
📖 Read
0 MB
✏️ Write
0 MB
Network I/O
📤 Sent
0 MB
📥 Received
0 MB
Last updated: Never Updates every 2 seconds

🤖 Chat with Your Model

💬 Generate Text

Use a model loaded from this workspace or switch to the remote Ollama server.
Select the exact local model directory or remote Ollama model tag the chat tab should use.
Exact mode returns the saved training answer verbatim for exact prompt matches. Context mode still queries the model, but includes the matched dataset entry as supporting context.
Higher values = longer responses (50-2000 tokens)
Fact 0.20 • mostly factual Fiction
Lower values reduce creative drift. Set this near 0 for deterministic factual answers, especially on NVIDIA.

🦙 Import Model to Ollama

📦 Import GGUF Model

ℹ️ About: This will import your trained model into Ollama, allowing you to run it locally using the ollama run command.

⚠️ No GGUF file? If you don't see any GGUF files below, you need to convert your trained model first.

This will be the name you use with ollama run <name>
Controls randomness (0.0 = deterministic, 1.0+ = creative)

☁️ Cloud deploy: Use the second button to upload the selected GGUF and create the model on ollama.ayestaran.dev. This requires SSH access from the dashboard host to the server.

☁️ Ollama Cloud Server Admin

ℹ️ About: Manage models on ollama.ayestaran.dev. Listing uses the remote tags endpoint, and deletion uses SSH access from the dashboard host.

📊 Training Artifacts

Loading artifacts...

📊 3D Model Visualizations

Interactive 3D visualizations of your model's training process, checkpoints, embeddings, and layer structures.

Checking for visualizations...

🛠️ Technical Stack

Comprehensive overview of the technologies, frameworks, and tools powering this LLM training dashboard.

🧠 Core Deep Learning Framework

PyTorch 2.0+

Primary deep learning framework powering the transformer architecture, training loops, and model optimization.

Transformers 4.30+

Hugging Face Transformers library for model architecture, tokenization, and pre-trained model integration.

CUDA Toolkit

GPU acceleration support for high-performance model training with NVIDIA graphics cards.

Apple Silicon (MPS): Native GPU acceleration for the custom M-series ARM-based processors, via Metal Performance Shaders. PyTorch uses MPS for GPU support on Apple Silicon.

🏗️ Custom Transformer Implementation

Core Components (transformer.py)

  • Multi-Head Attention - Scaled dot-product attention with multiple heads for parallel processing
  • Positional Encoding - Sinusoidal position embeddings for sequence ordering
  • Feed-Forward Networks - Position-wise fully connected layers with GELU activation
  • Transformer Layers - Encoder and decoder layers with residual connections and layer normalization
  • Model Variants - Support for both encoder-decoder and decoder-only architectures

Training Utilities (train_utils.py)

  • Gradient Accumulation - Memory-efficient training with large batch sizes
  • Mixed Precision Training - FP16 automatic mixed precision for faster training
  • Learning Rate Scheduling - Cosine annealing and warmup strategies
  • Checkpoint Management - Save/load model states with SafeTensors format
  • Text Generation - Temperature, top-k, and top-p sampling strategies

🌐 Web Dashboard & Visualization

Flask 3.0+

Lightweight web framework serving the training dashboard API and real-time status updates.

Plotly 5.20+

Interactive 3D visualizations for model checkpoints, embeddings, and layer structures.

Marked.js

Markdown parsing for rendering documentation and workflow guides directly in the dashboard.

📊 Data Processing & System Monitoring

NumPy 1.24+

Numerical computing library for efficient array operations and data manipulation.

PSUtil 5.9+

Real-time system and process monitoring for CPU, memory, GPU, and disk usage tracking.

TQDM 4.65+

Progress bars for training loops and batch processing with ETA estimates.

🦙 Model Export & Deployment

SafeTensors 0.4+

Fast and secure model serialization format for checkpoint saving and model conversion.

Ollama Integration

Export trained models to GGUF format and import directly into Ollama for local inference.

llama.cpp

Convert PyTorch models to GGUF format for efficient CPU/GPU inference with quantization.

🔧 Optional Tools & Extensions

TensorBoard 2.12+

Training visualization and metrics tracking for loss curves, learning rates, and gradients.

Weights & Biases

Experiment tracking and collaboration platform for machine learning projects.

Scikit-learn 1.2+

Data preprocessing utilities and evaluation metrics for model performance analysis.

🎯 System Architecture

Data Pipeline: Text/JSONL Input → Tokenization → Dataset Creation → DataLoader

Training Loop: Forward Pass → Loss Calculation → Backpropagation → Optimizer Step → Checkpoint Save

Model Export: PyTorch Model → SafeTensors → GGUF Conversion → Ollama Import

Dashboard: Flask API ↔ Real-time Updates ↔ 3D Visualizations ↔ System Monitoring

🚀 Built for Production-Ready LLM Training

Architecture, Design and Development by Franz Ayestaran / Enhanced Pair Programming with Claude Code