LLM Training Dashboard - (AI-Augmented Software Engineering)

Dashboard crest LLM Training Dashboard - (AI-Augmented Software Engineering)

Manage Training Workflow, 3D Visualizations, Ollama Integration and Model Artifact Clean Up
Architecture, Design and Development by Franz Ayestaran / Enhanced Pair Programming with Claude Code & OpenAI GPT

Loading workflow documentation...

📊 LLM Training Pipeline Presentation

Download the complete PowerPoint presentation covering the LLM creation and Ollama deployment pipeline

⬇️ Download PPTX

🚀 Train Your Model

📂 Upload Training Data

Upload a .txt, .json, or .jsonl training file. JSON datasets can use records like {"instruction": "...", "output": "..."}.

⚙️ Training Configuration

e.g. 0.00001 for 1e-5

LoRA Configuration

QLoRA uses bitsandbytes on NVIDIA CUDA and switches to an Apple-native MLX-LM path on Apple Silicon for real quantized adapter training. Recommended defaults: 5 epochs, batch size 4, QLoRA, native profile.
Native keeps each backend's preferred defaults. Comparable aligns sequence length, dataset packing, and adapter scope more closely so Apple and CUDA runs are easier to compare.
Keep this enabled for easiest inference and export. Disable it to save only the LoRA adapter and reduce disk usage.

🔄 Resume from Checkpoint (Optional)

Continue training from a saved checkpoint instead of starting from scratch.

Recommended resets the form to the dashboard baseline. Export saves the current dashboard settings as a versioned JSON config. Load applies a previously saved config back into the form and restores the referenced training file when available.

⚠️ Please upload a training data file first

💻 System Status

⚙️

CPU

0 cores
Usage 0%
0 MHz
🧠

Memory (RAM)

0 GB total
Used 0%
0 GB / 0 GB available
🎮

GPU

Checking...
No GPU detected or PyTorch not installed
🖥️

Device Information

Platform: -
Architecture: -
Processor: -
Python: -
🟩

NVIDIA

NVIDIA CUDA: -
📊

Utilization Details

CPU Cores
Disk I/O
📖 Read
0 MB
✏️ Write
0 MB
Network I/O
📤 Sent
0 MB
📥 Received
0 MB
Last updated: Never Updates every 2 seconds

🧪 Phase-1 ML/DL And Classical ML Training

Train Phase-1 deep learning models, Phase-2 classical ML models, and the new Phase-3 Vision Transformer and small text transformer models from one dashboard page. The sections stay stacked in order so Phase 3 appears below the existing classical workflow.

A. Model Selector

Loading model registry...

B. Dataset Upload Panel

Choose a model to see its supported dataset format.

No ML/DL dataset uploaded yet.

C. Hyperparameter Panel

Training Actions

Upload a compatible dataset to enable training.
No completed ML/DL run selected yet.

📈 Phase-2 Classical ML Models

Train Random Forest, SVM, and Logistic Regression models on CSV datasets with schema preview, target-column control, optional ONNX export, telemetry, and downloadable `.pkl` artifacts.

A. Classical Model Selector

Loading classical model registry...

B. CSV Dataset Upload

Choose a classical model to see its supported dataset format.

No classical ML dataset uploaded yet.

C. Hyperparameter Panel

Training Actions

Upload a compatible CSV dataset to enable classical ML training.
No completed classical ML run selected yet.

🛰️ Phase-3 Vision Transformers and Small Transformers

Train Vision Transformer plus tiny causal LLaMA text models with image-folder or JSONL datasets, live telemetry, ONNX export, ONNX Runtime inference support, and GGUF/Ollama export for the Phase-3 text models.

A. Transformer Model Selector

Loading transformer registry...

B. Dataset Upload Panel

Choose a Phase-3 model to see its supported dataset format.

No Phase-3 dataset uploaded yet.

C. Hyperparameter Panel

Training Actions

Upload a compatible dataset to enable training.
No completed Phase-3 run selected yet.

🤖 Chat with Your Model

💬 Generate Text

Use a model loaded from this workspace or switch to the remote Ollama server.
Select the exact local model directory or remote Ollama model tag the chat tab should use.
Run the same prompt against two local artifacts or two cloud model tags and inspect both responses together.
Exact mode returns the saved training answer verbatim for exact prompt matches. Context mode still queries the model, but includes the matched dataset entry as supporting context.
Higher values = longer responses (50-2000 tokens)
Auto scales with max length and backend. Pick a longer timeout for slower local models.
Fact 0.20 • mostly factual Fiction
Lower values reduce creative drift. Set this near 0 for deterministic factual answers, especially on NVIDIA.

🦙 Import Model to Ollama

📦 Import GGUF Model

ℹ️ About: This will import your trained model into Ollama, allowing you to run it locally using the ollama run command.

⚠️ No GGUF file? If you don't see any GGUF files below, you need to convert your trained model first.

This will be the name you use with ollama run <name>
Controls randomness (0.0 = deterministic, 1.0+ = creative)

☁️ Cloud deploy: Use the second button to upload the selected GGUF and create the model on ollama.ayestaran.dev. This requires SSH access from the dashboard host to the server.

☁️ Ollama Cloud Server Admin

ℹ️ About: Manage models on ollama.ayestaran.dev. Listing uses the remote tags endpoint, and deletion uses SSH access from the dashboard host.

📊 Training Artifacts

Loading artifacts...

📊 3D Model Visualizations

Interactive 3D visualizations of your model's training process, checkpoints, embeddings, layer structures, and internal LLM topology.

Checking for visualizations...

Interactive Attention Flow

🧠 3D LLM Attention Explorer

A dedicated scene for inspecting how query, key, and value paths move through a transformer block. The layout is intentionally closer to a cinematic architecture diagram than the existing topology plot, with Q/K/V weight slabs, vector stages, layer norm, attention scoring, and animated token flow.

What You Can Inspect

Separate Q, K, and V stages, token score routing, residual handoff, and downstream attention aggregation in one navigable 3D scene.

Interaction Model

Orbit, pan, and zoom the scene. Click a subsystem to pin metadata and use the scene controls to jump straight to Q, K, V, or the attention matrix.

Visual Direction

The page uses a hand-built scene rather than Plotly so the composition can read more like a technical explainer, matching the reference style more closely.

Immersive Neural Atlas

🧬 3D Neural Brain Journey

A dedicated brain-shaped scene for travelling through your trained model, now with a semantic Embedding Galaxy mode. Switch between the structural 3D brain and a token-star universe where clusters become constellations, dense regions become nebulae, and model-specific concepts form their own colored territories.

Choose which local model checkpoint powers the Brain Journey and Semantic Universe views.
How To Read It

Brain mode shows the residual spine, attention branches, and sampled learned pathways from the real checkpoint. Galaxy mode remaps sampled token embeddings into a 3D star field.

Journey Model

Use the built-in waypoints to jump from input cortex to mid-layer reasoning and then into the output crown. Click any node to pin its layer, tensor path, and sampled row metrics.

Practical Constraint

This atlas is sampled, not literal-all-edges. Rendering every trained connection in a browser is not feasible at LLM scale, so the scene uses the strongest learned pathways to stay explorable.

🛰️ Visit Logs

Recent server-side page visits captured in SQLite. Timestamps are shown in UTC.

Total Visits
0
Showing
0

🛠️ Technical Stack

Overview of the Phase-1 ML/DL, Phase-2 classical ML, and Phase-3 transformer workflows that power this dashboard across NVIDIA CUDA and Apple Silicon MLX/MPS environments.

🧠 Core Deep Learning Framework

PyTorch 2.0+

Primary runtime for Phase-1 CNN/MLP/RNN training, custom transformer experiments, Phase-3 Vision Transformer pipelines, and shared optimization and export flows.

Transformers 4.30+

Hugging Face stack for tokenization, pre-trained checkpoints, compact text transformers, and PEFT-based LoRA fine-tuning workflows.

CUDA / MPS / MLX

CUDA accelerates NVIDIA training and quantized fine-tuning workloads.

Apple Silicon (MPS / MLX): Native GPU acceleration for M-series processors via Metal Performance Shaders, with MLX-LM enabling Apple-native quantized LoRA and QLoRA training and fused export flows.

🏗️ Model Coverage & Training Workflows

Model Coverage (transformer.py + backend/ml_models)

  • Phase 1 ML/DL Models - CNN, MLP, and RNN training flows with CSV, JSONL, and image dataset support
  • Custom Transformer Blocks - Attention, positional encoding, feed-forward layers, and decoder-oriented experiments
  • Phase 2 Classical ML - Random Forest, SVM, and Logistic Regression registries with CSV schema inspection
  • Phase 3 Models - Vision Transformer, MicroLLaMA, and MiniLLaMA workflows with optional ONNX export plus GGUF conversion for the text-model path
  • Shared Dataset Pipeline - Built-in upload, sample-dataset generation, preview, validation, and run history flows

Training Workflows (finetune.py / finetune_mlx.py / backend/ml_models)

  • Gradient Accumulation & MPS Safety - Automatic batch-size and sequence-length adjustments for constrained Apple Silicon memory
  • LoRA / QLoRA Backends - PEFT plus bitsandbytes on CUDA and MLX-native quantized fine-tuning on Apple Silicon
  • Resume, Validation & Scheduling - Cosine scheduling, gradient checkpointing, resume-from-checkpoint, and epoch validation samples
  • Export Paths - SafeTensors, GGUF, ONNX, and artifact bundles for local deployment and comparison
  • Telemetry & History - Training status, logs, history, and downloadable artifacts surfaced directly in the dashboard

🌐 Web Dashboard & Visualization

Flask 3.0+

Flask serves the unified Phase-1, Phase-2, and Phase-3 dashboard, training APIs, dataset upload routes, telemetry, and artifact export endpoints.

Plotly 5.20+

Interactive 3D visualizations for checkpoints, embeddings, layer structures, neural atlases, and interpretability scenes.

Marked.js

Markdown parsing for rendering documentation and workflow guides directly in the dashboard.

📊 Data Processing & System Monitoring

NumPy 1.24+

Array-centric processing for tabular, text, image, and geometry workflows across ML and deep-learning training.

PSUtil 5.9+

Real-time system and process monitoring for CPU, memory, GPU, and disk usage tracking.

TQDM 4.65+

Progress reporting for fine-tuning, classical ML jobs, dataset conversion, and export pipelines with ETA estimates.

🦙 Model Export & Deployment

SafeTensors 0.4+

Checkpoint and adapter serialization for transformer fine-tuning, resume bundles, and model conversion pipelines.

Ollama Integration

GGUF export path for PyTorch and Apple-native MLX outputs, with direct Ollama import for local inference.

GGUF / ONNX Runtime

llama.cpp quantization supports GGUF deployment, while ONNX and ONNX Runtime cover Phase-2 exports plus optional ONNX and GGUF paths for the Phase-3 text models.

🔧 ML, Tracking & Inference Tooling

TensorBoard 2.12+

Loss, learning-rate, and gradient tracking for deep-learning runs and checkpoint inspection.

Weights & Biases

Experiment tracking and collaboration platform for machine learning projects.

Scikit-learn 1.2+

Phase-2 classical ML stack for Random Forest, SVM, and Logistic Regression training, metrics, and optional skl2onnx export.

🔬 Transformer Interpretability Stack

LLM Training Dashboard progressively exposes the internal logic of large language models through twelve hierarchical layers, moving from high-level semantic geometry down to neuron-level concept discovery.

  1. Embedding Galaxy - Constructs the top-level semantic space, mapping tokens into a unified representational geometry.
  2. Brain Atlas - Defines the macro-architecture of the model, showing how major regions such as attention, MLP, and embedding blocks interconnect.
  3. Tensor Microarchitecture - Visualizes tensor statistics and heatmaps for fine-grained inspection of weight distributions and activation patterns.
  4. Head-Aware Q/K/V Decomposition - Separates attention heads to analyze norms, sparsity, and per-head behavior.
  5. Activation-Path Visualization - Traces query, key, and value activations through attention weights to reveal how information flows within a layer.
  6. Multi-Head Interaction Map - Examines horizontal structure through head-to-head similarity, clustering, redundancy, and specialization, including syntax, induction, and negation heads.
  7. Layer-to-Layer Activation Flow - Explores vertical structure by showing how outputs propagate across layers to form emergent circuits and conceptual hierarchies.
  8. MLP Neuron Concept Discovery - Discovers concept neurons and feature detectors in MLP layers that respond to interpretable patterns such as numbers, names, and emotions.
  9. Feature-Space Geometry - Visualizes principal components, concept directions, neuron clusters, and subspaces for specific behaviors.
  10. Time Drift Visualisation - Tracks how embeddings, attention patterns, and neuron behaviours shift across checkpoints to reveal when representations diverge or capabilities emerge.
  11. Gradient Flow & Influence Maps - Reveals why a model chose its output by tracing token‑level gradients, attribution signals, and causal influence pathways.
  12. Mechanistic Circuits & Subgraph Extraction - A multi‑level framework that reveals how transformer models represent, transform, and reason through their internal mechanisms.

🎯 System Architecture

Data Pipeline: CSV / JSONL / image-folder input → preview and validation → dataset builders → training loaders

Training Loop: Phase-1 deep learning loops / Phase-2 classical fit / Phase-3 transformer fine-tuning → telemetry → checkpoints and artifacts

Model Export: SafeTensors / .pkl / ONNX / GGUF → Ollama / ONNX Runtime / local artifact downloads

Dashboard: Flask API ↔ Phase-1/2/3 controls ↔ real-time updates ↔ 3D visualizations ↔ system monitoring

🚀 Built for Production-Ready LLM Training

Architecture, Design and Development by Franz Ayestaran / Enhanced Pair Programming with Claude Code & OpenAI GPT