Each card uses downsampled Q, K, and V head partitions from the selected attention layer.
Per-head Q, K, and V slices stay read-only and downsampled for browser-safe inspection.
Provide a short sequence, select the query token index, and compute a compact read-only activation path for the current head.
Shows the selected token query vector plus compact key and value examples for the current sequence.
Displays the softmax profile across the sequence and the strongest attended tokens.
Summarizes the selected token, the top attended targets, and the resulting head output vector.
Compare head similarity, clusters, redundancies, specialization, and anomalies for the current layer.
Cosine similarity across lightweight per-head Q/K/V feature vectors for this layer.
PCA projection of head features. Nearby points indicate similar query/key/value behaviour.
Track how one selected head contributes to the residual stream and which heads respond most strongly in the next layers.
Use a short sequence and selected token to compute a compact, read-only cross-layer flow from the current head.
Shows the selected source head and the strongest responding heads across the configured depth.
Each row represents one hop and each column shows the influence score for a target head in the next layer.
Trace per-token saliency, attribution, and causal influence for the current head without computing full Jacobians.
Choose an output token, attribution method, and optional causal patch strategy to answer why this head influenced the selected output.
Each token shows gradient saliency, attribution magnitude, and combined influence toward the selected output token.
Top input tokens are ranked by combined saliency, attribution, and local attention influence.
Patch individual token embeddings with lightweight baselines and compare the patched residual objective.
Extract lightweight local circuits around the current behaviour, then jump straight into the exact head or activation-path target.
Choose the behaviour token, local depth, and behaviour label to rank nearby circuits without computing a full model-wide graph.
Ranked feature → head → layer → feature pathways near the current head and selected output token.
Compact directed subgraph linking selected features, heads, layers, and downstream behaviour nodes.
Inspect concept-like neurons, sparse detectors, sampled token correlations, and neuron directions for the selected feed-forward layer.
Sort and filter neurons by activation, sparsity, and concept-like scores.
Choose a neuron to inspect current-input activation, token correlations, and direction alignment.
Project neuron features or token embeddings into PCA space, overlay concept directions, cluster the geometry, and render behaviour subspaces.
Cluster counts, concept-basis support, and projected subspace metadata update as each computation finishes.
This page turns the trained checkpoint into a browser-safe 3D brain atlas. The scene uses real learned tensors, samples the strongest learned rows from each layer, and places them inside an outlined brain shell so you can orbit, inspect, and travel across the network.
Press Start Atlas to begin building the neural journey from the current trained checkpoint.