See what happens inside a large language model as it thinks. Vanilla visualizes every step of inference in real time — token probabilities, attention patterns, layer activations, residual flows, and parameter space — with four modes from deep research to kid-friendly animations. v1.0.4: VLM image input, model discovery, Metal-accelerated heatmaps, animated transitions, and error recovery.

Expert Mode — Token probability analysis, attention insights, and layer-by-layer breakdown
Four modes. One vision.
From deep research to a child's first encounter with AI — every perspective is designed.
Full analysis dashboard: Top-K probability bars, attention heatmaps, Logit Lens layer-by-layer predictions, residual flow magnitudes, GDN trends, and parameter space statistics.

Step-by-step educational walkthrough: from tokenization to attention to final prediction. Perfect for learning how transformers work.

An animated AI sprite character guides children through the inference journey with colorful token cards and playful interactions.

True pause with DispatchSemaphore — freeze inference at any token, step forward one at a time, rewind and inspect any layer.

Sequence summary with key findings: confidence distribution, most decisive layers, and generation statistics at a glance.

Built with Swift + Metal on Apple Silicon. Metal-accelerated attention heatmaps, real-time particle rendering, and smooth 60fps animated transitions.

Built-in recommended model panel with local scanning. First-time users see suggestions immediately — pick a model and visualize in 30 seconds.

Actionable error states with retry, switch model, and timeout protection. No more app restarts when things go wrong.

Understand model behavior at every layer
Debug and validate model outputs visually
Teach AI concepts to anyone — even kids
Runs 100% locally, no cloud dependency
Interactive exploration, not static charts
Built for Apple Silicon with Metal GPU acceleration
Vanilla processes everything on your device. Zero data collection. No account required.