LLM Inference Visualizer

Vanilla

See what happens inside a large language model as it thinks. Vanilla visualizes every step of inference in real time — token probabilities, attention patterns, layer activations, residual flows, and parameter space — with four modes from deep research to kid-friendly animations. v1.0.4: VLM image input, model discovery, Metal-accelerated heatmaps, animated transitions, and error recovery.

Download DMG

Expert Mode — Token probability analysis, attention insights, and layer-by-layer breakdown

Features

Four modes. One vision.

From deep research to a child's first encounter with AI — every perspective is designed.

Expert Mode

Full analysis dashboard: Top-K probability bars, attention heatmaps, Logit Lens layer-by-layer predictions, residual flow magnitudes, GDN trends, and parameter space statistics.

Guided Mode

Step-by-step educational walkthrough: from tokenization to attention to final prediction. Perfect for learning how transformers work.

Kid Mode

An animated AI sprite character guides children through the inference journey with colorful token cards and playful interactions.

Pause & Step

True pause with DispatchSemaphore — freeze inference at any token, step forward one at a time, rewind and inspect any layer.

Review Mode

Sequence summary with key findings: confidence distribution, most decisive layers, and generation statistics at a glance.

Native Performance

Built with Swift + Metal on Apple Silicon. Metal-accelerated attention heatmaps, real-time particle rendering, and smooth 60fps animated transitions.

Model Discovery

Built-in recommended model panel with local scanning. First-time users see suggestions immediately — pick a model and visualize in 30 seconds.

Error Recovery

Actionable error states with retry, switch model, and timeout protection. No more app restarts when things go wrong.

Why Vanilla

Understand model behavior at every layer

Debug and validate model outputs visually

Teach AI concepts to anyone — even kids

Runs 100% locally, no cloud dependency

Interactive exploration, not static charts

Built for Apple Silicon with Metal GPU acceleration

Vanilla processes everything on your device. Zero data collection. No account required.