Interactive visualizations that break down core LLM concepts — from tokens to inference engines.
Understand the core building blocks of LLMs from scratch — tokens, embeddings, attention, quantization, training & inference.
Why does AI charge by tokens? How many tokens is one Chinese character?
How does AI turn words into numbers? From one-hot to semantic space
How does AI actually "understand" your words?
Why can phones run large models? What are INT4/INT8?
Why does training need hundreds of GPUs but inference just one?
Why can some models read entire books while others only read a page?
Explore inference acceleration, model compression, hardware architecture and AI Agents — from principles to AtomGradient research.
The two phases of LLM text generation: "understanding" and "outputting"
Why are tokens cheaper in long conversations? The secret of cache hits
Why can DeepSeek run fast with 671B parameters?
How do large models 'teach' small models? The art of knowledge transfer
How does AI learn to 'speak human'? From pre-training to alignment
Apple Silicon's secret weapon: ANE+GPU co-inference
How does 'draft-then-verify' double inference speed?
Why is Apple Silicon uniquely suited for AI?
How to load models 20x faster? The magic of mmap
How does a phone speak in real-time after 67% model compression?
How many steps to make your phone 'see' photos?
Personalized AI without uploading data? Emergent intelligence from cross-domain integration
Why does 'thinking before answering' make AI smarter?
How did AI evolve from 'chatting' to 'doing work'?
How does AI write code, run tests, and fix bugs? From Copilot to Claude Code