Understand Every Layer of AI

Interactive visualizations that break down core LLM concepts — from tokens to inference engines.

Beginner6 articles

AI Fundamentals

Understand the core building blocks of LLMs from scratch — tokens, embeddings, attention, quantization, training & inference.

01Beginner8 min

Tokens & Tokenization

Why does AI charge by tokens? How many tokens is one Chinese character?

→

02Beginner8 min

Embeddings

How does AI turn words into numbers? From one-hot to semantic space

→

03Beginner12 min

Transformer Attention

How does AI actually "understand" your words?

→

04Beginner10 min

Model Quantization

Why can phones run large models? What are INT4/INT8?

→

05Beginner8 min

Inference vs Training

Why does training need hundreds of GPUs but inference just one?

→

06Beginner8 min

Context Window

Why can some models read entire books while others only read a page?

→

Advanced15 articles

Deep Dive: Inference & Optimization

Explore inference acceleration, model compression, hardware architecture and AI Agents — from principles to AtomGradient research.

07Advanced10 min

Prefill & Decode

The two phases of LLM text generation: "understanding" and "outputting"

→

08Advanced8 min

KV Cache

Why are tokens cheaper in long conversations? The secret of cache hits

→

09Advanced10 min

Mixture of Experts

Why can DeepSeek run fast with 671B parameters?

→

10Advanced8 min

Model Distillation

How do large models 'teach' small models? The art of knowledge transfer

→

11Advanced10 min

RLHF

How does AI learn to 'speak human'? From pre-training to alignment

→

12Advanced12 min

ANE Hybrid Inference

Apple Silicon's secret weapon: ANE+GPU co-inference

→

13Advanced10 min

Speculative Decoding

How does 'draft-then-verify' double inference speed?

→

14Advanced8 min

Unified Memory Architecture

Why is Apple Silicon uniquely suited for AI?

→

15Advanced8 min

Zero-Copy Model Loading

How to load models 20x faster? The magic of mmap

→

16Advanced8 min

On-Device TTS

How does a phone speak in real-time after 67% model compression?

→

17Advanced8 min

On-Device Vision Models

How many steps to make your phone 'see' photos?

→

18Advanced10 min

On-Device Personal AI

Personalized AI without uploading data? Emergent intelligence from cross-domain integration

→

19Advanced10 min

Reasoning Models

Why does 'thinking before answering' make AI smarter?

→

20Advanced10 min

AI Agents

How did AI evolve from 'chatting' to 'doing work'?

→

21Advanced10 min

Code Agents

How does AI write code, run tests, and fix bugs? From Copilot to Claude Code

→