For Developers

Intelligence for every device.Defined by you.

Coming Soon

We're working around the clock to bring you the ultimate on-device AI experience. Below are our latest real-device benchmarks — every number measured on actual hardware, not simulators.

OptimizeEdge StudioCompress & optimize models

IntegrateEdge RuntimeOn-device inference SDK

ShipEdge ScaffoldGenerate a publishable app

Real-Device Benchmarks

20-turn deep conversations on your iPhone, no slowdown

9B parameter model, 20-turn continuous conversation, 20,000+ token context — equivalent to a short book.

20turns

Continuous Deep ChatNo reset, no interruption

26K+

Context CapacityA short book in memory

7.5tok/s

T20 Sustained SpeediPhone at turn 20

Crashes8GB devices, fully stable

Edge Runtime vs Standard Implementation — iPad Air M3, Qwen3.5-9B

tok/s = tokens generated per second, higher is smoother. Standard = stock open-source MLX.

Turn	Context	Edge Runtime	Standard	Gain
T1	1K	16.5	17.8	-7%
T5	5K	9.6	6.8	+41%
T10	10K	6.9	5.4	+28%
T15	16K	4.8	3.6	+33%
T20	21K	3.5	1.8	+94%

Generation Speed Curve (tok/s)

Qwen3.5-9B · 20-turn deep technical discussion · Edge Runtime

181260

T1T5T10T15T20

iPad Air M3

iPhone 17 Pro

iPhone Air

iPhone 17e

iPhone 15 PM

Standard (iPad)

iPad Air M3M3 · 8GB

20/20

T116.5

T203.5

tok/s

Fastest first turn

iPhone 17 ProA19 Pro · 12GB

20/20

T112.6

T2010.8

tok/s

Flagship experience

iPhone AirA19 · 12GB

20/20

T19.5

T207.5

tok/s

Near-zero degradation

iPhone 15 PMA17 Pro · 8GB

17/20

T17.4

T172.9

tok/s

Previous-gen flagship

iPhone 17eA19 · 8GB

20/20

T19.2

T202.7

tok/s

Entry-level capable

Standard implementation crashes on turn 2 on iPhones

Without our core inference algorithms, the stock implementation crashes on turn 2 on both iPhone 17 Pro and iPhone 17e. Edge Runtime's core inference algorithms complete all 20 turns smoothly on the same devices.

20 turns tested at 21,000+ tokens. At our current algorithm capacity, we can sustain ~26,000 tokens of continuous conversation — equivalent to reading and remembering an entire book, all on your phone.

Note: iPhone Air, iPhone 17 Pro, and Pro Max all have 12GB of physical RAM, yet iOS limits each app to roughly 6GB. We're not sure why Apple imposes this ceiling. If Apple unlocks more memory for apps in the future, we believe on-device AI performance will be even more impressive.

The Edge Suite

One brand, three tools, complete loop

Edge Studio

Preview

AI-powered model analysis & surgical optimization

Intelligent analysis engine that automatically detects redundant layers, inefficient neurons, and optimization opportunities. Our proprietary 7-step progressive pipeline performs neuron-level surgical pruning — not coarse-grained compression — with real-time perplexity monitoring to guarantee output quality.

▶Intelligent model analysis: auto-detect redundant layers & low-efficiency neurons
▶Neuron-level surgical pruning with perplexity-guided quality control
▶7-step progressive optimization: vocab → neuron → layer → quantization
▶Built-in benchmark at every step (disk, RAM, tok/s, PPL)

Edge Runtime

Preview

Proprietary inference algorithms for Apple Silicon

Purpose-built inference engine with proprietary ANE-GPU co-scheduling, disaggregated inference architecture, and zero-copy model loading. Not a wrapper — original algorithms that achieve 11.3x prefill speedup and 79% GPU power reduction through ANE batch dispatch and concurrent pipeline execution.

▶Proprietary ANE-GPU co-scheduling: 11.3x prefill acceleration
▶Disaggregated inference: ANE prefill + GPU decode in parallel
▶Zero-copy model loading: 20x faster with no memory spike
▶Native support for Edge Studio's non-standard optimized architectures

Edge Scaffold

Preview

The definitive edge AI deployment solution

The only end-to-end pipeline from optimized model to published App Store app. Integrates Edge Runtime's proprietary inference, On-Demand Resources for intelligent model delivery, and built-in ESG carbon tracking — a complete deployment solution that no other platform offers.

▶End-to-end: optimized model → App Store app in one pipeline
▶ODR-powered intelligent model delivery (no bundling large files)
▶One config file: app name, model, system prompt — ship in minutes
▶Built-in ESG carbon savings tracking for sustainability compliance

Quick Start

import EdgeInference

let engine = LLMEngine()
try await engine.load(config: .find(modelID: "qwen3.5-0.8b")!)

for try await chunk in engine.generate(
    messages: [.user("What is edge AI?")]
) {
    print(chunk.text, terminator: "")
}

5 lines of Swift — load a model, stream tokens. That's it.

Get early access

Sign up to be notified when the AtomGradient Edge suite is publicly available. We'll send setup guides and invite you to our developer preview.

Hundreds of developers already on the waitlist

AtomGradient is bringing intelligence to every edge, NOT JUST Apple — stay tuned.

Intelligence for every device.Defined by you.

20-turn deep conversations on your iPhone, no slowdown

Edge Runtime vs Standard Implementation — iPad Air M3, Qwen3.5-9B

Generation Speed Curve (tok/s)

One brand, three tools, complete loop

Edge Studio

Edge Runtime

Edge Scaffold

More from AtomGradient

Vanilla

Quick Start

Get early access