For Developers

Intelligence for every device.Defined by you.

Coming Soon

We're working around the clock to bring you the ultimate on-device AI experience. Below are our latest real-device benchmarks — every number measured on actual hardware, not simulators.

1
OptimizeEdge StudioCompress & optimize models
2
IntegrateEdge RuntimeOn-device inference SDK
3
ShipEdge ScaffoldGenerate a publishable app
Real-Device Benchmarks

20-turn deep conversations on your iPhone, no slowdown

9B parameter model, 20-turn continuous conversation, 20,000+ token context — equivalent to a short book.

20turns
Continuous Deep ChatNo reset, no interruption
26K+
Context CapacityA short book in memory
7.5tok/s
T20 Sustained SpeediPhone at turn 20
0
Crashes8GB devices, fully stable

Edge Runtime vs Standard Implementation — iPad Air M3, Qwen3.5-9B

tok/s = tokens generated per second, higher is smoother. Standard = stock open-source MLX.

TurnContextEdge RuntimeStandardGain
T11K16.517.8-7%
T55K9.66.8+41%
T1010K6.95.4+28%
T1516K4.83.6+33%
T2021K3.51.8+94%

Generation Speed Curve (tok/s)

Qwen3.5-9B · 20-turn deep technical discussion · Edge Runtime

181260
T1T5T10T15T20
iPad Air M3
iPhone 17 Pro
iPhone Air
iPhone 17e
iPhone 15 PM
Standard (iPad)
iPad Air M3M3 · 8GB
20/20
T116.5
T203.5
tok/s
Fastest first turn
iPhone 17 ProA19 Pro · 12GB
20/20
T112.6
T2010.8
tok/s
Flagship experience
iPhone AirA19 · 12GB
20/20
T19.5
T207.5
tok/s
Near-zero degradation
iPhone 15 PMA17 Pro · 8GB
17/20
T17.4
T172.9
tok/s
Previous-gen flagship
iPhone 17eA19 · 8GB
20/20
T19.2
T202.7
tok/s
Entry-level capable
!

Standard implementation crashes on turn 2 on iPhones

Without our core inference algorithms, the stock implementation crashes on turn 2 on both iPhone 17 Pro and iPhone 17e. Edge Runtime's core inference algorithms complete all 20 turns smoothly on the same devices.

20 turns tested at 21,000+ tokens. At our current algorithm capacity, we can sustain ~26,000 tokens of continuous conversation — equivalent to reading and remembering an entire book, all on your phone.

Note: iPhone Air, iPhone 17 Pro, and Pro Max all have 12GB of physical RAM, yet iOS limits each app to roughly 6GB. We're not sure why Apple imposes this ceiling. If Apple unlocks more memory for apps in the future, we believe on-device AI performance will be even more impressive.

The Edge Suite

One brand, three tools, complete loop

Edge Studio

Preview

AI-powered model analysis & surgical optimization

Intelligent analysis engine that automatically detects redundant layers, inefficient neurons, and optimization opportunities. Our proprietary 7-step progressive pipeline performs neuron-level surgical pruning — not coarse-grained compression — with real-time perplexity monitoring to guarantee output quality.

  • Intelligent model analysis: auto-detect redundant layers & low-efficiency neurons
  • Neuron-level surgical pruning with perplexity-guided quality control
  • 7-step progressive optimization: vocab → neuron → layer → quantization
  • Built-in benchmark at every step (disk, RAM, tok/s, PPL)

Edge Runtime

Preview

Proprietary inference algorithms for Apple Silicon

Purpose-built inference engine with proprietary ANE-GPU co-scheduling, disaggregated inference architecture, and zero-copy model loading. Not a wrapper — original algorithms that achieve 11.3x prefill speedup and 79% GPU power reduction through ANE batch dispatch and concurrent pipeline execution.

  • Proprietary ANE-GPU co-scheduling: 11.3x prefill acceleration
  • Disaggregated inference: ANE prefill + GPU decode in parallel
  • Zero-copy model loading: 20x faster with no memory spike
  • Native support for Edge Studio's non-standard optimized architectures

Edge Scaffold

Preview

The definitive edge AI deployment solution

The only end-to-end pipeline from optimized model to published App Store app. Integrates Edge Runtime's proprietary inference, On-Demand Resources for intelligent model delivery, and built-in ESG carbon tracking — a complete deployment solution that no other platform offers.

  • End-to-end: optimized model → App Store app in one pipeline
  • ODR-powered intelligent model delivery (no bundling large files)
  • One config file: app name, model, system prompt — ship in minutes
  • Built-in ESG carbon savings tracking for sustainability compliance

More from AtomGradient

Vanilla

Vanilla

Available

LLM inference visualization & interactive exploration

Quick Start

import EdgeInference

let engine = LLMEngine()
try await engine.load(config: .find(modelID: "qwen3.5-0.8b")!)

for try await chunk in engine.generate(
    messages: [.user("What is edge AI?")]
) {
    print(chunk.text, terminator: "")
}

5 lines of Swift — load a model, stream tokens. That's it.

Get early access

Sign up to be notified when the AtomGradient Edge suite is publicly available. We'll send setup guides and invite you to our developer preview.

Hundreds of developers already on the waitlist

AtomGradient is bringing intelligence to every edge, NOT JUST Apple — stay tuned.