Developer Preview

Make AI grow on every device

On-device model optimization, inference, personalization, cross-device mesh, and one-click publishing — full-stack, zero cloud dependency. Currently available on Apple platforms. More coming soon.

Docs coming soonGitHub
main.swift
import EdgeInference

let engine = LLMEngine()
try await engine.load(config: .find(
    modelID: "qwen3.5-4b"
)!)

for try await chunk in engine.generate(
  messages: [.user("What is edge AI?")]
) {
  print(chunk.text, terminator: "")
}

5 lines — load a model, stream tokens

Full-Stack Ownership

From inference kernel to agent shipping — every layer is ours

Cloud AI can't — they sell compute. Open source can't — nobody else controls all five layers. Data never leaves the user's device. Shipping on Apple first, cross-platform architecture ready.

Your Agent
Edge Kit
Inference SDK
Edge Halo
Evolution · Patented
Edge Mesh
Device Mesh
Edge Engine
Native Runtime · DSR Attention
Edge Studio
Optimization
Edge Scaffold
Ship Apps

Get early access

Sign up to be notified when the Edge developer products are publicly available. We'll send setup guides and invite you to the developer preview.

Continuous Learning

Patented HALO Algorithm

The industry is exploring how to make models learn continuously. We shipped it on-device.

Google, OpenAI, and Anthropic are all researching how to make models grow with users — but their approaches require cloud infrastructure and data uploads. Built on our patented HALO algorithm system, Edge Halo puts the entire evolution loop on the user's device: from behavior collection, profile extraction, adapter training, to real-time steering. Data physically never leaves the device.

Collect

User interactions stored locally, encrypted on device

Profile

Extract the user's preference geometry — not keywords, but directions in activation space

Train

Train a lightweight adapter on the user's own Mac, transfer back via device mesh

Steer

Inject into inference in real-time, no model reload needed

Rollback

User can revert to the base model at any time

Cloud vs On-Device

Data uploaded to cloud servers
Data never leaves the device
Platform owns user profiles
Users own their profiles
Requires constant connectivity
Works fully offline
All users share one model
Each user gets their own adapter
Platform decides when to update
Users decide when to train and rollback
Edge Halo docs coming soon

The Edge Suite

01

Edge Engine

Native Metal runtime for Apple Silicon

Purpose-built inference runtime. Metal command scheduling, tensor abstractions, model-family implementations. Foundation for Edge Kit.

02

Edge Kit

Swift SDK for on-device AI

Complete inference SDK supporting LLM, VLM, ASR, and TTS. Streaming output, multi-turn conversation, automatic memory management, LoRA adapter support.

03

Edge Halo

Model self-evolution

Models that grow with users. User profiling, adapter lifecycle management, real-time activation steering. All data stays on-device.

04

Edge Scaffold

From model to App Store

iOS app template. One config file, automatic device detection, four-tier model delivery. From optimized model to published app in minutes.

05

Edge Studio

Model optimization workbench

Analyze, optimize, benchmark, and export models. 117+ architectures, device-specific recommendations, one-click app generation.

Benchmarks

Not slides — real devices, real models, real numbers

Qwen3.5-9B-4bit, 200-turn continuous conversation stress test, metrics recorded per turn.

11.5TPS
9B sustained throughput
iPhone 17 Pro · stable across 200 turns
<1s
Time to first token
496ms avg for 9B (17 Pro)
Prefill acceleration
Custom operators vs generic framework
2.1×
VLM image processing
1803 vs 851 TPS

9B Model · 200-Turn Conversation · Per-Device

DeviceChipRAMFirstAvgLastTTFT
iPhone 17 ProA19 Pro12G13.611.511.4496ms
iPhone AirA19 Pro12G9.18.28.75868ms
iPad Air M3M38G~99.19.062192ms

TPS = tokens/sec · TTFT = mean time to first token · Qwen3.5-9B-4bit · iPad Air M3 ran 2048 tokens/turn with context compaction; others 1024 tokens/turn

Custom Engine vs Generic Framework

Text Prefill (4B)
1305vs187TPS
Text Prefill (9B)
843vs122TPS
6.9×
VLM Prefill (4B)
1803vs851TPS
2.1×
VLM Prefill (9B)
1234vs511TPS
2.4×

M2 Ultra 192GB · Baseline: mlx_vlm (Python MLX)

Shipped — Powered by Edge

Vanilla

Vanilla

Available

Real-time visualization of large language model inference for researchers, learners, and educators. v1.0.4 adds VLM image input, model discovery, faster visual rendering, animated transitions, and error recovery.