On-device model optimization, inference, personalization, cross-device mesh, and one-click publishing — full-stack, zero cloud dependency. Currently available on Apple platforms. More coming soon.
import EdgeInference
let engine = LLMEngine()
try await engine.load(config: .find(
modelID: "qwen3.5-4b"
)!)
for try await chunk in engine.generate(
messages: [.user("What is edge AI?")]
) {
print(chunk.text, terminator: "")
}5 lines — load a model, stream tokens
Full-Stack Ownership
Cloud AI can't — they sell compute. Open source can't — nobody else controls all five layers. Data never leaves the user's device. Shipping on Apple first, cross-platform architecture ready.
Sign up to be notified when the Edge developer products are publicly available. We'll send setup guides and invite you to the developer preview.
Continuous Learning
Patented HALO AlgorithmGoogle, OpenAI, and Anthropic are all researching how to make models grow with users — but their approaches require cloud infrastructure and data uploads. Built on our patented HALO algorithm system, Edge Halo puts the entire evolution loop on the user's device: from behavior collection, profile extraction, adapter training, to real-time steering. Data physically never leaves the device.
User interactions stored locally, encrypted on device
Extract the user's preference geometry — not keywords, but directions in activation space
Train a lightweight adapter on the user's own Mac, transfer back via device mesh
Inject into inference in real-time, no model reload needed
User can revert to the base model at any time
Cloud vs On-Device
The Edge Suite
Purpose-built inference runtime. Metal command scheduling, tensor abstractions, model-family implementations. Foundation for Edge Kit.
Complete inference SDK supporting LLM, VLM, ASR, and TTS. Streaming output, multi-turn conversation, automatic memory management, LoRA adapter support.
Models that grow with users. User profiling, adapter lifecycle management, real-time activation steering. All data stays on-device.
iOS app template. One config file, automatic device detection, four-tier model delivery. From optimized model to published app in minutes.
Analyze, optimize, benchmark, and export models. 117+ architectures, device-specific recommendations, one-click app generation.
Benchmarks
Qwen3.5-9B-4bit, 200-turn continuous conversation stress test, metrics recorded per turn.
| Device | Chip | RAM | First | Avg | Last | TTFT |
|---|---|---|---|---|---|---|
| iPhone 17 Pro | A19 Pro | 12G | 13.6 | 11.5 | 11.4 | 496ms |
| iPhone Air | A19 Pro | 12G | 9.1 | 8.2 | 8.75 | 868ms |
| iPad Air M3 | M3 | 8G | ~9 | 9.1 | 9.06 | 2192ms |
TPS = tokens/sec · TTFT = mean time to first token · Qwen3.5-9B-4bit · iPad Air M3 ran 2048 tokens/turn with context compaction; others 1024 tokens/turn
M2 Ultra 192GB · Baseline: mlx_vlm (Python MLX)
Shipped — Powered by Edge
Real-time visualization of large language model inference for researchers, learners, and educators. v1.0.4 adds VLM image input, model discovery, faster visual rendering, animated transitions, and error recovery.