Open Source AI Infrastructure
The AI Runtime
for Every Browser
Every device is different. Your code shouldn't be.
One API that auto-routes between local WebGPU and cloud — zero code change.
WebGPU Local SDK Cloud API
The Solution
One API. Adaptive Runtime.
Write your code once. The SDK detects device capability and routes inference automatically.
app.ts
import { createClient } from '@webllm-io/sdk';
const client = createClient({
local: 'auto', // auto-detect device capability
cloud: { baseURL: 'https://api.openai.com/v1' }, // fallback target
});
// Same API, same types — whether local or cloud
const stream = await client.chat.completions.create({
messages: [{ role: 'user', content: 'Explain quantum computing' }],
stream: true,
}); S / A WebGPU Local (8B) MacBook Pro, RTX GPUs
B / C WebGPU Local (smaller) iPad, mobile devices
No WebGPU Cloud Fallback Same API, zero change
WebWorker Isolation OPFS Cache Zero Config Smart Routing
How It Feels
Cloud-First, Then Local. Seamlessly.
1
User asks a question
2
Cloud responds instantly
zero wait
3
Model downloads silently
background
4
Next request runs locally
automatic switch
Get Started
Add intelligent, adaptive AI to your web app in minutes.
npm install @webllm-io/sdk
Coming soon: @webllm-io/ui components · @webllm-io/rag private document search