Open Source AI Infrastructure

The AI Runtime
for Every Browser

Every device is different. Your code shouldn't be.
One API that auto-routes between local WebGPU and cloud — zero code change.

WebGPU Local SDK Cloud API

One API. Adaptive Runtime.

Write your code once. The SDK detects device capability and routes inference automatically.

app.ts
import { createClient } from '@webllm-io/sdk';

const client = createClient({
  local: 'auto',                        // auto-detect device capability
  cloud: { baseURL: 'https://api.openai.com/v1' }, // fallback target
});

// Same API, same types — whether local or cloud
const stream = await client.chat.completions.create({
  messages: [{ role: 'user', content: 'Explain quantum computing' }],
  stream: true,
});
S / A WebGPU Local (8B) MacBook Pro, RTX GPUs
B / C WebGPU Local (smaller) iPad, mobile devices
No WebGPU Cloud Fallback Same API, zero change
WebWorker Isolation OPFS Cache Zero Config Smart Routing

Cloud-First, Then Local. Seamlessly.

1

User asks a question

2

Cloud responds instantly

zero wait

3

Model downloads silently

background

4

Next request runs locally

automatic switch

Get Started

Add intelligent, adaptive AI to your web app in minutes.

npm install @webllm-io/sdk

Coming soon: @webllm-io/ui components · @webllm-io/rag private document search