Open Source AI Infrastructure

The AI Runtime
for Every Browser

Every device is different. Your code shouldn't be.
One API that auto-routes between local WebGPU and cloud — zero code change.

Get Started Live Playground

WebGPU Local SDK Cloud API

The Solution

One API. Adaptive Runtime.

Write your code once. The SDK detects device capability and routes inference automatically.

app.ts

import { createClient } from '@webllm-io/sdk';

const client = createClient({
  local: 'auto',                        // auto-detect device capability
  cloud: { baseURL: 'https://api.openai.com/v1' }, // fallback target
});

// Same API, same types — whether local or cloud
const stream = await client.chat.completions.create({
  messages: [{ role: 'user', content: 'Explain quantum computing' }],
  stream: true,
});

S / A WebGPU Local (8B) MacBook Pro, RTX GPUs

B / C WebGPU Local (smaller) iPad, mobile devices

No WebGPU Cloud Fallback Same API, zero change

WebWorker Isolation OPFS Cache Zero Config Smart Routing

How It Feels

Cloud-First, Then Local. Seamlessly.

User asks a question

Cloud responds instantly

zero wait

Model downloads silently

background

Next request runs locally

automatic switch

Get Started

Add intelligent, adaptive AI to your web app in minutes.

npm install @webllm-io/sdk

Read the Docs Try Playground

Coming soon: @webllm-io/ui components · @webllm-io/rag private document search

The AI Runtimefor Every Browser

One API. Adaptive Runtime.

Cloud-First, Then Local. Seamlessly.

Get Started

The AI Runtime
for Every Browser