Welcome to WebLLM.io
Welcome to WebLLM.io
The AI Runtime for Every Browser — One unified API that intelligently routes between local WebGPU inference and cloud providers.
Why WebLLM.io?
WebLLM.io brings AI inference directly to the browser with smart fallback capabilities. It automatically chooses the best execution strategy based on your user’s device capabilities.
Key Features
- Smart Routing — Automatically selects local or cloud inference based on device capability
- Zero Configuration — Works out of the box with sensible defaults
- WebWorker Isolation — Runs inference in Web Workers to keep your UI responsive
- OPFS Caching — Efficient model storage using the Origin Private File System
- OpenAI Compatible — Familiar API interface for seamless integration
- Progressive Enhancement — Gracefully falls back to cloud when local inference isn’t available
How It Works
WebLLM.io scores your user’s device (S/A/B/C grades based on VRAM) and automatically:
- High-end devices — Run powerful models locally via WebGPU
- Mid-range devices — Use lightweight models or cloud fallback
- Low-end devices — Seamlessly route to cloud providers
All with the same simple API call.
Quick Links
- Installation — Get started in 2 minutes
- Quick Start — Your first completion in 5 lines of code
- Playground — Try it live in your browser
Example
import { createClient } from '@webllm-io/sdk';
const client = createClient({ local: 'auto', cloud: { baseURL: 'https://api.openai.com/v1', apiKey: 'sk-...' },});
const result = await client.chat.completions.create({ messages: [{ role: 'user', content: 'Hello, world!' }],});
console.log(result.choices[0].message.content);That’s it. WebLLM.io handles device detection, model selection, WebWorker orchestration, and cloud fallback automatically.
Browser Support
- Local Inference — Chrome 113+, Edge 113+ (WebGPU required)
- Cloud Mode — All modern browsers
Ready to get started? Head over to the Installation Guide.