Custom Providers
WebLLM SDK uses a provider system to abstract inference backends. You can use the built-in mlc() and fetchSSE() factories, or provide a completely custom function.
Built-in Providers
mlc() — Local Provider
The mlc() function creates a local inference provider using the MLC engine:
import { createClient } from '@webllm-io/sdk';import { mlc } from '@webllm-io/sdk/providers/mlc';
const client = createClient({ local: mlc({ model: 'Llama-3.1-8B-Instruct-q4f16_1-MLC', useWebWorker: true, useCache: true, tiers: { high: 'Llama-3.1-8B-Instruct-q4f16_1-MLC', medium: 'Llama-3.2-3B-Instruct-q4f16_1-MLC', low: 'Qwen2.5-1.5B-Instruct-q4f16_1-MLC', }, }),});Options
| Option | Type | Default | Description |
|---|---|---|---|
model | string | — | Fixed model ID. Overrides tier selection. |
tiers | TiersConfig | — | Model per device grade (high/medium/low) |
useCache | boolean | true | Enable OPFS model caching |
useWebWorker | boolean | true | Run in Web Worker |
fetchSSE() — Cloud Provider
The fetchSSE() function creates a cloud inference provider using SSE streaming:
import { createClient } from '@webllm-io/sdk';import { fetchSSE } from '@webllm-io/sdk/providers/fetch';
const client = createClient({ cloud: fetchSSE({ baseURL: 'https://api.openai.com/v1', apiKey: 'sk-...', model: 'gpt-4o-mini', timeout: 30000, retries: 2, }),});You can also pass just a URL string:
const client = createClient({ cloud: fetchSSE('https://api.openai.com/v1'),});Options
| Option | Type | Default | Description |
|---|---|---|---|
baseURL | string | — | API endpoint URL (required) |
apiKey | string | — | Authorization bearer token |
model | string | — | Model identifier |
headers | Record<string, string> | — | Additional request headers |
timeout | number | — | Request timeout in milliseconds |
retries | number | — | Number of retry attempts |
Custom Cloud Function
For full control, pass a function as the cloud config. It receives messages and a route context, and must return either a ChatCompletion or an AsyncIterable<ChatCompletionChunk>:
import type { CloudFn } from '@webllm-io/sdk';
const myProvider: CloudFn = async (messages, context) => { const response = await fetch('https://my-api.example.com/chat', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ messages }), });
return response.json();};
const client = createClient({ cloud: myProvider,});CloudFn Type
type CloudFn = ( messages: Message[], context: RouteContext,) => Promise<ChatCompletion> | AsyncIterable<ChatCompletionChunk>;Config vs Explicit Provider
There are two ways to configure each backend:
Plain config — the SDK wraps it into the default provider automatically:
createClient({ local: 'auto', cloud: { baseURL: '...', apiKey: '...' },});Explicit provider — you call the factory yourself for more control:
createClient({ local: mlc({ useWebWorker: false }), cloud: fetchSSE({ baseURL: '...', retries: 5 }),});Both approaches produce the same result. Use explicit providers when you need options that aren’t exposed in the plain config shorthand.