Provider Composition
WebLLM.io uses a flexible provider composition system that allows you to configure backends using simple strings or objects, while internally converting them to standardized provider functions. This design enables progressive disclosure: start with simple configs and opt into explicit providers when you need more control.
Configuration to Provider Pipeline
Every backend configuration goes through a resolution pipeline:
User Input Resolution Internal Provider──────────────────────────────────────────────────────────────────────────'auto' → auto-wrap → mlc(){ model: 'Llama-3.1' } → auto-wrap → mlc({ model: '...' })mlc({ ... }) → pass-through → mlc({ ... })CustomFunction → pass-through → CustomFunction
string/object normalize ResolvedProvider(Plain Config) (Provider Function)Local Provider Resolution
String Inputs
String inputs are shorthand for common configurations:
import { createClient } from '@webllm-io/sdk'
// Input: 'auto'const client1 = createClient({ local: 'auto'})// Resolves to: mlc() with automatic device-based model selection
// Input: 'Llama-3.1-8B-Instruct-q4f16_1-MLC'const client2 = createClient({ local: 'Llama-3.1-8B-Instruct-q4f16_1-MLC'})// Resolves to: mlc({ model: 'Llama-3.1-8B-Instruct-q4f16_1-MLC' })Object Inputs
Object inputs provide structured configuration:
// Input: LocalObjectConfigconst client = createClient({ local: { model: 'Llama-3.1-8B-Instruct-q4f16_1-MLC', useWorker: true, useCache: true }})// Resolves to: mlc({ model: '...', useWorker: true, useCache: true })Tiered Object Inputs
The responsive API uses a tiers object:
// Input: LocalTierConfigconst client = createClient({ local: { tiers: { high: 'Llama-3.1-8B-Instruct-q4f16_1-MLC', medium: 'Phi-3.5-mini-instruct-q4f16_1-MLC', low: 'Qwen2.5-1.5B-Instruct-q4f16_1-MLC' } }})// Resolves to: mlc() with device-grade-based tier selectionFunction Inputs
Explicit provider functions pass through unchanged:
import { mlc } from '@webllm-io/sdk/providers/mlc'
// Input: mlc() provider functionconst client = createClient({ local: mlc({ model: 'Llama-3.1-8B-Instruct-q4f16_1-MLC', useWorker: false, logLevel: 'DEBUG' })})// Resolves to: mlc({ ... }) (no transformation)Disabling Local
Set local: false to disable local inference:
const client = createClient({ local: false, cloud: { /* ... */ }})// Resolves to: null (no local backend)Cloud Provider Resolution
String Inputs
String inputs are treated as API keys with default OpenAI endpoint:
// Input: API key stringconst client = createClient({ cloud: 'sk-...'})// Resolves to: fetchSSE({// baseURL: 'https://api.openai.com/v1',// apiKey: 'sk-...',// model: 'gpt-4o-mini'// })Object Inputs
Object inputs provide full configuration:
// Input: CloudObjectConfigconst client = createClient({ cloud: { baseURL: 'https://api.openai.com/v1', apiKey: process.env.OPENAI_API_KEY, model: 'gpt-4o-mini', timeout: 30000, maxRetries: 2 }})// Resolves to: fetchSSE({ ... })Function Inputs
Explicit provider functions pass through:
import { fetchSSE } from '@webllm-io/sdk/providers/fetch'
// Input: fetchSSE() provider functionconst client = createClient({ cloud: fetchSSE({ baseURL: 'https://api.openai.com/v1', apiKey: process.env.OPENAI_API_KEY, model: 'gpt-4o', timeout: 60000 })})// Resolves to: fetchSSE({ ... }) (no transformation)Custom Cloud Functions
You can provide a custom CloudFn implementation:
import type { CloudFn } from '@webllm-io/sdk'
const customCloudProvider: CloudFn = async ({ messages, options }) => { const response = await fetch('https://my-custom-api.com/chat', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ messages, ...options }) })
// Return AsyncIterable<ChatCompletionChunk> return parseSSEStream(response.body)}
const client = createClient({ cloud: customCloudProvider})// Resolves to: customCloudProvider (no transformation)Disabling Cloud
Set cloud: false to disable cloud fallback:
const client = createClient({ local: 'auto', cloud: false})// Resolves to: null (no cloud backend)Provider Types
ResolvedLocalBackend
After resolution, local configs become ResolvedLocalBackend:
type ResolvedLocalBackend = { type: 'mlc' model: string useWorker: boolean useCache: boolean workerUrl?: string initProgressCallback?: (report: InitProgressReport) => void logLevel?: 'DEBUG' | 'INFO' | 'WARN' | 'ERROR'}ResolvedCloudBackend
After resolution, cloud configs become ResolvedCloudBackend:
type ResolvedCloudBackend = { type: 'fetchSSE' | 'custom' baseURL: string apiKey?: string model?: string timeout?: number maxRetries?: number headers?: Record<string, string> fetch?: typeof fetch}Resolution Flow Diagram
createClient({ local, cloud }) ↓┌───────────────────────────────────────────┐│ Configuration Resolution │├───────────────────────────────────────────┤│ Local Input → Normalize → mlc() ││ Cloud Input → Normalize → fetchSSE()│└───────────────────────────────────────────┘ ↓┌───────────────────────────────────────────┐│ Provider Initialization │├───────────────────────────────────────────┤│ mlc() creates MLCEngine instance ││ fetchSSE() creates fetch wrapper │└───────────────────────────────────────────┘ ↓┌───────────────────────────────────────────┐│ Backend Registration │├───────────────────────────────────────────┤│ Register with InferenceBackend manager ││ Register with Router for decision logic │└───────────────────────────────────────────┘ ↓WebLLMClient readyAuto-Wrapping Examples
Example 1: Zero Config to mlc()
// User writes:createClient({ local: 'auto' })
// SDK transforms to:createClient({ local: mlc({ model: detectDeviceGrade() === 'S' ? 'Llama-3.1-8B-...' : detectDeviceGrade() === 'A' ? 'Llama-3.1-8B-...' : detectDeviceGrade() === 'B' ? 'Phi-3.5-mini-...' : 'Qwen2.5-1.5B-...', useWorker: true, useCache: true })})Example 2: Object Config to mlc()
// User writes:createClient({ local: { model: 'Llama-3.1-8B-Instruct-q4f16_1-MLC', useCache: false }})
// SDK transforms to:createClient({ local: mlc({ model: 'Llama-3.1-8B-Instruct-q4f16_1-MLC', useWorker: true, // Default useCache: false })})Example 3: String API Key to fetchSSE()
// User writes:createClient({ cloud: process.env.OPENAI_API_KEY})
// SDK transforms to:createClient({ cloud: fetchSSE({ baseURL: 'https://api.openai.com/v1', apiKey: process.env.OPENAI_API_KEY, model: 'gpt-4o-mini', timeout: 30000, maxRetries: 1 })})Example 4: Cloud Object to fetchSSE()
// User writes:createClient({ cloud: { baseURL: 'https://api.together.xyz/v1', apiKey: process.env.TOGETHER_API_KEY, model: 'meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo' }})
// SDK transforms to:createClient({ cloud: fetchSSE({ baseURL: 'https://api.together.xyz/v1', apiKey: process.env.TOGETHER_API_KEY, model: 'meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo', timeout: 30000, // Default maxRetries: 1 // Default })})Custom Provider Implementation
Local Provider Example
Implementing a custom local provider (hypothetical):
import type { LocalFn } from '@webllm-io/sdk'
const customLocalProvider: LocalFn = async ({ messages, options }) => { // Initialize custom WebGPU inference engine const engine = await initCustomEngine()
// Generate response const stream = await engine.generate(messages, options)
// Return AsyncIterable<ChatCompletionChunk> return { async *[Symbol.asyncIterator]() { for await (const token of stream) { yield { id: generateId(), object: 'chat.completion.chunk', created: Date.now(), model: 'custom-model', choices: [{ index: 0, delta: { content: token }, finish_reason: null }] } } } }}
const client = createClient({ local: customLocalProvider})Cloud Provider Example
Implementing a custom cloud provider for Anthropic (requires a custom provider since Anthropic uses /v1/messages, not the OpenAI-compatible /v1/chat/completions format):
import type { CloudFn } from '@webllm-io/sdk'
const anthropicProvider: CloudFn = async ({ messages, options }) => { const response = await fetch('https://api.anthropic.com/v1/messages', { method: 'POST', headers: { 'Content-Type': 'application/json', 'x-api-key': process.env.ANTHROPIC_API_KEY, 'anthropic-version': '2023-06-01' }, body: JSON.stringify({ model: 'claude-3-5-sonnet-20241022', messages: messages, stream: true, max_tokens: options.max_tokens || 4096 }) })
// Parse Anthropic's SSE format (different from OpenAI) return parseAnthropicSSE(response.body)}
const client = createClient({ cloud: anthropicProvider})Benefits of Provider Composition
1. Progressive Disclosure
Start simple, add complexity only when needed:
// Beginner: simple stringcreateClient({ local: 'auto' })
// Intermediate: object configcreateClient({ local: { model: '...', useCache: false } })
// Advanced: explicit providercreateClient({ local: mlc({ model: '...', logLevel: 'DEBUG' }) })2. Type Safety
TypeScript ensures valid configurations at compile time:
// ✅ ValidcreateClient({ local: 'auto' })createClient({ local: { model: 'Llama-3.1-8B' } })createClient({ local: mlc({ model: 'Llama-3.1-8B' }) })
// ❌ Type errorcreateClient({ local: 123 })createClient({ local: { invalidKey: true } })3. Extensibility
Custom providers integrate seamlessly:
const client = createClient({ local: myCustomLocalProvider, cloud: myCustomCloudProvider})4. Default Optimizations
Auto-wrapped providers get sensible defaults:
useWorker: trueprevents UI freezinguseCache: truespeeds up subsequent loadstimeout: 30000prevents hung requestsmaxRetries: 1handles transient network errors
Best Practices
1. Use Auto-Wrapping for Simplicity
Let the SDK handle provider creation:
// ✅ Recommended for most use casescreateClient({ local: 'auto', cloud: { baseURL: '...', apiKey: '...' }})2. Use Explicit Providers for Advanced Control
When you need debugging or non-standard configs:
// ✅ Use explicit providers for advanced scenariosimport { mlc } from '@webllm-io/sdk/providers/mlc'
createClient({ local: mlc({ model: 'Llama-3.1-8B-Instruct-q4f16_1-MLC', logLevel: 'DEBUG', initProgressCallback: (report) => { console.log(`Loading: ${report.text}`) } })})3. Validate Custom Providers
Ensure custom providers match the expected interface:
import type { CloudFn } from '@webllm-io/sdk'
const myProvider: CloudFn = async ({ messages, options }) => { // Implementation must return AsyncIterable<ChatCompletionChunk> // ...}4. Document Provider Assumptions
If you’re wrapping a non-standard API, document the assumptions:
/** * Custom provider for XYZ API * Assumes: * - OpenAI-compatible message format * - SSE streaming with 'data:' prefix * - Authentication via 'X-API-Key' header */const xyzProvider = async ({ messages, options }) => { // ...}Next Steps
- Learn about Architecture for overall system design
- Understand Three-Level API for configuration patterns
- Explore Device Scoring for automatic model selection