Skip to content

Provider Composition

WebLLM.io uses a flexible provider composition system that allows you to configure backends using simple strings or objects, while internally converting them to standardized provider functions. This design enables progressive disclosure: start with simple configs and opt into explicit providers when you need more control.

Configuration to Provider Pipeline

Every backend configuration goes through a resolution pipeline:

User Input Resolution Internal Provider
──────────────────────────────────────────────────────────────────────────
'auto' → auto-wrap → mlc()
{ model: 'Llama-3.1' } → auto-wrap → mlc({ model: '...' })
mlc({ ... }) → pass-through → mlc({ ... })
CustomFunction → pass-through → CustomFunction
string/object normalize ResolvedProvider
(Plain Config) (Provider Function)

Local Provider Resolution

String Inputs

String inputs are shorthand for common configurations:

import { createClient } from '@webllm-io/sdk'
// Input: 'auto'
const client1 = createClient({
local: 'auto'
})
// Resolves to: mlc() with automatic device-based model selection
// Input: 'Llama-3.1-8B-Instruct-q4f16_1-MLC'
const client2 = createClient({
local: 'Llama-3.1-8B-Instruct-q4f16_1-MLC'
})
// Resolves to: mlc({ model: 'Llama-3.1-8B-Instruct-q4f16_1-MLC' })

Object Inputs

Object inputs provide structured configuration:

// Input: LocalObjectConfig
const client = createClient({
local: {
model: 'Llama-3.1-8B-Instruct-q4f16_1-MLC',
useWorker: true,
useCache: true
}
})
// Resolves to: mlc({ model: '...', useWorker: true, useCache: true })

Tiered Object Inputs

The responsive API uses a tiers object:

// Input: LocalTierConfig
const client = createClient({
local: {
tiers: {
high: 'Llama-3.1-8B-Instruct-q4f16_1-MLC',
medium: 'Phi-3.5-mini-instruct-q4f16_1-MLC',
low: 'Qwen2.5-1.5B-Instruct-q4f16_1-MLC'
}
}
})
// Resolves to: mlc() with device-grade-based tier selection

Function Inputs

Explicit provider functions pass through unchanged:

import { mlc } from '@webllm-io/sdk/providers/mlc'
// Input: mlc() provider function
const client = createClient({
local: mlc({
model: 'Llama-3.1-8B-Instruct-q4f16_1-MLC',
useWorker: false,
logLevel: 'DEBUG'
})
})
// Resolves to: mlc({ ... }) (no transformation)

Disabling Local

Set local: false to disable local inference:

const client = createClient({
local: false,
cloud: { /* ... */ }
})
// Resolves to: null (no local backend)

Cloud Provider Resolution

String Inputs

String inputs are treated as API keys with default OpenAI endpoint:

// Input: API key string
const client = createClient({
cloud: 'sk-...'
})
// Resolves to: fetchSSE({
// baseURL: 'https://api.openai.com/v1',
// apiKey: 'sk-...',
// model: 'gpt-4o-mini'
// })

Object Inputs

Object inputs provide full configuration:

// Input: CloudObjectConfig
const client = createClient({
cloud: {
baseURL: 'https://api.openai.com/v1',
apiKey: process.env.OPENAI_API_KEY,
model: 'gpt-4o-mini',
timeout: 30000,
maxRetries: 2
}
})
// Resolves to: fetchSSE({ ... })

Function Inputs

Explicit provider functions pass through:

import { fetchSSE } from '@webllm-io/sdk/providers/fetch'
// Input: fetchSSE() provider function
const client = createClient({
cloud: fetchSSE({
baseURL: 'https://api.openai.com/v1',
apiKey: process.env.OPENAI_API_KEY,
model: 'gpt-4o',
timeout: 60000
})
})
// Resolves to: fetchSSE({ ... }) (no transformation)

Custom Cloud Functions

You can provide a custom CloudFn implementation:

import type { CloudFn } from '@webllm-io/sdk'
const customCloudProvider: CloudFn = async ({ messages, options }) => {
const response = await fetch('https://my-custom-api.com/chat', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ messages, ...options })
})
// Return AsyncIterable<ChatCompletionChunk>
return parseSSEStream(response.body)
}
const client = createClient({
cloud: customCloudProvider
})
// Resolves to: customCloudProvider (no transformation)

Disabling Cloud

Set cloud: false to disable cloud fallback:

const client = createClient({
local: 'auto',
cloud: false
})
// Resolves to: null (no cloud backend)

Provider Types

ResolvedLocalBackend

After resolution, local configs become ResolvedLocalBackend:

type ResolvedLocalBackend = {
type: 'mlc'
model: string
useWorker: boolean
useCache: boolean
workerUrl?: string
initProgressCallback?: (report: InitProgressReport) => void
logLevel?: 'DEBUG' | 'INFO' | 'WARN' | 'ERROR'
}

ResolvedCloudBackend

After resolution, cloud configs become ResolvedCloudBackend:

type ResolvedCloudBackend = {
type: 'fetchSSE' | 'custom'
baseURL: string
apiKey?: string
model?: string
timeout?: number
maxRetries?: number
headers?: Record<string, string>
fetch?: typeof fetch
}

Resolution Flow Diagram

createClient({ local, cloud })
┌───────────────────────────────────────────┐
│ Configuration Resolution │
├───────────────────────────────────────────┤
│ Local Input → Normalize → mlc() │
│ Cloud Input → Normalize → fetchSSE()│
└───────────────────────────────────────────┘
┌───────────────────────────────────────────┐
│ Provider Initialization │
├───────────────────────────────────────────┤
│ mlc() creates MLCEngine instance │
│ fetchSSE() creates fetch wrapper │
└───────────────────────────────────────────┘
┌───────────────────────────────────────────┐
│ Backend Registration │
├───────────────────────────────────────────┤
│ Register with InferenceBackend manager │
│ Register with Router for decision logic │
└───────────────────────────────────────────┘
WebLLMClient ready

Auto-Wrapping Examples

Example 1: Zero Config to mlc()

// User writes:
createClient({ local: 'auto' })
// SDK transforms to:
createClient({
local: mlc({
model: detectDeviceGrade() === 'S' ? 'Llama-3.1-8B-...' :
detectDeviceGrade() === 'A' ? 'Llama-3.1-8B-...' :
detectDeviceGrade() === 'B' ? 'Phi-3.5-mini-...' :
'Qwen2.5-1.5B-...',
useWorker: true,
useCache: true
})
})

Example 2: Object Config to mlc()

// User writes:
createClient({
local: {
model: 'Llama-3.1-8B-Instruct-q4f16_1-MLC',
useCache: false
}
})
// SDK transforms to:
createClient({
local: mlc({
model: 'Llama-3.1-8B-Instruct-q4f16_1-MLC',
useWorker: true, // Default
useCache: false
})
})

Example 3: String API Key to fetchSSE()

// User writes:
createClient({
cloud: process.env.OPENAI_API_KEY
})
// SDK transforms to:
createClient({
cloud: fetchSSE({
baseURL: 'https://api.openai.com/v1',
apiKey: process.env.OPENAI_API_KEY,
model: 'gpt-4o-mini',
timeout: 30000,
maxRetries: 1
})
})

Example 4: Cloud Object to fetchSSE()

// User writes:
createClient({
cloud: {
baseURL: 'https://api.together.xyz/v1',
apiKey: process.env.TOGETHER_API_KEY,
model: 'meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo'
}
})
// SDK transforms to:
createClient({
cloud: fetchSSE({
baseURL: 'https://api.together.xyz/v1',
apiKey: process.env.TOGETHER_API_KEY,
model: 'meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo',
timeout: 30000, // Default
maxRetries: 1 // Default
})
})

Custom Provider Implementation

Local Provider Example

Implementing a custom local provider (hypothetical):

import type { LocalFn } from '@webllm-io/sdk'
const customLocalProvider: LocalFn = async ({ messages, options }) => {
// Initialize custom WebGPU inference engine
const engine = await initCustomEngine()
// Generate response
const stream = await engine.generate(messages, options)
// Return AsyncIterable<ChatCompletionChunk>
return {
async *[Symbol.asyncIterator]() {
for await (const token of stream) {
yield {
id: generateId(),
object: 'chat.completion.chunk',
created: Date.now(),
model: 'custom-model',
choices: [{
index: 0,
delta: { content: token },
finish_reason: null
}]
}
}
}
}
}
const client = createClient({
local: customLocalProvider
})

Cloud Provider Example

Implementing a custom cloud provider for Anthropic (requires a custom provider since Anthropic uses /v1/messages, not the OpenAI-compatible /v1/chat/completions format):

import type { CloudFn } from '@webllm-io/sdk'
const anthropicProvider: CloudFn = async ({ messages, options }) => {
const response = await fetch('https://api.anthropic.com/v1/messages', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'x-api-key': process.env.ANTHROPIC_API_KEY,
'anthropic-version': '2023-06-01'
},
body: JSON.stringify({
model: 'claude-3-5-sonnet-20241022',
messages: messages,
stream: true,
max_tokens: options.max_tokens || 4096
})
})
// Parse Anthropic's SSE format (different from OpenAI)
return parseAnthropicSSE(response.body)
}
const client = createClient({
cloud: anthropicProvider
})

Benefits of Provider Composition

1. Progressive Disclosure

Start simple, add complexity only when needed:

// Beginner: simple string
createClient({ local: 'auto' })
// Intermediate: object config
createClient({ local: { model: '...', useCache: false } })
// Advanced: explicit provider
createClient({ local: mlc({ model: '...', logLevel: 'DEBUG' }) })

2. Type Safety

TypeScript ensures valid configurations at compile time:

// ✅ Valid
createClient({ local: 'auto' })
createClient({ local: { model: 'Llama-3.1-8B' } })
createClient({ local: mlc({ model: 'Llama-3.1-8B' }) })
// ❌ Type error
createClient({ local: 123 })
createClient({ local: { invalidKey: true } })

3. Extensibility

Custom providers integrate seamlessly:

const client = createClient({
local: myCustomLocalProvider,
cloud: myCustomCloudProvider
})

4. Default Optimizations

Auto-wrapped providers get sensible defaults:

  • useWorker: true prevents UI freezing
  • useCache: true speeds up subsequent loads
  • timeout: 30000 prevents hung requests
  • maxRetries: 1 handles transient network errors

Best Practices

1. Use Auto-Wrapping for Simplicity

Let the SDK handle provider creation:

// ✅ Recommended for most use cases
createClient({
local: 'auto',
cloud: { baseURL: '...', apiKey: '...' }
})

2. Use Explicit Providers for Advanced Control

When you need debugging or non-standard configs:

// ✅ Use explicit providers for advanced scenarios
import { mlc } from '@webllm-io/sdk/providers/mlc'
createClient({
local: mlc({
model: 'Llama-3.1-8B-Instruct-q4f16_1-MLC',
logLevel: 'DEBUG',
initProgressCallback: (report) => {
console.log(`Loading: ${report.text}`)
}
})
})

3. Validate Custom Providers

Ensure custom providers match the expected interface:

import type { CloudFn } from '@webllm-io/sdk'
const myProvider: CloudFn = async ({ messages, options }) => {
// Implementation must return AsyncIterable<ChatCompletionChunk>
// ...
}

4. Document Provider Assumptions

If you’re wrapping a non-standard API, document the assumptions:

/**
* Custom provider for XYZ API
* Assumes:
* - OpenAI-compatible message format
* - SSE streaming with 'data:' prefix
* - Authentication via 'X-API-Key' header
*/
const xyzProvider = async ({ messages, options }) => {
// ...
}

Next Steps