Cloud Fallback
WebLLM.io supports cloud-based inference using OpenAI-compatible APIs. This enables instant responses, server-side model execution, and zero local dependencies for pure cloud mode.
Zero Dependencies for Cloud-Only Mode
When using cloud-only configuration (no local inference), WebLLM.io has zero external dependencies. The SDK includes a self-implemented SSE (Server-Sent Events) parser, eliminating the need for the OpenAI SDK or any other HTTP client library.
# Cloud-only mode requires NO peer dependenciesnpm install @webllm-io/sdkBasic Usage
Configure cloud inference with an API endpoint and key:
import { createClient } from '@webllm-io/sdk';
const client = createClient({ cloud: { baseURL: 'https://api.openai.com/v1', apiKey: 'sk-your-api-key-here', model: 'gpt-4o-mini' }});
const completion = await client.chat.completions.create({ messages: [ { role: 'user', content: 'Explain quantum computing' } ]});
console.log(completion.choices[0].message.content);Shorthand String Configuration
For quick setup, pass the API key as a string (assumes OpenAI):
const client = createClient({ cloud: 'sk-your-api-key-here'});
// Specify model in each requestconst completion = await client.chat.completions.create({ messages: [{ role: 'user', content: 'Hello!' }], model: 'gpt-4o-mini'});OpenAI-Compatible APIs
WebLLM.io works with any OpenAI-compatible API endpoint. Examples include:
OpenAI
const client = createClient({ cloud: { baseURL: 'https://api.openai.com/v1', apiKey: process.env.OPENAI_API_KEY, model: 'gpt-4o-mini' }});Azure OpenAI
const client = createClient({ cloud: { baseURL: 'https://your-resource.openai.azure.com/openai/deployments/your-deployment', apiKey: process.env.AZURE_OPENAI_KEY, headers: { 'api-version': '2024-02-01' }, model: 'gpt-4o' }});Groq
const client = createClient({ cloud: { baseURL: 'https://api.groq.com/openai/v1', apiKey: process.env.GROQ_API_KEY, model: 'llama-3.1-70b-versatile' }});Local OpenAI-Compatible Server
const client = createClient({ cloud: { baseURL: 'http://localhost:1234/v1', model: 'local-model' // No API key required for local servers }});Custom Headers
Add custom headers for authentication or API versioning:
const client = createClient({ cloud: { baseURL: 'https://api.example.com/v1', apiKey: 'your-key', model: 'custom-model', headers: { 'X-Custom-Header': 'value', 'Authorization': 'Bearer custom-token' // Override default auth } }});Timeout and Retries
Configure request timeout and automatic retries:
const client = createClient({ cloud: { baseURL: 'https://api.openai.com/v1', apiKey: process.env.OPENAI_API_KEY, model: 'gpt-4o-mini', timeout: 30000, // 30 seconds (default: 60000) retries: 3 // Retry failed requests 3 times (default: 2) }});Timeouts apply to the entire request (including streaming). Retries only trigger on network errors, not on API errors (4xx/5xx status codes).
Using the fetchSSE Provider
For advanced use cases, you can create a cloud provider explicitly using fetchSSE():
import { createClient } from '@webllm-io/sdk';import { fetchSSE } from '@webllm-io/sdk/providers/fetch';
const cloudProvider = fetchSSE({ baseURL: 'https://api.openai.com/v1', apiKey: process.env.OPENAI_API_KEY, model: 'gpt-4o-mini', timeout: 45000, retries: 5});
const client = createClient({ cloud: cloudProvider});This is equivalent to passing the object configuration directly, but gives you a reusable provider instance.
Environment Variables
Store API keys securely using environment variables:
OPENAI_API_KEY=sk-your-api-key-hereGROQ_API_KEY=gsk_your-groq-key-hereconst client = createClient({ cloud: { baseURL: 'https://api.openai.com/v1', apiKey: import.meta.env.OPENAI_API_KEY, // Vite // apiKey: process.env.OPENAI_API_KEY, // Node.js model: 'gpt-4o-mini' }});Never commit API keys to version control.
Model Override per Request
Override the default model for individual requests:
const client = createClient({ cloud: { baseURL: 'https://api.openai.com/v1', apiKey: process.env.OPENAI_API_KEY, model: 'gpt-4o-mini' // default }});
// Use default modelawait client.chat.completions.create({ messages: [{ role: 'user', content: 'Quick question' }]});
// Override with a different modelawait client.chat.completions.create({ messages: [{ role: 'user', content: 'Complex reasoning task' }], model: 'gpt-4o'});Error Handling
Handle API errors gracefully:
try { const completion = await client.chat.completions.create({ messages: [{ role: 'user', content: 'Hello!' }] }); console.log(completion.choices[0].message.content);} catch (error) { if (error.status === 401) { console.error('Invalid API key'); } else if (error.status === 429) { console.error('Rate limit exceeded'); } else if (error.status === 500) { console.error('Server error'); } else { console.error('Request failed:', error.message); }}Next Steps
- Learn about Hybrid Routing to combine local and cloud inference
- Explore Streaming for real-time token-by-token responses
- See Abort Requests to cancel in-flight API calls