Skip to content

Cloud-Only Mode

Cloud-only mode lets you use WebLLM.io’s unified API with cloud providers like OpenAI, without installing the local inference engine (@mlc-ai/web-llm).

Basic Cloud Setup

import { createClient } from '@webllm-io/sdk';
const client = await createClient({
cloud: {
baseURL: 'https://api.openai.com/v1',
apiKey: 'sk-...',
model: 'gpt-4o-mini'
}
});
const response = await client.chat.completions.create({
messages: [
{ role: 'user', content: 'Explain quantum computing in simple terms.' }
]
});
console.log(response.choices[0].message.content);

No Local Dependencies Required

When using cloud-only mode, you do not need to install @mlc-ai/web-llm:

{
"dependencies": {
"@webllm-io/sdk": "^1.0.0"
// @mlc-ai/web-llm NOT needed for cloud-only
}
}

The SDK has zero external dependencies for cloud mode — even Server-Sent Events (SSE) parsing is implemented internally.

Streaming Responses

Cloud streaming works identically to local streaming:

const client = await createClient({
cloud: {
baseURL: 'https://api.openai.com/v1',
apiKey: process.env.OPENAI_API_KEY,
model: 'gpt-4o-mini'
}
});
const stream = await client.chat.completions.create({
messages: [
{ role: 'system', content: 'You are a helpful assistant.' },
{ role: 'user', content: 'Write a haiku about coding.' }
],
stream: true
});
for await (const chunk of stream) {
const delta = chunk.choices[0]?.delta?.content;
if (delta) {
process.stdout.write(delta);
}
}

Cloud Configuration Options

import { createClient } from '@webllm-io/sdk';
const client = await createClient({
cloud: {
baseURL: 'https://api.openai.com/v1',
apiKey: 'sk-...',
model: 'gpt-4o-mini',
// Optional: Request timeout (default: 60000ms)
timeout: 30000,
// Optional: Retry attempts (default: 2)
retries: 3,
// Optional: Custom headers
headers: {
'X-Custom-Header': 'value'
}
}
});

Using Alternative Cloud Providers

Any OpenAI-compatible API works:

Azure OpenAI

const client = await createClient({
cloud: {
baseURL: 'https://YOUR_RESOURCE.openai.azure.com/openai/deployments/YOUR_DEPLOYMENT',
apiKey: 'YOUR_AZURE_KEY',
model: 'gpt-4',
headers: {
'api-key': 'YOUR_AZURE_KEY'
}
}
});

Self-Hosted (Ollama, LocalAI, etc.)

const client = await createClient({
cloud: {
baseURL: 'http://localhost:11434/v1', // Ollama
apiKey: 'not-required', // Some local servers don't need keys
model: 'llama3.1'
}
});

Store sensitive keys in environment variables:

const client = await createClient({
cloud: {
baseURL: import.meta.env.VITE_OPENAI_BASE_URL,
apiKey: import.meta.env.VITE_OPENAI_API_KEY,
model: 'gpt-4o-mini'
}
});
.env.local
VITE_OPENAI_BASE_URL=https://api.openai.com/v1
VITE_OPENAI_API_KEY=sk-...

Error Handling

try {
const response = await client.chat.completions.create({
messages: [{ role: 'user', content: 'Hello!' }]
});
} catch (error) {
if (error.message.includes('401')) {
console.error('Invalid API key');
} else if (error.message.includes('timeout')) {
console.error('Request timed out');
} else {
console.error('Request failed:', error);
}
}

Cloud-Only Use Cases

Cloud-only mode is ideal when:

  • ✅ You need access to the latest frontier models (GPT-4, etc.)
  • ✅ User devices don’t support WebGPU
  • ✅ You want to minimize bundle size (no local inference engine)
  • ✅ Your application already has cloud API infrastructure
  • ✅ You need guaranteed consistency across all devices

Next Steps