FAQ
Browser Support
Which browsers support WebGPU?
WebGPU is required for local inference. Supported browsers:
- Chrome/Edge 113+ (Windows, macOS, Linux, ChromeOS)
- Safari 18+ (macOS, iOS)
You can check browser support at caniuse.com/webgpu.
How do I enable WebGPU in my browser?
WebGPU is enabled by default in supported versions. To verify:
import { checkCapability } from '@webllm-io/sdk';
const cap = await checkCapability();console.log('WebGPU available:', cap.webgpu);If false, ensure you’re using a supported browser version.
Dependencies
Do I need @mlc-ai/web-llm for cloud-only mode?
No! When using cloud-only mode, you don’t need to install @mlc-ai/web-llm:
{ "dependencies": { "@webllm-io/sdk": "^1.0.0" // No @mlc-ai/web-llm needed! }}Only install @mlc-ai/web-llm if you want local inference:
{ "dependencies": { "@webllm-io/sdk": "^1.0.0" }, "peerDependencies": { "@mlc-ai/web-llm": "^0.2.x" }}Hardware Requirements
How much VRAM does my device need?
WebLLM.io supports devices across all grades:
| Grade | VRAM | Model Size | Minimum GPU |
|---|---|---|---|
| S | ≥8GB | ~5.5GB | RTX 3080+, M2 Max+ |
| A | ≥4GB | ~5.5GB | RTX 3060, M1 Pro |
| B | ≥2GB | ~2.2GB | Integrated GPUs, M1 |
| C | <2GB | ~1.0GB | Mobile GPUs, older laptops |
Even Grade C devices (entry-level GPUs) can run local inference using lightweight models like Qwen2.5-1.5B-Instruct.
How do I check my device’s VRAM?
import { checkCapability } from '@webllm-io/sdk';
const cap = await checkCapability();console.log('GPU VRAM:', cap.gpu?.vram, 'MB');console.log('Device Grade:', cap.grade);COOP/COEP Headers
Why do I need COOP/COEP headers?
Local inference uses SharedArrayBuffer for performance, which requires Cross-Origin-Opener-Policy (COOP) and Cross-Origin-Embedder-Policy (COEP) headers.
How do I set COOP/COEP headers?
Vite:
import { defineConfig } from 'vite';
export default defineConfig({ plugins: [ { name: 'coop-coep', configureServer(server) { server.middlewares.use((req, res, next) => { res.setHeader('Cross-Origin-Opener-Policy', 'same-origin'); res.setHeader('Cross-Origin-Embedder-Policy', 'require-corp'); next(); }); } } ]});Next.js:
module.exports = { async headers() { return [ { source: '/(.*)', headers: [ { key: 'Cross-Origin-Opener-Policy', value: 'same-origin' }, { key: 'Cross-Origin-Embedder-Policy', value: 'require-corp' } ] } ]; }};Express:
app.use((req, res, next) => { res.setHeader('Cross-Origin-Opener-Policy', 'same-origin'); res.setHeader('Cross-Origin-Embedder-Policy', 'require-corp'); next();});What if I can’t set COOP/COEP headers?
Use cloud-only mode, which doesn’t require SharedArrayBuffer:
const client = await createClient({ cloud: { baseURL: 'https://api.openai.com/v1', apiKey: process.env.OPENAI_API_KEY, model: 'gpt-4o-mini' }});Cloud Providers
Can I use a non-OpenAI cloud provider?
Yes! Any OpenAI-compatible API works:
Azure OpenAI:
const client = await createClient({ cloud: { baseURL: 'https://YOUR_RESOURCE.openai.azure.com/openai/deployments/YOUR_DEPLOYMENT', apiKey: 'YOUR_AZURE_KEY', model: 'gpt-4' }});Self-hosted (Ollama, LocalAI):
const client = await createClient({ cloud: { baseURL: 'http://localhost:11434/v1', apiKey: 'not-required', model: 'llama3.1' }});Model Downloads
How large are the model downloads?
Model sizes vary by device grade:
- Grade S/A: ~4.5GB (Llama-3.1-8B)
- Grade B: ~2.2GB (Phi-3.5-mini-instruct)
- Grade C: ~1.5GB (Qwen2.5-1.5B)
Models are downloaded once and cached in OPFS (Origin Private File System) for future use.
How do I check if a model is cached?
import { hasModelInCache } from '@webllm-io/sdk';
const modelId = 'Llama-3.1-8B-Instruct-q4f16_1-MLC';const isCached = await hasModelInCache(modelId);
if (isCached) { console.log('Model is cached, initialization will be fast!');} else { console.log('Model will be downloaded (~4.5GB)');}Can I delete cached models?
import { deleteModelFromCache } from '@webllm-io/sdk';
const modelId = 'Llama-3.1-8B-Instruct-q4f16_1-MLC';await deleteModelFromCache(modelId);console.log('Model removed from cache');Privacy & Security
Is my data sent to the cloud in local mode?
No! When using local-only mode, all inference happens on your device. No data is transmitted to external servers.
// 100% local, zero network requests for inferenceconst client = await createClient({ local: 'auto'});Where are models stored?
Models are cached in OPFS (Origin Private File System), a secure browser storage API:
- Isolated per origin (domain)
- Not accessible to other websites
- Persists across sessions
- Can be cleared via browser settings
Performance
Why is first-time initialization slow?
The first initialization downloads the model (1.5GB - 4.5GB). Subsequent initializations are fast because the model is cached.
Show progress to users:
const client = await createClient({ local: { model: 'auto', onProgress: (report) => { console.log(`${report.text} - ${Math.round(report.progress * 100)}%`); } }});Does local inference run on the GPU?
Yes! Local inference uses WebGPU to run on your GPU, providing hardware-accelerated performance.
Why does inference block my UI?
By default, inference runs in a Web Worker to avoid blocking the main thread. If you disabled Web Workers:
// DON'T do this unless debuggingconst client = await createClient({ local: mlc({ model: 'auto', useWebWorker: false // ❌ Blocks main thread })});Always keep Web Workers enabled for production:
// ✅ Default: runs in Web Workerconst client = await createClient({ local: 'auto'});Troubleshooting
”Failed to create GPUAdapter” error
This means WebGPU is not available. Possible causes:
- Unsupported browser — Use Chrome 113+, Edge 113+, or Safari 18+
- GPU drivers outdated — Update your graphics drivers
- WebGPU disabled — Check browser flags (should be enabled by default)
Solution: Fall back to cloud mode:
import { checkCapability, createClient } from '@webllm-io/sdk';
const cap = await checkCapability();
const client = await createClient({ local: cap.webgpu ? 'auto' : undefined, cloud: { baseURL: 'https://api.openai.com/v1', apiKey: process.env.OPENAI_API_KEY, model: 'gpt-4o-mini' }});“SharedArrayBuffer is not defined” error
COOP/COEP headers are not set. See COOP/COEP Headers section above.
Model download fails or times out
- Check network connection — Large downloads require stable internet
- Check storage space — Ensure sufficient disk space (5GB+ recommended)
- Try a smaller model — Use Grade C models if limited storage
- Clear cache and retry — Delete partial downloads
import { deleteModelFromCache } from '@webllm-io/sdk';
await deleteModelFromCache('Llama-3.1-8B-Instruct-q4f16_1-MLC');// Retry initializationGetting Help
- GitHub Issues: github.com/WebLLM-io/webllm.io
- X (Twitter): @webllm_io
- Documentation: webllm.io/docs
For bug reports, include:
- Browser version
- Device grade (from
checkCapability()) - Error messages and stack traces
- Minimal reproduction code