Skip to content

FAQ

Browser Support

Which browsers support WebGPU?

WebGPU is required for local inference. Supported browsers:

  • Chrome/Edge 113+ (Windows, macOS, Linux, ChromeOS)
  • Safari 18+ (macOS, iOS)

You can check browser support at caniuse.com/webgpu.

How do I enable WebGPU in my browser?

WebGPU is enabled by default in supported versions. To verify:

import { checkCapability } from '@webllm-io/sdk';
const cap = await checkCapability();
console.log('WebGPU available:', cap.webgpu);

If false, ensure you’re using a supported browser version.


Dependencies

Do I need @mlc-ai/web-llm for cloud-only mode?

No! When using cloud-only mode, you don’t need to install @mlc-ai/web-llm:

{
"dependencies": {
"@webllm-io/sdk": "^1.0.0"
// No @mlc-ai/web-llm needed!
}
}

Only install @mlc-ai/web-llm if you want local inference:

{
"dependencies": {
"@webllm-io/sdk": "^1.0.0"
},
"peerDependencies": {
"@mlc-ai/web-llm": "^0.2.x"
}
}

Hardware Requirements

How much VRAM does my device need?

WebLLM.io supports devices across all grades:

GradeVRAMModel SizeMinimum GPU
S≥8GB~5.5GBRTX 3080+, M2 Max+
A≥4GB~5.5GBRTX 3060, M1 Pro
B≥2GB~2.2GBIntegrated GPUs, M1
C<2GB~1.0GBMobile GPUs, older laptops

Even Grade C devices (entry-level GPUs) can run local inference using lightweight models like Qwen2.5-1.5B-Instruct.

How do I check my device’s VRAM?

import { checkCapability } from '@webllm-io/sdk';
const cap = await checkCapability();
console.log('GPU VRAM:', cap.gpu?.vram, 'MB');
console.log('Device Grade:', cap.grade);

COOP/COEP Headers

Why do I need COOP/COEP headers?

Local inference uses SharedArrayBuffer for performance, which requires Cross-Origin-Opener-Policy (COOP) and Cross-Origin-Embedder-Policy (COEP) headers.

How do I set COOP/COEP headers?

Vite:

vite.config.ts
import { defineConfig } from 'vite';
export default defineConfig({
plugins: [
{
name: 'coop-coep',
configureServer(server) {
server.middlewares.use((req, res, next) => {
res.setHeader('Cross-Origin-Opener-Policy', 'same-origin');
res.setHeader('Cross-Origin-Embedder-Policy', 'require-corp');
next();
});
}
}
]
});

Next.js:

next.config.js
module.exports = {
async headers() {
return [
{
source: '/(.*)',
headers: [
{ key: 'Cross-Origin-Opener-Policy', value: 'same-origin' },
{ key: 'Cross-Origin-Embedder-Policy', value: 'require-corp' }
]
}
];
}
};

Express:

app.use((req, res, next) => {
res.setHeader('Cross-Origin-Opener-Policy', 'same-origin');
res.setHeader('Cross-Origin-Embedder-Policy', 'require-corp');
next();
});

What if I can’t set COOP/COEP headers?

Use cloud-only mode, which doesn’t require SharedArrayBuffer:

const client = await createClient({
cloud: {
baseURL: 'https://api.openai.com/v1',
apiKey: process.env.OPENAI_API_KEY,
model: 'gpt-4o-mini'
}
});

Cloud Providers

Can I use a non-OpenAI cloud provider?

Yes! Any OpenAI-compatible API works:

Azure OpenAI:

const client = await createClient({
cloud: {
baseURL: 'https://YOUR_RESOURCE.openai.azure.com/openai/deployments/YOUR_DEPLOYMENT',
apiKey: 'YOUR_AZURE_KEY',
model: 'gpt-4'
}
});

Self-hosted (Ollama, LocalAI):

const client = await createClient({
cloud: {
baseURL: 'http://localhost:11434/v1',
apiKey: 'not-required',
model: 'llama3.1'
}
});

Model Downloads

How large are the model downloads?

Model sizes vary by device grade:

  • Grade S/A: ~4.5GB (Llama-3.1-8B)
  • Grade B: ~2.2GB (Phi-3.5-mini-instruct)
  • Grade C: ~1.5GB (Qwen2.5-1.5B)

Models are downloaded once and cached in OPFS (Origin Private File System) for future use.

How do I check if a model is cached?

import { hasModelInCache } from '@webllm-io/sdk';
const modelId = 'Llama-3.1-8B-Instruct-q4f16_1-MLC';
const isCached = await hasModelInCache(modelId);
if (isCached) {
console.log('Model is cached, initialization will be fast!');
} else {
console.log('Model will be downloaded (~4.5GB)');
}

Can I delete cached models?

import { deleteModelFromCache } from '@webllm-io/sdk';
const modelId = 'Llama-3.1-8B-Instruct-q4f16_1-MLC';
await deleteModelFromCache(modelId);
console.log('Model removed from cache');

Privacy & Security

Is my data sent to the cloud in local mode?

No! When using local-only mode, all inference happens on your device. No data is transmitted to external servers.

// 100% local, zero network requests for inference
const client = await createClient({
local: 'auto'
});

Where are models stored?

Models are cached in OPFS (Origin Private File System), a secure browser storage API:

  • Isolated per origin (domain)
  • Not accessible to other websites
  • Persists across sessions
  • Can be cleared via browser settings

Performance

Why is first-time initialization slow?

The first initialization downloads the model (1.5GB - 4.5GB). Subsequent initializations are fast because the model is cached.

Show progress to users:

const client = await createClient({
local: {
model: 'auto',
onProgress: (report) => {
console.log(`${report.text} - ${Math.round(report.progress * 100)}%`);
}
}
});

Does local inference run on the GPU?

Yes! Local inference uses WebGPU to run on your GPU, providing hardware-accelerated performance.

Why does inference block my UI?

By default, inference runs in a Web Worker to avoid blocking the main thread. If you disabled Web Workers:

// DON'T do this unless debugging
const client = await createClient({
local: mlc({
model: 'auto',
useWebWorker: false // ❌ Blocks main thread
})
});

Always keep Web Workers enabled for production:

// ✅ Default: runs in Web Worker
const client = await createClient({
local: 'auto'
});

Troubleshooting

”Failed to create GPUAdapter” error

This means WebGPU is not available. Possible causes:

  1. Unsupported browser — Use Chrome 113+, Edge 113+, or Safari 18+
  2. GPU drivers outdated — Update your graphics drivers
  3. WebGPU disabled — Check browser flags (should be enabled by default)

Solution: Fall back to cloud mode:

import { checkCapability, createClient } from '@webllm-io/sdk';
const cap = await checkCapability();
const client = await createClient({
local: cap.webgpu ? 'auto' : undefined,
cloud: {
baseURL: 'https://api.openai.com/v1',
apiKey: process.env.OPENAI_API_KEY,
model: 'gpt-4o-mini'
}
});

“SharedArrayBuffer is not defined” error

COOP/COEP headers are not set. See COOP/COEP Headers section above.

Model download fails or times out

  1. Check network connection — Large downloads require stable internet
  2. Check storage space — Ensure sufficient disk space (5GB+ recommended)
  3. Try a smaller model — Use Grade C models if limited storage
  4. Clear cache and retry — Delete partial downloads
import { deleteModelFromCache } from '@webllm-io/sdk';
await deleteModelFromCache('Llama-3.1-8B-Instruct-q4f16_1-MLC');
// Retry initialization

Getting Help

For bug reports, include:

  • Browser version
  • Device grade (from checkCapability())
  • Error messages and stack traces
  • Minimal reproduction code