Web Worker
By default, WebLLM SDK runs local MLC inference in a Web Worker. This prevents model loading and token generation from blocking the main thread.
Default Behavior
Worker mode is enabled by default — no configuration needed:
const client = createClient({ local: 'auto' });// Inference runs in a Web Worker automaticallyDisabling Workers
For debugging or environments without Worker support:
const client = createClient({ local: { model: 'Llama-3.1-8B-Instruct-q4f16_1-MLC', useWebWorker: false, // Run on main thread },});Required HTTP Headers
WebGPU in Workers requires SharedArrayBuffer, which needs these HTTP headers:
Cross-Origin-Opener-Policy: same-originCross-Origin-Embedder-Policy: require-corpVite Configuration
export default defineConfig({ plugins: [ { name: 'configure-response-headers', configureServer(server) { server.middlewares.use((_req, res, next) => { res.setHeader('Cross-Origin-Opener-Policy', 'same-origin'); res.setHeader('Cross-Origin-Embedder-Policy', 'require-corp'); next(); }); }, }, ],});Next.js Configuration
module.exports = { async headers() { return [ { source: '/(.*)', headers: [ { key: 'Cross-Origin-Opener-Policy', value: 'same-origin' }, { key: 'Cross-Origin-Embedder-Policy', value: 'require-corp' }, ], }, ]; },};Nginx
add_header Cross-Origin-Opener-Policy "same-origin" always;add_header Cross-Origin-Embedder-Policy "require-corp" always;Worker Entry Point
The SDK provides a dedicated worker entry at @webllm-io/sdk/worker. If you need to customize the worker setup, you can reference this entry point directly.
Troubleshooting
”SharedArrayBuffer is not defined”
The COOP/COEP headers are missing or incorrect. Check your server configuration.
”Failed to construct Worker”
Ensure your bundler is configured to handle worker imports. With Vite, workers are supported out of the box. For webpack, you may need worker-loader or the built-in worker support in webpack 5.
Performance
Worker communication adds minimal overhead (typically <1ms per message). The benefits of a non-blocking UI far outweigh this cost, especially during multi-second model loading phases.