Basic Chat
This example demonstrates the simplest way to get started with WebLLM.io. The client will automatically detect your device capabilities and select an appropriate local model.
Complete Example
import { createClient } from '@webllm-io/sdk';
// Create a client with automatic local model selectionconst client = await createClient({ local: 'auto'});
// Send a chat messageconst response = await client.chat.completions.create({ messages: [ { role: 'user', content: 'What is WebGPU?' } ]});
// Log the responseconsole.log(response.choices[0].message.content);How It Works
-
Client Creation:
createClient({ local: 'auto' })automatically:- Detects your device’s WebGPU capabilities
- Assigns a device grade (S/A/B/C)
- Selects an appropriate model based on available VRAM
- Downloads and caches the model in OPFS (if not already cached)
-
Inference: The
chat.completions.create()call runs entirely in your browser using WebGPU. No data is sent to external servers. -
Response: You receive a complete response object following the OpenAI Chat Completions API format.
Device Grades and Model Selection
WebLLM.io automatically selects models based on your device grade:
- Grade S (≥8GB VRAM): Llama-3.1-8B (high tier)
- Grade A (≥4GB VRAM): Llama-3.1-8B (high tier)
- Grade B (≥2GB VRAM): Phi-3.5-mini-instruct (medium tier)
- Grade C (<2GB VRAM): Qwen2.5-1.5B (low tier)
Next Steps
- Streaming Chat — Stream responses token by token
- Loading Progress — Display model download progress
- Device Detection — Check device capabilities before initialization