Basic Chat

This example demonstrates the simplest way to get started with WebLLM.io. The client will automatically detect your device capabilities and select an appropriate local model.

Complete Example

import { createClient } from '@webllm-io/sdk';

// Create a client with automatic local model selection
const client = await createClient({
  local: 'auto'
});

// Send a chat message
const response = await client.chat.completions.create({
  messages: [
    { role: 'user', content: 'What is WebGPU?' }
  ]
});

// Log the response
console.log(response.choices[0].message.content);

How It Works

Client Creation: createClient({ local: 'auto' }) automatically:
- Detects your device’s WebGPU capabilities
- Assigns a device grade (S/A/B/C)
- Selects an appropriate model based on available VRAM
- Downloads and caches the model in OPFS (if not already cached)
Inference: The chat.completions.create() call runs entirely in your browser using WebGPU. No data is sent to external servers.
Response: You receive a complete response object following the OpenAI Chat Completions API format.

Device Grades and Model Selection

WebLLM.io automatically selects models based on your device grade:

Grade S (≥8GB VRAM): Llama-3.1-8B (high tier)
Grade A (≥4GB VRAM): Llama-3.1-8B (high tier)
Grade B (≥2GB VRAM): Phi-3.5-mini-instruct (medium tier)
Grade C (<2GB VRAM): Qwen2.5-1.5B (low tier)

Next Steps

Streaming Chat — Stream responses token by token
Loading Progress — Display model download progress
Device Detection — Check device capabilities before initialization