Skip to content

Basic Chat

This example demonstrates the simplest way to get started with WebLLM.io. The client will automatically detect your device capabilities and select an appropriate local model.

Complete Example

import { createClient } from '@webllm-io/sdk';
// Create a client with automatic local model selection
const client = await createClient({
local: 'auto'
});
// Send a chat message
const response = await client.chat.completions.create({
messages: [
{ role: 'user', content: 'What is WebGPU?' }
]
});
// Log the response
console.log(response.choices[0].message.content);

How It Works

  1. Client Creation: createClient({ local: 'auto' }) automatically:

    • Detects your device’s WebGPU capabilities
    • Assigns a device grade (S/A/B/C)
    • Selects an appropriate model based on available VRAM
    • Downloads and caches the model in OPFS (if not already cached)
  2. Inference: The chat.completions.create() call runs entirely in your browser using WebGPU. No data is sent to external servers.

  3. Response: You receive a complete response object following the OpenAI Chat Completions API format.

Device Grades and Model Selection

WebLLM.io automatically selects models based on your device grade:

  • Grade S (≥8GB VRAM): Llama-3.1-8B (high tier)
  • Grade A (≥4GB VRAM): Llama-3.1-8B (high tier)
  • Grade B (≥2GB VRAM): Phi-3.5-mini-instruct (medium tier)
  • Grade C (<2GB VRAM): Qwen2.5-1.5B (low tier)

Next Steps