Quick Start

This guide will walk you through creating your first AI completions with WebLLM.io.

Step 1: Import and Create Client

First, import the SDK and create a client instance:

import { createClient } from '@webllm-io/sdk';

const client = createClient({
  local: 'auto', // Auto-detect device capability and select model
  cloud: {
    baseURL: 'https://api.openai.com/v1',
    apiKey: 'sk-your-api-key-here',
  },
});

Configuration Options

local: 'auto' — Automatically selects the best local model based on device capability
cloud: { ... } — Cloud provider configuration (OpenAI compatible)

Step 2: Make Your First Completion

Use the OpenAI-compatible Chat Completions API:

const result = await client.chat.completions.create({
  messages: [{ role: 'user', content: 'Hello, world!' }],
});

console.log(result.choices[0].message.content);
// Output: "Hello! How can I help you today?"

Request Options

const result = await client.chat.completions.create({
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'What is the capital of France?' },
  ],
  temperature: 0.7,
  max_tokens: 100,
});

Step 3: Streaming Completions

For real-time responses, use streaming mode:

const stream = await client.chat.completions.create({
  messages: [{ role: 'user', content: 'Write a short poem about coding.' }],
  stream: true,
});

for await (const chunk of stream) {
  const content = chunk.choices[0]?.delta?.content ?? '';
  process.stdout.write(content);
}

Browser Example (Streaming)

const stream = await client.chat.completions.create({
  messages: [{ role: 'user', content: 'Explain quantum computing.' }],
  stream: true,
});

const container = document.getElementById('output');

for await (const chunk of stream) {
  const content = chunk.choices[0]?.delta?.content ?? '';
  container.textContent += content;
}

Step 4: Abort/Interrupt Requests

Cancel ongoing completions:

const controller = new AbortController();

// Start completion
const promise = client.chat.completions.create({
  messages: [{ role: 'user', content: 'Count to 1000.' }],
  signal: controller.signal,
});

// Abort after 2 seconds
setTimeout(() => controller.abort(), 2000);

try {
  await promise;
} catch (error) {
  console.log('Request aborted');
}

Step 5: Clean Up

When you’re done, dispose of the client to free resources:

await client.dispose();

Complete Example

Putting it all together:

import { createClient } from '@webllm-io/sdk';

async function main() {
  // Create client
  const client = createClient({
    local: 'auto',
    cloud: {
      baseURL: 'https://api.openai.com/v1',
      apiKey: process.env.OPENAI_API_KEY,
    },
  });

  // Non-streaming completion
  const result = await client.chat.completions.create({
    messages: [{ role: 'user', content: 'Hello!' }],
  });
  console.log('Non-streaming:', result.choices[0].message.content);

  // Streaming completion
  const stream = await client.chat.completions.create({
    messages: [{ role: 'user', content: 'Count to 5.' }],
    stream: true,
  });

  process.stdout.write('Streaming: ');
  for await (const chunk of stream) {
    process.stdout.write(chunk.choices[0]?.delta?.content ?? '');
  }
  console.log('\n');

  // Clean up
  await client.dispose();
}

main().catch(console.error);

Next Steps

Playground — Try the interactive demo
Configuration — Learn about advanced options
Local Inference — Deep dive into WebGPU models
Cloud Providers — Configure cloud fallback