Browser Usage

Name: Gerbil
Author: Gerbil

Run LLMs directly in the browser with WebGPU acceleration. No server required.

100-150 tok/s with WebGPU ·Models cached in IndexedDB ·Fully private, runs locally

React Hooks

The easiest way to use Gerbil in React. Follows the same patterns as Vercel AI SDK.

useChat

Full chat with message history, thinking mode, and streaming:

Chat.tsx

01import { useChat } from "@tryhamster/gerbil/browser";
02
03function Chat() {
04  const { 
05    messages,        // Message[] with id, role, content, thinking?
06    input,           // Current input value
07    setInput,        // Update input
08    handleSubmit,    // Form submit handler
09    isLoading,       // Model loading
10    loadingProgress, // { status, file?, progress? }
11    isGenerating,    // Currently generating
12    thinking,        // Current thinking content (streaming)
13    stop,            // Stop generation
14    clear,           // Clear messages
15    tps,             // Tokens per second
16    error,           // Error message
17  } = useChat({
18    model: "qwen3-0.6b",
19    thinking: true,
20    system: "You are a helpful assistant.",
21    maxTokens: 512,
22  });
23
24  if (isLoading) {
25    return <div>Loading model: {loadingProgress?.progress}%</div>;
26  }
27
28  return (
29    <div>
30      {messages.map(m => (
31        <div key={m.id}>
32          {m.thinking && (
33            <details>
34              <summary>Thinking...</summary>
35              <pre>{m.thinking}</pre>
36            </details>
37          )}
38          <p><strong>{m.role}:</strong> {m.content}</p>
39        </div>
40      ))}
41      
42      <form onSubmit={handleSubmit}>
43        <input 
44          value={input} 
45          onChange={e => setInput(e.target.value)}
46          disabled={isGenerating}
47          placeholder="Ask anything..."
48        />
49        <button type="submit" disabled={isGenerating}>
50          {isGenerating ? `${tps.toFixed(0)} tok/s` : "Send"}
51        </button>
52        {isGenerating && <button type="button" onClick={stop}>Stop</button>}
53      </form>
54    </div>
55  );
56}

useCompletion

One-off text generation without message history:

Generator.tsx

01import { useCompletion } from "@tryhamster/gerbil/browser";
02
03function Generator() {
04  const { 
05    complete,        // Function to generate text
06    completion,      // Generated text (streaming)
07    thinking,        // Thinking content (if enabled)
08    isLoading,       // Model loading
09    isGenerating,    // Currently generating
10    tps,             // Tokens per second
11    stop,            // Stop generation
12    error,           // Error message
13  } = useCompletion({
14    model: "qwen3-0.6b",
15    thinking: true,
16    maxTokens: 256,
17  });
18
19  if (isLoading) return <div>Loading...</div>;
20
21  return (
22    <div>
23      <button 
24        onClick={() => complete("Write a haiku about coding")} 
25        disabled={isGenerating}
26      >
27        Generate
28      </button>
29      
30      {thinking && (
31        <details open>
32          <summary>Thinking...</summary>
33          <pre>{thinking}</pre>
34        </details>
35      )}
36      
37      <p>{completion}</p>
38      
39      {isGenerating && (
40        <>
41          <span>{tps.toFixed(0)} tok/s</span>
42          <button onClick={stop}>Stop</button>
43        </>
44      )}
45    </div>
46  );
47}

Hook Options

types.ts

interface UseChatOptions {
  model?: string;        // Model ID (default: "qwen3-0.6b")
  autoLoad?: boolean;    // Load model on mount (default: false)
  thinking?: boolean;    // Enable thinking mode (Qwen3)
  system?: string;       // System prompt
  maxTokens?: number;    // Max tokens (default: 256)
  temperature?: number;  // Temperature (default: 0.7)
  topP?: number;         // Top-p sampling
  topK?: number;         // Top-k sampling
}

// useCompletion uses the same options

Lazy Loading (Default)

By default, models load on first generation - not on page load. This prevents surprise downloads:

loading.tsx

// Default: model loads when user first submits
const { handleSubmit, isLoading } = useChat();

// Preload on mount (downloads immediately)
const { handleSubmit, isLoading } = useChat({ autoLoad: true });

// Manual control with load()
const { load, isLoading, isReady } = useChat();

return (
  <button onClick={load} disabled={isLoading || isReady}>
    {isLoading ? "Loading..." : isReady ? "Ready" : "Load Model"}
  </button>
);

Loading Progress States

The loadingProgress object tells you exactly what's happening during model load:

loading-states.ts

// loadingProgress.status values:

"downloading"  // Fetching from network (first time)
               // Has: file, progress (0-100)

"loading"      // Loading from IndexedDB cache (fast)
               // No additional properties

"ready"        // Model ready for inference
               // No additional properties

"error"        // Load failed
               // Has: error message

// Example usage:
if (loadingProgress?.status === "downloading") {
  return <div>Downloading {loadingProgress.file}: {loadingProgress.progress}%</div>;
}
if (loadingProgress?.status === "loading") {
  return <div>Loading from cache...</div>;
}

Message Type

types.ts

interface Message {
  id: string;                      // Unique ID (e.g., "msg-1")
  role: "user" | "assistant";      // Message role
  content: string;                 // The message content
  thinking?: string;               // Thinking content (if enabled)
}

Low-Level API

For non-React apps or custom implementations, use createGerbilWorker directly:

vanilla.ts

01import { createGerbilWorker, isWebGPUSupported } from "@tryhamster/gerbil/browser";
02
03// Check WebGPU support
04if (!isWebGPUSupported()) {
05  console.log("WebGPU not supported - use Chrome/Edge 113+");
06  return;
07}
08
09// Create worker (loads model automatically)
10const gerbil = await createGerbilWorker({
11  modelId: "qwen3-0.6b",
12  onProgress: (p) => {
13    if (p.status === "downloading") {
14      console.log(`Downloading ${p.file}: ${p.progress}%`);
15    } else if (p.status === "loading") {
16      console.log("Loading from cache...");
17    }
18  },
19  onToken: (token) => {
20    // token.text - the token text
21    // token.state - "thinking" or "answering"
22    // token.tps - tokens per second
23    process.stdout.write(token.text);
24  },
25  onComplete: (result) => {
26    console.log(`Done: ${result.tps.toFixed(1)} tok/s`);
27  },
28});
29
30// Generate
31await gerbil.generate("Write a haiku", { thinking: true });
32
33// Interrupt
34gerbil.interrupt();
35
36// Reset conversation
37gerbil.reset();
38
39// Clean up
40gerbil.terminate();

Utilities

Helper functions to check compatibility, optimize for device capabilities, and debug issues.

imports.ts

import {
  // Basic checks
  isWebGPUSupported,      // Quick boolean check
  getWebGPUInfo,          // GPU adapter info
  
  // Production-ready checks
  checkWebGPUReady,       // Full WebGPU verification
  getRecommendedModels,   // Memory-aware model selection
  checkStorageQuota,      // Verify disk space
  checkWebGPUCapabilities,// GPU buffer limits
  
  // Debugging
  getBrowserDiagnostics,  // Full diagnostic report
} from "@tryhamster/gerbil/browser";

isWebGPUSupported()

Check if the browser supports WebGPU:

check.ts

import { isWebGPUSupported } from "@tryhamster/gerbil/browser";

if (!isWebGPUSupported()) {
  // Show fallback UI or error message
  alert("Please use Chrome or Edge 113+ for WebGPU support");
}

getWebGPUInfo()

Get GPU adapter information for debugging:

info.ts

import { getWebGPUInfo } from "@tryhamster/gerbil/browser";

const info = await getWebGPUInfo();
console.log(info);
// { supported: true, adapter: "Apple", device: "Apple M4 Max" }

checkWebGPUReady()

Full WebGPU verification — checks not just if the API exists, but if it actually works:

check-ready.ts

import { checkWebGPUReady } from "@tryhamster/gerbil/browser";

const result = await checkWebGPUReady();
// {
//   ok: true,
//   webgpu: true,
//   adapter: { vendor: "apple", architecture: "common-3", device: "", description: "" },
//   reason: "WebGPU is ready"
// }

if (!result.ok) {
  console.warn(result.reason); // Human-readable explanation
  // Falls back to WASM automatically
}

getRecommendedModels()

Memory-aware model selection based on navigator.deviceMemory:

recommended.ts

import { getRecommendedModels } from "@tryhamster/gerbil/browser";

const models = getRecommendedModels();
// {
//   chat: "qwen3-0.6b",      // or "smollm2-360m" on low-memory devices
//   vision: "ministral-3b",  // or null if not enough memory
//   tts: "kokoro-82m",
//   stt: "whisper-base.en",  // or "whisper-tiny.en" on low-memory
//   embedding: "Xenova/all-MiniLM-L6-v2",
//   reason: "8GB+ detected, using full models"
// }

// Use for smart defaults
const { messages } = useChat({ model: models.chat });

Mobile: On mobile devices, Gerbil automatically uses q4 quantization (CPU-optimized) instead of q4f16 (GPU-optimized) for better compatibility and performance.

checkStorageQuota()

Verify available storage before downloading a large model:

storage.ts

import { checkStorageQuota } from "@tryhamster/gerbil/browser";

// Check if we have 700MB available (for qwen3-0.6b)
const storage = await checkStorageQuota(700);
// {
//   ok: true,
//   available: 4500,  // MB available
//   required: 700,    // MB requested
//   message: "4.5GB available, 700MB required"
// }

if (!storage.ok) {
  alert(storage.message); // "Only 200MB available, need 700MB"
  return;
}

checkWebGPUCapabilities()

Check if the GPU can run a specific model (buffer size limits, etc.):

capabilities.ts

import { checkWebGPUCapabilities } from "@tryhamster/gerbil/browser";

const caps = await checkWebGPUCapabilities("qwen3-0.6b");
// {
//   canRunModel: true,
//   maxBufferSize: 2147483648,  // 2GB
//   requiredBufferSize: 500000000,  // ~500MB for qwen3
//   reason: "GPU buffer size sufficient"
// }

if (!caps.canRunModel) {
  console.warn(caps.reason);
  // Use smaller model or fall back to WASM
}

getBrowserDiagnostics()

Comprehensive diagnostic info for debugging compatibility issues:

diagnostics.ts

import { getBrowserDiagnostics } from "@tryhamster/gerbil/browser";

const diag = await getBrowserDiagnostics();
// {
//   browser: "Chrome",
//   version: "120.0.0",
//   platform: "macOS",
//   mobile: false,
//   webgpu: {
//     supported: true,
//     adapter: { vendor: "apple", ... }
//   },
//   memory: {
//     deviceMemory: 8,  // GB (from navigator.deviceMemory)
//     jsHeapLimit: 4096  // MB
//   },
//   storage: {
//     available: 4500,  // MB
//     persistent: true
//   }
// }

// Useful for error reporting
console.log("Diagnostics:", JSON.stringify(diag, null, 2));

Model Preloading

Download models ahead of time during app initialization, so users don't wait when they first use AI. These functions work outside React hooks — perfect for app startup.

preload.ts

01import { 
02  preloadChatModel,
03  preloadEmbeddingModel,
04  preloadTTSModel,
05  preloadSTTModel 
06} from "@tryhamster/gerbil/browser";
07
08// During app initialization (before React mounts)
09async function initApp() {
10  // Preload LLM with progress tracking
11  await preloadChatModel("qwen3-0.6b", {
12    onProgress: (p) => {
13      if (p.status === "downloading") {
14        console.log(`Downloading ${p.file}: ${p.progress}%`);
15      }
16    },
17  });
18
19  // Preload other models (all run in parallel)
20  await Promise.all([
21    preloadEmbeddingModel(),           // default: Xenova/all-MiniLM-L6-v2
22    preloadTTSModel("kokoro-82m"),     // or "supertonic-66m"
23    preloadSTTModel("whisper-tiny.en"),
24  ]);
25  
26  console.log("All models ready!");
27}
28
29// Call during app startup
30initApp();

Preload Functions

Function	Default Model	Description
preloadChatModel(modelId, opts?)	—	Preload LLM to IndexedDB
preloadEmbeddingModel(modelId?, opts?)	Xenova/all-MiniLM-L6-v2	Preload embedding model
preloadTTSModel(modelId?, opts?)	kokoro-82m	Preload text-to-speech model
preloadSTTModel(modelId?, opts?)	whisper-tiny.en	Preload speech-to-text model

Preload Options

types.ts

interface PreloadOptions {
  // Track download progress
  onProgress?: (p: PreloadProgress) => void;
  
  // Keep model loaded in memory after preload (default: false)
  // false = download, then dispose to free RAM
  // true = download and keep in memory for instant use
  keepLoaded?: boolean;
}

type PreloadProgress = {
  status: "downloading" | "loading" | "ready" | "error";
  file?: string;      // Current file being downloaded
  progress?: number;  // 0-100 percentage
  message?: string;   // Status message
};

keepLoaded Option

Control whether the model stays in memory after preloading:

Value	Behavior	Use Case
false	Download → Dispose → Free memory	Preload for later, save RAM
true	Download → Keep in memory	Instant use, no disk I/O delay

keep-loaded.ts

// Download only - frees RAM after preload (~400MB saved)
await preloadChatModel("qwen3-0.6b");
// Later: loads from IndexedDB cache (~1-2s)

// Keep in memory - uses RAM but instant inference
await preloadChatModel("qwen3-0.6b", { keepLoaded: true });
// Later: model already loaded, no wait

Browser Models

Models optimized for browser use. Automatically cached in IndexedDB after first download.

Model	Size	Speed	Best For
qwen3-0.6b	~400MB	100-150 tok/s	General use, thinking mode, reasoning
smollm2-360m	~250MB	150-200 tok/s	Faster responses, good quality
smollm2-135m	~100MB	200-300 tok/s	Fastest, basic tasks

Browser Support

Browser	Version	Status
Chrome / Edge	113+	✓ Full support
Safari	18+	⚠ May have quirks
Firefox	—	✗ Behind flag, not recommended

iOS Memory Guards

iOS Safari and iOS Chrome have strict memory limits (~300-400MB effective for WKWebView). All React hooks automatically protect against iOS crashes — no code changes required.

Automatic Protection: All hooks block large models on iOS, detect crashes, and use chunked resumable downloads.

What Happens Automatically

Hook	iOS Guard	Crash Detect	Chunked DL
useChat	✓ Blocks large	✓	✓ Resumable
useCompletion	✓ Blocks large	✓	✓ Resumable
useSpeech	—	✓	✓ Resumable
useVoiceInput	—	✓	✓ Resumable
useEmbedding	—	✓	✓ Resumable

iOS Compatibility Matrix

Model	Size	iOS Safe	Notes
smollm2-135m	~150MB	✓ Yes	Best for iOS
smollm2-360m	~400MB	✓ Yes	Recommended for iOS
qwen3-0.6b	~700MB	⚠ Risky	Only on iPhone 14+/iPad Pro
qwen3-1.7b	~1.8GB	✗ Blocked	Desktop only
kokoro-82m	~350MB	✓ Yes	TTS
whisper-tiny	~150MB	✓ Yes	STT

Manual Utilities (Advanced)

For custom implementations or advanced control, these utilities are available:

ios-utilities.ts

01import { 
02  isModelSafeForDevice,     // Check if model is safe for current device
03  detectMemoryCrash,        // Check if previous session crashed
04  setDownloadPhase,         // Track download phase for crash detection
05  clearDownloadPhase,       // Clear phase on success
06  downloadModelChunked,     // Resumable chunked downloads
07  hasIncompleteDownload,    // Check for interrupted downloads
08  clearIncompleteDownload,  // Clear partial download
09} from "@tryhamster/gerbil/browser";
10
11// Check model safety before loading
12const check = isModelSafeForDevice("qwen3-1.7b");
13if (!check.safe) {
14  console.log(check.reason);         // "Model is too large for iOS..."
15  console.log(check.recommendation); // "Use smollm2-360m or qwen3-0.6b"
16  console.log(check.maxSafeModel);   // "qwen3-0.6b"
17}
18
19// Detect if page crashed during previous model load
20const crash = detectMemoryCrash();
21if (crash.crashed) {
22  console.log(crash.recommendation); // "The model was too large..."
23  console.log(crash.phase);          // "downloading" | "initializing"
24  console.log(crash.modelId);        // which model caused it
25}

Chunked Resumable Downloads

Model downloads automatically use chunked downloading with resume support. If a download is interrupted (page refresh, crash, network error), it resumes from where it left off. For manual control:

chunked.ts

01import { 
02  downloadModelChunked, 
03  hasIncompleteDownload, 
04  clearIncompleteDownload 
05} from "@tryhamster/gerbil/browser";
06
07// Check for interrupted downloads
08const incomplete = await hasIncompleteDownload("qwen3-0.6b");
09if (incomplete.incomplete) {
10  console.log(`Resuming: ${incomplete.percent}% complete`);
11}
12
13// Download with progress and resume support
14const buffer = await downloadModelChunked(
15  "https://huggingface.co/...",
16  "qwen3-0.6b",
17  {
18    onProgress: (info) => {
19      console.log(`${info.phase}: ${info.percent}%`);
20    },
21    signal: abortController.signal,
22  }
23);

Feature	Description
HTTP Range requests	Downloads in 1.5MB chunks using `Range: bytes=start-end`
IndexedDB storage	Each chunk stored separately to avoid large transaction spikes
Automatic resume	Tracks completed chunks in manifest, resumes from last position
ETag validation	Clears cached chunks if model version changes
Abort support	Cancel downloads gracefully with AbortController
Fallback	Falls back to regular download if server doesn't support Range

Troubleshooting

"WebGPU not supported"

Update to Chrome/Edge 113+
Check chrome://gpu for WebGPU status
Try enabling chrome://flags/#enable-unsafe-webgpu

Slow first load

First load downloads the model (~400MB for qwen3-0.6b) and compiles WebGPU shaders. Subsequent loads use IndexedDB cache and are much faster (~2-5s).

Out of memory

Smaller models like smollm2-135m use less GPU memory. Close other GPU-intensive tabs.

CORS / Header issues

Your server needs these headers for SharedArrayBuffer (required for threading):

Terminal

Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Embedder-Policy: require-corp

Next.js Configuration

Add the required headers and webpack config for Next.js:

next.config.js

01// next.config.js
02/** @type {import('next').NextConfig} */
03const nextConfig = {
04  async headers() {
05    return [
06      {
07        source: "/(.*)",
08        headers: [
09          { key: "Cross-Origin-Opener-Policy", value: "same-origin" },
10          { key: "Cross-Origin-Embedder-Policy", value: "require-corp" },
11        ],
12      },
13    ];
14  },
15  webpack: (config, { isServer }) => {
16    config.experiments = {
17      ...config.experiments,
18      asyncWebAssembly: true,
19    };
20    
21    if (isServer) {
22      config.externals.push("@huggingface/transformers");
23    } else {
24      // Exclude Node.js polyfills from browser bundle
25      config.resolve.alias = {
26        ...config.resolve.alias,
27        webgpu: false,
28      };
29      config.resolve.fallback = {
30        ...config.resolve.fallback,
31        path: false,
32        fs: false,
33        os: false,
34      };
35    }
36
37    return config;
38  },
39};
40
41module.exports = nextConfig;

Next Steps

React Hooks Reference → — useSpeech, useVoiceInput, useVoiceChat
Text-to-Speech → — generate natural speech in the browser
Speech-to-Text → — transcribe audio with Whisper
Vision AI → — analyze images in the browser