Browser Usage

Run LLMs directly in the browser with WebGPU acceleration. No server required.

100-150 tok/s with WebGPU ·Models cached in IndexedDB ·Fully private, runs locally

React Hooks

The easiest way to use Gerbil in React. Follows the same patterns as Vercel AI SDK.

useChat

Full chat with message history, thinking mode, and streaming:

Chat.tsx
01import { useChat } from "@tryhamster/gerbil/browser";
02
03function Chat() {
04 const {
05 messages, // Message[] with id, role, content, thinking?
06 input, // Current input value
07 setInput, // Update input
08 handleSubmit, // Form submit handler
09 isLoading, // Model loading
10 loadingProgress, // { status, file?, progress? }
11 isGenerating, // Currently generating
12 thinking, // Current thinking content (streaming)
13 stop, // Stop generation
14 clear, // Clear messages
15 tps, // Tokens per second
16 error, // Error message
17 } = useChat({
18 model: "qwen3-0.6b",
19 thinking: true,
20 system: "You are a helpful assistant.",
21 maxTokens: 512,
22 });
23
24 if (isLoading) {
25 return <div>Loading model: {loadingProgress?.progress}%</div>;
26 }
27
28 return (
29 <div>
30 {messages.map(m => (
31 <div key={m.id}>
32 {m.thinking && (
33 <details>
34 <summary>Thinking...</summary>
35 <pre>{m.thinking}</pre>
36 </details>
37 )}
38 <p><strong>{m.role}:</strong> {m.content}</p>
39 </div>
40 ))}
41
42 <form onSubmit={handleSubmit}>
43 <input
44 value={input}
45 onChange={e => setInput(e.target.value)}
46 disabled={isGenerating}
47 placeholder="Ask anything..."
48 />
49 <button type="submit" disabled={isGenerating}>
50 {isGenerating ? `${tps.toFixed(0)} tok/s` : "Send"}
51 </button>
52 {isGenerating && <button type="button" onClick={stop}>Stop</button>}
53 </form>
54 </div>
55 );
56}

useCompletion

One-off text generation without message history:

Generator.tsx
01import { useCompletion } from "@tryhamster/gerbil/browser";
02
03function Generator() {
04 const {
05 complete, // Function to generate text
06 completion, // Generated text (streaming)
07 thinking, // Thinking content (if enabled)
08 isLoading, // Model loading
09 isGenerating, // Currently generating
10 tps, // Tokens per second
11 stop, // Stop generation
12 error, // Error message
13 } = useCompletion({
14 model: "qwen3-0.6b",
15 thinking: true,
16 maxTokens: 256,
17 });
18
19 if (isLoading) return <div>Loading...</div>;
20
21 return (
22 <div>
23 <button
24 onClick={() => complete("Write a haiku about coding")}
25 disabled={isGenerating}
26 >
27 Generate
28 </button>
29
30 {thinking && (
31 <details open>
32 <summary>Thinking...</summary>
33 <pre>{thinking}</pre>
34 </details>
35 )}
36
37 <p>{completion}</p>
38
39 {isGenerating && (
40 <>
41 <span>{tps.toFixed(0)} tok/s</span>
42 <button onClick={stop}>Stop</button>
43 </>
44 )}
45 </div>
46 );
47}

Hook Options

types.ts
interface UseChatOptions {
model?: string; // Model ID (default: "qwen3-0.6b")
autoLoad?: boolean; // Load model on mount (default: false)
thinking?: boolean; // Enable thinking mode (Qwen3)
system?: string; // System prompt
maxTokens?: number; // Max tokens (default: 256)
temperature?: number; // Temperature (default: 0.7)
topP?: number; // Top-p sampling
topK?: number; // Top-k sampling
}
// useCompletion uses the same options

Lazy Loading (Default)

By default, models load on first generation - not on page load. This prevents surprise downloads:

loading.tsx
// Default: model loads when user first submits
const { handleSubmit, isLoading } = useChat();
// Preload on mount (downloads immediately)
const { handleSubmit, isLoading } = useChat({ autoLoad: true });
// Manual control with load()
const { load, isLoading, isReady } = useChat();
return (
<button onClick={load} disabled={isLoading || isReady}>
{isLoading ? "Loading..." : isReady ? "Ready" : "Load Model"}
</button>
);

Loading Progress States

The loadingProgress object tells you exactly what's happening during model load:

loading-states.ts
// loadingProgress.status values:
"downloading" // Fetching from network (first time)
// Has: file, progress (0-100)
"loading" // Loading from IndexedDB cache (fast)
// No additional properties
"ready" // Model ready for inference
// No additional properties
"error" // Load failed
// Has: error message
// Example usage:
if (loadingProgress?.status === "downloading") {
return <div>Downloading {loadingProgress.file}: {loadingProgress.progress}%</div>;
}
if (loadingProgress?.status === "loading") {
return <div>Loading from cache...</div>;
}

Message Type

types.ts
interface Message {
id: string; // Unique ID (e.g., "msg-1")
role: "user" | "assistant"; // Message role
content: string; // The message content
thinking?: string; // Thinking content (if enabled)
}

Low-Level API

For non-React apps or custom implementations, use createGerbilWorker directly:

vanilla.ts
01import { createGerbilWorker, isWebGPUSupported } from "@tryhamster/gerbil/browser";
02
03// Check WebGPU support
04if (!isWebGPUSupported()) {
05 console.log("WebGPU not supported - use Chrome/Edge 113+");
06 return;
07}
08
09// Create worker (loads model automatically)
10const gerbil = await createGerbilWorker({
11 modelId: "qwen3-0.6b",
12 onProgress: (p) => {
13 if (p.status === "downloading") {
14 console.log(`Downloading ${p.file}: ${p.progress}%`);
15 } else if (p.status === "loading") {
16 console.log("Loading from cache...");
17 }
18 },
19 onToken: (token) => {
20 // token.text - the token text
21 // token.state - "thinking" or "answering"
22 // token.tps - tokens per second
23 process.stdout.write(token.text);
24 },
25 onComplete: (result) => {
26 console.log(`Done: ${result.tps.toFixed(1)} tok/s`);
27 },
28});
29
30// Generate
31await gerbil.generate("Write a haiku", { thinking: true });
32
33// Interrupt
34gerbil.interrupt();
35
36// Reset conversation
37gerbil.reset();
38
39// Clean up
40gerbil.terminate();

Utilities

Helper functions to check compatibility, optimize for device capabilities, and debug issues.

imports.ts
import {
// Basic checks
isWebGPUSupported, // Quick boolean check
getWebGPUInfo, // GPU adapter info
// Production-ready checks
checkWebGPUReady, // Full WebGPU verification
getRecommendedModels, // Memory-aware model selection
checkStorageQuota, // Verify disk space
checkWebGPUCapabilities,// GPU buffer limits
// Debugging
getBrowserDiagnostics, // Full diagnostic report
} from "@tryhamster/gerbil/browser";

isWebGPUSupported()

Check if the browser supports WebGPU:

check.ts
import { isWebGPUSupported } from "@tryhamster/gerbil/browser";
if (!isWebGPUSupported()) {
// Show fallback UI or error message
alert("Please use Chrome or Edge 113+ for WebGPU support");
}

getWebGPUInfo()

Get GPU adapter information for debugging:

info.ts
import { getWebGPUInfo } from "@tryhamster/gerbil/browser";
const info = await getWebGPUInfo();
console.log(info);
// { supported: true, adapter: "Apple", device: "Apple M4 Max" }

checkWebGPUReady()

Full WebGPU verification — checks not just if the API exists, but if it actually works:

check-ready.ts
import { checkWebGPUReady } from "@tryhamster/gerbil/browser";
const result = await checkWebGPUReady();
// {
// ok: true,
// webgpu: true,
// adapter: { vendor: "apple", architecture: "common-3", device: "", description: "" },
// reason: "WebGPU is ready"
// }
if (!result.ok) {
console.warn(result.reason); // Human-readable explanation
// Falls back to WASM automatically
}

Memory-aware model selection based on navigator.deviceMemory:

recommended.ts
import { getRecommendedModels } from "@tryhamster/gerbil/browser";
const models = getRecommendedModels();
// {
// chat: "qwen3-0.6b", // or "smollm2-360m" on low-memory devices
// vision: "ministral-3b", // or null if not enough memory
// tts: "kokoro-82m",
// stt: "whisper-base.en", // or "whisper-tiny.en" on low-memory
// embedding: "Xenova/all-MiniLM-L6-v2",
// reason: "8GB+ detected, using full models"
// }
// Use for smart defaults
const { messages } = useChat({ model: models.chat });

Mobile: On mobile devices, Gerbil automatically uses q4 quantization (CPU-optimized) instead of q4f16 (GPU-optimized) for better compatibility and performance.

checkStorageQuota()

Verify available storage before downloading a large model:

storage.ts
import { checkStorageQuota } from "@tryhamster/gerbil/browser";
// Check if we have 700MB available (for qwen3-0.6b)
const storage = await checkStorageQuota(700);
// {
// ok: true,
// available: 4500, // MB available
// required: 700, // MB requested
// message: "4.5GB available, 700MB required"
// }
if (!storage.ok) {
alert(storage.message); // "Only 200MB available, need 700MB"
return;
}

checkWebGPUCapabilities()

Check if the GPU can run a specific model (buffer size limits, etc.):

capabilities.ts
import { checkWebGPUCapabilities } from "@tryhamster/gerbil/browser";
const caps = await checkWebGPUCapabilities("qwen3-0.6b");
// {
// canRunModel: true,
// maxBufferSize: 2147483648, // 2GB
// requiredBufferSize: 500000000, // ~500MB for qwen3
// reason: "GPU buffer size sufficient"
// }
if (!caps.canRunModel) {
console.warn(caps.reason);
// Use smaller model or fall back to WASM
}

getBrowserDiagnostics()

Comprehensive diagnostic info for debugging compatibility issues:

diagnostics.ts
import { getBrowserDiagnostics } from "@tryhamster/gerbil/browser";
const diag = await getBrowserDiagnostics();
// {
// browser: "Chrome",
// version: "120.0.0",
// platform: "macOS",
// mobile: false,
// webgpu: {
// supported: true,
// adapter: { vendor: "apple", ... }
// },
// memory: {
// deviceMemory: 8, // GB (from navigator.deviceMemory)
// jsHeapLimit: 4096 // MB
// },
// storage: {
// available: 4500, // MB
// persistent: true
// }
// }
// Useful for error reporting
console.log("Diagnostics:", JSON.stringify(diag, null, 2));

Model Preloading

Download models ahead of time during app initialization, so users don't wait when they first use AI. These functions work outside React hooks — perfect for app startup.

preload.ts
01import {
02 preloadChatModel,
03 preloadEmbeddingModel,
04 preloadTTSModel,
05 preloadSTTModel
06} from "@tryhamster/gerbil/browser";
07
08// During app initialization (before React mounts)
09async function initApp() {
10 // Preload LLM with progress tracking
11 await preloadChatModel("qwen3-0.6b", {
12 onProgress: (p) => {
13 if (p.status === "downloading") {
14 console.log(`Downloading ${p.file}: ${p.progress}%`);
15 }
16 },
17 });
18
19 // Preload other models (all run in parallel)
20 await Promise.all([
21 preloadEmbeddingModel(), // default: Xenova/all-MiniLM-L6-v2
22 preloadTTSModel("kokoro-82m"), // or "supertonic-66m"
23 preloadSTTModel("whisper-tiny.en"),
24 ]);
25
26 console.log("All models ready!");
27}
28
29// Call during app startup
30initApp();

Preload Functions

FunctionDefault ModelDescription
preloadChatModel(modelId, opts?)Preload LLM to IndexedDB
preloadEmbeddingModel(modelId?, opts?)Xenova/all-MiniLM-L6-v2Preload embedding model
preloadTTSModel(modelId?, opts?)kokoro-82mPreload text-to-speech model
preloadSTTModel(modelId?, opts?)whisper-tiny.enPreload speech-to-text model

Preload Options

types.ts
interface PreloadOptions {
// Track download progress
onProgress?: (p: PreloadProgress) => void;
// Keep model loaded in memory after preload (default: false)
// false = download, then dispose to free RAM
// true = download and keep in memory for instant use
keepLoaded?: boolean;
}
type PreloadProgress = {
status: "downloading" | "loading" | "ready" | "error";
file?: string; // Current file being downloaded
progress?: number; // 0-100 percentage
message?: string; // Status message
};

keepLoaded Option

Control whether the model stays in memory after preloading:

ValueBehaviorUse Case
falseDownload → Dispose → Free memoryPreload for later, save RAM
trueDownload → Keep in memoryInstant use, no disk I/O delay
keep-loaded.ts
// Download only - frees RAM after preload (~400MB saved)
await preloadChatModel("qwen3-0.6b");
// Later: loads from IndexedDB cache (~1-2s)
// Keep in memory - uses RAM but instant inference
await preloadChatModel("qwen3-0.6b", { keepLoaded: true });
// Later: model already loaded, no wait

Browser Models

Models optimized for browser use. Automatically cached in IndexedDB after first download.

ModelSizeSpeedBest For
qwen3-0.6b~400MB100-150 tok/sGeneral use, thinking mode, reasoning
smollm2-360m~250MB150-200 tok/sFaster responses, good quality
smollm2-135m~100MB200-300 tok/sFastest, basic tasks

Browser Support

BrowserVersionStatus
Chrome / Edge113+✓ Full support
Safari18+⚠ May have quirks
Firefox✗ Behind flag, not recommended

iOS Memory Guards

iOS Safari and iOS Chrome have strict memory limits (~300-400MB effective for WKWebView). All React hooks automatically protect against iOS crashes — no code changes required.

Automatic Protection: All hooks block large models on iOS, detect crashes, and use chunked resumable downloads.

What Happens Automatically

HookiOS GuardCrash DetectChunked DL
useChat✓ Blocks large✓ Resumable
useCompletion✓ Blocks large✓ Resumable
useSpeech✓ Resumable
useVoiceInput✓ Resumable
useEmbedding✓ Resumable

iOS Compatibility Matrix

ModelSizeiOS SafeNotes
smollm2-135m~150MB✓ YesBest for iOS
smollm2-360m~400MB✓ YesRecommended for iOS
qwen3-0.6b~700MB⚠ RiskyOnly on iPhone 14+/iPad Pro
qwen3-1.7b~1.8GB✗ BlockedDesktop only
kokoro-82m~350MB✓ YesTTS
whisper-tiny~150MB✓ YesSTT

Manual Utilities (Advanced)

For custom implementations or advanced control, these utilities are available:

ios-utilities.ts
01import {
02 isModelSafeForDevice, // Check if model is safe for current device
03 detectMemoryCrash, // Check if previous session crashed
04 setDownloadPhase, // Track download phase for crash detection
05 clearDownloadPhase, // Clear phase on success
06 downloadModelChunked, // Resumable chunked downloads
07 hasIncompleteDownload, // Check for interrupted downloads
08 clearIncompleteDownload, // Clear partial download
09} from "@tryhamster/gerbil/browser";
10
11// Check model safety before loading
12const check = isModelSafeForDevice("qwen3-1.7b");
13if (!check.safe) {
14 console.log(check.reason); // "Model is too large for iOS..."
15 console.log(check.recommendation); // "Use smollm2-360m or qwen3-0.6b"
16 console.log(check.maxSafeModel); // "qwen3-0.6b"
17}
18
19// Detect if page crashed during previous model load
20const crash = detectMemoryCrash();
21if (crash.crashed) {
22 console.log(crash.recommendation); // "The model was too large..."
23 console.log(crash.phase); // "downloading" | "initializing"
24 console.log(crash.modelId); // which model caused it
25}

Chunked Resumable Downloads

Model downloads automatically use chunked downloading with resume support. If a download is interrupted (page refresh, crash, network error), it resumes from where it left off. For manual control:

chunked.ts
01import {
02 downloadModelChunked,
03 hasIncompleteDownload,
04 clearIncompleteDownload
05} from "@tryhamster/gerbil/browser";
06
07// Check for interrupted downloads
08const incomplete = await hasIncompleteDownload("qwen3-0.6b");
09if (incomplete.incomplete) {
10 console.log(`Resuming: ${incomplete.percent}% complete`);
11}
12
13// Download with progress and resume support
14const buffer = await downloadModelChunked(
15 "https://huggingface.co/...",
16 "qwen3-0.6b",
17 {
18 onProgress: (info) => {
19 console.log(`${info.phase}: ${info.percent}%`);
20 },
21 signal: abortController.signal,
22 }
23);
FeatureDescription
HTTP Range requestsDownloads in 1.5MB chunks using Range: bytes=start-end
IndexedDB storageEach chunk stored separately to avoid large transaction spikes
Automatic resumeTracks completed chunks in manifest, resumes from last position
ETag validationClears cached chunks if model version changes
Abort supportCancel downloads gracefully with AbortController
FallbackFalls back to regular download if server doesn't support Range

Troubleshooting

"WebGPU not supported"

  • Update to Chrome/Edge 113+
  • Check chrome://gpu for WebGPU status
  • Try enabling chrome://flags/#enable-unsafe-webgpu

Slow first load

First load downloads the model (~400MB for qwen3-0.6b) and compiles WebGPU shaders. Subsequent loads use IndexedDB cache and are much faster (~2-5s).

Out of memory

Smaller models like smollm2-135m use less GPU memory. Close other GPU-intensive tabs.

CORS / Header issues

Your server needs these headers for SharedArrayBuffer (required for threading):

Terminal
Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Embedder-Policy: require-corp

Next.js Configuration

Add the required headers and webpack config for Next.js:

next.config.js
01// next.config.js
02/** @type {import('next').NextConfig} */
03const nextConfig = {
04 async headers() {
05 return [
06 {
07 source: "/(.*)",
08 headers: [
09 { key: "Cross-Origin-Opener-Policy", value: "same-origin" },
10 { key: "Cross-Origin-Embedder-Policy", value: "require-corp" },
11 ],
12 },
13 ];
14 },
15 webpack: (config, { isServer }) => {
16 config.experiments = {
17 ...config.experiments,
18 asyncWebAssembly: true,
19 };
20
21 if (isServer) {
22 config.externals.push("@huggingface/transformers");
23 } else {
24 // Exclude Node.js polyfills from browser bundle
25 config.resolve.alias = {
26 ...config.resolve.alias,
27 webgpu: false,
28 };
29 config.resolve.fallback = {
30 ...config.resolve.fallback,
31 path: false,
32 fs: false,
33 os: false,
34 };
35 }
36
37 return config;
38 },
39};
40
41module.exports = nextConfig;

Next Steps