WebGPU
v1.0.0 / OSS MIT
Hamsterbuilt with Hamster

On-device LLMs &
AI SDK Provider

Run in the browser, on Node.js, or anywhere JavaScript runs. Native WebGPU compute (WGSL). Zero API keys.

Text, vision, TTS, transcription, tools & skills. Works with generateText, streamText, and structured output.

$npm install@tryhamster/gerbil
Loading playground...
WebGPU Accelerated

Run LLMs on the User's GPU

40-200+ tok/s on the native WebGPU engine — pure WGSL compute, no ONNX runtime. Text, vision, embeddings, speech-to-text & text-to-speech all run natively. Cached in IndexedDB.

Native WebGPU (WGSL)
Runs on iPad Safari 26+
~200MB-1GB on-device models
AI SDK Provider

Works with AI SDK

Drop-in provider for generateText, streamText, and structured output. Also works with ai-sdk-tools for agents and state management.

  • Streaming responses
  • Zod schema validation
  • Tool calling
  • Thinking mode (CoT)
AI SDK integration docs
Use Cases

What You Can Build

Micro AI interactions that run locally. No API calls, no latency, no cost per request.

Smart Autocomplete

Context-aware suggestions that understand what users actually want, not just pattern matching

await gerbil.complete(input, { context })

Type 'meeting' → suggests 'Schedule meeting with Sarah about Q4 planning'

Instant Classification

Route tickets, tag content, detect spam — all in real-time without server calls

await gerbil.classify(text, categories)

Support ticket → 'billing' (98% confidence)

One-Click Summaries

TL;DR any content on demand. Long emails, docs, articles — instantly digestible

await gerbil.summarize(content)

10-page report → 3 key takeaways in 200ms

Smart Search

Understand queries semantically, not just keywords. Find what users mean

await gerbil.search(query, documents)

'stuff from last week' → finds relevant items

Writing Assistance

Grammar, tone, clarity — help users write better without leaving the input

await gerbil.improve(text, { style })

Suggests clearer phrasing as you type

Smart Defaults

Pre-fill forms intelligently based on context and user patterns

await gerbil.suggest(field, context)

Auto-suggests project name from description

Content Extraction

Pull structured data from unstructured text. Names, dates, entities

await gerbil.extract(text, schema)

'Call John at 3pm' → { person: 'John', time: '3pm' }

Sentiment Analysis

Understand tone in real-time. Flag angry customers, celebrate happy ones

await gerbil.sentiment(text)

Customer message → 'frustrated' (prioritize)

Explain Anything

Let users highlight any text and get instant explanations, definitions, or context

await gerbil.explain(selection)

Highlight 'WebGPU' → explains in plain English

Image Understanding

Describe photos, analyze screenshots, extract text from images — all locally

await engine.describeImage(img, prompt)

Upload receipt → extracts items, totals, dates

Visual QA

Let users ask questions about images in your app

await engine.describeImage(img, "What is this?")

'What color is the car?' → 'The car is blue'

Alt Text Generation

Auto-generate accessible image descriptions for your content

await captionImage({ image, style })

Photo → 'A sunset over the ocean with orange clouds'

Voice Narration

Read content aloud with natural-sounding voices. On-device TTS with native Kani-TTS-2

await engine.speak(text, { voice: "en_us" })

Blog post → Natural audio narration in realtime

Voice Input

Let users speak instead of type. Transcribe locally with native Moonshine

await stt.transcribe(pcm16kMono)

🎤 'Schedule meeting tomorrow' → typed text

Voice Chat

Full voice-to-voice conversations. STT → LLM → TTS, all on-device

useVoiceChat({ llmModel, voice })

Speak question → AI responds with voice

// All client-side. No server. No API costs.

Explore all skills
Features

Why Gerbil?

AI that runs where your code runs.

01

Runs Anywhere

Browser. Server. Edge. Same API everywhere JavaScript runs.

02

Feels Instant

40-200 tok/s on WebGPU. Fast enough to feel like magic.

03

Nothing to Manage

No API keys. No model servers. No billing dashboards. No ops.

04

Private by Default

Data never leaves the device. Ship AI in healthcare, finance, anywhere.

05

Downloads Once

100MB-2.5GB models. Cached in IndexedDB. Instant after first load.

06

Production Ready

Vision, tool calling, thinking mode, skills. With one line of code.

Usage

Browser & Server

Same native API, different environments. The WebGPU engine runs in the browser and on Node.js (WebGPU via Dawn).

Browser

Chat.tsx
01import { useEngine } from "@tryhamster/gerbil/gpu/hooks";
02
03function Chat() {
04 // Native WebGPU engine, cached after first download.
05 // No model arg → a device-aware default. Or pass { model: "..." }.
06 const { complete, completion, isGenerating } = useEngine();
07
08 return (
09 <button onClick={() => complete(userInput)} disabled={isGenerating}>
10 {completion || "Generate"}
11 </button>
12 );
13}
React hooks docs

Node.js

server.ts
01import { WebGPUEngine } from "@tryhamster/gerbil/gpu";
02
03// Same native engine on Node.js (WebGPU via Dawn)
04const engine = await WebGPUEngine.create({
05 repo: "Qwen/Qwen3.5-0.8B", // thinking-capable, 262k context
06});
07
08const result = await engine.generate("Write a haiku", {
09 maxTokens: 100,
10});
11
12console.log(result.thinking); // reasoning steps
13console.log(result.text); // final response
Getting started

Vision AI

Vision.tsx
01import { useEngine } from "@tryhamster/gerbil/gpu/hooks";
02
03function Vision() {
04 // Qwen3.5's own ViT tower (+~192 MB) — bit-exact vs HF.
05 const { describeImage, completion } = useEngine({ enableVision: true });
06
07 // image-in -> text-out (URL, File, or decoded RGB pixels)
08 return (
09 <button onClick={() => describeImage(photoUrl, "What's in this photo?")}>
10 {completion || "Describe"}
11 </button>
12 );
13}
Vision docs

Tool Calling

tools.ts
01import { defineTool } from "@tryhamster/gerbil";
02import { z } from "zod";
03
04const weather = defineTool({
05 name: "get_weather",
06 description: "Get weather for a city",
07 parameters: z.object({
08 city: z.string(),
09 }),
10 execute: async ({ city }) => {
11 return `Weather in ${city}: 72°F, sunny`;
12 },
13});
14
15// LLM can now call this tool during generation
Tools docs

Skills

skills.ts
01import { commit, summarize, review } from "@tryhamster/gerbil/skills";
02import { describeImage, captionImage } from "@tryhamster/gerbil/skills";
03
04// Generate commit message from staged changes
05const msg = await commit({ type: "conventional" });
06
07// Summarize any content
08const tldr = await summarize({ content: longDoc });
09
10// Vision skills
11const alt = await captionImage({ image: photoUrl });
12const analysis = await describeImage({
13 image: screenshot,
14 focus: "text"
15});
Skills docs

Text-to-Speech

Speaker.tsx
01import { useTTS } from "@tryhamster/gerbil/gpu/hooks";
02
03function Speaker() {
04 // Native WebGPU TTS (Kani-TTS-2) — pure WGSL, no ONNX.
05 // Synthesizes + plays 22.05 kHz audio on-device.
06 const { speak, isSynthesizing } = useTTS();
07
08 return (
09 <button onClick={() => speak("Hello, I'm Gerbil!")} disabled={isSynthesizing}>
10 Speak
11 </button>
12 );
13}
TTS docs

Speech-to-Text

VoiceInput.tsx
01import { useSTT } from "@tryhamster/gerbil/gpu/hooks";
02
03function VoiceInput() {
04 // Native WebGPU ASR (Moonshine) — no ONNX, no log-mel.
05 // Captures the mic and resamples to 16 kHz mono for you.
06 const { startRecording, stopRecording, isRecording, transcript } = useSTT();
07
08 return (
09 <div>
10 <button onClick={() => (isRecording ? stopRecording() : startRecording())}>
11 {isRecording ? "Stop" : "Record"}
12 </button>
13 <p>{transcript}</p>
14 </div>
15 );
16}
STT docs

CLI

Terminal
$ gerbil "Write a haiku about coding"
🤖 Loading lfm2.5-350m...
✓ Model loaded (2.3s)
Silent keystrokes fall
Bugs emerge from tangled code
Coffee saves the day
⚡ 47.2 tok/s | 0.8s
$ gerbil speak "Hello world" --voice bf_emma
$ gerbil transcribe audio.wav --timestamps
$ gerbil voice question.wav # STT → LLM → TTS
CLI docs
Models

Built-in Models

Optimized for browser and Node.js. Small enough to download, powerful enough to impress.

ModelTypeSizeSpeedThinkVisionEmbedTTSSTTBrowserBest For
qwen3.5-0.8bLLM~404MB
Text + vision, 262K context, reasoning
lfm2.5-350mLLM~199MB
Fast, lightweight text generation
gemma-4-e2bLLM~2.5GB (PLE streams, ~1GB resident)
Text-only · larger, desktop-leaning quality (vision/audio towers not yet built)
Embeddings
embeddinggemma-300mEMBED~173MB
768-dim semantic search (asymmetric query/document)
Text-to-Speech
kani-tts-2TTS
On-device speech synthesis (coming — not yet runnable)
Speech-to-Text
moonshine-baseSTT~190MB
Raw-waveform English transcription (greedy)

// Use any HuggingFace repo: await WebGPUEngine.create({ repo: "org/model" })

Get Started

$ npm install @tryhamster/gerbil