Models
Built-in models and how to use any HuggingFace model.
Built-in Models
These models are curated and tested to work great with Gerbil:
| Model | Type | Size | Speed | Think | Vision | TTS | STT | Browser | Best For |
|---|---|---|---|---|---|---|---|---|---|
ministral-3b | LLM | ~2.5GB | Vision + reasoning | ||||||
qwen3-0.6b | LLM | ~400MB | General use, reasoning | ||||||
qwen2.5-0.5b | LLM | ~350MB | General use | ||||||
qwen2.5-coder-0.5b | LLM | ~400MB | Code generation | ||||||
smollm2-360m | LLM | ~250MB | Fast completions | ||||||
smollm2-135m | LLM | ~100MB | Ultra-fast, tiny | ||||||
smollm2-1.7b | LLM | ~1.2GB | Higher quality | ||||||
phi-3-mini | LLM | ~2.1GB | High quality | ||||||
llama-3.2-1b | LLM | ~800MB | General use | ||||||
gemma-2b | LLM | ~1.4GB | Balanced | ||||||
tinyllama-1.1b | LLM | ~700MB | Lightweight | ||||||
Text-to-Speech | |||||||||
kokoro-82m | TTS | ~330MB | 28 voices, 24kHz, US/UK English | ||||||
supertonic-66m | TTS | ~250MB | 4 voices, 44.1kHz, fastest | ||||||
Speech-to-Text | |||||||||
whisper-tiny.en | STT | ~39MB | Fastest transcription | ||||||
whisper-base.en | STT | ~74MB | Balanced speed/accuracy | ||||||
whisper-small.en | STT | ~244MB | High quality | ||||||
whisper-large-v3-turbo | STT | ~809MB | Best quality, 80+ langs | ||||||
Choosing a Model
For general use
qwen3-0.6b is the best all-rounder. Good quality, reasonable speed, and supports thinking mode.
await g.loadModel("qwen3-0.6b");For speed
smollm2-135m is the fastest option. Great for simple tasks where speed matters.
await g.loadModel("smollm2-135m");For code
qwen2.5-coder-0.5b is optimized for code generation and understanding.
await g.loadModel("qwen2.5-coder-0.5b");For reasoning
qwen3-0.6b with thinking mode shows step-by-step reasoning:
await g.loadModel("qwen3-0.6b");const result = await g.generate("What is 17 * 23?", { thinking: true });console.log(result.thinking); // Shows reasoning stepsFor vision
ministral-3b understands images and supports reasoning. 256K context window.
await g.loadModel("ministral-3b");const result = await g.generate("Describe this image", { images: [{ source: "https://example.com/photo.jpg" }]});console.log(result.text); // Image descriptionSee the Vision documentation for supported image formats and more examples.
Using HuggingFace Models
Load any compatible model from HuggingFace using the hf: prefix:
// Short syntaxawait g.loadModel("hf:microsoft/Phi-3-mini-4k-instruct-onnx");await g.loadModel("hf:Qwen/Qwen2.5-0.5B-Instruct");
// Full URL also worksawait g.loadModel("https://huggingface.co/microsoft/Phi-3-mini");Note: Not all HuggingFace models are compatible. Look for models with ONNX format or that work with transformers.js.
Local Models
Load models from your local filesystem:
// Relative pathawait g.loadModel("file:./models/my-fine-tune");
// Absolute pathawait g.loadModel("file:/home/user/models/custom");Load Options
Customize how models are loaded:
await g.loadModel("qwen3-0.6b", { // Device selection device: "auto", // "auto" | "gpu" | "cpu" | "webgpu" // Quantization level dtype: "q4", // "q4" | "q8" | "fp16" | "fp32" // Progress callback onProgress: (info) => { console.log(info.status, info.progress); },});Quantization
| Level | Size | Quality | Speed |
|---|---|---|---|
q4 | Smallest | Good | Fastest |
q8 | Medium | Better | Fast |
fp16 | Large | Best | Slower |
fp32 | Largest | Best | Slowest |
Model Caching
Models are cached locally after first download. Default location:
~/.gerbil/models/Manage cached models with the CLI:
# List cached modelsgerbil models --installed
# Remove a modelgerbil rm smollm2-135m
# Clear all cached modelsgerbil cache clear