Models

Name: Gerbil
Author: Gerbil

Built-in models and how to use any HuggingFace model.

Built-in Models

These models are curated and tested to work great with Gerbil:

Model	Type	Size	Best For
`ministral-3b`	LLM	~2.5GB	Vision + reasoning
`qwen3-0.6b`	LLM	~400MB	General use, reasoning
`qwen2.5-0.5b`	LLM	~350MB	General use
`qwen2.5-coder-0.5b`	LLM	~400MB	Code generation
`smollm2-360m`	LLM	~250MB	Fast completions
`smollm2-135m`	LLM	~100MB	Ultra-fast, tiny
`smollm2-1.7b`	LLM	~1.2GB	Higher quality
`phi-3-mini`	LLM	~2.1GB	High quality
`llama-3.2-1b`	LLM	~800MB	General use
`gemma-2b`	LLM	~1.4GB	Balanced
`tinyllama-1.1b`	LLM	~700MB	Lightweight
Text-to-Speech
`kokoro-82m`	TTS	~330MB	28 voices, 24kHz, US/UK English
`supertonic-66m`	TTS	~250MB	4 voices, 44.1kHz, fastest
Speech-to-Text
`whisper-tiny.en`	STT	~39MB	Fastest transcription
`whisper-base.en`	STT	~74MB	Balanced speed/accuracy
`whisper-small.en`	STT	~244MB	High quality
`whisper-large-v3-turbo`	STT	~809MB	Best quality, 80+ langs

Choosing a Model

For general use

qwen3-0.6b is the best all-rounder. Good quality, reasonable speed, and supports thinking mode.

await g.loadModel("qwen3-0.6b");

For speed

smollm2-135m is the fastest option. Great for simple tasks where speed matters.

await g.loadModel("smollm2-135m");

For code

qwen2.5-coder-0.5b is optimized for code generation and understanding.

await g.loadModel("qwen2.5-coder-0.5b");

For reasoning

qwen3-0.6b with thinking mode shows step-by-step reasoning:

await g.loadModel("qwen3-0.6b");
const result = await g.generate("What is 17 * 23?", { thinking: true });
console.log(result.thinking); // Shows reasoning steps

For vision

ministral-3b understands images and supports reasoning. 256K context window.

await g.loadModel("ministral-3b");
const result = await g.generate("Describe this image", {
  images: [{ source: "https://example.com/photo.jpg" }]
});
console.log(result.text); // Image description

See the Vision documentation for supported image formats and more examples.

Using HuggingFace Models

Load any compatible model from HuggingFace using the hf: prefix:

huggingface.ts

// Short syntax
await g.loadModel("hf:microsoft/Phi-3-mini-4k-instruct-onnx");
await g.loadModel("hf:Qwen/Qwen2.5-0.5B-Instruct");

// Full URL also works
await g.loadModel("https://huggingface.co/microsoft/Phi-3-mini");

Note: Not all HuggingFace models are compatible. Look for models with ONNX format or that work with transformers.js.

Local Models

Load models from your local filesystem:

local.ts

// Relative path
await g.loadModel("file:./models/my-fine-tune");

// Absolute path
await g.loadModel("file:/home/user/models/custom");

Load Options

Customize how models are loaded:

options.ts

await g.loadModel("qwen3-0.6b", {
  // Device selection
  device: "auto",      // "auto" | "gpu" | "cpu" | "webgpu"
  
  // Quantization level
  dtype: "q4",         // "q4" | "q8" | "fp16" | "fp32"
  
  // Progress callback
  onProgress: (info) => {
    console.log(info.status, info.progress);
  },
});

Quantization

Level	Size	Quality	Speed
`q4`	Smallest	Good	Fastest
`q8`	Medium	Better	Fast
`fp16`	Large	Best	Slower
`fp32`	Largest	Best	Slowest

Model Caching

Models are cached locally after first download. Default location:

Terminal

~/.gerbil/models/

Manage cached models with the CLI:

Terminal

# List cached models
gerbil models --installed

# Remove a model
gerbil rm smollm2-135m

# Clear all cached models
gerbil cache clear