How to use from
llama.cpp
Install from brew
brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Janeodum/tsaro-e2b-gguf:F16
# Run inference directly in the terminal:
llama-cli -hf Janeodum/tsaro-e2b-gguf:F16
Install from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Janeodum/tsaro-e2b-gguf:F16
# Run inference directly in the terminal:
llama-cli -hf Janeodum/tsaro-e2b-gguf:F16
Use pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf Janeodum/tsaro-e2b-gguf:F16
# Run inference directly in the terminal:
./llama-cli -hf Janeodum/tsaro-e2b-gguf:F16
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf Janeodum/tsaro-e2b-gguf:F16
# Run inference directly in the terminal:
./build/bin/llama-cli -hf Janeodum/tsaro-e2b-gguf:F16
Use Docker
docker model run hf.co/Janeodum/tsaro-e2b-gguf:F16
Quick Links

Tsaro Gemma 4 E2B — GGUF

Quantized GGUF build of Janeodum/tsaro-e2b, for on-device inference via llama.cpp and llama.rn.

What this model does

Tsaro is a shared safety system for Northern Nigeria. This model is its threat extraction component: it takes an unstructured report written in Hausa, Pidgin, or English and returns a structured threat signal — threat type, location, perpetrator and vehicle counts, direction of movement, time references, and a confidence score.

Model details

  • Quantized from: Janeodum/tsaro-e2b
  • Original base model: google/gemma-4-e2b-it
  • Quantization: Q4_K_M
  • Format: GGUF, for llama.cpp / llama.rn
  • Role in Tsaro: the E2B variant is the smaller of two on-device extraction models. It is the fallback for older or low-RAM Android devices — the Tsaro app loads the largest model the hardware can run, falling back from E4B to E2B to a hosted endpoint.

Usage

With llama.cpp:

llama-cli -m tsaro-e2b-q4_k_m.gguf -p "your threat report text here"

In a React Native app via llama.rn, the model file is bundled or downloaded on first run and loaded for offline extraction when the device has no connectivity.

Intended use and limitations

Built for community safety reporting in a specific regional context. Not a general-purpose model. Outputs are extraction assistance, not verified intelligence.

Downloads last month
715
GGUF
Model size
5B params
Architecture
gemma4
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Janeodum/tsaro-e2b-gguf

Quantized
(1)
this model