How to use from the
Use from the
llama-cpp-python library
# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="Janeodum/tsaro-e2b-gguf",
	filename="",
)
llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Tsaro Gemma 4 E2B — GGUF

Quantized GGUF build of Janeodum/tsaro-e2b, for on-device inference via llama.cpp and llama.rn.

What this model does

Tsaro is a shared safety system for Northern Nigeria. This model is its threat extraction component: it takes an unstructured report written in Hausa, Pidgin, or English and returns a structured threat signal — threat type, location, perpetrator and vehicle counts, direction of movement, time references, and a confidence score.

Model details

  • Quantized from: Janeodum/tsaro-e2b
  • Original base model: google/gemma-4-e2b-it
  • Quantization: Q4_K_M
  • Format: GGUF, for llama.cpp / llama.rn
  • Role in Tsaro: the E2B variant is the smaller of two on-device extraction models. It is the fallback for older or low-RAM Android devices — the Tsaro app loads the largest model the hardware can run, falling back from E4B to E2B to a hosted endpoint.

Usage

With llama.cpp:

llama-cli -m tsaro-e2b-q4_k_m.gguf -p "your threat report text here"

In a React Native app via llama.rn, the model file is bundled or downloaded on first run and loaded for offline extraction when the device has no connectivity.

Intended use and limitations

Built for community safety reporting in a specific regional context. Not a general-purpose model. Outputs are extraction assistance, not verified intelligence.

Downloads last month
581
GGUF
Model size
5B params
Architecture
gemma4
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Janeodum/tsaro-e2b-gguf

Quantized
(1)
this model