Bronsn
/

tiny-aya-fire-tools-GGUF

+---
+license: cc-by-nc-4.0
+language:
+  - en
+  - sw
+  - lg
+  - multilingual
+tags:
+  - tiny-aya
+  - tool-calling
+  - function-calling
+  - multilingual
+  - on-device
+  - gguf
+  - ollama
+  - tiny-facade
+  - cohere
+  - african-languages
+base_model: CohereLabs/tiny-aya-fire-GGUF
+pipeline_tag: text-generation
+library_name: llama.cpp
+---
+# Tiny Aya Fire — Tool-Calling GGUF
+**A corrected, tool-calling-ready GGUF of [CohereLabs/tiny-aya-fire](https://huggingface.co/CohereLabs/tiny-aya-fire-GGUF) for Ollama and llama.cpp.**
+Part of the **[Tiny Facade](https://huggingface.co/collections/Bronsn/tiny-facade-multilingual-tool-calling-models)** collection — an open-source effort to bring reliable multilingual tool calling to on-device AI.
+## What This Fixes
+The official Tiny Aya GGUFs on Ollama ship with the **wrong chat template** (Command-R's template instead of Tiny Aya's own). This causes:
+- **End-token leakage** — `<|END_OF_TURN_TOKEN|>` and `<|END_RESPONSE|>` printed as visible text in responses
+- **No tool-calling support** — the default template has no provisions for function calling
+- **Broken conversation flow** — responses don't terminate cleanly
+This GGUF ships with a **corrected Modelfile** that uses Tiny Aya's actual template, adds proper stop tokens, and injects structured tool-calling support.
+## Quick Start (Ollama)
+```bash
+# Download the Modelfile
+# Then create the model pointing to the GGUF
+ollama create tiny-aya-fire-tools -f tiny-aya-fire-tools.Modelfile
+```
+Or if you've downloaded the GGUF directly, update the `FROM` line in the Modelfile to point to your local file:
+```
+FROM ./tiny-aya-fire-tools.GGUF
+```
+Then:
+```bash
+ollama create tiny-aya-fire-tools -f tiny-aya-fire-tools.Modelfile
+ollama run tiny-aya-fire-tools
+```
+## Tool Calling
+The corrected template supports Ollama's native tool calling. Define tools in your API call and the model will respond with structured `<tool_call>` blocks.
+### Example (Python + Ollama)
+```python
+import ollama
+response = ollama.chat(
+    model='tiny-aya-fire-tools',
+    messages=[
+        {'role': 'user', 'content': 'What is the weather in Kampala?'}
+    ],
+    tools=[
+        {
+            'type': 'function',
+            'function': {
+                'name': 'get_weather',
+                'description': 'Get current weather for a location',
+                'parameters': {
+                    'type': 'object',
+                    'properties': {
+                        'location': {
+                            'type': 'string',
+                            'description': 'City name'
+                        }
+                    },
+                    'required': ['location']
+                }
+            }
+        }
+    ]
+)
+print(response['message'])
+```
+### Multilingual Tool Calling
+The model handles tool calls from prompts in 70+ languages. Examples:
+| Language | Prompt | Expected Tool Call |
+|----------|--------|--------------------|
+| English | "What's the weather in Nairobi?" | `get_weather(location="Nairobi")` |
+| Swahili | "Hali ya hewa Dar es Salaam ikoje?" | `get_weather(location="Dar es Salaam")` |
+| Luganda | "Embeera y'obudde mu Kampala eri etya?" | `get_weather(location="Kampala")` |
+## Model Details
+| Property | Value |
+|----------|-------|
+| Base Model | [CohereLabs/tiny-aya-fire](https://huggingface.co/CohereLabs/tiny-aya-fire-GGUF) |
+| Parameters | 3.35B |
+| Quantization | Q4_K_M |
+| File Size | ~2.0 GB |
+| Languages | 70+ (optimized for English, Swahili, Luganda) |
+| License | CC-BY-NC-4.0 (inherited from Tiny Aya) |
+## What's in This Repo
+- `tiny-aya-fire-tools.GGUF` — The quantized model weights (Q4_K_M)
+- `tiny-aya-fire-tools.Modelfile` — Corrected Ollama Modelfile with tool-calling template
+## The Corrected Template
+The key fix is using Tiny Aya's native chat format with proper token boundaries:
+```
+<|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|>...system prompt...<|END_OF_TURN_TOKEN|>
+<|START_OF_TURN_TOKEN|><|USER_TOKEN|>...user message...<|END_OF_TURN_TOKEN|>
+<|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|><|START_RESPONSE|>...response...<|END_RESPONSE|><|END_OF_TURN_TOKEN|>
+```
+Both `<|END_OF_TURN_TOKEN|>` and `<|END_RESPONSE|>` are registered as stop tokens, preventing leakage.
+Tool definitions are injected into the system prompt inside `<tools>...</tools>` tags, and the model is instructed to respond with `<tool_call>` blocks when appropriate.
+## Tiny Facade Project
+Tiny Facade is an open-source research project investigating whether Tiny Aya can serve as a **shared multilingual tool-calling service** on Android devices. Instead of every app bundling its own 2GB language model, Facade loads the model once and exposes a shared interface through Android's AIDL system.
+**Research Focus:**
+- Multilingual tool-calling accuracy (English, Swahili, Luganda)
+- Shared on-device inference architecture
+- LoRA fine-tuning for structured function-call generation
+**Authors:** Bronson Bakunga, Kato Steven Mubiru
+**Affiliation:** Crane AI Labs / Cohere Labs Community
+**Part of:** [Expedition Tiny Aya](https://huggingface.co/CohereLabs) (Cohere Labs)
+## All Variants
+| Variant | Description | Repo |
+|---------|-------------|------|
+| **Global** | Broadest language coverage | [Bronsn/tiny-aya-global-tools-GGUF](https://huggingface.co/Bronsn/tiny-aya-global-tools-GGUF) |
+| **Earth** | Optimized for African languages | [Bronsn/tiny-aya-earth-tools-GGUF](https://huggingface.co/Bronsn/tiny-aya-earth-tools-GGUF) |
+| **Fire** | Optimized for South/Southeast Asian languages | [Bronsn/tiny-aya-fire-tools-GGUF](https://huggingface.co/Bronsn/tiny-aya-fire-tools-GGUF) |
+| **Water** | Optimized for European languages | [Bronsn/tiny-aya-water-tools-GGUF](https://huggingface.co/Bronsn/tiny-aya-water-tools-GGUF) |
+## Citation
+If you use these models, please cite the original Tiny Aya work:
+```bibtex
+@article{cohere2026tinyaya,
+  title={Tiny Aya: Democratizing Multilingual AI for On-Device Use},
+  author={Cohere Labs},
+  year={2026}
+}
+```