Bronsn commited on
Commit
48d0499
·
verified ·
1 Parent(s): dbd0762

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +171 -0
README.md ADDED
@@ -0,0 +1,171 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ language:
4
+ - en
5
+ - sw
6
+ - lg
7
+ - multilingual
8
+ tags:
9
+ - tiny-aya
10
+ - tool-calling
11
+ - function-calling
12
+ - multilingual
13
+ - on-device
14
+ - gguf
15
+ - ollama
16
+ - tiny-facade
17
+ - cohere
18
+ - african-languages
19
+ base_model: CohereLabs/tiny-aya-fire-GGUF
20
+ pipeline_tag: text-generation
21
+ library_name: llama.cpp
22
+ ---
23
+
24
+ # Tiny Aya Fire — Tool-Calling GGUF
25
+
26
+ **A corrected, tool-calling-ready GGUF of [CohereLabs/tiny-aya-fire](https://huggingface.co/CohereLabs/tiny-aya-fire-GGUF) for Ollama and llama.cpp.**
27
+
28
+ Part of the **[Tiny Facade](https://huggingface.co/collections/Bronsn/tiny-facade-multilingual-tool-calling-models)** collection — an open-source effort to bring reliable multilingual tool calling to on-device AI.
29
+
30
+ ## What This Fixes
31
+
32
+ The official Tiny Aya GGUFs on Ollama ship with the **wrong chat template** (Command-R's template instead of Tiny Aya's own). This causes:
33
+
34
+ - **End-token leakage** — `<|END_OF_TURN_TOKEN|>` and `<|END_RESPONSE|>` printed as visible text in responses
35
+ - **No tool-calling support** — the default template has no provisions for function calling
36
+ - **Broken conversation flow** — responses don't terminate cleanly
37
+
38
+ This GGUF ships with a **corrected Modelfile** that uses Tiny Aya's actual template, adds proper stop tokens, and injects structured tool-calling support.
39
+
40
+ ## Quick Start (Ollama)
41
+
42
+ ```bash
43
+ # Download the Modelfile
44
+ # Then create the model pointing to the GGUF
45
+ ollama create tiny-aya-fire-tools -f tiny-aya-fire-tools.Modelfile
46
+ ```
47
+
48
+ Or if you've downloaded the GGUF directly, update the `FROM` line in the Modelfile to point to your local file:
49
+
50
+ ```
51
+ FROM ./tiny-aya-fire-tools.GGUF
52
+ ```
53
+
54
+ Then:
55
+
56
+ ```bash
57
+ ollama create tiny-aya-fire-tools -f tiny-aya-fire-tools.Modelfile
58
+ ollama run tiny-aya-fire-tools
59
+ ```
60
+
61
+ ## Tool Calling
62
+
63
+ The corrected template supports Ollama's native tool calling. Define tools in your API call and the model will respond with structured `<tool_call>` blocks.
64
+
65
+ ### Example (Python + Ollama)
66
+
67
+ ```python
68
+ import ollama
69
+
70
+ response = ollama.chat(
71
+ model='tiny-aya-fire-tools',
72
+ messages=[
73
+ {'role': 'user', 'content': 'What is the weather in Kampala?'}
74
+ ],
75
+ tools=[
76
+ {
77
+ 'type': 'function',
78
+ 'function': {
79
+ 'name': 'get_weather',
80
+ 'description': 'Get current weather for a location',
81
+ 'parameters': {
82
+ 'type': 'object',
83
+ 'properties': {
84
+ 'location': {
85
+ 'type': 'string',
86
+ 'description': 'City name'
87
+ }
88
+ },
89
+ 'required': ['location']
90
+ }
91
+ }
92
+ }
93
+ ]
94
+ )
95
+
96
+ print(response['message'])
97
+ ```
98
+
99
+ ### Multilingual Tool Calling
100
+
101
+ The model handles tool calls from prompts in 70+ languages. Examples:
102
+
103
+ | Language | Prompt | Expected Tool Call |
104
+ |----------|--------|--------------------|
105
+ | English | "What's the weather in Nairobi?" | `get_weather(location="Nairobi")` |
106
+ | Swahili | "Hali ya hewa Dar es Salaam ikoje?" | `get_weather(location="Dar es Salaam")` |
107
+ | Luganda | "Embeera y'obudde mu Kampala eri etya?" | `get_weather(location="Kampala")` |
108
+
109
+ ## Model Details
110
+
111
+ | Property | Value |
112
+ |----------|-------|
113
+ | Base Model | [CohereLabs/tiny-aya-fire](https://huggingface.co/CohereLabs/tiny-aya-fire-GGUF) |
114
+ | Parameters | 3.35B |
115
+ | Quantization | Q4_K_M |
116
+ | File Size | ~2.0 GB |
117
+ | Languages | 70+ (optimized for English, Swahili, Luganda) |
118
+ | License | CC-BY-NC-4.0 (inherited from Tiny Aya) |
119
+
120
+ ## What's in This Repo
121
+
122
+ - `tiny-aya-fire-tools.GGUF` — The quantized model weights (Q4_K_M)
123
+ - `tiny-aya-fire-tools.Modelfile` — Corrected Ollama Modelfile with tool-calling template
124
+
125
+ ## The Corrected Template
126
+
127
+ The key fix is using Tiny Aya's native chat format with proper token boundaries:
128
+
129
+ ```
130
+ <|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|>...system prompt...<|END_OF_TURN_TOKEN|>
131
+ <|START_OF_TURN_TOKEN|><|USER_TOKEN|>...user message...<|END_OF_TURN_TOKEN|>
132
+ <|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|><|START_RESPONSE|>...response...<|END_RESPONSE|><|END_OF_TURN_TOKEN|>
133
+ ```
134
+
135
+ Both `<|END_OF_TURN_TOKEN|>` and `<|END_RESPONSE|>` are registered as stop tokens, preventing leakage.
136
+
137
+ Tool definitions are injected into the system prompt inside `<tools>...</tools>` tags, and the model is instructed to respond with `<tool_call>` blocks when appropriate.
138
+
139
+ ## Tiny Facade Project
140
+
141
+ Tiny Facade is an open-source research project investigating whether Tiny Aya can serve as a **shared multilingual tool-calling service** on Android devices. Instead of every app bundling its own 2GB language model, Facade loads the model once and exposes a shared interface through Android's AIDL system.
142
+
143
+ **Research Focus:**
144
+ - Multilingual tool-calling accuracy (English, Swahili, Luganda)
145
+ - Shared on-device inference architecture
146
+ - LoRA fine-tuning for structured function-call generation
147
+
148
+ **Authors:** Bronson Bakunga, Kato Steven Mubiru
149
+ **Affiliation:** Crane AI Labs / Cohere Labs Community
150
+ **Part of:** [Expedition Tiny Aya](https://huggingface.co/CohereLabs) (Cohere Labs)
151
+
152
+ ## All Variants
153
+
154
+ | Variant | Description | Repo |
155
+ |---------|-------------|------|
156
+ | **Global** | Broadest language coverage | [Bronsn/tiny-aya-global-tools-GGUF](https://huggingface.co/Bronsn/tiny-aya-global-tools-GGUF) |
157
+ | **Earth** | Optimized for African languages | [Bronsn/tiny-aya-earth-tools-GGUF](https://huggingface.co/Bronsn/tiny-aya-earth-tools-GGUF) |
158
+ | **Fire** | Optimized for South/Southeast Asian languages | [Bronsn/tiny-aya-fire-tools-GGUF](https://huggingface.co/Bronsn/tiny-aya-fire-tools-GGUF) |
159
+ | **Water** | Optimized for European languages | [Bronsn/tiny-aya-water-tools-GGUF](https://huggingface.co/Bronsn/tiny-aya-water-tools-GGUF) |
160
+
161
+ ## Citation
162
+
163
+ If you use these models, please cite the original Tiny Aya work:
164
+
165
+ ```bibtex
166
+ @article{cohere2026tinyaya,
167
+ title={Tiny Aya: Democratizing Multilingual AI for On-Device Use},
168
+ author={Cohere Labs},
169
+ year={2026}
170
+ }
171
+ ```