license: mit
base_model: Cactus-Compute/needle
library_name: jax
pipeline_tag: text-generation
language:
- fr
- en
tags:
- function-calling
- tool-use
- transit
- encoder-decoder
- edge
- on-device
- jax
- flax
Needle Transit
A 26M-parameter fine-tune of Cactus-Compute/needle
specialized for transit tool-calling. It translates a natural-language transit query into
one of two tool calls — or refuses when no tool applies.
The model is an extractor: it lifts the relevant slots from the query. Canonical resolution (station disambiguation, line lookup, routing) is the backend's job.
| Parameters | 26M |
| Base model | Cactus-Compute/needle |
| Architecture | Encoder-decoder Simple Attention Network (d_model=512, enc×12, dec×8, GQA 8H/4KV) |
| Vocab | 8192 (SentencePiece BPE) |
| Format | JAX/Flax checkpoint (.pkl) |
| Languages | French, English, FR/EN code-switching |
| License | MIT |
Supported tools
[
{
"name": "search_itinerary",
"description": "Plan a route between two points.",
"parameters": {
"origin": {"type": "string", "description": "Start location.", "required": true},
"destination": {"type": "string", "description": "End location.", "required": true},
"time_human": {"type": "string", "description": "Human time expression, e.g. 'a 14h'.", "required": false},
"time_mode": {"type": "string", "description": "'depart_at' or 'arrive_by'.", "required": false}
}
},
{
"name": "get_next_arrivals",
"description": "Next departures/arrivals at a stop.",
"parameters": {
"station": {"type": "string", "description": "Stop or station name.", "required": true},
"line": {"type": "string", "description": "Line identifier, e.g. '1', 'RER A'.", "required": false}
}
}
]
Capabilities
- Two-tool routing — picks
search_itineraryvsget_next_arrivalsfrom intent. - Refusal — emits no tool call for off-topic / under-specified / ambiguous queries.
- Robustness — handles typos, SMS-style compression, colloquial French, and FR/EN code-switching.
- Time handling —
depart_atvsarrive_bydisambiguation from natural phrasing.
Examples
Example query → tool call (verified on this checkpoint):
| Query | Output |
|---|---|
Itinéraire de Bastille à Nation |
[{"name":"search_itinerary","arguments":{"origin":"Bastille","destination":"Nation"}}] |
De Issy à Charles-de-Gaulle, départ 14h |
[{"name":"search_itinerary","arguments":{"origin":"Issy","destination":"Charles-de-Gaulle","time_human":"départ 14h","time_mode":"depart_at"}}] |
How do I get from Gare du Nord to La Défense? |
[{"name":"search_itinerary","arguments":{"origin":"Gare du Nord","destination":"La Défense"}}] |
Prochain métro à Bastille ligne 1 ? |
[{"name":"get_next_arrivals","arguments":{"station":"Bastille","line":"1"}}] |
prochains passages à Châtelet |
[{"name":"get_next_arrivals","arguments":{"station":"Châtelet"}}] |
cmt aller a chatelet depuis nation |
[{"name":"search_itinerary","arguments":{"origin":"nation","destination":"chatelet"}}] |
Quel temps fait-il ? |
[] |
The model is an extractor — it returns slot values verbatim from the query (note the lowercased
nation/chatelet above); canonical resolution is the backend's job.
Results
Dataset and evaluation: coming soon. The training dataset and a held-out real-OOD evaluation suite will be released shortly.
Usage
Install the official needle package (Cactus Compute), then:
from needle import SimpleAttentionNetwork, load_checkpoint, generate, get_tokenizer
params, config = load_checkpoint("needle-transit.pkl")
model = SimpleAttentionNetwork(config)
tokenizer = get_tokenizer()
result = generate(
model, params, tokenizer,
query="Prochain métro à Bastille ligne 1 ?",
tools='[{"name":"get_next_arrivals","description":"Next departures at a stop.","parameters":{"station":{"type":"string","description":"Stop name.","required":true},"line":{"type":"string","description":"Line id.","required":false}}}]',
stream=False,
)
print(result)
# [{"name":"get_next_arrivals","arguments":{"station":"Bastille","line":"1"}}]
Finetuning
Adapt the model to your own tools with the customized finetuning scripts in
github.com/Rasaboun/needle-transit — tunable
LR/Muon-LR, per-field loss weighting (--w-name/--w-value/--w-key), and metrics logging:
needle finetune data.jsonl --lr 3e-5 --w-value 4.0
Limitations
- Single-call only — emits at most one tool call per query.
- Domain-specific — tuned for transit tool-calling; off-domain tools are out of scope.
- Extractor, not a resolver — returns surface slots; the backend resolves canonical names.
- Small model — can be finicky on adversarial phrasing; finetune on your own data to adapt.
License & attribution
MIT. Fine-tuned from Cactus-Compute/needle (© Cactus Compute, MIT). See the base model card for architecture details.
Citation
@misc{ndubuaku2026needle,
title={Needle},
author={Henry Ndubuaku and Jakub Mroz and Karen Mosoyan and Roman Shemet and Parkirat Sandhu and Satyajit Kumar and Noah Cylich and Justin H. Lee},
year={2026},
url={https://github.com/cactus-compute/needle}
}