hmunachii commited on
Commit
8a204f0
·
verified ·
1 Parent(s): 0a31fbc

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +121 -0
README.md CHANGED
@@ -0,0 +1,121 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ library_name: jax
4
+ tags:
5
+ - function-calling
6
+ - tool-use
7
+ - encoder-decoder
8
+ - edge
9
+ - on-device
10
+ - jax
11
+ - flax
12
+ datasets:
13
+ - Cactus-Compute/tool-calls
14
+ ---
15
+
16
+ # Needle
17
+
18
+ A 26M parameter encoder-decoder transformer for on-device function calling, built on a "Simple Attention Network" architecture (no feedforward layers).
19
+
20
+ Distilled from Gemini 3.1 Flash Lite. Runs at 6000 tok/s prefill and 1200 tok/s decode on [Cactus](https://github.com/cactus-compute/cactus).
21
+
22
+ ## Model Details
23
+
24
+ | | |
25
+ |---|---|
26
+ | Parameters | 26M |
27
+ | Architecture | Encoder-decoder, pure attention (no FFN) |
28
+ | Encoder | 12 layers, GQA (8H/4KV), RoPE, gated residuals |
29
+ | Decoder | 8 layers, self-attn + cross-attn, gated residuals |
30
+ | d_model | 512 |
31
+ | Vocab | 8192 (SentencePiece BPE) |
32
+ | Norm | ZCRMSNorm (zero-centered, init=0) |
33
+ | Precision | bfloat16 (INT4 QAT during training) |
34
+ | Pretraining | 200B tokens on 16x TPU v6e (27hrs) |
35
+ | Post-training | 2B tokens of function call data (45mins) |
36
+
37
+ ## Architecture
38
+
39
+ No feedforward layers. Each encoder block is gated self-attention; each decoder block is gated self-attention + gated cross-attention. The only nonlinearities are softmax and sigmoid.
40
+
41
+ See [Simple Attention Networks](https://github.com/cactus-compute/needle/blob/main/docs/simple_attention_networks.md) for the full architectural breakdown.
42
+
43
+ ## Quickstart
44
+
45
+ ```bash
46
+ git clone https://github.com/cactus-compute/needle.git
47
+ cd needle && source ./setup
48
+ needle ui
49
+ ```
50
+
51
+ Opens a web UI at http://127.0.0.1:7860 where you can test and finetune on your own tools. Weights are auto-downloaded.
52
+
53
+ ## Usage (Python)
54
+
55
+ ```python
56
+ from src.model.run import load_checkpoint, generate
57
+ from src.model.architecture import EncoderDecoderTransformer
58
+ from src.dataset.dataset import get_tokenizer
59
+
60
+ params, config = load_checkpoint("checkpoints/needle.pkl")
61
+ model = EncoderDecoderTransformer(config)
62
+ tokenizer = get_tokenizer()
63
+
64
+ result = generate(
65
+ model, params, tokenizer,
66
+ query="What's the weather in San Francisco?",
67
+ tools='[{"name":"get_weather","parameters":{"location":"string"}}]',
68
+ stream=False,
69
+ )
70
+ print(result)
71
+ # [{"name":"get_weather","arguments":{"location":"San Francisco"}}]
72
+ ```
73
+
74
+ ## Finetuning
75
+
76
+ Finetune on your own tools via the web UI or CLI:
77
+
78
+ ```bash
79
+ # Web UI (generates data via Gemini, trains, evaluates, bundles result)
80
+ needle ui
81
+
82
+ # CLI
83
+ python -m src.training.finetune data.jsonl --checkpoint checkpoints/needle.pkl
84
+ ```
85
+
86
+ ## File Format
87
+
88
+ The checkpoint is a Python pickle containing:
89
+
90
+ ```python
91
+ {
92
+ "params": { ... }, # nested dict of numpy float16 arrays
93
+ "config": { ... }, # TransformerConfig fields as dict
94
+ }
95
+ ```
96
+
97
+ Load with:
98
+ ```python
99
+ import pickle
100
+ with open("needle.pkl", "rb") as f:
101
+ data = pickle.load(f)
102
+ ```
103
+
104
+ ## Training Data
105
+
106
+ Post-trained on [Cactus-Compute/tool-calls](https://huggingface.co/datasets/Cactus-Compute/tool-calls), a synthesized dataset of 2M+ function calling examples spanning 15 tool categories (timers, messaging, media, navigation, smart home, fitness, etc.).
107
+
108
+ ## License
109
+
110
+ MIT
111
+
112
+ ## Citation
113
+
114
+ ```
115
+ @misc{ndubuaku2026needle,
116
+ title={Simple Attention Networks},
117
+ author={Henry Ndubuaku},
118
+ year={2026},
119
+ url={https://github.com/cactus-compute/needle/blob/main/docs/simple_attention_networks.md}
120
+ }
121
+ ```