maskcude hmunachii commited on
Commit
b151a48
Β·
0 Parent(s):

Duplicate from Cactus-Compute/needle

Browse files

Co-authored-by: Henry Ndubuaku <hmunachii@users.noreply.huggingface.co>

Files changed (6) hide show
  1. .gitattributes +35 -0
  2. README.md +139 -0
  3. config.json +5 -0
  4. needle.pkl +3 -0
  5. tokenizer/needle.model +3 -0
  6. tokenizer/needle.vocab +0 -0
.gitattributes ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,139 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ library_name: jax
4
+ tags:
5
+ - function-calling
6
+ - tool-use
7
+ - encoder-decoder
8
+ - edge
9
+ - on-device
10
+ - jax
11
+ - flax
12
+ ---
13
+
14
+ # Needle
15
+
16
+ We distilled Gemini 3.1 into a 26m parameter "[Simple Attention Network](docs/simple_attention_networks.md)" that you can even finetune locally on your Mac/PC.
17
+ In production, Needle runs on [Cactus](https://github.com/cactus-compute/cactus) at 6000 toks/sec prefill and 1200 decode speed.
18
+ Weights are fully open on [Cactus-Compute/needle](https://huggingface.co/Cactus-Compute/needle), as well as the dataset generation.
19
+
20
+ | | |
21
+ |---|---|
22
+ | Parameters | 26M |
23
+ | Architecture | Encoder-decoder, pure attention (no FFN) |
24
+ | Encoder | 12 layers, GQA (8H/4KV), RoPE, gated residuals |
25
+ | Decoder | 8 layers, self-attn + cross-attn, gated residuals |
26
+ | d_model | 512 |
27
+ | Vocab | 8192 (SentencePiece BPE) |
28
+ | Norm | ZCRMSNorm (zero-centered, init=0) |
29
+ | Precision | bfloat16 (INT4 QAT during training) |
30
+ | Pretraining | 200B tokens on 16x TPU v6e (27hrs) |
31
+ | Post-training | 2B tokens of function call data (45mins) |
32
+
33
+ ```
34
+ d=512, 8H/4KV, BPE=8192
35
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
36
+ β”‚ Tool Call β”‚
37
+ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
38
+ β”Œβ”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
39
+ β”‚ Softmax β”‚
40
+ β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜
41
+ β”Œβ”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”
42
+ β”‚ Linear (T)β”‚ <- tied
43
+ β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜
44
+ β”Œβ”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”
45
+ β”‚ ZCRMSNorm β”‚
46
+ β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜
47
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”
48
+ β”‚ Decoder x 8 β”‚
49
+ β”‚β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”β”‚
50
+ β”‚β”‚ ZCRMSNorm β”‚β”‚
51
+ β”‚β”‚ Masked Self β”‚β”‚
52
+ β”‚β”‚ Attn + RoPE β”‚β”‚
53
+ β”‚β”‚ Gated Residualβ”‚β”‚
54
+ β”‚β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”‚
55
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚β”‚ ZCRMSNorm β”‚β”‚
56
+ β”‚ Encoder x 12 │─────────────────────>Cross Attn β”‚β”‚
57
+ β”‚ β”‚ β”‚β”‚ Gated Residualβ”‚β”‚
58
+ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜β”‚
59
+ β”‚ β”‚ZCRMSNorm β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
60
+ β”‚ β”‚Self Attn β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”
61
+ β”‚ β”‚ GQA+RoPE β”‚ β”‚ β”‚ Embedding β”‚ <- shared
62
+ β”‚ β”‚Gated Res β”‚ β”‚ β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜
63
+ β”‚ β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”
64
+ β”‚ β”‚ (no FFN) β”‚ β”‚ β”‚[EOS]<tool_call>β”‚
65
+ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ + answer β”‚
66
+ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
67
+ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
68
+ β”‚
69
+ β”Œβ”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”
70
+ β”‚ Embedding β”‚
71
+ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
72
+ β”‚
73
+ β”Œβ”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”
74
+ β”‚ Text β”‚
75
+ β”‚ query β”‚
76
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
77
+ ```
78
+
79
+ ## Quickstart
80
+
81
+ ```bash
82
+ git clone https://github.com/cactus-compute/needle.git
83
+ cd needle && source ./setup
84
+ needle playground
85
+ ```
86
+
87
+ Opens a web UI at http://127.0.0.1:7860 where you can test and finetune on your own tools. Weights are auto-downloaded.
88
+
89
+ ## Usage (Python)
90
+
91
+ ```python
92
+ from needle import load_checkpoint, generate, SimpleAttentionNetwork, get_tokenizer
93
+
94
+ params, config = load_checkpoint("checkpoints/needle.pkl")
95
+ model = SimpleAttentionNetwork(config)
96
+ tokenizer = get_tokenizer()
97
+
98
+ result = generate(
99
+ model, params, tokenizer,
100
+ query="What's the weather in San Francisco?",
101
+ tools='[{"name":"get_weather","parameters":{"location":"string"}}]',
102
+ stream=False,
103
+ )
104
+ print(result)
105
+ # [{"name":"get_weather","arguments":{"location":"San Francisco"}}]
106
+ ```
107
+
108
+ ## Finetuning
109
+
110
+ Finetune on your own tools via the web UI or CLI:
111
+
112
+ ```bash
113
+ # Web UI (generates data via Gemini, trains, evaluates, bundles result)
114
+ needle playground
115
+
116
+ # CLI (auto-downloads weights if not local)
117
+ needle finetune data.jsonl
118
+ ```
119
+
120
+ ## Links
121
+
122
+ - [Needle](https://github.com/cactus-compute/needle) - training, finetuning, and inference code
123
+ - [Cactus](https://github.com/cactus-compute/cactus) - on-device runtime (6000 tok/s prefill, 1200 tok/s decode)
124
+ - [Simple Attention Networks](https://github.com/cactus-compute/needle/blob/main/docs/simple_attention_networks.md) - architecture details
125
+
126
+ ## License
127
+
128
+ MIT
129
+
130
+ ## Citation
131
+
132
+ ```
133
+ @misc{ndubuaku2026needle,
134
+ title={Needle},
135
+ author={Henry Ndubuaku and Jakub Mroz and Karen Mosoyan and Roman Shemet and Parkirat Sandhu and Satyajit Kumar and Noah Cylich and Justin H. Lee},
136
+ year={2026},
137
+ url={https://github.com/cactus-compute/needle}
138
+ }
139
+ ```
config.json ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ {
2
+ "library_name": "jax",
3
+ "model_type": "custom",
4
+ "architectures": ["SimpleAttentionNetwork"]
5
+ }
needle.pkl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:40a32e91d1d4197bf15ba559b74f6727c342dc8746918742fc7d8e2c1f18df40
3
+ size 52633098
tokenizer/needle.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0823f5b9133c68a8140addc5d7a425fa9119c4c8cb4a550363b4bffa4ba1c8c7
3
+ size 124960
tokenizer/needle.vocab ADDED
The diff for this file is too large to render. See raw diff