hmunachii commited on
Commit
cfa3b5a
Β·
verified Β·
1 Parent(s): 8a204f0

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +51 -16
README.md CHANGED
@@ -9,8 +9,6 @@ tags:
9
  - on-device
10
  - jax
11
  - flax
12
- datasets:
13
- - Cactus-Compute/tool-calls
14
  ---
15
 
16
  # Needle
@@ -19,8 +17,6 @@ A 26M parameter encoder-decoder transformer for on-device function calling, buil
19
 
20
  Distilled from Gemini 3.1 Flash Lite. Runs at 6000 tok/s prefill and 1200 tok/s decode on [Cactus](https://github.com/cactus-compute/cactus).
21
 
22
- ## Model Details
23
-
24
  | | |
25
  |---|---|
26
  | Parameters | 26M |
@@ -34,7 +30,51 @@ Distilled from Gemini 3.1 Flash Lite. Runs at 6000 tok/s prefill and 1200 tok/s
34
  | Pretraining | 200B tokens on 16x TPU v6e (27hrs) |
35
  | Post-training | 2B tokens of function call data (45mins) |
36
 
37
- ## Architecture
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
38
 
39
  No feedforward layers. Each encoder block is gated self-attention; each decoder block is gated self-attention + gated cross-attention. The only nonlinearities are softmax and sigmoid.
40
 
@@ -83,6 +123,12 @@ needle ui
83
  python -m src.training.finetune data.jsonl --checkpoint checkpoints/needle.pkl
84
  ```
85
 
 
 
 
 
 
 
86
  ## File Format
87
 
88
  The checkpoint is a Python pickle containing:
@@ -94,17 +140,6 @@ The checkpoint is a Python pickle containing:
94
  }
95
  ```
96
 
97
- Load with:
98
- ```python
99
- import pickle
100
- with open("needle.pkl", "rb") as f:
101
- data = pickle.load(f)
102
- ```
103
-
104
- ## Training Data
105
-
106
- Post-trained on [Cactus-Compute/tool-calls](https://huggingface.co/datasets/Cactus-Compute/tool-calls), a synthesized dataset of 2M+ function calling examples spanning 15 tool categories (timers, messaging, media, navigation, smart home, fitness, etc.).
107
-
108
  ## License
109
 
110
  MIT
 
9
  - on-device
10
  - jax
11
  - flax
 
 
12
  ---
13
 
14
  # Needle
 
17
 
18
  Distilled from Gemini 3.1 Flash Lite. Runs at 6000 tok/s prefill and 1200 tok/s decode on [Cactus](https://github.com/cactus-compute/cactus).
19
 
 
 
20
  | | |
21
  |---|---|
22
  | Parameters | 26M |
 
30
  | Pretraining | 200B tokens on 16x TPU v6e (27hrs) |
31
  | Post-training | 2B tokens of function call data (45mins) |
32
 
33
+ ```
34
+ d=512, 8H/4KV, BPE=8192
35
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
36
+ β”‚ Tool Call β”‚
37
+ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
38
+ β”Œβ”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
39
+ β”‚ Softmax β”‚
40
+ β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜
41
+ β”Œβ”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”
42
+ β”‚ Linear (T)β”‚ <- tied
43
+ β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜
44
+ β”Œβ”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”
45
+ β”‚ ZCRMSNorm β”‚
46
+ β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜
47
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”
48
+ β”‚ Decoder x 8 β”‚
49
+ β”‚β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”β”‚
50
+ β”‚β”‚ ZCRMSNorm β”‚β”‚
51
+ β”‚β”‚ Masked Self β”‚β”‚
52
+ β”‚β”‚ Attn + RoPE β”‚β”‚
53
+ β”‚β”‚ Gated Residualβ”‚β”‚
54
+ β”‚β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”‚
55
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚β”‚ ZCRMSNorm β”‚β”‚
56
+ β”‚ Encoder x 12 │─────────────────────>Cross Attn β”‚β”‚
57
+ β”‚ β”‚ β”‚β”‚ Gated Residualβ”‚β”‚
58
+ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜β”‚
59
+ β”‚ β”‚ZCRMSNorm β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
60
+ β”‚ β”‚Self Attn β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”
61
+ β”‚ β”‚ GQA+RoPE β”‚ β”‚ β”‚ Embedding β”‚ <- shared
62
+ β”‚ β”‚Gated Res β”‚ β”‚ β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜
63
+ β”‚ β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”
64
+ β”‚ β”‚ (no FFN) β”‚ β”‚ β”‚[EOS]<tool_call>β”‚
65
+ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ + answer β”‚
66
+ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
67
+ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
68
+ β”‚
69
+ β”Œβ”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”
70
+ β”‚ Embedding β”‚
71
+ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
72
+ β”‚
73
+ β”Œβ”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”
74
+ β”‚ Text β”‚
75
+ β”‚ query β”‚
76
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
77
+ ```
78
 
79
  No feedforward layers. Each encoder block is gated self-attention; each decoder block is gated self-attention + gated cross-attention. The only nonlinearities are softmax and sigmoid.
80
 
 
123
  python -m src.training.finetune data.jsonl --checkpoint checkpoints/needle.pkl
124
  ```
125
 
126
+ ## Links
127
+
128
+ - [Needle](https://github.com/cactus-compute/needle) - training, finetuning, and inference code
129
+ - [Cactus](https://github.com/cactus-compute/cactus) - on-device runtime (6000 tok/s prefill, 1200 tok/s decode)
130
+ - [Simple Attention Networks](https://github.com/cactus-compute/needle/blob/main/docs/simple_attention_networks.md) - architecture details
131
+
132
  ## File Format
133
 
134
  The checkpoint is a Python pickle containing:
 
140
  }
141
  ```
142
 
 
 
 
 
 
 
 
 
 
 
 
143
  ## License
144
 
145
  MIT