anneketh-vij commited on
Commit
7a66ed7
·
verified ·
1 Parent(s): 2e8aa0c

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +222 -0
README.md ADDED
@@ -0,0 +1,222 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ - es
6
+ - fr
7
+ - de
8
+ - it
9
+ - pt
10
+ - ru
11
+ - ar
12
+ - hi
13
+ - ko
14
+ - zh
15
+ library_name: transformers
16
+ base_model:
17
+ - arcee-ai/Trinity-Mini
18
+ base_model_relation: quantized
19
+ ---
20
+ <div align="center">
21
+ <picture>
22
+ <img
23
+ src="https://cdn-uploads.huggingface.co/production/uploads/6435718aaaef013d1aec3b8b/i-v1KyAMOW_mgVGeic9WJ.png"
24
+ alt="Arcee Trinity Mini"
25
+ style="max-width: 100%; height: auto;"
26
+ >
27
+ </picture>
28
+ </div>
29
+
30
+ # Trinity Mini W4A16
31
+
32
+ **This repository contains the W4A16 quantized weights of Trinity-Mini (INT4 weights, 16-bit activations).**
33
+
34
+ Trinity Mini is an Arcee AI 26B MoE model with 3B active parameters. It is the medium-sized model in our new Trinity family, a series of open-weight models for enterprise and tinkerers alike.
35
+
36
+ This model is tuned for reasoning, but in testing, it uses a similar total token count to competitive instruction-tuned models.
37
+
38
+ ***
39
+
40
+ Trinity Mini is trained on 10T tokens gathered and curated through a key partnership with [Datology](https://www.datologyai.com/), building upon the excellent dataset we used on [AFM-4.5B](https://huggingface.co/arcee-ai/AFM-4.5B) with additional math and code.
41
+
42
+ Training was performed on a cluster of 512 H200 GPUs powered by [Prime Intellect](https://www.primeintellect.ai/) using HSDP parallelism.
43
+
44
+ More details, including key architecture decisions, can be found on our blog [here](https://www.arcee.ai/blog/the-trinity-manifesto)
45
+
46
+ Try it out now at [chat.arcee.ai](http://chat.arcee.ai/)
47
+
48
+ ***
49
+
50
+ ## Model Details
51
+
52
+ * **Model Architecture:** AfmoeForCausalLM
53
+ * **Parameters:** 26B, 3B active
54
+ * **Experts:** 128 total, 8 active, 1 shared
55
+ * **Context length:** 128k
56
+ * **Training Tokens:** 10T
57
+ * **License:** [Apache 2.0](https://huggingface.co/arcee-ai/Trinity-Mini#license)
58
+ * **Recommended settings:**
59
+ * temperature: 0.15
60
+ * top_k: 50
61
+ * top_p: 0.75
62
+ * min_p: 0.06
63
+
64
+ ***
65
+
66
+ ## Quantization Details
67
+
68
+ - Scheme: `W4A16` (INT4 weights, 16-bit activations)
69
+ - Intended use: quality-preserving 4-bit deployment of Trinity-Mini
70
+
71
+
72
+ ## Benchmarks
73
+
74
+ ![](https://cdn-uploads.huggingface.co/production/uploads/6435718aaaef013d1aec3b8b/UMV0OZh_H1JfvgzBTXh6u.png)
75
+
76
+ <div align="center">
77
+ <picture>
78
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/6435718aaaef013d1aec3b8b/sSVjGNHfrJKmQ6w8I18ek.png" style="background-color:ghostwhite;padding:5px;" width="17%" alt="Powered by Datology">
79
+ </picture>
80
+ </div>
81
+
82
+ ### Running our model
83
+
84
+ - [Transformers](https://huggingface.co/arcee-ai/Trinity-Mini#transformers)
85
+ - [VLLM](https://huggingface.co/arcee-ai/Trinity-Mini#vllm)
86
+ - [llama.cpp](https://huggingface.co/arcee-ai/Trinity-Mini#llamacpp)
87
+ - [LM Studio](https://huggingface.co/arcee-ai/Trinity-Mini#lm-studio)
88
+ - [API](https://huggingface.co/arcee-ai/Trinity-Mini#api)
89
+
90
+ ## Transformers
91
+
92
+ Use the `main` transformers branch
93
+
94
+ ```
95
+ git clone https://github.com/huggingface/transformers.git
96
+ cd transformers
97
+
98
+ # pip
99
+ pip install '.[torch]'
100
+
101
+ # uv
102
+ uv pip install '.[torch]'
103
+ ```
104
+
105
+ ```python
106
+ from transformers import AutoTokenizer, AutoModelForCausalLM
107
+ import torch
108
+
109
+ model_id = "arcee-ai/Trinity-Mini"
110
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
111
+ model = AutoModelForCausalLM.from_pretrained(
112
+ model_id,
113
+ torch_dtype=torch.bfloat16,
114
+ device_map="auto"
115
+ )
116
+
117
+ messages = [
118
+ {"role": "user", "content": "Who are you?"},
119
+ ]
120
+
121
+ input_ids = tokenizer.apply_chat_template(
122
+ messages,
123
+ add_generation_prompt=True,
124
+ return_tensors="pt"
125
+ ).to(model.device)
126
+
127
+ outputs = model.generate(
128
+ input_ids,
129
+ max_new_tokens=256,
130
+ do_sample=True,
131
+ temperature=0.5,
132
+ top_k=50,
133
+ top_p=0.95
134
+ )
135
+
136
+ response = tokenizer.decode(outputs[0], skip_special_tokens=True)
137
+ print(response)
138
+ ```
139
+
140
+ If using a released transformers, simply pass "trust_remote_code=True":
141
+
142
+ ```python
143
+ model_id = "arcee-ai/Trinity-Mini"
144
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
145
+ model = AutoModelForCausalLM.from_pretrained(
146
+ model_id,
147
+ torch_dtype=torch.bfloat16,
148
+ device_map="auto",
149
+ trust_remote_code=True
150
+ )
151
+ ```
152
+
153
+ ## VLLM
154
+
155
+ Supported in VLLM release 0.11.1
156
+
157
+ ```
158
+ # pip
159
+ pip install "vllm>=0.11.1"
160
+ ```
161
+
162
+ Serving the model with suggested settings:
163
+
164
+ ```
165
+ vllm serve arcee-ai/Trinity-Mini \
166
+ --dtype bfloat16 \
167
+ --enable-auto-tool-choice \
168
+ --reasoning-parser deepseek_r1 \
169
+ --tool-call-parser hermes
170
+ ```
171
+
172
+ ## llama.cpp
173
+
174
+ Supported in llama.cpp release b7061
175
+
176
+ Download the latest [llama.cpp release](https://github.com/ggml-org/llama.cpp/releases)
177
+
178
+ ```
179
+ llama-server -hf arcee-ai/Trinity-Mini-GGUF:q4_k_m \
180
+ --temp 0.15 \
181
+ --top-k 50 \
182
+ --top-p 0.75
183
+ --min-p 0.06
184
+ ```
185
+
186
+ ## LM Studio
187
+
188
+ Supported in latest LM Studio runtime
189
+
190
+ Update to latest available, then verify your runtime by:
191
+
192
+ 1. Click "Power User" at the bottom left
193
+ 2. Click the green "Developer" icon at the top left
194
+ 3. Select "LM Runtimes" at the top
195
+ 4. Refresh the list of runtimes and verify that the latest is installed
196
+
197
+ Then, go to Model Search and search for `arcee-ai/Trinity-Mini-GGUF`, download your prefered size, and load it up in the chat
198
+
199
+ ## API
200
+
201
+ Trinity Mini is available today on openrouter:
202
+
203
+ https://openrouter.ai/arcee-ai/trinity-mini
204
+
205
+ ```
206
+ curl -X POST "https://openrouter.ai/v1/chat/completions" \
207
+ -H "Authorization: Bearer $OPENROUTER_API_KEY" \
208
+ -H "Content-Type: application/json" \
209
+ -d '{
210
+ "model": "arcee-ai/trinity-mini",
211
+ "messages": [
212
+ {
213
+ "role": "user",
214
+ "content": "What are some fun things to do in New York?"
215
+ }
216
+ ]
217
+ }'
218
+ ```
219
+
220
+ ## License
221
+
222
+ Trinity-Mini-W4A16 is released under the Apache-2.0 license.