add python codes to code blocks

#2
.eval_results/gpqa.yaml DELETED
@@ -1,7 +0,0 @@
1
- - dataset:
2
- id: Idavidrein/gpqa
3
- task_id: diamond
4
- value: 76.3
5
- source:
6
- url: https://huggingface.co/arcee-ai/Trinity-Large-Thinking
7
- name: Model Card
 
 
 
 
 
 
 
 
.eval_results/mmlu-pro.yaml DELETED
@@ -1,7 +0,0 @@
1
- - dataset:
2
- id: TIGER-Lab/MMLU-Pro
3
- task_id: mmlu_pro
4
- value: 83.4
5
- source:
6
- url: https://huggingface.co/arcee-ai/Trinity-Large-Thinking
7
- name: Model Card
 
 
 
 
 
 
 
 
.eval_results/swe-bench_verified.yaml DELETED
@@ -1,7 +0,0 @@
1
- - dataset:
2
- id: SWE-bench/SWE-bench_Verified
3
- task_id: swe_bench_%_resolved
4
- value: 63.2
5
- source:
6
- url: https://huggingface.co/arcee-ai/Trinity-Large-Thinking
7
- name: Model Card
 
 
 
 
 
 
 
 
.gitattributes CHANGED
@@ -34,5 +34,3 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
  tokenizer.json filter=lfs diff=lfs merge=lfs -text
37
- All[[:space:]]charts.jpg filter=lfs diff=lfs merge=lfs -text
38
- All_charts.jpg filter=lfs diff=lfs merge=lfs -text
 
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
  tokenizer.json filter=lfs diff=lfs merge=lfs -text
 
 
All_charts.jpg DELETED

Git LFS Details

  • SHA256: 7780a4bc991ece46293e7ba4f5209f992efcc7052c7cd949e1676cb970d2007a
  • Pointer size: 131 Bytes
  • Size of remote file: 140 kB
README.md CHANGED
@@ -84,7 +84,6 @@ Trinity-Large-Thinking shares the same sparse MoE architecture as Trinity-Large-
84
  | Architecture | Sparse MoE (AfmoeForCausalLM) |
85
 
86
  ## Benchmarks
87
- ![Benchmark charts](https://huggingface.co/arcee-ai/Trinity-Large-Thinking/resolve/main/All_charts.jpg)
88
 
89
  | Benchmark | Trinity-Large-Thinking | Opus-4.6 | GLM-5 | MiniMax-M2.7 | Kimi-K2.5 |
90
  |---|---:|---:|---:|---:|---:|
@@ -107,37 +106,29 @@ Trinity-Large-Thinking produces reasoning traces inside `<think>...</think>` blo
107
  This means:
108
 
109
  1. **Multi-turn conversations**: When building chat applications, include the full assistant response (thinking + answer) in the conversation history for subsequent turns.
110
- 2. **Agentic loops**: When using Trinity-Large-Thinking as the backbone of an agent (OpenClaw, Hermes Agent, or custom), ensure your tool-calling loop preserves reasoning in the message history between steps.
111
  3. **Context window management**: The 512k extended context window accommodates long reasoning chains across many agentic steps. If you must truncate history, prefer removing older turns entirely rather than stripping thinking tokens from recent turns.
112
 
113
  ### How thinking works
114
 
115
- The model reasons internally before producing its response. When served via vLLM, the reasoning is separated into a dedicated field in the API response:
116
-
117
- ```json
118
- // API response structure
119
- {
120
- "message": {
121
- "role": "assistant",
122
- "reasoning_content": "The user wants flight information. I need to determine the date for next Tuesday, search for flights SFO → JFK, and filter by price < $300.",
123
- "content": "\n",
124
- "tool_calls": [{
125
- "function": {
126
- "name": "search_flights",
127
- "arguments": "{\"origin\": \"SFO\", \"destination\": \"JFK\", \"date\": \"2026-04-07\", \"max_price\": 300}"
 
128
  }
129
- }]
130
- }
131
- }
132
- ```
133
-
134
- ### Preserving reasoning in multi-turn conversations
135
-
136
- When building multi-turn agentic loops, you **must** pass `reasoning_content` back on assistant messages in subsequent requests. The chat template reads this field and re-wraps it in `<think>...</think>` tags during tokenization, maintaining the model's chain-of-thought across turns.
137
 
138
- **What happens if reasoning is omitted entirely?** The model can lose prior chain-of-thought context. On simple tasks this may work fine, but on complex multi-step agentic tasks, the model can produce malformed tool calls (e.g., tool call XML appearing inside the reasoning field instead of as structured `tool_calls`). For best results, always preserve `reasoning_content` and use `""` instead of `null` for content on tool-call turns.
139
-
140
- For implementation details, pitfalls (`reasoning` vs `reasoning_content`), and Python/TypeScript examples, see [Reasoning Traces](https://docs.arcee.ai/capabilities/reasoning-traces).
141
 
142
  ## Training Configuration
143
 
@@ -169,21 +160,18 @@ For implementation details, pitfalls (`reasoning` vs `reasoning_content`), and P
169
 
170
  Supported in vLLM 0.11.1+. For agentic use with both reasoning and tool calling:
171
 
172
- ```bash
173
- vllm serve arcee-ai/Trinity-Large-Thinking \
174
- --dtype bfloat16 \
175
- --reasoning-parser deepseek_r1 \
176
- --enable-auto-tool-choice \
177
- --tool-call-parser qwen3_coder
178
- ```
179
- **Recommended inference settings**: `temperature=0.45–0.6`, `top_p=0.95`, `top_k=50`
180
 
181
  This configuration:
182
  - `--reasoning-parser deepseek_r1` — Parses `<think>...</think>` reasoning blocks and exposes them via the `reasoning_content` field in the API response
183
  - `--tool-call-parser qwen3_coder` — Parses structured tool calls from the model output into the OpenAI-compatible `tool_calls` array
184
 
185
-
186
- #### Single-turn example
187
 
188
  ```python
189
  from openai import OpenAI
@@ -195,18 +183,22 @@ response = client.chat.completions.create(
195
  messages=[
196
  {"role": "user", "content": "What's the weather like in Paris?"}
197
  ],
198
- tools=[{
199
- "type": "function",
200
- "function": {
201
- "name": "get_weather",
202
- "description": "Get current weather for a location",
203
- "parameters": {
204
- "type": "object",
205
- "properties": {"location": {"type": "string"}},
206
- "required": ["location"]
 
 
 
 
207
  }
208
  }
209
- }],
210
  )
211
 
212
  # Access reasoning (thinking) content
@@ -217,87 +209,7 @@ content = response.choices[0].message.content
217
  tool_calls = response.choices[0].message.tool_calls
218
  ```
219
 
220
- #### Multi-turn agentic loop example
221
-
222
- The key pattern: after each turn, append the **full** assistant response (including reasoning) back to the message history, then append tool results, and send the updated history for the next turn.
223
-
224
- ```python
225
- import json
226
- from openai import OpenAI
227
-
228
- client = OpenAI(api_key="EMPTY", base_url="http://localhost:8000/v1")
229
- MODEL = "arcee-ai/Trinity-Large-Thinking"
230
-
231
- tools = [
232
- {"type": "function", "function": {
233
- "name": "get_customer_by_email",
234
- "description": "Look up a customer by email.",
235
- "parameters": {"type": "object", "properties": {"email": {"type": "string"}}, "required": ["email"]}
236
- }},
237
- {"type": "function", "function": {
238
- "name": "cancel_subscription",
239
- "description": "Cancel a subscription. Requires customer_id.",
240
- "parameters": {"type": "object", "properties": {"customer_id": {"type": "string"}, "reason": {"type": "string"}}, "required": ["customer_id"]}
241
- }}
242
- ]
243
-
244
- def execute_tool(name, arguments):
245
- """Simulate tool execution — replace with real implementations."""
246
- args = json.loads(arguments)
247
- if name == "get_customer_by_email":
248
- return json.dumps({"customer_id": "C2001", "name": "Jane Doe", "plan": "Premium", "status": "active"})
249
- elif name == "cancel_subscription":
250
- return json.dumps({"success": True, "message": f"Subscription cancelled for {args['customer_id']}"})
251
-
252
- messages = [
253
- {"role": "system", "content": "You are a helpful customer service agent."},
254
- {"role": "user", "content": "I want to cancel my subscription. My email is jane@example.com"}
255
- ]
256
-
257
- # Agent loop
258
- while True:
259
- response = client.chat.completions.create(
260
- model=MODEL, messages=messages, tools=tools,
261
- tool_choice="auto", temperature=0, max_tokens=1000
262
- )
263
- msg = response.choices[0].message
264
-
265
- # Build assistant message — PRESERVE reasoning_content
266
- assistant_msg = {"role": "assistant", "content": msg.content}
267
- if msg.reasoning_content:
268
- assistant_msg["reasoning_content"] = msg.reasoning_content # ← critical for multi-turn
269
- if msg.tool_calls:
270
- assistant_msg["tool_calls"] = [
271
- {"id": tc.id, "type": "function", "function": {"name": tc.function.name, "arguments": tc.function.arguments}}
272
- for tc in msg.tool_calls
273
- ]
274
- messages.append(assistant_msg)
275
-
276
- # If no tool calls, model gave its final response — done
277
- if not msg.tool_calls:
278
- print(f"Final response: {msg.content}")
279
- break
280
-
281
- # Execute tool calls and append results
282
- for tc in msg.tool_calls:
283
- result = execute_tool(tc.function.name, tc.function.arguments)
284
- print(f" Tool: {tc.function.name}({tc.function.arguments}) → {result}")
285
- messages.append({"role": "tool", "tool_call_id": tc.id, "content": result})
286
- ```
287
-
288
- Expected output:
289
- ```
290
- Tool: get_customer_by_email({"email": "jane@example.com"}) → {"customer_id": "C2001", ...}
291
- Tool: cancel_subscription({"customer_id": "C2001", ...}) → {"success": true, ...}
292
- Final response: Your subscription has been cancelled successfully.
293
- ```
294
-
295
- The critical line is:
296
- ```python
297
- assistant_msg["reasoning_content"] = msg.reasoning_content # ← pass reasoning_content back
298
- ```
299
-
300
- The chat template re-wraps it in `<think>...</think>` tags automatically. See [Reasoning Traces](https://docs.arcee.ai/capabilities/reasoning-traces) for full details.
301
 
302
  ### Transformers
303
 
@@ -341,32 +253,20 @@ print(response)
341
 
342
  ### API
343
 
344
- #### OpenRouter
345
-
346
- Available on [OpenRouter](https://openrouter.ai/) with full reasoning and tool calling support:
347
-
348
- ```bash
349
- curl -X POST "https://openrouter.ai/v1/chat/completions" \
350
- -H "Authorization: Bearer $OPENROUTER_API_KEY" \
351
- -H "Content-Type: application/json" \
352
- -d '{
353
- "model": "arcee-ai/trinity-large-thinking",
354
- "messages": [
355
- {
356
- "role": "user",
357
- "content": "What are some fun things to do in New York?"
358
- }
359
- ]
360
- }'
361
- ```
362
-
363
- **Multi-turn with OpenRouter**: OpenRouter returns reasoning in a `reasoning_details` object (their unified reasoning shape). For multi-turn conversations, pass `reasoning_details` back as-is on assistant messages in subsequent requests — OpenRouter handles model-specific upstream translation (for Trinity, this is sent as `reasoning_content` on assistant turns upstream). For debugging, enable echo to inspect the upstream API call:
364
-
365
- ```json
366
- {"debug": {"echo_upstream_body": true}}
367
- ```
368
-
369
- See [OpenRouter debugging docs](https://openrouter.ai/docs/api/reference/errors-and-debugging#debugging) for details.
370
 
371
  ## Agentic Use Cases
372
 
@@ -376,8 +276,6 @@ Trinity-Large-Thinking is optimized for deployment as the reasoning backbone of
376
 
377
  Trinity-Large-Thinking works as a drop-in brain for OpenClaw agents. Its native tool-calling format is compatible with OpenClaw's execution loop, and the extended reasoning enables reliable multi-step task completion — from email triage to code generation to meeting scheduling. Our 91.9% PinchBench score reflects real-world OpenClaw task performance.
378
 
379
- **Deploying for OpenClaw users**: OpenClaw preserves full assistant turns across steps. Ensure `reasoning_content` is forwarded on assistant messages in subsequent turns, and keep `content` non-null (empty string `""` is fine on tool-call turns). See [Reasoning Traces](https://docs.arcee.ai/capabilities/reasoning-traces) for full integration details.
380
-
381
  ### Hermes Agent
382
 
383
  Compatible with the Hermes Agent framework from Nous Research. Trinity-Large-Thinking's reasoning traces pair naturally with Hermes's skill-learning loop — the model's explicit chain-of-thought makes skill extraction more reliable, and its strong tool-calling capabilities integrate directly via the Hermes tool-use protocol.
@@ -387,14 +285,12 @@ Compatible with the Hermes Agent framework from Nous Research. Trinity-Large-Thi
387
  For custom implementations, the key integration pattern is:
388
 
389
  1. Send the user message with tool definitions
390
- 2. Receive the response with `reasoning_content` + `content` + `tool_calls`
391
  3. Execute the tool calls
392
- 4. Append the **full** assistant response (reasoning_content + content + tool calls) and tool results to the message history
393
  5. Send the updated history back for the next step
394
  6. Repeat until the model produces a final response without tool calls
395
 
396
- > **Important**: Step 4 must include `reasoning_content` on the assistant message. The chat template reads this field and re-wraps it in `<think>...</think>` tags during tokenization. Omitting it degrades multi-step performance — see [Reasoning Traces](https://docs.arcee.ai/capabilities/reasoning-traces) for full details.
397
-
398
  ## License
399
 
400
  Trinity-Large-Thinking is released under the Apache License, Version 2.0.
@@ -403,15 +299,13 @@ Trinity-Large-Thinking is released under the Apache License, Version 2.0.
403
 
404
  If you use this model, please cite:
405
 
406
- ```bibtex
407
- @misc{singh2026arceetrinity,
408
- title = {Arcee Trinity Large Technical Report},
409
- author = {Varun Singh and Lucas Krauss and Sami Jaghouar and Matej Sirovatka and Charles Goddard and Fares Obied and Jack Min Ong and Jannik Straube and Fern and Aria Harley and Conner Stewart and Colin Kealty and Maziyar Panahi and Simon Kirsten and Anushka Deshpande and Anneketh Vij and Arthur Bresnu and Pranav Veldurthi and Raghav Ravishankar and Hardik Bishnoi and DatologyAI Team and Arcee AI Team and Prime Intellect Team and Mark McQuade and Johannes Hagemann and Lucas Atkins},
410
- year = {2026},
411
- eprint = {2602.17004},
412
- archivePrefix= {arXiv},
413
- primaryClass = {cs.LG},
414
- doi = {10.48550/arXiv.2602.17004},
415
- url = {https://arxiv.org/abs/2602.17004}
416
- }
417
- ```
 
84
  | Architecture | Sparse MoE (AfmoeForCausalLM) |
85
 
86
  ## Benchmarks
 
87
 
88
  | Benchmark | Trinity-Large-Thinking | Opus-4.6 | GLM-5 | MiniMax-M2.7 | Kimi-K2.5 |
89
  |---|---:|---:|---:|---:|---:|
 
106
  This means:
107
 
108
  1. **Multi-turn conversations**: When building chat applications, include the full assistant response (thinking + answer) in the conversation history for subsequent turns.
109
+ 2. **Agentic loops**: When using Trinity-Large-Thinking as the backbone of an agent (OpenClaw, Hermes Agent, or custom), ensure your tool-calling loop preserves `<think>` blocks in the message history between steps.
110
  3. **Context window management**: The 512k extended context window accommodates long reasoning chains across many agentic steps. If you must truncate history, prefer removing older turns entirely rather than stripping thinking tokens from recent turns.
111
 
112
  ### How thinking works
113
 
114
+ The model reasons internally before producing its response. When served via vLLM, the reasoning is separated into a dedicated `reasoning_content` field in the API response:
115
+
116
+ // API response structure
117
+ {
118
+ "message": {
119
+ "role": "assistant",
120
+ "reasoning_content": "The user wants flight information. I need to determine the date for next Tuesday, search for flights SFO → JFK, and filter by price < $300.",
121
+ "content": "\n",
122
+ "tool_calls": [{
123
+ "function": {
124
+ "name": "search_flights",
125
+ "arguments": "{\"origin\": \"SFO\", \"destination\": \"JFK\", \"date\": \"2026-04-07\", \"max_price\": 300}"
126
+ }
127
+ }]
128
  }
129
+ }
 
 
 
 
 
 
 
130
 
131
+ When building multi-turn agentic loops, include the `reasoning_content` back in the conversation history (re-wrapped in `<think>...</think>` tags within the assistant message) so the model retains its prior reasoning chain.
 
 
132
 
133
  ## Training Configuration
134
 
 
160
 
161
  Supported in vLLM 0.11.1+. For agentic use with both reasoning and tool calling:
162
 
163
+ vllm serve arcee-ai/Trinity-Large-Thinking \
164
+ --dtype bfloat16 \
165
+ --enable-reasoning \
166
+ --reasoning-parser deepseek_r1 \
167
+ --enable-auto-tool-choice \
168
+ --tool-call-parser qwen3_coder
 
 
169
 
170
  This configuration:
171
  - `--reasoning-parser deepseek_r1` — Parses `<think>...</think>` reasoning blocks and exposes them via the `reasoning_content` field in the API response
172
  - `--tool-call-parser qwen3_coder` — Parses structured tool calls from the model output into the OpenAI-compatible `tool_calls` array
173
 
174
+ **Extracting reasoning content from the API response:**
 
175
 
176
  ```python
177
  from openai import OpenAI
 
183
  messages=[
184
  {"role": "user", "content": "What's the weather like in Paris?"}
185
  ],
186
+ tools=[ # your tool definitions here
187
+ {
188
+ "type": "function",
189
+ "function": {
190
+ "name": "get_weather",
191
+ "description": "Get current weather for a location",
192
+ "parameters": {
193
+ "type": "object",
194
+ "properties": {
195
+ "location": {"type": "string"}
196
+ },
197
+ "required": ["location"]
198
+ }
199
  }
200
  }
201
+ ],
202
  )
203
 
204
  # Access reasoning (thinking) content
 
209
  tool_calls = response.choices[0].message.tool_calls
210
  ```
211
 
212
+ **Note on thinking-in-context with vLLM**: When building multi-turn agentic loops, include both `reasoning_content` and `content` in the conversation history you send back to the model. The reasoning content should be re-wrapped in `<think>...</think>` tags within the assistant message.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
213
 
214
  ### Transformers
215
 
 
253
 
254
  ### API
255
 
256
+ Available on OpenRouter:
257
+
258
+ curl -X POST "https://openrouter.ai/v1/chat/completions" \
259
+ -H "Authorization: Bearer $OPENROUTER_API_KEY" \
260
+ -H "Content-Type: application/json" \
261
+ -d '{
262
+ "model": "arcee-ai/trinity-large-thinking",
263
+ "messages": [
264
+ {
265
+ "role": "user",
266
+ "content": "What are some fun things to do in New York?"
267
+ }
268
+ ]
269
+ }'
 
 
 
 
 
 
 
 
 
 
 
 
270
 
271
  ## Agentic Use Cases
272
 
 
276
 
277
  Trinity-Large-Thinking works as a drop-in brain for OpenClaw agents. Its native tool-calling format is compatible with OpenClaw's execution loop, and the extended reasoning enables reliable multi-step task completion — from email triage to code generation to meeting scheduling. Our 91.9% PinchBench score reflects real-world OpenClaw task performance.
278
 
 
 
279
  ### Hermes Agent
280
 
281
  Compatible with the Hermes Agent framework from Nous Research. Trinity-Large-Thinking's reasoning traces pair naturally with Hermes's skill-learning loop — the model's explicit chain-of-thought makes skill extraction more reliable, and its strong tool-calling capabilities integrate directly via the Hermes tool-use protocol.
 
285
  For custom implementations, the key integration pattern is:
286
 
287
  1. Send the user message with tool definitions
288
+ 2. Receive the response with `<think>` reasoning + tool calls
289
  3. Execute the tool calls
290
+ 4. Append the **full** assistant response (thinking + content + tool calls) and tool results to the message history
291
  5. Send the updated history back for the next step
292
  6. Repeat until the model produces a final response without tool calls
293
 
 
 
294
  ## License
295
 
296
  Trinity-Large-Thinking is released under the Apache License, Version 2.0.
 
299
 
300
  If you use this model, please cite:
301
 
302
+ @misc{singh2026arceetrinity,
303
+ title = {Arcee Trinity Large Technical Report},
304
+ author = {Varun Singh and Lucas Krauss and Sami Jaghouar and Matej Sirovatka and Charles Goddard and Fares Obied and Jack Min Ong and Jannik Straube and Fern and Aria Harley and Conner Stewart and Colin Kealty and Maziyar Panahi and Simon Kirsten and Anushka Deshpande and Anneketh Vij and Arthur Bresnu and Pranav Veldurthi and Raghav Ravishankar and Hardik Bishnoi and DatologyAI Team and Arcee AI Team and Prime Intellect Team and Mark McQuade and Johannes Hagemann and Lucas Atkins},
305
+ year = {2026},
306
+ eprint = {2602.17004},
307
+ archivePrefix= {arXiv},
308
+ primaryClass = {cs.LG},
309
+ doi = {10.48550/arXiv.2602.17004},
310
+ url = {https://arxiv.org/abs/2602.17004}
311
+ }