annekethvij commited on
Commit
6d6f35d
Β·
verified Β·
1 Parent(s): 54dba0f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -13
README.md CHANGED
@@ -119,7 +119,7 @@ The model reasons internally before producing its response. When served via vLLM
119
  {
120
  "message": {
121
  "role": "assistant",
122
- "reasoning": "The user wants flight information. I need to determine the date for next Tuesday, search for flights SFO β†’ JFK, and filter by price < $300.",
123
  "content": "\n",
124
  "tool_calls": [{
125
  "function": {
@@ -133,11 +133,11 @@ The model reasons internally before producing its response. When served via vLLM
133
 
134
  ### Preserving reasoning in multi-turn conversations
135
 
136
- When building multi-turn agentic loops, you **must** pass the reasoning field back on assistant messages in subsequent requests. The chat template reads this field and re-wraps it in `<think>...</think>` tags during tokenization, maintaining the model's chain-of-thought across turns.
137
 
138
- **⚠️ Field name compatibility**: In vLLM OpenAI-compatible chat APIs, input compatibility for `reasoning_content` can vary by version, and some versions only honor `reasoning` ([related issue](https://github.com/vllm-project/vllm/issues/38488)). For maximum compatibility in multi-turn loops, send assistant reasoning back as `reasoning`. If your SDK exposes `reasoning_content` in responses, map it to `reasoning` when appending assistant turns.
139
 
140
- **What happens if reasoning is omitted entirely?** If the assistant message has no reasoning field at all (neither `reasoning` nor `reasoning_content`), or if `content` is `null`, the model can lose prior chain-of-thought context. On simple tasks this may work fine, but on complex multi-step agentic tasks, the model can produce malformed tool calls (e.g., tool call XML appearing inside the reasoning field instead of as structured `tool_calls`). For best results, always preserve the reasoning field and use `""` instead of `null` for content on tool-call turns.
141
 
142
  ## Training Configuration
143
 
@@ -176,9 +176,10 @@ vllm serve arcee-ai/Trinity-Large-Thinking \
176
  --enable-auto-tool-choice \
177
  --tool-call-parser qwen3_coder
178
  ```
 
179
 
180
  This configuration:
181
- - `--reasoning-parser deepseek_r1` β€” Parses `<think>...</think>` reasoning blocks and exposes them via the `reasoning` field in the API response
182
  - `--tool-call-parser qwen3_coder` β€” Parses structured tool calls from the model output into the OpenAI-compatible `tool_calls` array
183
 
184
 
@@ -261,10 +262,10 @@ while True:
261
  )
262
  msg = response.choices[0].message
263
 
264
- # Build assistant message β€” PRESERVE the reasoning field
265
  assistant_msg = {"role": "assistant", "content": msg.content}
266
  if msg.reasoning_content:
267
- assistant_msg["reasoning"] = msg.reasoning_content # ← critical for multi-turn
268
  if msg.tool_calls:
269
  assistant_msg["tool_calls"] = [
270
  {"id": tc.id, "type": "function", "function": {"name": tc.function.name, "arguments": tc.function.arguments}}
@@ -293,10 +294,10 @@ Expected output:
293
 
294
  The critical line is:
295
  ```python
296
- assistant_msg["reasoning"] = msg.reasoning_content # ← pass reasoning back as "reasoning"
297
  ```
298
 
299
- The OpenAI SDK exposes the field as `reasoning_content` on the response object, but vLLM 0.18+ expects `reasoning` on input messages. The chat template then re-wraps it in `<think>...</think>` tags automatically.
300
 
301
  ### Transformers
302
 
@@ -375,7 +376,7 @@ Trinity-Large-Thinking is optimized for deployment as the reasoning backbone of
375
 
376
  Trinity-Large-Thinking works as a drop-in brain for OpenClaw agents. Its native tool-calling format is compatible with OpenClaw's execution loop, and the extended reasoning enables reliable multi-step task completion β€” from email triage to code generation to meeting scheduling. Our 91.9% PinchBench score reflects real-world OpenClaw task performance.
377
 
378
- **Deploying for OpenClaw users**: OpenClaw preserves full assistant turns across steps. For vLLM compatibility in public deployments, ensure the assistant reasoning is forwarded on the next turn as `reasoning` (not only `reasoning_content`) and keep assistant `content` non-null (empty string is fine). If your SDK emits `reasoning_content`, add a small adapter at your gateway to map it to `reasoning` before sending requests to vLLM.
379
 
380
  ### Hermes Agent
381
 
@@ -386,13 +387,13 @@ Compatible with the Hermes Agent framework from Nous Research. Trinity-Large-Thi
386
  For custom implementations, the key integration pattern is:
387
 
388
  1. Send the user message with tool definitions
389
- 2. Receive the response with `reasoning` + `content` + `tool_calls`
390
  3. Execute the tool calls
391
- 4. Append the **full** assistant response (reasoning + content + tool calls) and tool results to the message history
392
  5. Send the updated history back for the next step
393
  6. Repeat until the model produces a final response without tool calls
394
 
395
- > **Important**: Step 4 must include the `reasoning` field on the assistant message. The chat template reads this field and re-wraps it in `<think>...</think>` tags during tokenization. Omitting it degrades multi-step performance β€” see [Preserving reasoning in multi-turn conversations](#preserving-reasoning-in-multi-turn-conversations) for details.
396
 
397
  ## License
398
 
 
119
  {
120
  "message": {
121
  "role": "assistant",
122
+ "reasoning_content": "The user wants flight information. I need to determine the date for next Tuesday, search for flights SFO β†’ JFK, and filter by price < $300.",
123
  "content": "\n",
124
  "tool_calls": [{
125
  "function": {
 
133
 
134
  ### Preserving reasoning in multi-turn conversations
135
 
136
+ When building multi-turn agentic loops, you **must** pass `reasoning_content` back on assistant messages in subsequent requests. The chat template reads this field and re-wraps it in `<think>...</think>` tags during tokenization, maintaining the model's chain-of-thought across turns.
137
 
138
+ **What happens if reasoning is omitted entirely?** The model can lose prior chain-of-thought context. On simple tasks this may work fine, but on complex multi-step agentic tasks, the model can produce malformed tool calls (e.g., tool call XML appearing inside the reasoning field instead of as structured `tool_calls`). For best results, always preserve `reasoning_content` and use `""` instead of `null` for content on tool-call turns.
139
 
140
+ For implementation details, pitfalls (`reasoning` vs `reasoning_content`), and Python/TypeScript examples, see [Reasoning Traces](https://docs.arcee.ai/capabilities/reasoning-traces).
141
 
142
  ## Training Configuration
143
 
 
176
  --enable-auto-tool-choice \
177
  --tool-call-parser qwen3_coder
178
  ```
179
+ **Recommended inference settings**: `temperature=0.45–0.6`, `top_p=0.95`, `top_k=50`
180
 
181
  This configuration:
182
+ - `--reasoning-parser deepseek_r1` β€” Parses `<think>...</think>` reasoning blocks and exposes them via the `reasoning_content` field in the API response
183
  - `--tool-call-parser qwen3_coder` β€” Parses structured tool calls from the model output into the OpenAI-compatible `tool_calls` array
184
 
185
 
 
262
  )
263
  msg = response.choices[0].message
264
 
265
+ # Build assistant message β€” PRESERVE reasoning_content
266
  assistant_msg = {"role": "assistant", "content": msg.content}
267
  if msg.reasoning_content:
268
+ assistant_msg["reasoning_content"] = msg.reasoning_content # ← critical for multi-turn
269
  if msg.tool_calls:
270
  assistant_msg["tool_calls"] = [
271
  {"id": tc.id, "type": "function", "function": {"name": tc.function.name, "arguments": tc.function.arguments}}
 
294
 
295
  The critical line is:
296
  ```python
297
+ assistant_msg["reasoning_content"] = msg.reasoning_content # ← pass reasoning_content back
298
  ```
299
 
300
+ The chat template re-wraps it in `<think>...</think>` tags automatically. See [Reasoning Traces](https://docs.arcee.ai/capabilities/reasoning-traces) for full details.
301
 
302
  ### Transformers
303
 
 
376
 
377
  Trinity-Large-Thinking works as a drop-in brain for OpenClaw agents. Its native tool-calling format is compatible with OpenClaw's execution loop, and the extended reasoning enables reliable multi-step task completion β€” from email triage to code generation to meeting scheduling. Our 91.9% PinchBench score reflects real-world OpenClaw task performance.
378
 
379
+ **Deploying for OpenClaw users**: OpenClaw preserves full assistant turns across steps. Ensure `reasoning_content` is forwarded on assistant messages in subsequent turns, and keep `content` non-null (empty string `""` is fine on tool-call turns). See [Reasoning Traces](https://docs.arcee.ai/capabilities/reasoning-traces) for full integration details.
380
 
381
  ### Hermes Agent
382
 
 
387
  For custom implementations, the key integration pattern is:
388
 
389
  1. Send the user message with tool definitions
390
+ 2. Receive the response with `reasoning_content` + `content` + `tool_calls`
391
  3. Execute the tool calls
392
+ 4. Append the **full** assistant response (reasoning_content + content + tool calls) and tool results to the message history
393
  5. Send the updated history back for the next step
394
  6. Repeat until the model produces a final response without tool calls
395
 
396
+ > **Important**: Step 4 must include `reasoning_content` on the assistant message. The chat template reads this field and re-wraps it in `<think>...</think>` tags during tokenization. Omitting it degrades multi-step performance β€” see [Reasoning Traces](https://docs.arcee.ai/capabilities/reasoning-traces) for full details.
397
 
398
  ## License
399