arcee-ai
/

Trinity-Large-Thinking

@@ -119,7 +119,7 @@ The model reasons internally before producing its response. When served via vLLM
 {
   "message": {
     "role": "assistant",
-    "reasoning": "The user wants flight information. I need to determine the date for next Tuesday, search for flights SFO → JFK, and filter by price < $300.",
     "content": "\n",
     "tool_calls": [{
       "function": {
@@ -133,11 +133,11 @@ The model reasons internally before producing its response. When served via vLLM
 ### Preserving reasoning in multi-turn conversations
-When building multi-turn agentic loops, you **must** pass the reasoning field back on assistant messages in subsequent requests. The chat template reads this field and re-wraps it in `<think>...</think>` tags during tokenization, maintaining the model's chain-of-thought across turns.
-**⚠️ Field name compatibility**: In vLLM OpenAI-compatible chat APIs, input compatibility for `reasoning_content` can vary by version, and some versions only honor `reasoning` ([related issue](https://github.com/vllm-project/vllm/issues/38488)). For maximum compatibility in multi-turn loops, send assistant reasoning back as `reasoning`. If your SDK exposes `reasoning_content` in responses, map it to `reasoning` when appending assistant turns.
-**What happens if reasoning is omitted entirely?** If the assistant message has no reasoning field at all (neither `reasoning` nor `reasoning_content`), or if `content` is `null`, the model can lose prior chain-of-thought context. On simple tasks this may work fine, but on complex multi-step agentic tasks, the model can produce malformed tool calls (e.g., tool call XML appearing inside the reasoning field instead of as structured `tool_calls`). For best results, always preserve the reasoning field and use `""` instead of `null` for content on tool-call turns.
 ## Training Configuration
@@ -176,9 +176,10 @@ vllm serve arcee-ai/Trinity-Large-Thinking \
   --enable-auto-tool-choice \
   --tool-call-parser qwen3_coder
 ```
 This configuration:
-- `--reasoning-parser deepseek_r1` — Parses `<think>...</think>` reasoning blocks and exposes them via the `reasoning` field in the API response
 - `--tool-call-parser qwen3_coder` — Parses structured tool calls from the model output into the OpenAI-compatible `tool_calls` array
@@ -261,10 +262,10 @@ while True:
     )
     msg = response.choices[0].message
-    # Build assistant message — PRESERVE the reasoning field
     assistant_msg = {"role": "assistant", "content": msg.content}
     if msg.reasoning_content:
-        assistant_msg["reasoning"] = msg.reasoning_content  # ← critical for multi-turn
     if msg.tool_calls:
         assistant_msg["tool_calls"] = [
             {"id": tc.id, "type": "function", "function": {"name": tc.function.name, "arguments": tc.function.arguments}}
@@ -293,10 +294,10 @@ Expected output:
 The critical line is:
 ```python
-assistant_msg["reasoning"] = msg.reasoning_content  # ← pass reasoning back as "reasoning"
 ```
-The OpenAI SDK exposes the field as `reasoning_content` on the response object, but vLLM 0.18+ expects `reasoning` on input messages. The chat template then re-wraps it in `<think>...</think>` tags automatically.
 ### Transformers
@@ -375,7 +376,7 @@ Trinity-Large-Thinking is optimized for deployment as the reasoning backbone of
 Trinity-Large-Thinking works as a drop-in brain for OpenClaw agents. Its native tool-calling format is compatible with OpenClaw's execution loop, and the extended reasoning enables reliable multi-step task completion — from email triage to code generation to meeting scheduling. Our 91.9% PinchBench score reflects real-world OpenClaw task performance.
-**Deploying for OpenClaw users**: OpenClaw preserves full assistant turns across steps. For vLLM compatibility in public deployments, ensure the assistant reasoning is forwarded on the next turn as `reasoning` (not only `reasoning_content`) and keep assistant `content` non-null (empty string is fine). If your SDK emits `reasoning_content`, add a small adapter at your gateway to map it to `reasoning` before sending requests to vLLM.
 ### Hermes Agent
@@ -386,13 +387,13 @@ Compatible with the Hermes Agent framework from Nous Research. Trinity-Large-Thi
 For custom implementations, the key integration pattern is:
 1. Send the user message with tool definitions
-2. Receive the response with `reasoning` + `content` + `tool_calls`
 3. Execute the tool calls
-4. Append the **full** assistant response (reasoning + content + tool calls) and tool results to the message history
 5. Send the updated history back for the next step
 6. Repeat until the model produces a final response without tool calls
-> **Important**: Step 4 must include the `reasoning` field on the assistant message. The chat template reads this field and re-wraps it in `<think>...</think>` tags during tokenization. Omitting it degrades multi-step performance — see [Preserving reasoning in multi-turn conversations](#preserving-reasoning-in-multi-turn-conversations) for details.
 ## License

 {
   "message": {
     "role": "assistant",
+    "reasoning_content": "The user wants flight information. I need to determine the date for next Tuesday, search for flights SFO → JFK, and filter by price < $300.",
     "content": "\n",
     "tool_calls": [{
       "function": {
 ### Preserving reasoning in multi-turn conversations
+When building multi-turn agentic loops, you **must** pass `reasoning_content` back on assistant messages in subsequent requests. The chat template reads this field and re-wraps it in `<think>...</think>` tags during tokenization, maintaining the model's chain-of-thought across turns.
+**What happens if reasoning is omitted entirely?** The model can lose prior chain-of-thought context. On simple tasks this may work fine, but on complex multi-step agentic tasks, the model can produce malformed tool calls (e.g., tool call XML appearing inside the reasoning field instead of as structured `tool_calls`). For best results, always preserve `reasoning_content` and use `""` instead of `null` for content on tool-call turns.
+For implementation details, pitfalls (`reasoning` vs `reasoning_content`), and Python/TypeScript examples, see [Reasoning Traces](https://docs.arcee.ai/capabilities/reasoning-traces).
 ## Training Configuration
   --enable-auto-tool-choice \
   --tool-call-parser qwen3_coder
 ```
+**Recommended inference settings**: `temperature=0.45–0.6`, `top_p=0.95`, `top_k=50`
 This configuration:
+- `--reasoning-parser deepseek_r1` — Parses `<think>...</think>` reasoning blocks and exposes them via the `reasoning_content` field in the API response
 - `--tool-call-parser qwen3_coder` — Parses structured tool calls from the model output into the OpenAI-compatible `tool_calls` array
     )
     msg = response.choices[0].message
+    # Build assistant message — PRESERVE reasoning_content
     assistant_msg = {"role": "assistant", "content": msg.content}
     if msg.reasoning_content:
+        assistant_msg["reasoning_content"] = msg.reasoning_content  # ← critical for multi-turn
     if msg.tool_calls:
         assistant_msg["tool_calls"] = [
             {"id": tc.id, "type": "function", "function": {"name": tc.function.name, "arguments": tc.function.arguments}}
 The critical line is:
 ```python
+assistant_msg["reasoning_content"] = msg.reasoning_content  # ← pass reasoning_content back
 ```
+The chat template re-wraps it in `<think>...</think>` tags automatically. See [Reasoning Traces](https://docs.arcee.ai/capabilities/reasoning-traces) for full details.
 ### Transformers
 Trinity-Large-Thinking works as a drop-in brain for OpenClaw agents. Its native tool-calling format is compatible with OpenClaw's execution loop, and the extended reasoning enables reliable multi-step task completion — from email triage to code generation to meeting scheduling. Our 91.9% PinchBench score reflects real-world OpenClaw task performance.
+**Deploying for OpenClaw users**: OpenClaw preserves full assistant turns across steps. Ensure `reasoning_content` is forwarded on assistant messages in subsequent turns, and keep `content` non-null (empty string `""` is fine on tool-call turns). See [Reasoning Traces](https://docs.arcee.ai/capabilities/reasoning-traces) for full integration details.
 ### Hermes Agent
 For custom implementations, the key integration pattern is:
 1. Send the user message with tool definitions
+2. Receive the response with `reasoning_content` + `content` + `tool_calls`
 3. Execute the tool calls
+4. Append the **full** assistant response (reasoning_content + content + tool calls) and tool results to the message history
 5. Send the updated history back for the next step
 6. Repeat until the model produces a final response without tool calls
+> **Important**: Step 4 must include `reasoning_content` on the assistant message. The chat template reads this field and re-wraps it in `<think>...</think>` tags during tokenization. Omitting it degrades multi-step performance — see [Reasoning Traces](https://docs.arcee.ai/capabilities/reasoning-traces) for full details.
 ## License