|
|
--- |
|
|
base_model: |
|
|
- Qwen/Qwen3-8B |
|
|
datasets: |
|
|
- OpenThoughts-Agent-v1-SFT |
|
|
- OpenThoughts-Agent-v1-RL |
|
|
library_name: transformers |
|
|
license: apache-2.0 |
|
|
model-index: |
|
|
- name: OpenThinker-Agent-v1 |
|
|
results: [] |
|
|
pipeline_tag: text-generation |
|
|
tags: |
|
|
- agents |
|
|
- terminal |
|
|
- code |
|
|
- software-engineering |
|
|
--- |
|
|
|
|
|
<p align="center"> |
|
|
<img src="https://huggingface.co/datasets/open-thoughts/OpenThoughts1-Agent-SFT/resolve/main/ota-logo.png" width="50%"> |
|
|
</p> |
|
|
|
|
|
<p align="center"> |
|
|
<a href="https://www.openthoughts.ai/blog/agent" style="margin-right: 24px;">Project</a> | |
|
|
<a href="https://huggingface.co/datasets/open-thoughts/OpenThoughts-Agent-v1-SFT" style="margin-right: 24px; margin-left: 24px;">SFT dataset</a> | |
|
|
<a href="https://huggingface.co/datasets/open-thoughts/OpenThoughts-Agent-v1-RL" style="margin-right: 24px; margin-left: 24px;">RL dataset</a> | |
|
|
<a href="https://huggingface.co/open-thoughts/OpenThinker-Agent-v1-SFT" style="margin-right: 24px; margin-left: 24px;">SFT model</a> | |
|
|
<a href="https://huggingface.co/open-thoughts/OpenThinker-Agent-v1" style="margin-left: 24px;">RL model</a> |
|
|
</p> |
|
|
|
|
|
|
|
|
# OpenThinker-Agent-v1 |
|
|
|
|
|
**OpenThoughts-Agent** is an open-source effort to curate the best datasets for training agents. Our first release includes [datasets](https://huggingface.co/collections/open-thoughts/openthinker-agent), [models](https://huggingface.co/collections/open-thoughts/openthinker-agent) and our [research codebase](https://github.com/open-thoughts/OpenThoughts-Agent). |
|
|
|
|
|
[OpenThinker-Agent-v1](https://huggingface.co/open-thoughts/OpenThinker-Agent-v1) is a model trained for agentic tasks such as **Terminal-Bench 2.0** and **SWE-Bench**. |
|
|
|
|
|
The [OpenThinker-Agent-v1](https://huggingface.co/open-thoughts/OpenThinker-Agent-v1) model is post-trained from [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B). |
|
|
It is SFT-ed on the [OpenThoughts-Agent-v1-SFT](https://huggingface.co/datasets/open-thoughts/OpenThoughts-Agent-v1-SFT) dataset, then RL-ed on the [OpenThoughts-Agent-v1-RL](https://huggingface.co/datasets/open-thoughts/OpenThoughts-Agent-v1-RL) dataset. |
|
|
|
|
|
This model is the final model after both SFT and RL. For the model after the SFT stage only, see [OpenThinker-Agent-v1-SFT](https://huggingface.co/open-thoughts/OpenThinker-Agent-v1-SFT). |
|
|
|
|
|
- **Homepage:** https://www.openthoughts.ai/blog/agent |
|
|
- **Repository:** https://github.com/open-thoughts/OpenThoughts-Agent |
|
|
|
|
|
|
|
|
# OpenThinker-Agent-v1 Model Performance |
|
|
|
|
|
Our [OpenThinker-Agent-v1](https://huggingface.co/datasets/open-thoughts/OpenThoughts-Agent-v1-RL) model is the state-of-the-art model at its scale on agent benchmarks. |
|
|
|
|
|
| Model | Harness | Terminal-Bench 2.0 | SWE-Bench Verified | OpenThoughts-TB-Dev | |
|
|
| ----------------------------------------------------------------------------------------------- | ------- | ------------------ | --------- | ------------------- | |
|
|
| [Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) | Terminus-2 | 0.0 | 0.7 | 5.7 | |
|
|
| **[OpenThinker-Agent-v1](https://huggingface.co/open-thoughts/OpenThinker-Agent-v1)** | Terminus-2 | 4.9 | 15.7 | 17.3 | |
|
|
| [Qwen3-32B](https://huggingface.co/Qwen/Qwen3-32B) | Terminus-2 | 1.9 | 5.7 | 10.2 | |
|
|
| [Qwen/Qwen3-Coder-30B-A3B-Instruct](https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct) | OpenHands | 10.1 | 49.2 | 24.5 | |
|
|
|
|
|
|
|
|
# Data |
|
|
|
|
|
We built [OpenThinker-Agent-v1](https://huggingface.co/open-thoughts/OpenThinker-Agent-v1) in two stages: **supervised fine-tuning**, followed by **reinforcement learning**. |
|
|
Each stage required its own data pipeline β RL tasks (instructions, environments, and verifiers) and SFT traces from strong teacher agents completing tasks. |
|
|
|
|
|
[OpenThoughts-Agent-v1-SFT](https://huggingface.co/datasets/open-thoughts/OpenThoughts-Agent-v1-SFT) is an SFT trace dataset containing approximately **15,200 traces** drawn from two different data sources we curate: |
|
|
- **nl2bash**: Simple synthetically generated tasks where the agent has to format shell commands effectively |
|
|
- **InferredBugs**: A set of bugs in C# and Java collected by Microsoft that we turned into tasks |
|
|
|
|
|
[OpenThoughts-Agent-v1-RL](https://huggingface.co/datasets/open-thoughts/OpenThoughts-Agent-v1-RL) is an RL dataset containing ~720 tasks drawn from the **nl2bash verified** dataset. |
|
|
|
|
|
To stabilize training, we built a three-stage filtration pipeline that prunes tasks before they ever hit the learner: |
|
|
|
|
|
1. Bad verifiers filter: drop tasks with flaky or excessively slow verifiers. |
|
|
2. Environment stability: remove tasks whose containers take too long to build or tear down. |
|
|
Optional difficulty filter: discard tasks that even a strong model (GPT-5 Codex) cannot solve in a single pass. |
|
|
|
|
|
|
|
|
# Links |
|
|
- π [OpenThoughts-Agent project page](https://www.openthoughts.ai/blog/agent) |
|
|
- π» [OpenThoughts-Agent GitHub repository](https://github.com/open-thoughts/OpenThoughts-Agent) |
|
|
- π§ [OpenThoughts-Agent-v1-SFT dataset](https://huggingface.co/datasets/open-thoughts/OpenThoughts-Agent-v1-SFT) |
|
|
- π§ [OpenThoughts-Agent-v1-RL dataset](https://huggingface.co/datasets/open-thoughts/OpenThoughts-Agent-v1-RL) |
|
|
- π§ [OpenThoughts-TB-dev dataset](https://huggingface.co/datasets/open-thoughts/OpenThoughts-TB-dev) |
|
|
- π€ [OpenThinker-Agent-v1 model](https://huggingface.co/open-thoughts/OpenThinker-Agent-v1) |
|
|
- π€ [OpenThinker-Agent-v1-SFT model](https://huggingface.co/open-thoughts/OpenThinker-Agent-v1-SFT) |
|
|
|
|
|
|
|
|
# Citation |
|
|
``` |
|
|
@misc{openthoughts-agent, |
|
|
author = {Team, OpenThoughts-Agent}, |
|
|
month = Dec, |
|
|
title = {{OpenThoughts-Agent}}, |
|
|
howpublished = {https://open-thoughts.ai/agent}, |
|
|
year = {2025} |
|
|
} |
|
|
``` |