EtashGuha commited on
Commit
23efeaa
·
verified ·
1 Parent(s): 0e6079d

Add model card

Browse files
Files changed (1) hide show
  1. README.md +70 -0
README.md ADDED
@@ -0,0 +1,70 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model:
3
+ - Qwen/Qwen3-32B
4
+ datasets:
5
+ - open-thoughts/OpenThoughts-Agent-SFT-100K
6
+ library_name: transformers
7
+ license: apache-2.0
8
+ pipeline_tag: text-generation
9
+ tags:
10
+ - agents
11
+ - terminal
12
+ - code
13
+ - software-engineering
14
+ ---
15
+
16
+ <p align="center">
17
+ <a href="https://www.openthoughts.ai/blog/agent" style="margin-right: 24px;">Project</a> |
18
+ <a href="https://github.com/open-thoughts/OpenThoughts-Agent" style="margin-right: 24px; margin-left: 24px;">Code</a> |
19
+ <a href="https://huggingface.co/datasets/open-thoughts/OpenThoughts-Agent-SFT-100K" style="margin-left: 24px;">Training data</a> |
20
+ <a href="https://huggingface.co/collections/open-thoughts/openthinker-agent" style="margin-left: 24px;">Collection</a>
21
+ </p>
22
+
23
+ # OpenThinkerAgent-32B
24
+
25
+ **OpenThoughts-Agent** is an open effort to curate the best data for training agentic
26
+ language models. **OpenThinkerAgent-32B** is the 32B model from the SFT scaling ladder, fine-tuned
27
+ from [Qwen/Qwen3-32B](https://huggingface.co/Qwen/Qwen3-32B) on the **100,000-example**
28
+ [OpenThoughts-Agent-SFT-100K](https://huggingface.co/datasets/open-thoughts/OpenThoughts-Agent-SFT-100K)
29
+ dataset (Top-4 task sources, GLM-4.7-AWQ teacher in the terminus-2 harness, ≥5-turn filter).
30
+
31
+ ## Performance
32
+
33
+ Evaluated in the **terminus-2** harness (pass@1, 3 stochastic re-runs):
34
+
35
+ | Benchmark | Accuracy |
36
+ | --- | --- |
37
+ | SWE-Bench-Verified-100 | 55.7 |
38
+ | OpenThoughts-TBLite | 41.3 |
39
+ | Terminal-Bench 2.0 | 26.2 |
40
+
41
+ ### Full benchmark suite (OpenThinkerAgent-32B, best harness)
42
+
43
+ | Benchmark | Acc |
44
+ | --- | --- |
45
+ | SWE-Bench-Verified | 54.0 |
46
+ | Terminal-Bench 2.0 | 26.2 |
47
+ | Aider-Polyglot | 32.4 |
48
+ | BFCL-Parity | 85.9 |
49
+ | MedAgentBench | 47.8 |
50
+ | GAIA-127 | 23.6 |
51
+ | FinanceAgent-Terminal | 44.0 |
52
+ | **Average (7)** | **44.8** |
53
+
54
+ This is the best open-data 32B model on the average of seven agentic benchmarks.
55
+
56
+ ## Links
57
+ - 🌐 [Project](https://www.openthoughts.ai/blog/agent)
58
+ - 💻 [Code](https://github.com/open-thoughts/OpenThoughts-Agent)
59
+ - 🧠 [Training dataset](https://huggingface.co/datasets/open-thoughts/OpenThoughts-Agent-SFT-100K)
60
+ - 📚 [Collection](https://huggingface.co/collections/open-thoughts/openthinker-agent)
61
+
62
+ ## Citation
63
+ ```
64
+ @misc{openthoughts-agent,
65
+ author = {Team, OpenThoughts-Agent},
66
+ title = {OpenThoughts-Agent: Data Recipes for Agentic Models},
67
+ howpublished = {https://www.openthoughts.ai/blog/agent},
68
+ year = {2026}
69
+ }
70
+ ```