Add HOTE-8B model card

#1
Files changed (1) hide show
  1. README.md +91 -1
README.md CHANGED
@@ -7,4 +7,94 @@ base_model:
7
  datasets:
8
  - rl-research/dr-tulu-sft-data
9
  - rl-research/dr-tulu-rl-data
10
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  datasets:
8
  - rl-research/dr-tulu-sft-data
9
  - rl-research/dr-tulu-rl-data
10
+ library_name: transformers
11
+ pipeline_tag: text-generation
12
+ tags:
13
+ - deep-research
14
+ - agent
15
+ - reinforcement-learning
16
+ - tool-use
17
+ - open-ended-evolution
18
+ - qwen3
19
+ model-index:
20
+ - name: HOTE-8B
21
+ results:
22
+ - task:
23
+ type: text-generation
24
+ name: Long-form deep research
25
+ dataset:
26
+ name: HealthBench
27
+ type: HealthBench
28
+ metrics:
29
+ - type: score
30
+ value: 54.4
31
+ name: HealthBench score
32
+ - task:
33
+ type: text-generation
34
+ name: Long-form deep research
35
+ dataset:
36
+ name: ResearchQA
37
+ type: ResearchQA
38
+ metrics:
39
+ - type: score
40
+ value: 76.9
41
+ name: ResearchQA score
42
+ - task:
43
+ type: text-generation
44
+ name: Long-form deep research
45
+ dataset:
46
+ name: DeepResearchBench
47
+ type: DeepResearchBench
48
+ metrics:
49
+ - type: score
50
+ value: 45.9
51
+ name: DeepResearchBench score
52
+ ---
53
+
54
+ # HOTE-8B
55
+
56
+ HOTE-8B is an 8B-parameter deep research model trained with **Hybrid Open-Ended Tri-Evolution (HOTE)**, a reinforcement-learning framework for open-ended research agents. The model is introduced in [Hybrid Open-Ended Tri-Evolution Makes Better Deep Researcher](https://arxiv.org/abs/2606.13710) (arXiv:2606.13710v2, 2026-06-15).
57
+
58
+ HOTE trains a deep research system through the co-evolution of three roles:
59
+
60
+ - **Solver**: plans, searches, integrates retrieved evidence, and writes long-form research reports with citations.
61
+ - **Judge**: generates and updates rubrics, evaluates multiple solver responses, and provides rewards beyond deterministic-answer tasks.
62
+ - **Proposer**: searches for weaknesses identified by the judge and proposes challenging but learnable research tasks.
63
+
64
+ The framework uses a dual-mode strategy with both tool-use and no-tool training. According to the paper, this improves training efficiency while allowing the tool-use and no-tool modes to benefit each other.
65
+
66
+ ## Repository Contents
67
+
68
+ This repository contains the following checkpoint folders:
69
+
70
+ - `step_700/`: HOTE-8B deep research model checkpoint.
71
+ - `step_700_query/`: proposer checkpoint used in the HOTE framework.
72
+
73
+ ## Intended Use
74
+
75
+ HOTE-8B is intended for research on long-form deep research agents, search-augmented report generation, open-ended agent evolution, and reinforcement learning for non-verifiable tasks.
76
+
77
+ The model is most useful when integrated with a search-enabled agent runtime. In the paper, the solver operates with ReAct-style actions including thinking, tool calls, final answers, and citations. The model weights alone do not provide web search, browsing, paper search, citation validation, or tool execution.
78
+
79
+
80
+ ## Limitations
81
+
82
+ - The model is designed for deep research workflows and should be paired with robust tool execution, citation validation, and source-quality checks.
83
+ - The model may generate inaccurate, incomplete, outdated, or unsupported claims, especially without retrieval tools.
84
+ - The paper notes that evolution slows as training progresses and that the upper bound may still be constrained by model scale.
85
+ - The HOTE method still relies on initial training data; fully data-free open-ended deep research evolution is left for future work.
86
+ - Research outputs in sensitive domains such as healthcare, law, finance, or public policy should be reviewed by qualified experts.
87
+
88
+ ## Citation
89
+
90
+ ```bibtex
91
+ @misc{piao2026hybridopenendedtrievolutionmakes,
92
+ title = {Hybrid Open-Ended Tri-Evolution Makes Better Deep Researcher},
93
+ author = {Hongming Piao and Chi Liu and Mengzhuo Chen and Yan Shu and Xidong Wang and Derek Li and Ying Wei and Bryan Dai},
94
+ year = {2026},
95
+ eprint = {2606.13710},
96
+ archivePrefix = {arXiv},
97
+ primaryClass = {cs.AI},
98
+ url = {https://arxiv.org/abs/2606.13710}
99
+ }
100
+ ```