Improve language tag

#3
by lbourdois - opened
Files changed (1) hide show
  1. README.md +92 -80
README.md CHANGED
@@ -1,80 +1,92 @@
1
- ---
2
- license: apache-2.0
3
- language:
4
- - en
5
- base_model:
6
- - Qwen/Qwen2.5-7B-Instruct
7
- ---
8
- # GAIR/DeepResearcher-7b
9
-
10
- ## Introduction
11
-
12
- DeepResearcher is the first comprehensive framework for end-to-end training of LLM-based deep research agents through scaling reinforcement learning (RL) in real-world environments with authentic web search interactions. Our qualitative analysis reveals emergent cognitive behaviors from end-to-end RL training, including the ability to formulate plans, cross-validate information from multiple sources, engage in self-reflection to redirect research, and maintain honesty when unable to find definitive answers.
13
-
14
- ## Model Details
15
-
16
- - **License:** Apache 2.0
17
- - **Model type:** Reinforcement learning-based LLM (Large Language Model).
18
- - **Language(s):** The model is designed for tasks in English.
19
- - **Finetuned from model:** The model is built using the Qwen2.5-7B-Instruct architecture .
20
-
21
- ### Model Description
22
-
23
- <!-- Provide a longer summary of what this model is. -->
24
-
25
-
26
- ### Model Sources
27
-
28
- - **Repository:** [DeepResearcher GitHub](https://github.com/GAIR-NLP/DeepResearcher) .
29
- - **Paper:** [DeepResearcher: Scaling Deep Research via Reinforcement Learning in Real-world Environments](https://arxiv.org/abs/2504.03160)
30
-
31
-
32
- ## How to Get Started with the Model
33
-
34
- To get started, you can visit the [DeepResearcher repository](https://github.com/GAIR-NLP/DeepResearcher) on GitHub, where the model's code and setup instructions are provided .
35
-
36
- ## Training Details
37
-
38
- ### Training Data
39
-
40
- The model was trained on open-domain question-answering datasets, including:
41
- - **NaturalQuestions (NQ)**
42
- - **TriviaQA (TQ)**
43
- - **HotpotQA**
44
- - **2Wiki MultiHopQA**
45
-
46
- ### Training Procedure
47
-
48
- DeepResearcher was trained using reinforcement learning (RL) with the Group Relative Policy Optimization (GRPO) algorithm. It was tested in both in-domain (NQ, TQ, HotpotQA) and out-of-domain (Musique, Bamboogle, PopQA) settings .
49
-
50
- ## Evaluation
51
-
52
- ### Testing Data
53
-
54
- The model was evaluated on several datasets, including:
55
- - **NQ (Natural Questions)**
56
- - **TQ (TriviaQA)**
57
- - **HotpotQA**
58
- - **2Wiki**
59
- - **Musique**
60
- - **Bamboogle**
61
- - **PopQA** .
62
-
63
-
64
- ### Results
65
-
66
- DeepResearcher outperforms all baseline models, achieving a substantial improvement in task completion across the datasets, particularly in out-of-domain scenarios.
67
-
68
-
69
- ## Citation
70
- ```
71
- @misc{zheng2025deepresearcherscalingdeepresearch,
72
- title={DeepResearcher: Scaling Deep Research via Reinforcement Learning in Real-world Environments},
73
- author={Yuxiang Zheng and Dayuan Fu and Xiangkun Hu and Xiaojie Cai and Lyumanshan Ye and Pengrui Lu and Pengfei Liu},
74
- year={2025},
75
- eprint={2504.03160},
76
- archivePrefix={arXiv},
77
- primaryClass={cs.AI},
78
- url={https://arxiv.org/abs/2504.03160},
79
- }
80
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - zho
5
+ - eng
6
+ - fra
7
+ - spa
8
+ - por
9
+ - deu
10
+ - ita
11
+ - rus
12
+ - jpn
13
+ - kor
14
+ - vie
15
+ - tha
16
+ - ara
17
+ base_model:
18
+ - Qwen/Qwen2.5-7B-Instruct
19
+ ---
20
+ # GAIR/DeepResearcher-7b
21
+
22
+ ## Introduction
23
+
24
+ DeepResearcher is the first comprehensive framework for end-to-end training of LLM-based deep research agents through scaling reinforcement learning (RL) in real-world environments with authentic web search interactions. Our qualitative analysis reveals emergent cognitive behaviors from end-to-end RL training, including the ability to formulate plans, cross-validate information from multiple sources, engage in self-reflection to redirect research, and maintain honesty when unable to find definitive answers.
25
+
26
+ ## Model Details
27
+
28
+ - **License:** Apache 2.0
29
+ - **Model type:** Reinforcement learning-based LLM (Large Language Model).
30
+ - **Language(s):** The model is designed for tasks in English.
31
+ - **Finetuned from model:** The model is built using the Qwen2.5-7B-Instruct architecture .
32
+
33
+ ### Model Description
34
+
35
+ <!-- Provide a longer summary of what this model is. -->
36
+
37
+
38
+ ### Model Sources
39
+
40
+ - **Repository:** [DeepResearcher GitHub](https://github.com/GAIR-NLP/DeepResearcher) .
41
+ - **Paper:** [DeepResearcher: Scaling Deep Research via Reinforcement Learning in Real-world Environments](https://arxiv.org/abs/2504.03160)
42
+
43
+
44
+ ## How to Get Started with the Model
45
+
46
+ To get started, you can visit the [DeepResearcher repository](https://github.com/GAIR-NLP/DeepResearcher) on GitHub, where the model's code and setup instructions are provided .
47
+
48
+ ## Training Details
49
+
50
+ ### Training Data
51
+
52
+ The model was trained on open-domain question-answering datasets, including:
53
+ - **NaturalQuestions (NQ)**
54
+ - **TriviaQA (TQ)**
55
+ - **HotpotQA**
56
+ - **2Wiki MultiHopQA**
57
+
58
+ ### Training Procedure
59
+
60
+ DeepResearcher was trained using reinforcement learning (RL) with the Group Relative Policy Optimization (GRPO) algorithm. It was tested in both in-domain (NQ, TQ, HotpotQA) and out-of-domain (Musique, Bamboogle, PopQA) settings .
61
+
62
+ ## Evaluation
63
+
64
+ ### Testing Data
65
+
66
+ The model was evaluated on several datasets, including:
67
+ - **NQ (Natural Questions)**
68
+ - **TQ (TriviaQA)**
69
+ - **HotpotQA**
70
+ - **2Wiki**
71
+ - **Musique**
72
+ - **Bamboogle**
73
+ - **PopQA** .
74
+
75
+
76
+ ### Results
77
+
78
+ DeepResearcher outperforms all baseline models, achieving a substantial improvement in task completion across the datasets, particularly in out-of-domain scenarios.
79
+
80
+
81
+ ## Citation
82
+ ```
83
+ @misc{zheng2025deepresearcherscalingdeepresearch,
84
+ title={DeepResearcher: Scaling Deep Research via Reinforcement Learning in Real-world Environments},
85
+ author={Yuxiang Zheng and Dayuan Fu and Xiangkun Hu and Xiaojie Cai and Lyumanshan Ye and Pengrui Lu and Pengfei Liu},
86
+ year={2025},
87
+ eprint={2504.03160},
88
+ archivePrefix={arXiv},
89
+ primaryClass={cs.AI},
90
+ url={https://arxiv.org/abs/2504.03160},
91
+ }
92
+ ```