Gen-Verse
/

DemyAgent-4B

Safetensors

qwen3

Model card Files Files and versions

xet

Community

Improve model card: Add metadata, update paper link and add project page link

by nielsr HF Staff - opened Oct 14, 2025

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

+35

-4

Files changed (1) hide show

README.md +35 -4

README.md CHANGED Viewed

@@ -1,8 +1,35 @@
-<div align="center">
-<h1>Demystifying Reinforcement Learning in Agentic Reasoning<h1>
-<p align="center">   <a href="https://arxiv.org/abs/2510.11701">     <img src="https://img.shields.io/badge/Paper-Arxiv-red?logo=arxiv&logoColor=red" alt="Paper on arXiv"/>   </a>   <a href="https://github.com/Gen-Verse/Open-AgentRL">     <img src="https://img.shields.io/badge/Open--AgentRL-GitHub-black?logo=github&logoColor=white" alt="Open-AgentRL on GitHub"/>   </a>   <a href="https://huggingface.co/datasets/Gen-Verse/Open-AgentRL-30K">     <img src="https://img.shields.io/badge/30K_RL_Dataset-Hugging%20Face-orange?logo=huggingface&logoColor=yellow" alt="30K RL Dataset"/>   </a>   <a href="https://huggingface.co/Gen-Verse/DemyAgent-4B">     <img src="https://img.shields.io/badge/DemyAgent--4B-Hugging%20Face-FFCC00?logo=huggingface&logoColor=yellow" alt="DemyAgent-4B Model"/>   </a> </p> </div>
 ## 🎯 About This Repository
 This repository contains the **DemyAgent-4B** model weights, a 4B-sized agentic reasoning model that achieves **state-of-the-art performance** on challenging benchmarks including AIME2024/2025, GPQA-Diamond, and LiveCodeBench-v6.
@@ -60,4 +87,8 @@ We evaluate our models on challenging benchmarks spanning mathematics, science,
   journal={arXiv preprint arXiv:2510.11701},
   year={2025}
 }
-```

+---
+pipeline_tag: text-generation
+library_name: transformers
+license: cc-by-nc-4.0
+tags:
+- agentic-reasoning
+- tool-use
+- LLM
+- Qwen
+---
+# Demystifying Reinforcement Learning in Agentic Reasoning
+<div align="center">
+<p align="center">
+  <a href="https://huggingface.co/papers/2510.11701">
+    <img src="https://img.shields.io/badge/Paper-HuggingFace-red?logo=arxiv&logoColor=red" alt="Paper on Hugging Face"/>
+  </a>
+  <a href="https://github.com/Gen-Verse/Open-AgentRL">
+    <img src="https://img.shields.io/badge/Open--AgentRL-GitHub-black?logo=github&logoColor=white" alt="Open-AgentRL on GitHub"/>
+  </a>
+  <a href="https://huggingface.co/datasets/Gen-Verse/Open-AgentRL-30K">
+    <img src="https://img.shields.io/badge/30K_RL_Dataset-Hugging%20Face-orange?logo=huggingface&logoColor=yellow" alt="30K RL Dataset"/>
+  </a>
+  <a href="https://huggingface.co/Gen-Verse/DemyAgent-4B">
+    <img src="https://img.shields.io/badge/DemyAgent--4B-Hugging%20Face-FFCC00?logo=huggingface&logoColor=yellow" alt="DemyAgent-4B Model"/>
+  </a>
+</p>
+</div>
+**Paper**: [Demystifying Reinforcement Learning in Agentic Reasoning](https://huggingface.co/papers/2510.11701)
+**Project Page**: [Open-AgentRL Collection](https://huggingface.co/collections/Gen-Verse/open-agentrl-68eda4c05755ca5a8c663656)
 ## 🎯 About This Repository
 This repository contains the **DemyAgent-4B** model weights, a 4B-sized agentic reasoning model that achieves **state-of-the-art performance** on challenging benchmarks including AIME2024/2025, GPQA-Diamond, and LiveCodeBench-v6.
   journal={arXiv preprint arXiv:2510.11701},
   year={2025}
 }
+```
+## 🙏 Acknowledgements
+This work aims to explore more efficient paradigms for Agentic RL. Our implementation builds upon the excellent codebases of [VeRL](https://github.com/volcengine/verl) and [ReTool](https://github.com/ReTool-RL/ReTool). We sincerely thank these projects for their valuable insights and high-quality implementations, which have greatly facilitated our research.