wizardII commited on
Commit
954c68b
·
verified ·
1 Parent(s): 78e4bad

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -1
README.md CHANGED
@@ -18,7 +18,8 @@ license: apache-2.0
18
 
19
  <div align="center">
20
 
21
- [![Notion](https://img.shields.io/badge/Notion-%23000000.svg?style=for-the-badge&logo=notion&logoColor=white)](https://rogue-canopy-54a.notion.site/Asymmetric-Dual-Clipping-Policy-Optimization-Escaping-Local-Optima-Unlocking-the-Full-Potential--2650e4c8c16a8034a5d3dfec358c9021)
 
22
  [![Github](https://img.shields.io/badge/Code-000000?style=for-the-badge&logo=github&logoColor=000&logoColor=white)](https://github.com/wizard-III/Archer2.0)
23
  [![Model](https://img.shields.io/badge/Model-fcd022?style=for-the-badge&logo=huggingface&logoColor=000&labelColor)](https://huggingface.co/Fate-Zero/Archer2.0-Code-1.5B-Preview)
24
  [![Data](https://img.shields.io/badge/Data-fcd022?style=for-the-badge&logo=huggingface&logoColor=000&labelColor)](https://huggingface.co/datasets/Fate-Zero/Archer2.0-Code-1.5B)
@@ -26,6 +27,17 @@ license: apache-2.0
26
 
27
  </div>
28
 
 
 
 
 
 
 
 
 
 
 
 
29
  <table>
30
  <thead>
31
  <tr>
 
18
 
19
  <div align="center">
20
 
21
+ [![Notion](https://img.shields.io/badge/Notion-%23000000.svg?style=for-the-badge&logo=notion&logoColor=white)](https://rogue-canopy-54a.notion.site/ASPO-Asymmetric-Importance-Sampling-Policy-Optimization-2650e4c8c16a8034a5d3dfec358c9021)
22
+ [![Paper](https://img.shields.io/badge/Paper-%23000000?style=for-the-badge&logo=arxiv&logoColor=000&labelColor=white)](https://arxiv.org/abs/2510.06062)
23
  [![Github](https://img.shields.io/badge/Code-000000?style=for-the-badge&logo=github&logoColor=000&logoColor=white)](https://github.com/wizard-III/Archer2.0)
24
  [![Model](https://img.shields.io/badge/Model-fcd022?style=for-the-badge&logo=huggingface&logoColor=000&labelColor)](https://huggingface.co/Fate-Zero/Archer2.0-Code-1.5B-Preview)
25
  [![Data](https://img.shields.io/badge/Data-fcd022?style=for-the-badge&logo=huggingface&logoColor=000&labelColor)](https://huggingface.co/datasets/Fate-Zero/Archer2.0-Code-1.5B)
 
27
 
28
  </div>
29
 
30
+ ## Overview
31
+
32
+ **Archer2.0** marks a significant evolution from its predecessor through the introduction of **Asymmetric Importance Sampling Policy Optimization (ASPO)**, which is designed to overcome the fundamental limitations of **PPO-Clip**, effectively mitigating issues like **entropy collapse** and **repetitive outputs**, preventing **premature convergence**, and thereby enabling more advanced **reinforcement learning** capabilities.
33
+
34
+ <div align="center">
35
+ <img src="assets/archer2.0.png" width="100%"/>
36
+ </div>
37
+ <br>
38
+
39
+ While our mathematical models are still in training and have not converged, we have evaluated Archer2.0 on the LiveCodeBench v5 and v6 code benchmarks. The results are detailed in the table below.
40
+
41
  <table>
42
  <thead>
43
  <tr>