Spaces:
Running
Running
Update README.md
Browse files
README.md
CHANGED
|
@@ -9,30 +9,16 @@ pinned: false
|
|
| 9 |
|
| 10 |
<!-- Banner -------------------------------------------------------------- -->
|
| 11 |
<p align="center">
|
| 12 |
-
<b>Fine-grain evaluation &
|
| 13 |
-
ConditionedMath (AIME & MATH500) ·
|
| 14 |
-
</p>
|
| 15 |
-
<p align="center">
|
| 16 |
-
<a href="https://github.com/your-org/ContradictMath/actions">
|
| 17 |
-
<img alt="CI" src="https://github.com/your-org/ContradictMath/actions/workflows/ci.yml/badge.svg"/>
|
| 18 |
-
</a>
|
| 19 |
-
<a href="https://pypi.org/project/contradictmath">
|
| 20 |
-
<img alt="PyPI" src="https://img.shields.io/pypi/v/contradictmath.svg"/>
|
| 21 |
-
</a>
|
| 22 |
-
<a href="LICENSE">
|
| 23 |
-
<img alt="License" src="https://img.shields.io/github/license/your-org/ContradictMath.svg"/>
|
| 24 |
-
</a>
|
| 25 |
</p>
|
| 26 |
|
| 27 |
---
|
| 28 |
|
| 29 |
## 📜 Why ReasoningTrap?
|
| 30 |
|
| 31 |
-
> Current RL-tuned LLMs excel at *producing* answers, but often ignore explicit user constraints.
|
| 32 |
> **ReasoningTrap** surfaces these failure modes with carefully crafted, *conditioned* problems.
|
| 33 |
-
|
| 34 |
-
* **Dual tracks** – AIME-style short answers and MATH500 long-form proofs.
|
| 35 |
-
* **RL objective zoo** – GRPO, PRM, Absolute-Zero Reasoner, Eurus-PRIME, ThinkPRM & more.
|
| 36 |
-
* **G-score metric** – geometric mean of correctness, brevity & justification quality.
|
| 37 |
* **Plug-and-play** – evaluate any 🤗 Transformers, vLLM or OpenAI-style chat model in two lines.
|
| 38 |
|
|
|
|
| 9 |
|
| 10 |
<!-- Banner -------------------------------------------------------------- -->
|
| 11 |
<p align="center">
|
| 12 |
+
<b>Fine-grain evaluation & Large Reasoning Models that <i>fails in reasoning</i> due to <i>reasoning rigidity</i>.</b><br/>
|
| 13 |
+
ConditionedMath (AIME & MATH500) · PuzzleTrivial · Training scripts · Zero-shot pipelines
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 14 |
</p>
|
| 15 |
|
| 16 |
---
|
| 17 |
|
| 18 |
## 📜 Why ReasoningTrap?
|
| 19 |
|
| 20 |
+
> Current RL-tuned Reasoning LLMs excel at *producing* answers, but often ignore explicit user constraints.
|
| 21 |
> **ReasoningTrap** surfaces these failure modes with carefully crafted, *conditioned* problems.
|
| 22 |
+
* **Modified from Famous MATH Reasoning Benchmark** – AIME & MATH500 long-form proofs.
|
|
|
|
|
|
|
|
|
|
| 23 |
* **Plug-and-play** – evaluate any 🤗 Transformers, vLLM or OpenAI-style chat model in two lines.
|
| 24 |
|