Spaces:

ReasoningTrap
/

README

Running

yjyjyj98 commited on May 22, 2025

Commit

48163d7

verified ·

1 Parent(s): 186cea5

Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -9,30 +9,16 @@ pinned: false
 <!-- Banner -------------------------------------------------------------- -->
 <p align="center">
-  <b>Fine-grain evaluation &amp; RL baselines for large language models that <i>think</i>.</b><br/>
-  ConditionedMath (AIME &amp; MATH500) · Model zoo · Training scripts · Zero-shot pipelines
-</p>
-<p align="center">
-  <a href="https://github.com/your-org/ContradictMath/actions">
-    <img alt="CI" src="https://github.com/your-org/ContradictMath/actions/workflows/ci.yml/badge.svg"/>
-  </a>
-  <a href="https://pypi.org/project/contradictmath">
-    <img alt="PyPI" src="https://img.shields.io/pypi/v/contradictmath.svg"/>
-  </a>
-  <a href="LICENSE">
-    <img alt="License" src="https://img.shields.io/github/license/your-org/ContradictMath.svg"/>
-  </a>
 </p>
 ---
 ## 📜 Why ReasoningTrap?
-> Current RL-tuned LLMs excel at *producing* answers, but often ignore explicit user constraints.
 > **ReasoningTrap** surfaces these failure modes with carefully crafted, *conditioned* problems.
-* **Dual tracks** – AIME-style short answers and MATH500 long-form proofs.
-* **RL objective zoo** – GRPO, PRM, Absolute-Zero Reasoner, Eurus-PRIME, ThinkPRM &amp; more.
-* **G-score metric** – geometric mean of correctness, brevity &amp; justification quality.
 * **Plug-and-play** – evaluate any 🤗 Transformers, vLLM or OpenAI-style chat model in two lines.

 <!-- Banner -------------------------------------------------------------- -->
 <p align="center">
+  <b>Fine-grain evaluation &amp; Large Reasoning Models that <i>fails in reasoning</i> due to <i>reasoning rigidity</i>.</b><br/>
+  ConditionedMath (AIME &amp; MATH500) · PuzzleTrivial · Training scripts · Zero-shot pipelines
 </p>
 ---
 ## 📜 Why ReasoningTrap?
+> Current RL-tuned Reasoning LLMs excel at *producing* answers, but often ignore explicit user constraints.
 > **ReasoningTrap** surfaces these failure modes with carefully crafted, *conditioned* problems.
+* **Modified from Famous MATH Reasoning Benchmark** – AIME & MATH500 long-form proofs.
 * **Plug-and-play** – evaluate any 🤗 Transformers, vLLM or OpenAI-style chat model in two lines.