yjyjyj98 commited on
Commit
48163d7
·
verified ·
1 Parent(s): 186cea5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -18
README.md CHANGED
@@ -9,30 +9,16 @@ pinned: false
9
 
10
  <!-- Banner -------------------------------------------------------------- -->
11
  <p align="center">
12
- <b>Fine-grain evaluation &amp; RL baselines for large language models that <i>think</i>.</b><br/>
13
- ConditionedMath (AIME &amp; MATH500) · Model zoo · Training scripts · Zero-shot pipelines
14
- </p>
15
- <p align="center">
16
- <a href="https://github.com/your-org/ContradictMath/actions">
17
- <img alt="CI" src="https://github.com/your-org/ContradictMath/actions/workflows/ci.yml/badge.svg"/>
18
- </a>
19
- <a href="https://pypi.org/project/contradictmath">
20
- <img alt="PyPI" src="https://img.shields.io/pypi/v/contradictmath.svg"/>
21
- </a>
22
- <a href="LICENSE">
23
- <img alt="License" src="https://img.shields.io/github/license/your-org/ContradictMath.svg"/>
24
- </a>
25
  </p>
26
 
27
  ---
28
 
29
  ## 📜 Why ReasoningTrap?
30
 
31
- > Current RL-tuned LLMs excel at *producing* answers, but often ignore explicit user constraints.
32
  > **ReasoningTrap** surfaces these failure modes with carefully crafted, *conditioned* problems.
33
-
34
- * **Dual tracks** – AIME-style short answers and MATH500 long-form proofs.
35
- * **RL objective zoo** – GRPO, PRM, Absolute-Zero Reasoner, Eurus-PRIME, ThinkPRM &amp; more.
36
- * **G-score metric** – geometric mean of correctness, brevity &amp; justification quality.
37
  * **Plug-and-play** – evaluate any 🤗 Transformers, vLLM or OpenAI-style chat model in two lines.
38
 
 
9
 
10
  <!-- Banner -------------------------------------------------------------- -->
11
  <p align="center">
12
+ <b>Fine-grain evaluation &amp; Large Reasoning Models that <i>fails in reasoning</i> due to <i>reasoning rigidity</i>.</b><br/>
13
+ ConditionedMath (AIME &amp; MATH500) · PuzzleTrivial · Training scripts · Zero-shot pipelines
 
 
 
 
 
 
 
 
 
 
 
14
  </p>
15
 
16
  ---
17
 
18
  ## 📜 Why ReasoningTrap?
19
 
20
+ > Current RL-tuned Reasoning LLMs excel at *producing* answers, but often ignore explicit user constraints.
21
  > **ReasoningTrap** surfaces these failure modes with carefully crafted, *conditioned* problems.
22
+ * **Modified from Famous MATH Reasoning Benchmark** – AIME & MATH500 long-form proofs.
 
 
 
23
  * **Plug-and-play** – evaluate any 🤗 Transformers, vLLM or OpenAI-style chat model in two lines.
24