yjyjyj98 commited on
Commit
186cea5
·
verified ·
1 Parent(s): f1a9ec4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +29 -1
README.md CHANGED
@@ -7,4 +7,32 @@ sdk: static
7
  pinned: false
8
  ---
9
 
10
- Edit this `README.md` markdown file to author your organization card.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  pinned: false
8
  ---
9
 
10
+ <!-- Banner -------------------------------------------------------------- -->
11
+ <p align="center">
12
+ <b>Fine-grain evaluation &amp; RL baselines for large language models that <i>think</i>.</b><br/>
13
+ ConditionedMath (AIME &amp; MATH500) · Model zoo · Training scripts · Zero-shot pipelines
14
+ </p>
15
+ <p align="center">
16
+ <a href="https://github.com/your-org/ContradictMath/actions">
17
+ <img alt="CI" src="https://github.com/your-org/ContradictMath/actions/workflows/ci.yml/badge.svg"/>
18
+ </a>
19
+ <a href="https://pypi.org/project/contradictmath">
20
+ <img alt="PyPI" src="https://img.shields.io/pypi/v/contradictmath.svg"/>
21
+ </a>
22
+ <a href="LICENSE">
23
+ <img alt="License" src="https://img.shields.io/github/license/your-org/ContradictMath.svg"/>
24
+ </a>
25
+ </p>
26
+
27
+ ---
28
+
29
+ ## 📜 Why ReasoningTrap?
30
+
31
+ > Current RL-tuned LLMs excel at *producing* answers, but often ignore explicit user constraints.
32
+ > **ReasoningTrap** surfaces these failure modes with carefully crafted, *conditioned* problems.
33
+
34
+ * **Dual tracks** – AIME-style short answers and MATH500 long-form proofs.
35
+ * **RL objective zoo** – GRPO, PRM, Absolute-Zero Reasoner, Eurus-PRIME, ThinkPRM &amp; more.
36
+ * **G-score metric** – geometric mean of correctness, brevity &amp; justification quality.
37
+ * **Plug-and-play** – evaluate any 🤗 Transformers, vLLM or OpenAI-style chat model in two lines.
38
+