lm-provers
/

QED-Nano

Text Generation

text-generation-inference

Model card Files Files and versions

ars22 commited on Feb 12

Commit

455ac25

·

verified ·

1 Parent(s): 47655bb

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -26,7 +26,7 @@ QED-Nano is a 4B parameter model explicitly post-trained to strengthen its proof
 ![imoproofbench.png](https://huggingface.co/lm-provers/QED-Nano/resolve/main/imoproofbench.png)
-QED-Nano is based on [Qwen/Qwen3-4B-Thinking-2507](https://huggingface.co/Qwen/Qwen3-4B-Thinking-2507), and was post-trained via a combination of supervised fine-tuning and [reinforcement learning with a reasoning cache](https://huggingface.co/papers/2602.03773) (to be able to train for continual improvement at test time) on a mixture of Olympiads proof problems from various public sources.
 >[!NOTE]
 > We are working to release the full training recipe, including data, code, and agent scaffolds -- stay tuned!

 ![imoproofbench.png](https://huggingface.co/lm-provers/QED-Nano/resolve/main/imoproofbench.png)
+QED-Nano is based on [Qwen/Qwen3-4B-Thinking-2507](https://huggingface.co/Qwen/Qwen3-4B-Thinking-2507), and was post-trained via a combination of supervised fine-tuning and [reinforcement learning with a reasoning cache](https://huggingface.co/papers/2602.03773) (to be able to train for continual improvement with our agentic scaffold at test time) on a mixture of Olympiads proof problems from various public sources.
 >[!NOTE]
 > We are working to release the full training recipe, including data, code, and agent scaffolds -- stay tuned!