ars22 commited on
Commit
455ac25
·
verified ·
1 Parent(s): 47655bb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -26,7 +26,7 @@ QED-Nano is a 4B parameter model explicitly post-trained to strengthen its proof
26
 
27
  ![imoproofbench.png](https://huggingface.co/lm-provers/QED-Nano/resolve/main/imoproofbench.png)
28
 
29
- QED-Nano is based on [Qwen/Qwen3-4B-Thinking-2507](https://huggingface.co/Qwen/Qwen3-4B-Thinking-2507), and was post-trained via a combination of supervised fine-tuning and [reinforcement learning with a reasoning cache](https://huggingface.co/papers/2602.03773) (to be able to train for continual improvement at test time) on a mixture of Olympiads proof problems from various public sources.
30
 
31
  >[!NOTE]
32
  > We are working to release the full training recipe, including data, code, and agent scaffolds -- stay tuned!
 
26
 
27
  ![imoproofbench.png](https://huggingface.co/lm-provers/QED-Nano/resolve/main/imoproofbench.png)
28
 
29
+ QED-Nano is based on [Qwen/Qwen3-4B-Thinking-2507](https://huggingface.co/Qwen/Qwen3-4B-Thinking-2507), and was post-trained via a combination of supervised fine-tuning and [reinforcement learning with a reasoning cache](https://huggingface.co/papers/2602.03773) (to be able to train for continual improvement with our agentic scaffold at test time) on a mixture of Olympiads proof problems from various public sources.
30
 
31
  >[!NOTE]
32
  > We are working to release the full training recipe, including data, code, and agent scaffolds -- stay tuned!