Update README.md
Browse files
README.md
CHANGED
|
@@ -26,7 +26,7 @@ QED-Nano is a 4B parameter model explicitly post-trained to strengthen its proof
|
|
| 26 |
|
| 27 |

|
| 28 |
|
| 29 |
-
QED-Nano is based on [Qwen/Qwen3-4B-Thinking-2507](https://huggingface.co/Qwen/Qwen3-4B-Thinking-2507), and was post-trained via a combination of supervised fine-tuning and [reinforcement learning with a reasoning cache](https://huggingface.co/papers/2602.03773) (to be able to train for continual improvement at test time) on a mixture of Olympiads proof problems from various public sources.
|
| 30 |
|
| 31 |
>[!NOTE]
|
| 32 |
> We are working to release the full training recipe, including data, code, and agent scaffolds -- stay tuned!
|
|
|
|
| 26 |
|
| 27 |

|
| 28 |
|
| 29 |
+
QED-Nano is based on [Qwen/Qwen3-4B-Thinking-2507](https://huggingface.co/Qwen/Qwen3-4B-Thinking-2507), and was post-trained via a combination of supervised fine-tuning and [reinforcement learning with a reasoning cache](https://huggingface.co/papers/2602.03773) (to be able to train for continual improvement with our agentic scaffold at test time) on a mixture of Olympiads proof problems from various public sources.
|
| 30 |
|
| 31 |
>[!NOTE]
|
| 32 |
> We are working to release the full training recipe, including data, code, and agent scaffolds -- stay tuned!
|