| license: mit | |
| base_model: | |
| - Qwen/Qwen2.5-3B | |
| The model for mathematical reasoning task training from GSM8k and MATH training set by [DERL](arxiv.org/abs/2512.13399). | |
| license: mit | |
| base_model: | |
| - Qwen/Qwen2.5-3B | |
| The model for mathematical reasoning task training from GSM8k and MATH training set by [DERL](arxiv.org/abs/2512.13399). | |