| | --- |
| | license: apache-2.0 |
| | datasets: |
| | - open-r1/OpenR1-Math-220k |
| | - yentinglin/s1K-1.1-trl-format |
| | - simplescaling/s1K-1.1 |
| | language: |
| | - en |
| | metrics: |
| | - accuracy |
| | base_model: |
| | - mistralai/Mistral-Small-24B-Instruct-2501 |
| | pipeline_tag: text-generation |
| | tags: |
| | - reasoning |
| | model-index: |
| | - name: yentinglin/Mistral-Small-24B-Instruct-2501-reasoning |
| | results: |
| | - task: |
| | type: text-generation |
| | dataset: |
| | name: MATH-500 |
| | type: MATH |
| | metrics: |
| | - name: pass@1 |
| | type: pass@1 |
| | value: 0.95 |
| | verified: false |
| | source: |
| | name: yentinglin/zhtw-reasoning-eval-leaderboard |
| | url: https://huggingface.co/spaces/yentinglin/zhtw-reasoning-eval-leaderboard |
| | - task: |
| | type: text-generation |
| | dataset: |
| | name: AIME 2025 |
| | type: AIME |
| | metrics: |
| | - name: pass@1 |
| | type: pass@1 |
| | value: 0.5333 |
| | verified: false |
| | source: |
| | name: yentinglin/zhtw-reasoning-eval-leaderboard |
| | url: https://huggingface.co/spaces/yentinglin/zhtw-reasoning-eval-leaderboard |
| | - task: |
| | type: text-generation |
| | dataset: |
| | name: AIME 2024 |
| | type: AIME |
| | metrics: |
| | - name: pass@1 |
| | type: pass@1 |
| | value: 0.6667 |
| | verified: false |
| | source: |
| | name: yentinglin/zhtw-reasoning-eval-leaderboard |
| | url: https://huggingface.co/spaces/yentinglin/zhtw-reasoning-eval-leaderboard |
| | - task: |
| | type: text-generation |
| | dataset: |
| | name: GPQA Diamond |
| | type: GPQA |
| | metrics: |
| | - name: pass@1 |
| | type: pass@1 |
| | value: 0.62022 |
| | verified: false |
| | source: |
| | name: yentinglin/zhtw-reasoning-eval-leaderboard |
| | url: https://huggingface.co/spaces/yentinglin/zhtw-reasoning-eval-leaderboard |
| | --- |
| | # Mistral-Small-Reasoning |
| |
|
| | <!-- Provide a quick summary of what the model is/does. --> |
| |
|
| | This model is a fine-tuned version of [mistralai/Mistral-Small-24B-Instruct-2501](https://huggingface.co/mistralai/Mistral-Small-24B-Instruct-2501), specifically optimized for mathematical reasoning tasks. It has been fine-tuned on datasets including [OpenR1-Math-220k](https://huggingface.co/datasets/open-r1/OpenR1-Math-220k), and [s1K-1.1](https://huggingface.co/datasets/simplescaling/s1K-1.1), aiming to enhance its reasoning capabilities. |
| |
|
| | ## Model Details |
| |
|
| | ### Model Description |
| |
|
| | <!-- Provide a longer summary of what this model is. --> |
| |
|
| |
|
| |
|
| | - **Developed by:** [Yenting Lin](https://www.linkedin.com/in/yen-ting-lin-416732b3/) |
| | - **Funded by:** [Ubitus](https://ubitus.net) |
| | - **Model type:** Instruction-tuned language model for reasoning |
| | - **Language(s) (NLP):** English (en) |
| | - **License:** Apache 2.0 |
| | - **Finetuned from model:** [mistralai/Mistral-Small-24B-Instruct-2501](https://huggingface.co/mistralai/Mistral-Small-24B-Instruct-2501) |
| |
|
| |
|
| | ## How to Get Started with the Model |
| |
|
| | A demo is available at [twllm.com](https://twllm.com/models/yentinglin/mistral-sft), and inference can be run using vLLM or sglang. |
| |
|
| |
|
| | ## Training Details |
| |
|
| | The model was trained using **4×8 H100 GPUs**, provided by [**Ubitus**](https://ubitus.net). |
| |
|
| |
|
| | [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl) |
| | <details><summary>See Training config</summary> |
| |
|
| | axolotl version: [`a98526ef7843a3e8aa006f260e6b4fb8912b5f1a`](https://github.com/axolotl-ai-cloud/axolotl/tree/a98526ef7843a3e8aa006f260e6b4fb8912b5f1a) |
| |
|
| | ```yaml |
| | base_model: mistralai/Mistral-Small-24B-Instruct-2501 |
| | |
| | plugins: |
| | - axolotl.integrations.liger.LigerPlugin |
| | liger_rope: true |
| | liger_rms_norm: true |
| | liger_swiglu: true |
| | liger_fused_linear_cross_entropy: true |
| | |
| | datasets: |
| | - path: yentinglin/s1K-1.1-trl-format |
| | type: chat_template |
| | chat_template: tokenizer_default |
| | field_messages: messages |
| | message_field_role: role |
| | message_field_content: content |
| | - path: open-r1/OpenR1-Math-220k |
| | type: chat_template |
| | chat_template: tokenizer_default |
| | field_messages: messages |
| | message_field_role: from |
| | message_field_content: value |
| | dataset_prepared_path: |
| | val_set_size: 0.0 |
| | output_dir: ./placeholder/ |
| | |
| | sequence_len: 32768 |
| | sample_packing: true |
| | eval_sample_packing: False |
| | pad_to_sequence_len: true |
| | |
| | wandb_project: Reasoning |
| | wandb_entity: |
| | wandb_watch: |
| | wandb_name: Mistral-24B-SFT-220k |
| | wandb_log_model: |
| | |
| | gradient_accumulation_steps: 4 |
| | micro_batch_size: 1 |
| | num_epochs: 5 |
| | optimizer: adamw_torch_fused |
| | lr_scheduler: cosine |
| | learning_rate: 2e-5 |
| | |
| | train_on_inputs: false |
| | group_by_length: false |
| | bf16: auto |
| | tf32: false |
| | |
| | gradient_checkpointing: true |
| | gradient_checkpointing_kwargs: |
| | use_reentrant: false |
| | logging_steps: 1 |
| | flash_attention: true |
| | |
| | warmup_ratio: 0.1 |
| | saves_per_epoch: 2 |
| | weight_decay: 0.0 |
| | deepspeed: deepspeed_configs/zero3_bf16.json |
| | special_tokens: |
| | pad_token: "<pad>" |
| | ``` |
| |
|
| | </details><br> |
| |
|
| | ## Evaluation |
| |
|
| | The evaluation code is available at [Hugging Face Open-R1](https://github.com/huggingface/open-r1). Note that I have updated the AIME 25 dataset to the full set, available at [AIME 2025](https://huggingface.co/datasets/yentinglin/aime_2025). |
| |
|
| | Our results below are averaged over multiple runs. See our eval details [here.](https://huggingface.co/datasets/yentinglin/zhtw-reasoning-details-_fsx_ubuntu_yentinglin_ckpt_run_20250214_1600_checkpoint-800_) |
| |
|
| | | Pass@1 | # Params | MATH-500 | AIME 2025 | AIME 2024 | GPQA Diamond | |
| | |-----------------------------------|---------|---------|-----------|-----------|--------------| |
| | | **Mistral-24B-Reasoning (Ours)** | 24B | 95.0 | 53.33 | 66.67 | 62.02 | |
| | | Mistral-24B-Instruct | 24B | 70.6 | - | - | 45.3 | |
| | | s1.1-32B | 32B | 93.2 | 40.0 | 56.7 | 61.62 | |
| | | LIMO | 32B | 94.8 | 36.67 | 57.1 | 59.09 | |
| | | DeepSeek-R1-Distill-Llama-70B | 70B | 94.5 | 46.67 | 70.0 | 65.2 | |
| | | DeepSeek-R1-Distill-Qwen-32B | 32B | 94.3 | 60.0 | 72.6 | 62.1 | |
| | | DeepSeek-R1 | 671B | 97.3 | 70.0 | 72.6 | 71.5 | |
| | | o1 | - | 96.4 | 79.0 | - | 75.7 | |
| | | o3-mini (high) | - | 97.9 | 86.5 | - | 77.2 | |
| | | o3-mini (medium) | - | 97.3 | 76.5 | - | 74.9 | |
| |
|
| | ## Citation |
| |
|
| | If you use this model, please cite: |
| | ```bib |
| | @article{yentinglin2025_mistral_reasoning, |
| | author = {Yenting Lin}, |
| | title = {Mistral-Small-24B-Instruct-2501-reasoning}, |
| | journal = {Hugging Face}, |
| | year = {2025}, |
| | url = {https://huggingface.co/yentinglin/Mistral-Small-24B-Instruct-2501-reasoning} |
| | } |
| | ``` |
| |
|
| |
|
| | # Disclaimer |
| |
|
| | This model is provided “as‑is” and without warranties of any kind. Users are solely responsible for evaluating the accuracy and suitability of the outputs. The developers assume no liability for any direct or indirect damages arising from its use. |
| | The model is strictly not intended for high‑risk applications such as medical diagnosis, legal advice, or financial investment. For such use cases, please consult qualified professionals. |
| |
|
| | 本模型「如是」(as‑is)提供,使用者須自行評估結果之正確性與適用性。開發者對於使用本模型所引發之任何直接或間接損失,不承擔任何法律責任。 |
| | 嚴禁用於醫療診斷、法律諮詢、金融投資等高風險場景;若有相關需求,請尋求專業人員協助。 |