Update README.md

3060ad8 verified 11 months ago

7.49 kB

	---
	license: apache-2.0
	datasets:
	- open-r1/OpenR1-Math-220k
	- yentinglin/s1K-1.1-trl-format
	- simplescaling/s1K-1.1
	language:
	- en
	metrics:
	- accuracy
	base_model:
	- mistralai/Mistral-Small-24B-Instruct-2501
	pipeline_tag: text-generation
	tags:
	- reasoning
	model-index:
	- name: yentinglin/Mistral-Small-24B-Instruct-2501-reasoning
	results:
	- task:
	type: text-generation
	dataset:
	name: MATH-500
	type: MATH
	metrics:
	- name: pass@1
	type: pass@1
	value: 0.95
	verified: false
	source:
	name: yentinglin/zhtw-reasoning-eval-leaderboard
	url: https://huggingface.co/spaces/yentinglin/zhtw-reasoning-eval-leaderboard
	- task:
	type: text-generation
	dataset:
	name: AIME 2025
	type: AIME
	metrics:
	- name: pass@1
	type: pass@1
	value: 0.5333
	verified: false
	source:
	name: yentinglin/zhtw-reasoning-eval-leaderboard
	url: https://huggingface.co/spaces/yentinglin/zhtw-reasoning-eval-leaderboard
	- task:
	type: text-generation
	dataset:
	name: AIME 2024
	type: AIME
	metrics:
	- name: pass@1
	type: pass@1
	value: 0.6667
	verified: false
	source:
	name: yentinglin/zhtw-reasoning-eval-leaderboard
	url: https://huggingface.co/spaces/yentinglin/zhtw-reasoning-eval-leaderboard
	- task:
	type: text-generation
	dataset:
	name: GPQA Diamond
	type: GPQA
	metrics:
	- name: pass@1
	type: pass@1
	value: 0.62022
	verified: false
	source:
	name: yentinglin/zhtw-reasoning-eval-leaderboard
	url: https://huggingface.co/spaces/yentinglin/zhtw-reasoning-eval-leaderboard
	---
	# Mistral-Small-Reasoning

	<!-- Provide a quick summary of what the model is/does. -->

	This model is a fine-tuned version of [mistralai/Mistral-Small-24B-Instruct-2501](https://huggingface.co/mistralai/Mistral-Small-24B-Instruct-2501), specifically optimized for mathematical reasoning tasks. It has been fine-tuned on datasets including [OpenR1-Math-220k](https://huggingface.co/datasets/open-r1/OpenR1-Math-220k), and [s1K-1.1](https://huggingface.co/datasets/simplescaling/s1K-1.1), aiming to enhance its reasoning capabilities.

	## Model Details

	### Model Description

	<!-- Provide a longer summary of what this model is. -->



	- Developed by: [Yenting Lin](https://www.linkedin.com/in/yen-ting-lin-416732b3/)
	- Funded by: [Ubitus](https://ubitus.net)
	- Model type: Instruction-tuned language model for reasoning
	- Language(s) (NLP): English (en)
	- License: Apache 2.0
	- Finetuned from model: [mistralai/Mistral-Small-24B-Instruct-2501](https://huggingface.co/mistralai/Mistral-Small-24B-Instruct-2501)


	## How to Get Started with the Model

	A demo is available at [twllm.com](https://twllm.com/models/yentinglin/mistral-sft), and inference can be run using vLLM or sglang.


	## Training Details

	The model was trained using 4×8 H100 GPUs, provided by [Ubitus](https://ubitus.net).


	[<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
	<details><summary>See Training config</summary>

	axolotl version: [`a98526ef7843a3e8aa006f260e6b4fb8912b5f1a`](https://github.com/axolotl-ai-cloud/axolotl/tree/a98526ef7843a3e8aa006f260e6b4fb8912b5f1a)

	```yaml
	base_model: mistralai/Mistral-Small-24B-Instruct-2501

	plugins:
	- axolotl.integrations.liger.LigerPlugin
	liger_rope: true
	liger_rms_norm: true
	liger_swiglu: true
	liger_fused_linear_cross_entropy: true

	datasets:
	- path: yentinglin/s1K-1.1-trl-format
	type: chat_template
	chat_template: tokenizer_default
	field_messages: messages
	message_field_role: role
	message_field_content: content
	- path: open-r1/OpenR1-Math-220k
	type: chat_template
	chat_template: tokenizer_default
	field_messages: messages
	message_field_role: from
	message_field_content: value
	dataset_prepared_path:
	val_set_size: 0.0
	output_dir: ./placeholder/

	sequence_len: 32768
	sample_packing: true
	eval_sample_packing: False
	pad_to_sequence_len: true

	wandb_project: Reasoning
	wandb_entity:
	wandb_watch:
	wandb_name: Mistral-24B-SFT-220k
	wandb_log_model:

	gradient_accumulation_steps: 4
	micro_batch_size: 1
	num_epochs: 5
	optimizer: adamw_torch_fused
	lr_scheduler: cosine
	learning_rate: 2e-5

	train_on_inputs: false
	group_by_length: false
	bf16: auto
	tf32: false

	gradient_checkpointing: true
	gradient_checkpointing_kwargs:
	use_reentrant: false
	logging_steps: 1
	flash_attention: true

	warmup_ratio: 0.1
	saves_per_epoch: 2
	weight_decay: 0.0
	deepspeed: deepspeed_configs/zero3_bf16.json
	special_tokens:
	pad_token: "<pad>"
	```

	</details><br>

	## Evaluation

	The evaluation code is available at [Hugging Face Open-R1](https://github.com/huggingface/open-r1). Note that I have updated the AIME 25 dataset to the full set, available at [AIME 2025](https://huggingface.co/datasets/yentinglin/aime_2025).

	Our results below are averaged over multiple runs. See our eval details [here.](https://huggingface.co/datasets/yentinglin/zhtw-reasoning-details-_fsx_ubuntu_yentinglin_ckpt_run_20250214_1600_checkpoint-800_)

	\| Pass@1 \| # Params \| MATH-500 \| AIME 2025 \| AIME 2024 \| GPQA Diamond \|
	\|-----------------------------------\|---------\|---------\|-----------\|-----------\|--------------\|
	\| Mistral-24B-Reasoning (Ours) \| 24B \| 95.0 \| 53.33 \| 66.67 \| 62.02 \|
	\| Mistral-24B-Instruct \| 24B \| 70.6 \| - \| - \| 45.3 \|
	\| s1.1-32B \| 32B \| 93.2 \| 40.0 \| 56.7 \| 61.62 \|
	\| LIMO \| 32B \| 94.8 \| 36.67 \| 57.1 \| 59.09 \|
	\| DeepSeek-R1-Distill-Llama-70B \| 70B \| 94.5 \| 46.67 \| 70.0 \| 65.2 \|
	\| DeepSeek-R1-Distill-Qwen-32B \| 32B \| 94.3 \| 60.0 \| 72.6 \| 62.1 \|
	\| DeepSeek-R1 \| 671B \| 97.3 \| 70.0 \| 72.6 \| 71.5 \|
	\| o1 \| - \| 96.4 \| 79.0 \| - \| 75.7 \|
	\| o3-mini (high) \| - \| 97.9 \| 86.5 \| - \| 77.2 \|
	\| o3-mini (medium) \| - \| 97.3 \| 76.5 \| - \| 74.9 \|

	## Citation

	If you use this model, please cite:
	```bib
	@article{yentinglin2025_mistral_reasoning,
	author = {Yenting Lin},
	title = {Mistral-Small-24B-Instruct-2501-reasoning},
	journal = {Hugging Face},
	year = {2025},
	url = {https://huggingface.co/yentinglin/Mistral-Small-24B-Instruct-2501-reasoning}
	}
	```


	# Disclaimer

	This model is provided “as‑is” and without warranties of any kind. Users are solely responsible for evaluating the accuracy and suitability of the outputs. The developers assume no liability for any direct or indirect damages arising from its use.
	The model is strictly not intended for high‑risk applications such as medical diagnosis, legal advice, or financial investment. For such use cases, please consult qualified professionals.

	本模型「如是」（as‑is）提供，使用者須自行評估結果之正確性與適用性。開發者對於使用本模型所引發之任何直接或間接損失，不承擔任何法律責任。
	嚴禁用於醫療診斷、法律諮詢、金融投資等高風險場景；若有相關需求，請尋求專業人員協助。