|
|
--- |
|
|
license: mit |
|
|
datasets: |
|
|
- GAIR/LIMO |
|
|
language: |
|
|
- en |
|
|
base_model: |
|
|
- deepseek-ai/DeepSeek-R1-Distill-Qwen-7B |
|
|
tags: |
|
|
- R1 |
|
|
- DeepSeek |
|
|
- Distill |
|
|
- Qwen |
|
|
- 7B |
|
|
- LIMO |
|
|
--- |
|
|
# LIMO-R1-Distill-Qwen-7B |
|
|
Using [deepseek-ai/DeepSeek-R1-Distill-Qwen-7B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B) as base model. |
|
|
|
|
|
Fine-tuned on [GAIR/LIMO](https://huggingface.co/GAIR/LIMO). |
|
|
|
|
|
Trained using LLaMA-Factory with the config: |
|
|
``` |
|
|
max_seq_length = 6*1024 |
|
|
|
|
|
lora_rank = 128 |
|
|
lora_alpha = lora_rank |
|
|
lora_target = "all" |
|
|
|
|
|
args = dict( |
|
|
stage="sft", |
|
|
do_train=True, |
|
|
model_name_or_path="unsloth/DeepSeek-R1-Distill-Qwen-7B-bnb-4bit", |
|
|
dataset="limo_restructured", |
|
|
template="custom_template", |
|
|
finetuning_type="lora", |
|
|
lora_target=lora_target, |
|
|
output_dir="qwen_distill_7b_lora", |
|
|
per_device_train_batch_size=1, |
|
|
gradient_accumulation_steps=4, |
|
|
lr_scheduler_type="cosine", |
|
|
logging_steps=1, |
|
|
warmup_ratio=0.05, |
|
|
learning_rate=1e-4, |
|
|
num_train_epochs=1.0, |
|
|
max_grad_norm=0.25, |
|
|
loraplus_lr_ratio=16.0, |
|
|
fp16=True, |
|
|
report_to="none", |
|
|
preprocessing_num_workers=16, |
|
|
cutoff_len=max_seq_length, |
|
|
optim="paged_adamw_8bit" |
|
|
) |
|
|
|
|
|
``` |
|
|
|
|
|
System used: |
|
|
``` |
|
|
'Please reason step by step inside the <think> and </think> tags, and put your final answer within \\boxed{}.' |
|
|
``` |
|
|
|
|
|
Custom template used in training: |
|
|
``` |
|
|
register_template( |
|
|
name="custom_template", |
|
|
format_user=StringFormatter( |
|
|
slots=["<|User|>{{content}}<|Assistant|>"] |
|
|
), |
|
|
format_assistant=StringFormatter( |
|
|
slots=["{{content}}<|end▁of▁sentence|>"] |
|
|
), |
|
|
format_system=StringFormatter( |
|
|
slots=["<|begin▁of▁sentence|>{{content}}"] |
|
|
), |
|
|
format_function=FunctionFormatter( |
|
|
slots=[ |
|
|
"<|Assistant|><|tool▁calls▁begin|><|tool▁call▁begin|>{{type}}<|tool▁sep|>{{name}}\n```json\n{{arguments}}\n```<|tool▁call▁end|><|tool▁calls▁end|><|end▁of▁sentence|>" |
|
|
], |
|
|
tool_format="qwen" |
|
|
), |
|
|
format_observation=StringFormatter( |
|
|
slots=[ |
|
|
"<|tool▁outputs▁begin|><|tool▁output_begin|>{{content}}<|tool▁output▁end|><|tool▁outputs▁end|>" |
|
|
] |
|
|
), |
|
|
format_tools=ToolFormatter(tool_format="qwen"), |
|
|
default_system="Please reason step by step inside the tags <think> and </think>, and put your final answer within \\boxed{}.", |
|
|
stop_words=["<|end▁of▁sentence|>"] |
|
|
) |
|
|
``` |
|
|
|
|
|
Every entry in the dataset starts with `<think>` and end its reasoning with `</think>`. |
|
|
|
|
|
In the dataset for variation, I randomly replaced the start of the string "Okay," with one of the following: |
|
|
``` |
|
|
starts = [ |
|
|
"Alright,", |
|
|
"Well,", |
|
|
"So,", |
|
|
"Hmm,", |
|
|
"Okay then,", |
|
|
"Right,", |
|
|
"Let's see,", |
|
|
"Now,", |
|
|
"Alrighty,", |
|
|
"Thinking about it,", |
|
|
"You know,", |
|
|
"Well then,", |
|
|
"Come to think of it,", |
|
|
"Actually,", |
|
|
"Now that I think about it,", |
|
|
"Good question,", |
|
|
"Let me think,", |
|
|
"Let's see now,", |
|
|
"Interesting,", |
|
|
"Now then," |
|
|
] |
|
|
``` |
|
|
|