RDson
/

LIMO-R1-Distill-Qwen-7B

Model card Files Files and versions

LIMO-R1-Distill-Qwen-7B / README.md

RDson's picture

Update README.md

1e45d6b verified about 1 year ago

|

history blame contribute delete

3.03 kB

	---
	license: mit
	datasets:
	- GAIR/LIMO
	language:
	- en
	base_model:
	- deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
	tags:
	- R1
	- DeepSeek
	- Distill
	- Qwen
	- 7B
	- LIMO
	---
	# LIMO-R1-Distill-Qwen-7B
	Using [deepseek-ai/DeepSeek-R1-Distill-Qwen-7B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B) as base model.

	Fine-tuned on [GAIR/LIMO](https://huggingface.co/GAIR/LIMO).

	Trained using LLaMA-Factory with the config:
	```
	max_seq_length = 6*1024

	lora_rank = 128
	lora_alpha = lora_rank
	lora_target = "all"

	args = dict(
	stage="sft",
	do_train=True,
	model_name_or_path="unsloth/DeepSeek-R1-Distill-Qwen-7B-bnb-4bit",
	dataset="limo_restructured",
	template="custom_template",
	finetuning_type="lora",
	lora_target=lora_target,
	output_dir="qwen_distill_7b_lora",
	per_device_train_batch_size=1,
	gradient_accumulation_steps=4,
	lr_scheduler_type="cosine",
	logging_steps=1,
	warmup_ratio=0.05,
	learning_rate=1e-4,
	num_train_epochs=1.0,
	max_grad_norm=0.25,
	loraplus_lr_ratio=16.0,
	fp16=True,
	report_to="none",
	preprocessing_num_workers=16,
	cutoff_len=max_seq_length,
	optim="paged_adamw_8bit"
	)

	```

	System used:
	```
	'Please reason step by step inside the <think> and </think> tags, and put your final answer within \\boxed{}.'
	```

	Custom template used in training:
	```
	register_template(
	name="custom_template",
	format_user=StringFormatter(
	slots=["<｜User｜>{{content}}<｜Assistant｜>"]
	),
	format_assistant=StringFormatter(
	slots=["{{content}}<｜end▁of▁sentence｜>"]
	),
	format_system=StringFormatter(
	slots=["<｜begin▁of▁sentence｜>{{content}}"]
	),
	format_function=FunctionFormatter(
	slots=[
	"<｜Assistant｜><｜tool▁calls▁begin｜><｜tool▁call▁begin｜>{{type}}<｜tool▁sep｜>{{name}}\n```json\n{{arguments}}\n```<｜tool▁call▁end｜><｜tool▁calls▁end｜><｜end▁of▁sentence｜>"
	],
	tool_format="qwen"
	),
	format_observation=StringFormatter(
	slots=[
	"<｜tool▁outputs▁begin｜><｜tool▁output_begin｜>{{content}}<｜tool▁output▁end｜><｜tool▁outputs▁end｜>"
	]
	),
	format_tools=ToolFormatter(tool_format="qwen"),
	default_system="Please reason step by step inside the tags <think> and </think>, and put your final answer within \\boxed{}.",
	stop_words=["<｜end▁of▁sentence｜>"]
	)
	```

	Every entry in the dataset starts with `<think>` and end its reasoning with `</think>`.

	In the dataset for variation, I randomly replaced the start of the string "Okay," with one of the following:
	```
	starts = [
	"Alright,",
	"Well,",
	"So,",
	"Hmm,",
	"Okay then,",
	"Right,",
	"Let's see,",
	"Now,",
	"Alrighty,",
	"Thinking about it,",
	"You know,",
	"Well then,",
	"Come to think of it,",
	"Actually,",
	"Now that I think about it,",
	"Good question,",
	"Let me think,",
	"Let's see now,",
	"Interesting,",
	"Now then,"
	]
	```