reasoning_sft_sample_lora_a_quality_v4

Qwen/Qwen3.5-0.8B์— ํ•œ๊ตญ์–ด Thinking Process ํ˜•์‹ ๋ฐ์ดํ„ฐ๋ฅผ SFTํ•œ LoRA adapter์ž…๋‹ˆ๋‹ค.

  • Dataset: NotoriousH2/reasoning_sft_sample
  • Config: method_a
  • Route: teacher๊ฐ€ ์งˆ๋ฌธ๋งŒ ๋ณด๊ณ  ํ•œ๊ตญ์–ด reasoning๊ณผ ๋‹ต๋ณ€์„ ์ง์ ‘ ์ƒ์„ฑํ•œ ๋ฐ์ดํ„ฐ
  • Base model: Qwen/Qwen3.5-0.8B
  • Train split: 400 examples
  • Training: QLoRA, 2 epochs, LoRA r=16, alpha=32

System Prompt

๋‹น์‹ ์€ ํ•œ๊ตญ์–ด๋กœ ์ถ”๋ก ํ•˜๊ณ  ๋‹ตํ•˜๋Š” ์กฐ์ˆ˜์ž…๋‹ˆ๋‹ค.
reasoning ์˜์—ญ์€ `Thinking Process:`๋กœ ์‹œ์ž‘ํ•˜๊ณ , ํ•œ๊ตญ์–ด๋กœ ๊ตฌ์กฐํ™”ํ•ด ์ž‘์„ฑํ•˜์„ธ์š”.
์ตœ์ข… ์‘๋‹ต์€ ์‚ฌ์šฉ์ž์˜ ์š”์ฒญ์— ๋งž๋Š” ์ž์—ฐ์Šค๋Ÿฌ์šด ํ•œ๊ตญ์–ด๋กœ ์ž‘์„ฑํ•˜์„ธ์š”.

Usage

from peft import AutoPeftModelForCausalLM
from transformers import AutoTokenizer

model_id = "NotoriousH2/reasoning_sft_sample_lora_a_quality_v4"

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoPeftModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype="auto",
    trust_remote_code=True,
)
Downloads last month
101
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for NotoriousH2/reasoning_sft_sample_lora_a_quality_v4

Adapter
(144)
this model

Dataset used to train NotoriousH2/reasoning_sft_sample_lora_a_quality_v4