Qwen2.5-7B Construction Equipment Summarization LoRA (Korean) β Best
건μ€μ₯λΉ μ μ§λ³΄μ ν΄λ μ μμ½ LoRA μ΄λν° (SFT v2 - μ΅μ’ μΆμ² λͺ¨λΈ)
Model Description
- Base Model: Qwen/Qwen2.5-7B-Instruct
- Training Method: SFT v2 (Supervised Fine-Tuning)
- Domain: Construction equipment maintenance & repair claim summarization
- Task: Korean claim report β Korean summary
- Framework: Transformers + PEFT
Performance (summarization_test.json, 100 samples)
| Model | ROUGE-1 | ROUGE-2 | ROUGE-L |
|---|---|---|---|
| Qwen Base | 32.94 | 14.57 | 32.94 |
| Qwen SFT v2 (this) | 34.64 | 14.66 | 34.39 |
| Qwen DPO v2 | 33.20 | 12.07 | 33.20 |
Comparison with GPT-OSS-20B
| Model | ROUGE-1 | ROUGE-2 | ROUGE-L |
|---|---|---|---|
| Qwen SFT v2 (7B, this) | 34.64 | 14.66 | 34.39 |
| GPT-OSS SFT v2 (20B) | 34.41 | 14.74 | 34.16 |
Nearly identical performance despite Qwen being 7B vs GPT-OSS 20B.
Key Findings
- Qwen Base already has strong Korean summarization ability (ROUGE-1 32.94)
- SFT provides modest improvement (+1.70 ROUGE-1)
- DPO hurts performance (-1.44 ROUGE-1 vs SFT)
- 100% of samples produce valid Korean summaries
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-7B-Instruct", torch_dtype="auto", device_map="auto")
model = PeftModel.from_pretrained(model, "madokalif/qwen2.5-7b-construction-summarization-ko-lora")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-7B-Instruct")
messages = [
{"role": "system", "content": "건μ€μ₯λΉ ν΄λ μ λ³΄κ³ μλ₯Ό μ½κ³ ν΅μ¬ λ΄μ©μ κ°κ²°νκ² νκ΅μ΄λ‘ μμ½νμΈμ."},
{"role": "user", "content": "λ€μ ν΄λ μ λ³΄κ³ μλ₯Ό μμ½νμΈμ:\n\nνμ: λκ°μ νΈμ€μμ λκ°μκ° μκ³ μμ΅λλ€..."}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.3, top_p=0.9, do_sample=True)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
- Downloads last month
- 3