onebeans
/

Qwen2.5-Coder-KoInstruct-QLoRA

Text Generation

Model card Files Files and versions

Qwen2.5-Coder-KoInstruct-QLoRA / README.md

onebeans's picture

Update README.md

74060e3 verified 11 months ago

|

history blame contribute delete

3.79 kB

	---
	license: apache-2.0
	datasets:
	- beomi/KoAlpaca-RealQA
	language:
	- ko
	base_model:
	- Qwen/Qwen2.5-Coder-1.5B-Instruct
	pipeline_tag: text-generation
	---

	# Model Description
	Qwen/Qwen2.5-Coder-1.5B-Instruct을 기반으로 PEFT를 이용하여 QLoRA (4-bit quantization + PEFT)해본 모델입니다.

	학습 데이터는 beomi/KoAlpaca-RealQA를 사용하였습니다.

	작은 모델을 이용하여 QLoRA를 한 것이다 보니 양질의 output이 나오지는 않지만 QLoRA모델과 원본모델의 답변이 차이는 확실히 있었습니다.

	# Quantization Configuration
	```python
	bnb_config = BitsAndBytesConfig(
	load_in_4bit=True,
	bnb_4bit_quant_type="nf4",
	bnb_4bit_use_double_quant=True,
	bnb_4bit_compute_dtype=torch.float16,
	)
	```

	# LoRA Condifiguration
	```python
	lora_config = LoraConfig(
	r=8,
	lora_alpha=32,
	lora_dropout=0.05,
	bias="none",
	task_type="CAUSAL_LM",
	target_modules=["c_attn", "q_proj", "v_proj"]
	)
	```

	# Training Arguments
	```python
	training_args = TrainingArguments(
	num_train_epochs=8,
	per_device_train_batch_size=4,
	gradient_accumulation_steps=4,
	evaluation_strategy="steps",
	eval_steps=300,
	save_strategy="steps",
	save_steps=300,
	logging_steps=300,
	load_best_model_at_end=True,
	metric_for_best_model="eval_loss",
	greater_is_better=False
	)
	```

	# Training Progress
	\| Step \| Training Loss \| Validation Loss \|
	\|------\|---------------\|-----------------\|
	\| 300 \| 1.595000 \| 1.611501 \|
	\| 600 \| 1.593300 \| 1.596210 \|
	\| 900 \| 1.577600 \| 1.586121 \|
	\| 1200 \| 1.564600 \| 1.577804 \|
	\| ... \| ... \| ... \|
	\| 7200 \| 1.499700 \| 1.525933 \|
	\| 7500 \| 1.493400 \| 1.525612 \|
	\| 7800 \| 1.491000 \| 1.525330 \|
	\| 8100 \| 1.499900 \| 1.525138 \|


	# 실행 코드
	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
	import torch

	# Quantization config (must match QLoRA settings used during fine-tuning)
	bnb_config = BitsAndBytesConfig(
	load_in_4bit=True,
	bnb_4bit_quant_type="nf4",
	bnb_4bit_use_double_quant=True,
	bnb_4bit_compute_dtype=torch.float16,
	)

	# Load tokenizer and model (local or hub path)
	model_path = "onebeans/Qwen2.5-Coder-KoInstruct-QLoRA"
	tokenizer = AutoTokenizer.from_pretrained(model_path)
	model = AutoModelForCausalLM.from_pretrained(
	model_path,
	quantization_config=bnb_config,
	device_map="auto"
	)
	model.eval()

	# Define prompt using ChatML format (Qwen-style)
	def build_chatml_prompt(question: str) -> str:
	system_msg = "<\|im_start\|>system\n당신은 유용한 한국어 도우미입니다.<\|im_end\|>\n"
	user_msg = f"<\|im_start\|>user\n{question}<\|im_end\|>\n"
	return system_msg + user_msg + "<\|im_start\|>assistant\n"

	# Run inference
	def generate_response(question: str, max_new_tokens: int = 128) -> str:
	prompt = build_chatml_prompt(question)
	inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

	with torch.no_grad():
	outputs = model.generate(
	**inputs,
	max_new_tokens=max_new_tokens,
	do_sample=False,
	top_p=0.9,
	temperature=0.7,
	eos_token_id=tokenizer.eos_token_id,
	)

	return tokenizer.decode(outputs[0], skip_special_tokens=True)

	# Example
	question = "한국의 수도는 어디인가요?" # 기존 모델(Qwen/Qwen2.5-Coder-1.5B-Instruct)의 응답 -> 한국의 수도는 서울입니다.
	response = generate_response(question)
	print("모델 응답:\n", response)
	```

	# 실행환경

	Window 10

	NVIDIA GeForce RTX 4070 Ti

	# Framework Versions

	Python: 3.10.14

	PyTorch: 1.12.1

	Transformers: 4.46.2

	Datasets: 3.2.0

	Tokenizers: 0.20.3

	PEFT: 0.8.2