kyleparrratt commited on
Commit
cfb448d
·
verified ·
1 Parent(s): c10a879

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +64 -3
README.md CHANGED
@@ -1,3 +1,64 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - bg
4
+ - en
5
+ license: mit
6
+ base_model: Qwen/Qwen2.5-Coder-7B-Instruct
7
+ tags:
8
+ - code
9
+ - bulgarian
10
+ - lora
11
+ - peft
12
+ - qwen2.5
13
+ ---
14
+
15
+ # Qwen2.5-Coder-7B-Instruct – Bulgarian coding LoRA
16
+
17
+ LoRA adapter that makes [Qwen/Qwen2.5-Coder-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct) respond in **Bulgarian** for coding questions (with code and explanations). Trained on 400 OPUS-translated Bulgarian coding examples (evol-codealpaca-v1), no prompt poisoning.
18
+
19
+ ## Usage
20
+
21
+ Load base model + this adapter with PEFT:
22
+
23
+ ```python
24
+ from transformers import AutoModelForCausalLM, AutoTokenizer
25
+ from peft import PeftModel
26
+ import torch
27
+
28
+ base = "Qwen/Qwen2.5-Coder-7B-Instruct"
29
+ adapter = "YOUR_USERNAME/qwen25coder-7b-bg-lora" # or local path
30
+
31
+ tokenizer = AutoTokenizer.from_pretrained(base, trust_remote_code=True)
32
+ model = AutoModelForCausalLM.from_pretrained(
33
+ base,
34
+ torch_dtype=torch.bfloat16,
35
+ device_map="auto",
36
+ )
37
+ model = PeftModel.from_pretrained(model, adapter, is_trainable=False)
38
+
39
+ messages = [
40
+ {"role": "system", "content": "You are a helpful coding assistant. Answer in Bulgarian when asked."},
41
+ {"role": "user", "content": "Отговори на български. Напиши функция на Python за проверка на просто число."},
42
+ ]
43
+ text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
44
+ inputs = tokenizer(text, return_tensors="pt").to(model.device)
45
+ out = model.generate(**inputs, max_new_tokens=512, do_sample=False, pad_token_id=tokenizer.eos_token_id)
46
+ print(tokenizer.decode(out[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))
47
+ ```
48
+
49
+ ## Training
50
+
51
+ - **Base:** Qwen2.5-Coder-7B-Instruct
52
+ - **Data:** 400 samples from evol-codealpaca-v1, prompts/completions translated to Bulgarian with OPUS (Helsinki-NLP/opus-mt-en-bg), no boost phrase
53
+ - **Adapter:** LoRA r=16, 60 steps from a prior Bulgarian adapter, saved with Unsloth
54
+ - **Inference:** Use plain `transformers` + PEFT (Unsloth inference can hit RoPE shape errors with this adapter)
55
+
56
+ ## Limitations
57
+
58
+ - Some explanations may slip into English on rare occasions.
59
+ - Prefer “Отговори на български.” in the user message for consistent Bulgarian.
60
+ - Small dataset (400 samples); more data would likely improve coverage and style.
61
+
62
+ ## License
63
+
64
+ MIT (same as base model). Adapter weights and this card are provided as-is.