kyleparrratt
/

Vitosha-GPT-Code

+---
+language:
+  - bg
+  - en
+license: mit
+base_model: Qwen/Qwen2.5-Coder-7B-Instruct
+tags:
+  - code
+  - bulgarian
+  - lora
+  - peft
+  - qwen2.5
+---
+# Qwen2.5-Coder-7B-Instruct – Bulgarian coding LoRA
+LoRA adapter that makes [Qwen/Qwen2.5-Coder-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct) respond in **Bulgarian** for coding questions (with code and explanations). Trained on 400 OPUS-translated Bulgarian coding examples (evol-codealpaca-v1), no prompt poisoning.
+## Usage
+Load base model + this adapter with PEFT:
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+from peft import PeftModel
+import torch
+base = "Qwen/Qwen2.5-Coder-7B-Instruct"
+adapter = "YOUR_USERNAME/qwen25coder-7b-bg-lora"  # or local path
+tokenizer = AutoTokenizer.from_pretrained(base, trust_remote_code=True)
+model = AutoModelForCausalLM.from_pretrained(
+    base,
+    torch_dtype=torch.bfloat16,
+    device_map="auto",
+)
+model = PeftModel.from_pretrained(model, adapter, is_trainable=False)
+messages = [
+    {"role": "system", "content": "You are a helpful coding assistant. Answer in Bulgarian when asked."},
+    {"role": "user", "content": "Отговори на български. Напиши функция на Python за проверка на просто число."},
+]
+text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+inputs = tokenizer(text, return_tensors="pt").to(model.device)
+out = model.generate(**inputs, max_new_tokens=512, do_sample=False, pad_token_id=tokenizer.eos_token_id)
+print(tokenizer.decode(out[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))
+```
+## Training
+- **Base:** Qwen2.5-Coder-7B-Instruct
+- **Data:** 400 samples from evol-codealpaca-v1, prompts/completions translated to Bulgarian with OPUS (Helsinki-NLP/opus-mt-en-bg), no boost phrase
+- **Adapter:** LoRA r=16, 60 steps from a prior Bulgarian adapter, saved with Unsloth
+- **Inference:** Use plain `transformers` + PEFT (Unsloth inference can hit RoPE shape errors with this adapter)
+## Limitations
+- Some explanations may slip into English on rare occasions.
+- Prefer “Отговори на български.” in the user message for consistent Bulgarian.
+- Small dataset (400 samples); more data would likely improve coverage and style.
+## License
+MIT (same as base model). Adapter weights and this card are provided as-is.