| --- |
| language: |
| - bg |
| - en |
| license: mit |
| base_model: Qwen/Qwen2.5-Coder-7B-Instruct |
| tags: |
| - code |
| - bulgarian |
| - lora |
| - peft |
| - vitosha-gpt-code |
| - slm |
| - offline |
| --- |
| |
| <div align="center"> |
|
|
| # 🏔️ Vitosha-GPT-Code |
|
|
| ### *The coding assistant that speaks Bulgarian — and runs anywhere.* |
|
|
|  |
|
|
| [](https://opensource.org/licenses/MIT) |
| [](https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct) |
| [](https://huggingface.co/kyleparrratt/Vitosha-GPT-Code) |
| [](https://huggingface.co/kyleparrratt/Vitosha-GPT-Code) |
| [](https://huggingface.co/kyleparrratt/Vitosha-GPT-Code) |
| [](https://huggingface.co/kyleparrratt/Vitosha-GPT-Code) |
|
|
| </div> |
|
|
| --- |
|
|
| ## 🇧🇬 Why This Exists |
|
|
| > *"Every Bulgarian has the right to AI."* |
|
|
| Right now, AI is a luxury. You need fast internet. You need expensive hardware. You need a subscription. If you're in a remote province on a 10-year-old PC, the 21st century is locked behind a paywall. |
|
|
| **That's wrong. And Vitosha-GPT-Code is the answer.** |
|
|
| --- |
|
|
| ## 🌍 The Problem We're Solving |
|
|
| ``` |
| ❌ Remote area? → No cloud access |
| ❌ Old PC / low RAM? → Runs too slow |
| ❌ No fiber optic? → Can't stream tokens |
| ❌ No credit card? → Locked out of ChatGPT |
| |
| ✅ Vitosha-GPT-Code → Works offline. 4GB RAM. Free. Forever. |
| ``` |
|
|
| A kid in a remote Bulgarian province deserves the **same coding tools** as a developer in Sofia. Whether it's building a website for the family business or learning to program for the first time — **hardware should never be a barrier to entry**. |
|
|
| --- |
|
|
| ## 🛠️ What It Does |
|
|
| Vitosha-GPT-Code is a **Bulgarian-first coding assistant** — it writes code, explains concepts, and answers technical questions **in Bulgarian by default**. |
|
|
| | Capability | Example | |
| |---|---| |
| | 🐍 Write Python functions | "Напиши функция за проверка на просто число" | |
| | 🌐 Build web projects | "Направи уебсайт за малък бизнес" | |
| | 🔁 Multi-turn coding chat | Remembers context across follow-up questions | |
| | 📖 Explain algorithms | "Обясни как работи binary search" | |
| | 🗃️ SQL queries | "Извлечи всички потребители от таблица users" | |
| | 🔧 Debug code | Spot errors and suggest fixes in Bulgarian | |
|
|
| ### 🗣️ Real Example Output |
|
|
| ``` |
| Потребител: Напиши функция на Python, която проверява дали число е просто. |
| |
| Виtoша: Ето една проста функция на Python, която проверява дали даденото |
| число е просто: |
| |
| def is_prime(n): |
| if n <= 1: |
| return False |
| for i in range(2, int(n**0.5) + 1): |
| if n % i == 0: |
| return False |
| return True |
| |
| Тази функция работи така: ако числото е по-малко или равно на 1, не е |
| просто. За всички останали — проверяваме всеки делител до корена на |
| числото... |
| ``` |
| ✅ **Correct code. Bulgarian explanation. Zero internet. Zero cost.** |
|
|
| --- |
|
|
| ## ⚡ Quick Start |
|
|
| ```python |
| from transformers import AutoModelForCausalLM, AutoTokenizer |
| from peft import PeftModel |
| import torch |
| |
| base = "Qwen/Qwen2.5-Coder-7B-Instruct" |
| adapter = "kyleparrratt/Vitosha-GPT-Code" |
| |
| tokenizer = AutoTokenizer.from_pretrained(base, trust_remote_code=True) |
| model = AutoModelForCausalLM.from_pretrained( |
| base, |
| torch_dtype=torch.bfloat16, |
| device_map="auto", |
| ) |
| model = PeftModel.from_pretrained(model, adapter, is_trainable=False) |
| |
| messages = [ |
| {"role": "system", "content": "Ти си полезен асистент за програмиране. Отговаряш на български."}, |
| {"role": "user", "content": "Напиши функция на Python за проверка на просто число и обясни на български."}, |
| ] |
| |
| text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) |
| inputs = tokenizer(text, return_tensors="pt").to(model.device) |
| out = model.generate(**inputs, max_new_tokens=512, do_sample=False, pad_token_id=tokenizer.eos_token_id) |
| print(tokenizer.decode(out[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)) |
| ``` |
|
|
| --- |
|
|
| ## 🔬 How It Was Built |
|
|
| This isn't just a wrapper with a Bulgarian flag slapped on it. It's purpose-trained from the ground up to think and respond in Bulgarian. |
|
|
| | Component | Detail | |
| |---|---| |
| | 🧠 **Base Model** | Qwen2.5-Coder-7B-Instruct — one of the strongest open code models available | |
| | 📚 **Training Data** | 5,000 Bulgarian coding examples from evol-codealpaca-v1 | |
| | 🌍 **Translation** | OPUS-MT (opus-mt-tc-big-en-bg) — dedicated EN→BG model, GPU-accelerated | |
| | ⚙️ **Fine-tuning** | LoRA (r=16) with Unsloth, efficient parameter training | |
| | 🚫 **No tricks** | No prompt poisoning, no phrase-stuffing — clean Bulgarian training targets | |
| | 🔒 **Privacy** | Designed for 100% local inference — your data never leaves your machine | |
|
|
| --- |
|
|
| ## 🗺️ Roadmap |
|
|
| ``` |
| [✅] V0.1 — LoRA adapter: Bulgarian code explanations & generation |
| [ ] V0.2 — 5,000-sample OPUS-translated training run + re-train |
| [ ] V0.3 — GGUF export: run with llama.cpp on 4GB RAM |
| [ ] V0.4 — Windows installer for offline use with no tech knowledge |
| [ ] V1.0 — Full offline Bulgarian coding assistant for every Bulgarian |
| ``` |
|
|
| --- |
|
|
| ## ⚠️ Current Limitations |
|
|
| - Occasional slip into English on complex explanations. Add *"Отговори на български."* to the user prompt to stay consistent. |
| - Code identifiers, API names, and variable names remain in English (as they should). |
| - V0.1 is trained on 400 samples; V0.2 will use 5,000. |
|
|
| --- |
|
|
| ## 🏔️ The Name |
|
|
| **Vitosha** is the mountain that watches over Sofia — visible from the capital, unchanging, accessible to everyone. It doesn't care if you're a professor or a student, if you have a new laptop or an old one. You can walk up Vitosha for free. |
|
|
| That's what this model is. |
|
|
| --- |
|
|
| ## 📄 License |
|
|
| MIT. Free to use, modify, and deploy. Kept free on purpose. |
|
|
| --- |
|
|
| <div align="center"> |
|
|
| **Built solo. Kept free. For every Bulgarian. 🇧🇬** |
|
|
| *"От Витоша, за всички."* |
|
|
| </div> |
|
|