--- language: - bg - en license: mit base_model: Qwen/Qwen2.5-Coder-7B-Instruct tags: - code - bulgarian - lora - peft - vitosha-gpt-code - slm - offline ---
# 🏔️ Vitosha-GPT-Code ### *The coding assistant that speaks Bulgarian — and runs anywhere.* ![Vitosha-GPT-Code Banner](vitosha-banner.png) [![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT) [![Base Model](https://img.shields.io/badge/Base-Qwen2.5--Coder--7B-blue)](https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct) [![Language](https://img.shields.io/badge/Language-Bulgarian%20🇧🇬-red)](https://huggingface.co/kyleparrratt/Vitosha-GPT-Code) [![Offline](https://img.shields.io/badge/Runs-100%25%20Offline-orange)](https://huggingface.co/kyleparrratt/Vitosha-GPT-Code) [![RAM](https://img.shields.io/badge/Target-4GB%20RAM-purple)](https://huggingface.co/kyleparrratt/Vitosha-GPT-Code) [![Status](https://img.shields.io/badge/Status-V0.1%20in%20development-yellow)](https://huggingface.co/kyleparrratt/Vitosha-GPT-Code)
--- ## 🇧🇬 Why This Exists > *"Every Bulgarian has the right to AI."* Right now, AI is a luxury. You need fast internet. You need expensive hardware. You need a subscription. If you're in a remote province on a 10-year-old PC, the 21st century is locked behind a paywall. **That's wrong. And Vitosha-GPT-Code is the answer.** --- ## 🌍 The Problem We're Solving ``` ❌ Remote area? → No cloud access ❌ Old PC / low RAM? → Runs too slow ❌ No fiber optic? → Can't stream tokens ❌ No credit card? → Locked out of ChatGPT ✅ Vitosha-GPT-Code → Works offline. 4GB RAM. Free. Forever. ``` A kid in a remote Bulgarian province deserves the **same coding tools** as a developer in Sofia. Whether it's building a website for the family business or learning to program for the first time — **hardware should never be a barrier to entry**. --- ## 🛠️ What It Does Vitosha-GPT-Code is a **Bulgarian-first coding assistant** — it writes code, explains concepts, and answers technical questions **in Bulgarian by default**. | Capability | Example | |---|---| | 🐍 Write Python functions | "Напиши функция за проверка на просто число" | | 🌐 Build web projects | "Направи уебсайт за малък бизнес" | | 🔁 Multi-turn coding chat | Remembers context across follow-up questions | | 📖 Explain algorithms | "Обясни как работи binary search" | | 🗃️ SQL queries | "Извлечи всички потребители от таблица users" | | 🔧 Debug code | Spot errors and suggest fixes in Bulgarian | ### 🗣️ Real Example Output ``` Потребител: Напиши функция на Python, която проверява дали число е просто. Виtoша: Ето една проста функция на Python, която проверява дали даденото число е просто: def is_prime(n): if n <= 1: return False for i in range(2, int(n**0.5) + 1): if n % i == 0: return False return True Тази функция работи така: ако числото е по-малко или равно на 1, не е просто. За всички останали — проверяваме всеки делител до корена на числото... ``` ✅ **Correct code. Bulgarian explanation. Zero internet. Zero cost.** --- ## ⚡ Quick Start ```python from transformers import AutoModelForCausalLM, AutoTokenizer from peft import PeftModel import torch base = "Qwen/Qwen2.5-Coder-7B-Instruct" adapter = "kyleparrratt/Vitosha-GPT-Code" tokenizer = AutoTokenizer.from_pretrained(base, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained( base, torch_dtype=torch.bfloat16, device_map="auto", ) model = PeftModel.from_pretrained(model, adapter, is_trainable=False) messages = [ {"role": "system", "content": "Ти си полезен асистент за програмиране. Отговаряш на български."}, {"role": "user", "content": "Напиши функция на Python за проверка на просто число и обясни на български."}, ] text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = tokenizer(text, return_tensors="pt").to(model.device) out = model.generate(**inputs, max_new_tokens=512, do_sample=False, pad_token_id=tokenizer.eos_token_id) print(tokenizer.decode(out[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)) ``` --- ## 🔬 How It Was Built This isn't just a wrapper with a Bulgarian flag slapped on it. It's purpose-trained from the ground up to think and respond in Bulgarian. | Component | Detail | |---|---| | 🧠 **Base Model** | Qwen2.5-Coder-7B-Instruct — one of the strongest open code models available | | 📚 **Training Data** | 5,000 Bulgarian coding examples from evol-codealpaca-v1 | | 🌍 **Translation** | OPUS-MT (opus-mt-tc-big-en-bg) — dedicated EN→BG model, GPU-accelerated | | ⚙️ **Fine-tuning** | LoRA (r=16) with Unsloth, efficient parameter training | | 🚫 **No tricks** | No prompt poisoning, no phrase-stuffing — clean Bulgarian training targets | | 🔒 **Privacy** | Designed for 100% local inference — your data never leaves your machine | --- ## 🗺️ Roadmap ``` [✅] V0.1 — LoRA adapter: Bulgarian code explanations & generation [ ] V0.2 — 5,000-sample OPUS-translated training run + re-train [ ] V0.3 — GGUF export: run with llama.cpp on 4GB RAM [ ] V0.4 — Windows installer for offline use with no tech knowledge [ ] V1.0 — Full offline Bulgarian coding assistant for every Bulgarian ``` --- ## ⚠️ Current Limitations - Occasional slip into English on complex explanations. Add *"Отговори на български."* to the user prompt to stay consistent. - Code identifiers, API names, and variable names remain in English (as they should). - V0.1 is trained on 400 samples; V0.2 will use 5,000. --- ## 🏔️ The Name **Vitosha** is the mountain that watches over Sofia — visible from the capital, unchanging, accessible to everyone. It doesn't care if you're a professor or a student, if you have a new laptop or an old one. You can walk up Vitosha for free. That's what this model is. --- ## 📄 License MIT. Free to use, modify, and deploy. Kept free on purpose. ---
**Built solo. Kept free. For every Bulgarian. 🇧🇬** *"От Витоша, за всички."*