File size: 6,741 Bytes
cfb448d
93b2b72
 
 
 
 
cfb448d
93b2b72
 
 
 
 
 
 
cfb448d
 
93b2b72
cfb448d
93b2b72
cfb448d
93b2b72
cfb448d
6d29a06
 
93b2b72
 
 
 
 
 
cfb448d
93b2b72
cfb448d
93b2b72
b999e09
93b2b72
b999e09
93b2b72
b999e09
93b2b72
b999e09
93b2b72
b999e09
93b2b72
b999e09
93b2b72
b999e09
93b2b72
 
 
 
 
b999e09
93b2b72
 
b999e09
93b2b72
b999e09
93b2b72
b999e09
93b2b72
b999e09
93b2b72
b999e09
93b2b72
 
 
 
 
 
 
 
b999e09
93b2b72
b999e09
93b2b72
 
b999e09
93b2b72
 
b999e09
93b2b72
 
 
 
 
 
 
b999e09
93b2b72
 
 
 
 
b999e09
93b2b72
b999e09
93b2b72
b999e09
93b2b72
 
 
 
b999e09
93b2b72
 
b999e09
93b2b72
 
 
 
 
 
 
b999e09
93b2b72
 
 
 
b999e09
93b2b72
 
 
 
 
b999e09
93b2b72
b999e09
93b2b72
b999e09
93b2b72
b999e09
93b2b72
 
 
 
 
 
 
 
b999e09
93b2b72
b999e09
93b2b72
b999e09
93b2b72
 
 
 
 
 
 
b999e09
93b2b72
b999e09
93b2b72
b999e09
93b2b72
 
 
b999e09
93b2b72
b999e09
93b2b72
b999e09
93b2b72
b999e09
93b2b72
b999e09
93b2b72
b999e09
93b2b72
b999e09
93b2b72
b999e09
93b2b72
b999e09
93b2b72
b999e09
93b2b72
b999e09
93b2b72
b999e09
93b2b72
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
---
language:
  - bg
  - en
license: mit
base_model: Qwen/Qwen2.5-Coder-7B-Instruct
tags:
  - code
  - bulgarian
  - lora
  - peft
  - vitosha-gpt-code
  - slm
  - offline
---

<div align="center">

# 🏔️ Vitosha-GPT-Code

### *The coding assistant that speaks Bulgarian — and runs anywhere.*

![Vitosha-GPT-Code Banner](vitosha-banner.png)

[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT)
[![Base Model](https://img.shields.io/badge/Base-Qwen2.5--Coder--7B-blue)](https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct)
[![Language](https://img.shields.io/badge/Language-Bulgarian%20🇧🇬-red)](https://huggingface.co/kyleparrratt/Vitosha-GPT-Code)
[![Offline](https://img.shields.io/badge/Runs-100%25%20Offline-orange)](https://huggingface.co/kyleparrratt/Vitosha-GPT-Code)
[![RAM](https://img.shields.io/badge/Target-4GB%20RAM-purple)](https://huggingface.co/kyleparrratt/Vitosha-GPT-Code)
[![Status](https://img.shields.io/badge/Status-V0.1%20in%20development-yellow)](https://huggingface.co/kyleparrratt/Vitosha-GPT-Code)

</div>

---

## 🇧🇬 Why This Exists

> *"Every Bulgarian has the right to AI."*

Right now, AI is a luxury. You need fast internet. You need expensive hardware. You need a subscription. If you're in a remote province on a 10-year-old PC, the 21st century is locked behind a paywall.

**That's wrong. And Vitosha-GPT-Code is the answer.**

---

## 🌍 The Problem We're Solving

```
❌  Remote area?          → No cloud access
❌  Old PC / low RAM?     → Runs too slow  
❌  No fiber optic?       → Can't stream tokens
❌  No credit card?       → Locked out of ChatGPT

✅  Vitosha-GPT-Code      → Works offline. 4GB RAM. Free. Forever.
```

A kid in a remote Bulgarian province deserves the **same coding tools** as a developer in Sofia. Whether it's building a website for the family business or learning to program for the first time — **hardware should never be a barrier to entry**.

---

## 🛠️ What It Does

Vitosha-GPT-Code is a **Bulgarian-first coding assistant** — it writes code, explains concepts, and answers technical questions **in Bulgarian by default**. 

| Capability | Example |
|---|---|
| 🐍 Write Python functions | "Напиши функция за проверка на просто число" |
| 🌐 Build web projects | "Направи уебсайт за малък бизнес" |
| 🔁 Multi-turn coding chat | Remembers context across follow-up questions |
| 📖 Explain algorithms | "Обясни как работи binary search" |
| 🗃️ SQL queries | "Извлечи всички потребители от таблица users" |
| 🔧 Debug code | Spot errors and suggest fixes in Bulgarian |

### 🗣️ Real Example Output

```
Потребител: Напиши функция на Python, която проверява дали число е просто.

Виtoша: Ето една проста функция на Python, която проверява дали даденото
число е просто:

def is_prime(n):
    if n <= 1:
        return False
    for i in range(2, int(n**0.5) + 1):
        if n % i == 0:
            return False
    return True

Тази функция работи така: ако числото е по-малко или равно на 1, не е
просто. За всички останали — проверяваме всеки делител до корена на
числото...
```**Correct code. Bulgarian explanation. Zero internet. Zero cost.**

---

## ⚡ Quick Start

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

base = "Qwen/Qwen2.5-Coder-7B-Instruct"
adapter = "kyleparrratt/Vitosha-GPT-Code"

tokenizer = AutoTokenizer.from_pretrained(base, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    base,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
model = PeftModel.from_pretrained(model, adapter, is_trainable=False)

messages = [
    {"role": "system", "content": "Ти си полезен асистент за програмиране. Отговаряш на български."},
    {"role": "user", "content": "Напиши функция на Python за проверка на просто число и обясни на български."},
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=512, do_sample=False, pad_token_id=tokenizer.eos_token_id)
print(tokenizer.decode(out[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))
```

---

## 🔬 How It Was Built

This isn't just a wrapper with a Bulgarian flag slapped on it. It's purpose-trained from the ground up to think and respond in Bulgarian.

| Component | Detail |
|---|---|
| 🧠 **Base Model** | Qwen2.5-Coder-7B-Instruct — one of the strongest open code models available |
| 📚 **Training Data** | 5,000 Bulgarian coding examples from evol-codealpaca-v1 |
| 🌍 **Translation** | OPUS-MT (opus-mt-tc-big-en-bg) — dedicated EN→BG model, GPU-accelerated |
| ⚙️ **Fine-tuning** | LoRA (r=16) with Unsloth, efficient parameter training |
| 🚫 **No tricks** | No prompt poisoning, no phrase-stuffing — clean Bulgarian training targets |
| 🔒 **Privacy** | Designed for 100% local inference — your data never leaves your machine |

---

## 🗺️ Roadmap

```
[✅] V0.1 — LoRA adapter: Bulgarian code explanations & generation
[ ] V0.2 — 5,000-sample OPUS-translated training run + re-train
[ ] V0.3 — GGUF export: run with llama.cpp on 4GB RAM
[ ] V0.4 — Windows installer for offline use with no tech knowledge
[ ] V1.0 — Full offline Bulgarian coding assistant for every Bulgarian
```

---

## ⚠️ Current Limitations

- Occasional slip into English on complex explanations. Add *"Отговори на български."* to the user prompt to stay consistent.
- Code identifiers, API names, and variable names remain in English (as they should).
- V0.1 is trained on 400 samples; V0.2 will use 5,000.

---

## 🏔️ The Name

**Vitosha** is the mountain that watches over Sofia — visible from the capital, unchanging, accessible to everyone. It doesn't care if you're a professor or a student, if you have a new laptop or an old one. You can walk up Vitosha for free.

That's what this model is.

---

## 📄 License

MIT. Free to use, modify, and deploy. Kept free on purpose.

---

<div align="center">

**Built solo. Kept free. For every Bulgarian. 🇧🇬**

*"От Витоша, за всички."*

</div>