---
language:
- mk
license: apache-2.0
library_name: transformers
pipeline_tag: text-generation
tags:
- macedonian
- cyrillic
- mistral
- qlora
- peft
base_model: mistralai/Mistral-7B-v0.1
datasets:
- ainowmk/MK-LLM-Mistral-data
metrics:
- perplexity
pretty_name: MK-LLM (Mistral)
model-index:
- name: MK-LLM-Mistral
  results: []
---
# 🇲🇰 MK-LLM: The First Open Macedonian Language Model

## 🌍 About This Project
MK-LLM is Macedonia's first open-source Large Language Model (LLM), developed for the community, by the community. This project is led by AI Now - Association for Artificial Intelligence in Macedonia.

📌 **Website:** [www.ainow.mk](https://www.ainow.mk)  
📩 **Contact:** [contact@ainow.mk](mailto:contact@ainow.mk)  
🛠 **Model:** [MK-LLM-Mistral](https://huggingface.co/ainowmk/MK-LLM-Mistral)  
💻 **GitHub:** [MK-LLM](https://github.com/AI-now-mk/MK-LLM)

## 🆕 Latest Updates (14.10.2025)
- OpenAI-compatible endpoints: `/v1/chat/completions`, `/v1/completions`, `/v1/models` with JSON SSE streaming
- QLoRA training pipeline (4-bit) with LoRA adapters and gradient checkpointing
- Upgraded Macedonian data pipeline: cleaner extraction (trafilatura), gcld3 language filter, MinHash dedup
- Gradio demo UI and improved FastAPI server (env-based config, lazy model load, quantization toggles)
- Repository hygiene: LICENSE, model/dataset cards, Makefile, package inits, `.gitkeep` for data/models

## 📂 Repository Structure
```plaintext
MK-LLM/
├── data/
│   ├── wikipedia/
│   │   ├── download_wiki.py
│   │   └── parse_wiki.py
│   ├── cleaned/
│   ├── processed/
│   ├── raw/
│   ├── tokenized/
│   ├── eval/
│   │   └── mk_eval.jsonl
│   ├── process_all_data.py
│   └── clean_wikipedia.py
├── examples/
│   ├── client_python.py
│   ├── client_js.mjs
│   ├── data_loader.py
│   └── train_mistral_mk.py
├── inference/
│   ├── api.py
│   ├── gradio_app.py
│   └── chatbot.py
├── training/
│   ├── train_pipeline.py
│   └── fine_tune_mistral.py
├── scripts/
│   ├── preprocess_data.py
│   └── evaluate.py
├── configs/
│   ├── train_small.yaml
│   └── train_full.yaml
├── tests/
│   ├── test_api.py
│   ├── test_model.py
│   └── test_dataset.py
├── docs/
│   ├── EXTENDING.md
│   └── GITHUB_ISSUES.md
├── .github/
│   ├── workflows/ci.yml
│   ├── ISSUE_TEMPLATE/
│   │   ├── bug_report.yml
│   │   └── feature_request.yml
│   └── PULL_REQUEST_TEMPLATE.md
├── models/
├── notebooks/
│   └── evaluation.ipynb
├── Dockerfile
├── docker-compose.yml
├── Makefile
├── requirements.txt
├── constraints.txt
├── LICENSE
├── MODEL_CARD.md
├── DATASET_CARD.md
├── CODE_OF_CONDUCT.md
├── SECURITY.md
└── README.md
```

## Getting Started
1. Clone the repository:
```bash
git clone https://github.com/AI-now-mk/MK-LLM.git
cd MK-LLM
```

2) Install dependencies
```bash
pip install -r requirements.txt
```

Optional (recommended): use a virtual environment
```bash
python -m venv .venv
# Windows
.\.venv\Scripts\activate
# macOS/Linux
source .venv/bin/activate
python -m pip install --upgrade pip
pip install -r requirements.txt
```

3) Configure environment (optional)
Create a `.env` file in the project root:
```bash
HOST=0.0.0.0
PORT=8000
ALLOW_ORIGINS=*
MODEL_PATH=./models/mistral-finetuned-mk
MODEL_ID=mk-llm
TRUST_REMOTE_CODE=true
LOAD_IN_4BIT=false
LOAD_IN_8BIT=false
TORCH_DTYPE=float16
```

4) Quick run: inference API
```bash
# Ensure a model exists at ./models/mistral-finetuned-mk (train or download)
python -m inference.api
# In another terminal, call the API
curl -X POST http://localhost:8000/generate \
  -H "Content-Type: application/json" \
  -d '{"prompt":"Здраво Македонија!", "max_new_tokens":128}'
```

5) Optional: Gradio demo UI
```bash
python -m inference.gradio_app
# Open http://localhost:7860
```

6) Prepare data (Macedonian)
```bash
# Download and extract Macedonian Wikipedia
python -m data.wikipedia.download_wiki
# Parse Wikipedia dump into clean text
python -m data.wikipedia.parse_wiki
# Collect web + combine + clean + mk language filter
python -m data.process_all_data
```

7) Train (example)
```bash
python -m training.train_pipeline
# or
python -m training.fine_tune_mistral
```

### Docker
Build and run the API with Docker:
```bash
docker build -t mk-llm .
docker run --gpus all -p 8000:8000 -e MODEL_PATH=./models/mistral-finetuned-mk mk-llm
```

Or via docker-compose:
```bash
docker-compose up --build
```

### Continuous Integration
This repository includes a GitHub Actions CI to lint, type-check, and run tests on PRs/commits to `main`.

### Constraints (reproducible installs)
To install with pinned versions:
```bash
pip install -r requirements.txt -c constraints.txt
```

### OpenAI-compatible endpoints
This server exposes OpenAI-style routes so common clients (incl. gpt-oss-compatible tooling) can connect.

- Chat Completions (streaming supported):
```bash
curl http://localhost:8000/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "mk-llm",
    "messages": [
      {"role": "system", "content": "Ти си помошник кој зборува на македонски."},
      {"role": "user", "content": "Која е историјата на Охрид?"}
    ],
    "stream": false
  }'
```

- Text Completions:
```bash
curl http://localhost:8000/v1/completions \
  -H 'Content-Type: application/json' \
  -d '{
    "prompt": "Здраво Македонија!",
    "max_tokens": 128
  }'
```

Related project: OpenAI gpt-oss (open-weight models, client compatibility notes). See `https://github.com/openai/gpt-oss`.

### Use with gpt-oss-compatible clients
Point any OpenAI-compatible client to this server.

Example (Python OpenAI SDK environment):
```bash
export OPENAI_API_KEY=dummy
export OPENAI_BASE_URL=http://localhost:8000/v1
```

Example Chat Completions (curl):
```bash
curl "$OPENAI_BASE_URL/chat/completions" \
  -H 'Content-Type: application/json' \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "mk-llm",
    "messages": [
      {"role": "system", "content": "Ти си помошник кој зборува на македонски."},
      {"role": "user", "content": "Која е историјата на Охрид?"}
    ],
    "stream": true
  }'
```