Instructions to use ainowmk/MK-LLM-Mistral with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use ainowmk/MK-LLM-Mistral with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="ainowmk/MK-LLM-Mistral")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("ainowmk/MK-LLM-Mistral")
model = AutoModelForCausalLM.from_pretrained("ainowmk/MK-LLM-Mistral")

PEFT
How to use ainowmk/MK-LLM-Mistral with PEFT:
```
Task type is invalid.
```
Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use ainowmk/MK-LLM-Mistral with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "ainowmk/MK-LLM-Mistral"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ainowmk/MK-LLM-Mistral",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/ainowmk/MK-LLM-Mistral

SGLang

How to use ainowmk/MK-LLM-Mistral with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "ainowmk/MK-LLM-Mistral" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ainowmk/MK-LLM-Mistral",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "ainowmk/MK-LLM-Mistral" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ainowmk/MK-LLM-Mistral",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use ainowmk/MK-LLM-Mistral with Docker Model Runner:
```
docker model run hf.co/ainowmk/MK-LLM-Mistral
```

🇲🇰 MK-LLM: The First Open Macedonian Language Model

🌍 About This Project

MK-LLM is Macedonia's first open-source Large Language Model (LLM), developed for the community, by the community. This project is led by AI Now - Association for Artificial Intelligence in Macedonia.

📌 Website: www.ainow.mk
📩 Contact: contact@ainow.mk
🛠 Model: MK-LLM-Mistral
💻 GitHub: MK-LLM

🆕 Latest Updates (14.10.2025)

OpenAI-compatible endpoints: /v1/chat/completions, /v1/completions, /v1/models with JSON SSE streaming
QLoRA training pipeline (4-bit) with LoRA adapters and gradient checkpointing
Upgraded Macedonian data pipeline: cleaner extraction (trafilatura), gcld3 language filter, MinHash dedup
Gradio demo UI and improved FastAPI server (env-based config, lazy model load, quantization toggles)
Repository hygiene: LICENSE, model/dataset cards, Makefile, package inits, .gitkeep for data/models

📂 Repository Structure

MK-LLM/
├── data/
│   ├── wikipedia/
│   │   ├── download_wiki.py
│   │   └── parse_wiki.py
│   ├── cleaned/
│   ├── processed/
│   ├── raw/
│   ├── tokenized/
│   ├── eval/
│   │   └── mk_eval.jsonl
│   ├── process_all_data.py
│   └── clean_wikipedia.py
├── examples/
│   ├── client_python.py
│   ├── client_js.mjs
│   ├── data_loader.py
│   └── train_mistral_mk.py
├── inference/
│   ├── api.py
│   ├── gradio_app.py
│   └── chatbot.py
├── training/
│   ├── train_pipeline.py
│   └── fine_tune_mistral.py
├── scripts/
│   ├── preprocess_data.py
│   └── evaluate.py
├── configs/
│   ├── train_small.yaml
│   └── train_full.yaml
├── tests/
│   ├── test_api.py
│   ├── test_model.py
│   └── test_dataset.py
├── docs/
│   ├── EXTENDING.md
│   └── GITHUB_ISSUES.md
├── .github/
│   ├── workflows/ci.yml
│   ├── ISSUE_TEMPLATE/
│   │   ├── bug_report.yml
│   │   └── feature_request.yml
│   └── PULL_REQUEST_TEMPLATE.md
├── models/
├── notebooks/
│   └── evaluation.ipynb
├── Dockerfile
├── docker-compose.yml
├── Makefile
├── requirements.txt
├── constraints.txt
├── LICENSE
├── MODEL_CARD.md
├── DATASET_CARD.md
├── CODE_OF_CONDUCT.md
├── SECURITY.md
└── README.md

Getting Started

Clone the repository:

git clone https://github.com/AI-now-mk/MK-LLM.git
cd MK-LLM

Install dependencies

pip install -r requirements.txt

Optional (recommended): use a virtual environment

python -m venv .venv
# Windows
.\.venv\Scripts\activate
# macOS/Linux
source .venv/bin/activate
python -m pip install --upgrade pip
pip install -r requirements.txt

Configure environment (optional) Create a .env file in the project root:

HOST=0.0.0.0
PORT=8000
ALLOW_ORIGINS=*
MODEL_PATH=./models/mistral-finetuned-mk
MODEL_ID=mk-llm
TRUST_REMOTE_CODE=true
LOAD_IN_4BIT=false
LOAD_IN_8BIT=false
TORCH_DTYPE=float16

Quick run: inference API

# Ensure a model exists at ./models/mistral-finetuned-mk (train or download)
python -m inference.api
# In another terminal, call the API
curl -X POST http://localhost:8000/generate \
  -H "Content-Type: application/json" \
  -d '{"prompt":"Здраво Македонија!", "max_new_tokens":128}'

Optional: Gradio demo UI

python -m inference.gradio_app
# Open http://localhost:7860

Prepare data (Macedonian)

# Download and extract Macedonian Wikipedia
python -m data.wikipedia.download_wiki
# Parse Wikipedia dump into clean text
python -m data.wikipedia.parse_wiki
# Collect web + combine + clean + mk language filter
python -m data.process_all_data

Train (example)

python -m training.train_pipeline
# or
python -m training.fine_tune_mistral

Docker

Build and run the API with Docker:

docker build -t mk-llm .
docker run --gpus all -p 8000:8000 -e MODEL_PATH=./models/mistral-finetuned-mk mk-llm

Or via docker-compose:

docker-compose up --build

Continuous Integration

This repository includes a GitHub Actions CI to lint, type-check, and run tests on PRs/commits to main.

Constraints (reproducible installs)

To install with pinned versions:

pip install -r requirements.txt -c constraints.txt

OpenAI-compatible endpoints

This server exposes OpenAI-style routes so common clients (incl. gpt-oss-compatible tooling) can connect.

Chat Completions (streaming supported):

curl http://localhost:8000/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "mk-llm",
    "messages": [
      {"role": "system", "content": "Ти си помошник кој зборува на македонски."},
      {"role": "user", "content": "Која е историјата на Охрид?"}
    ],
    "stream": false
  }'

Text Completions:

curl http://localhost:8000/v1/completions \
  -H 'Content-Type: application/json' \
  -d '{
    "prompt": "Здраво Македонија!",
    "max_tokens": 128
  }'

Related project: OpenAI gpt-oss (open-weight models, client compatibility notes). See https://github.com/openai/gpt-oss.

Use with gpt-oss-compatible clients

Point any OpenAI-compatible client to this server.

Example (Python OpenAI SDK environment):

export OPENAI_API_KEY=dummy
export OPENAI_BASE_URL=http://localhost:8000/v1

Example Chat Completions (curl):

curl "$OPENAI_BASE_URL/chat/completions" \
  -H 'Content-Type: application/json' \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "mk-llm",
    "messages": [
      {"role": "system", "content": "Ти си помошник кој зборува на македонски."},
      {"role": "user", "content": "Која е историјата на Охрид?"}
    ],
    "stream": true
  }'

Downloads last month: 6

Safetensors

Model size

0.3B params

Tensor type

F16

Model tree for ainowmk/MK-LLM-Mistral

Base model

mistralai/Mistral-7B-v0.1

Finetuned

(937)

this model

ainowmk
/

MK-LLM-Mistral