---
language:
- de
- en
license: apache-2.0
tags:
- vocabulary
- education
- german
- language-learning
- gguf
- 4bit
- qwen2.5
- word-level
base_model: Qwen/Qwen2.5-7B-Instruct
model_type: qwen2
quantization: 4bit
library_name: llama-cpp
pipeline_tag: text-generation
datasets:
- custom
---

# Qwen2.5-7B-Semantic-Triplets-DE-EN-GGUF

## Model Description

This is a **4-bit GGUF quantized version** of Qwen 2.5 7B, fine-tuned to generate **3 thematically related German vocabulary words** (with English translations) for any given **single German input word**.

- **Base Model:** `Qwen/Qwen2.5-7B-Instruct`  
- **Quantization:** GGUF 4-bit (Q4_K_M)  
- **Format:** Compatible with llama.cpp, Ollama, LM Studio, and other GGUF-compatible inference engines  
- **Primary Use Case:** Educational apps, language learning tools, vocabulary expansion from a single word

The model is optimized for **word-level prompts** (e.g., "Strand", "Hotel", "Rechnung") and delivers structured JSON outputs that can be easily processed by applications.

---

## ⚠️ Important Usage Notes

- **Input Format:**  
  Primarily a **single German word** (or a very short phrase).

- **Output Format:**  
  Always **exactly 3** thematically related vocabulary items as a JSON array:

```json
[
  {"index": 1, "de": "...", "en": "..."},
  {"index": 2, "de": "...", "en": "..."},
  {"index": 3, "de": "...", "en": "..."}
]
```

### Example

**Input:**

```text
Strand
```

**Expected Output:**

```json
[
  {"index": 1, "de": "Strandkorb", "en": "wicker beach chair"},
  {"index": 2, "de": "Badehandtuch", "en": "beach towel"},
  {"index": 3, "de": "Sonnencreme", "en": "sunscreen"}
]
```

---

## Training Details

- **Fine-tuning steps:** 50  
- **Final training loss:** 0.2671  
- **Final validation loss:** 0.2792  

**Task:**  
For a given **German word**, the model learns to generate 3 thematically related vocabulary items with German and English forms, in a strict JSON schema.

**Training Data Format:**

- `system`: Describes the task (3 related words, de/en, JSON, indices 1-3)  
- `user`: A single German word (e.g., "Hotel", "Flugzeug", "Bibliothek")  
- `assistant`: The target JSON array with exactly 3 word objects

The data covers common **everyday topics** (travel, hotel, restaurant, office, school, leisure, city, nature, etc.) and was prepared specifically for German language learners.

Training was performed in a **Kaggle notebook environment** using Hugging Face Transformers + TRL (SFTTrainer).  
After fine-tuning, the model was converted to **GGUF 4-bit** for efficient inference.

There is **only one GGUF model file** (no extra merged/adapter variants).

---

## Usage

### Option 1: llama.cpp (Recommended)

**Why llama.cpp?**  
GGUF is the native format of **llama.cpp**, which now supports many architectures (including Qwen2.5). It provides very efficient CPU and GPU inference.

#### Installation

```bash
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
make
```

#### Download Model (from Hugging Face)

```bash
huggingface-cli download BlackbirdTI/Qwen2.5-7B-Semantic-Triplets-DE-EN-GGUF \
  --local-dir ./models/
```

#### Run Inference

```bash
./main -m ./models/qwen2.5-7b-instruct.Q4_K_M.gguf \
  -p "Du bist ein linguistischer Assistent für eine Sprachenlern-App. Deine Aufgabe ist es, zu einem gegebenen deutschen Wort exakt 3 thematisch verwandte Hauptvokabeln zu finden und diese bilingual (Deutsch und Englisch) auszugeben. Jedes Wort MUSS einen eindeutigen sequenziellen Index haben, beginnend bei 1. Gib das Ergebnis ausschließlich als JSON-Array mit Objekten aus, die je ein 'index', 'de' und 'en' Feld enthalten.\n\nUser: Strand\nAssistant:" \
  -n 150 \
  --temp 0.7 \
  --top-p 0.9
```

---

### Option 2: Ollama

#### Installation

```bash
curl -fsSL https://ollama.com/install.sh | sh
```

#### Modelfile

Create a file named `Modelfile` next to your `.gguf` file:

```text
FROM ./qwen2.5-7b-instruct.Q4_K_M.gguf

SYSTEM """Du bist ein linguistischer Assistent für eine Sprachenlern-App. Deine Aufgabe ist es, zu einem gegebenen deutschen Wort exakt 3 thematisch verwandte Hauptvokabeln zu finden und diese bilingual (Deutsch und Englisch) auszugeben. Jedes Wort MUSS einen eindeutigen sequenziellen Index haben, beginnend bei 1. Gib das Ergebnis ausschließlich als JSON-Array mit Objekten aus, die je ein 'index', 'de' und 'en' Feld enthalten."""

PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER stop "User:"
PARAMETER stop "\n\n"
```

#### Import and Run

```bash
ollama create qwen-triplets -f Modelfile
ollama run qwen-triplets "Strand"
```

---

### Option 3: Python (llama-cpp-python)

#### Installation

```bash
pip install llama-cpp-python
```

#### Example Code

```python
from llama_cpp import Llama

llm = Llama(
    model_path="./models/qwen2.5-7b-instruct.Q4_K_M.gguf",
    n_ctx=2048,
    n_threads=8,
    n_gpu_layers=35  # 0 for CPU-only; adjust for your GPU
)

system_prompt = """Du bist ein linguistischer Assistent für eine Sprachenlern-App. Deine Aufgabe ist es, zu einem gegebenen deutschen Wort exakt 3 thematisch verwandte Hauptvokabeln zu finden und diese bilingual (Deutsch und Englisch) auszugeben. Jedes Wort MUSS einen eindeutigen sequenziellen Index haben, beginnend bei 1. Gib das Ergebnis ausschließlich als JSON-Array mit Objekten aus, die je ein 'index', 'de' und 'en' Feld enthalten."""

user_input = "Strand"

prompt = f"{system_prompt}\n\nUser: {user_input}\nAssistant:"

output = llm(
    prompt,
    max_tokens=150,
    temperature=0.7,
    top_p=0.9,
    stop=["User:", "\n\n"],
)

print(output["choices"][0]["text"])
```

---

### Option 4: LM Studio (GUI)

1. Download **LM Studio** from https://lmstudio.ai  
2. Import the GGUF file via **Local Models → Import**  
3. Select the model in the chat tab  
4. Set the system prompt (same as above)  
5. Enter German words as user input

---

## Performance (Indicative)

| Hardware        | Inference Speed (per word) | Memory Usage |
|----------------|----------------------------|-------------|
| CPU (8 cores)  | ~2–4 s                     | ~4–5 GB RAM |
| GPU (8 GB VRAM)| ~1–2 s                     | ~5–6 GB VRAM|
| Apple M1/M2    | ~1–3 s                     | ~5–6 GB RAM |

Actual performance depends on your hardware and llama.cpp build options.

---

## GGUF Benefits

- ✅ Single, self-contained model file  
- ✅ 4-bit quantization provides good quality/speed tradeoff  
- ✅ Runs on CPU-only machines  
- ✅ Supported by many frontends (CLI, Ollama, LM Studio, Web UIs)

---

## Limitations

- Optimized for **single German words**, not for long sentences or dialogues  
- Output is always **exactly 3** vocabulary pairs (not dynamic)  
- Not designed for general chat or complex reasoning  
- 4-bit quantization introduces minor quality loss compared to full precision

---

## File Structure

```text
Qwen2.5-7B-Semantic-Triplets-DE-EN-GGUF/
├── qwen2.5-7b-instruct.Q4_K_M.gguf
├── config.json
└── README.md
```

---

## License

- **Base Model:** Qwen2.5-7B-Instruct – Apache 2.0  
- **This fine-tuned GGUF variant:** Apache 2.0  

Users are free to use, modify, and deploy this model (including commercial use) under the terms of the Apache 2.0 license.

---

## Acknowledgments

- **Base Model:** [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct)  
- **GGUF Format & Inference:** [llama.cpp](https://github.com/ggerganov/llama.cpp) by @ggerganov  
- **Training:** Hugging Face Transformers + TRL