BlackbirdTI's picture
Rename semantic-triplets-readme.md to README.md
246bd86 verified
---
language:
- de
- en
license: apache-2.0
tags:
- vocabulary
- education
- german
- language-learning
- gguf
- 4bit
- qwen2.5
- word-level
base_model: Qwen/Qwen2.5-7B-Instruct
model_type: qwen2
quantization: 4bit
library_name: llama-cpp
pipeline_tag: text-generation
datasets:
- custom
---
# Qwen2.5-7B-Semantic-Triplets-DE-EN-GGUF
## Model Description
This is a **4-bit GGUF quantized version** of Qwen 2.5 7B, fine-tuned to generate **3 thematically related German vocabulary words** (with English translations) for any given **single German input word**.
- **Base Model:** `Qwen/Qwen2.5-7B-Instruct`
- **Quantization:** GGUF 4-bit (Q4_K_M)
- **Format:** Compatible with llama.cpp, Ollama, LM Studio, and other GGUF-compatible inference engines
- **Primary Use Case:** Educational apps, language learning tools, vocabulary expansion from a single word
The model is optimized for **word-level prompts** (e.g., "Strand", "Hotel", "Rechnung") and delivers structured JSON outputs that can be easily processed by applications.
---
## ⚠️ Important Usage Notes
- **Input Format:**
Primarily a **single German word** (or a very short phrase).
- **Output Format:**
Always **exactly 3** thematically related vocabulary items as a JSON array:
```json
[
{"index": 1, "de": "...", "en": "..."},
{"index": 2, "de": "...", "en": "..."},
{"index": 3, "de": "...", "en": "..."}
]
```
### Example
**Input:**
```text
Strand
```
**Expected Output:**
```json
[
{"index": 1, "de": "Strandkorb", "en": "wicker beach chair"},
{"index": 2, "de": "Badehandtuch", "en": "beach towel"},
{"index": 3, "de": "Sonnencreme", "en": "sunscreen"}
]
```
---
## Training Details
- **Fine-tuning steps:** 50
- **Final training loss:** 0.2671
- **Final validation loss:** 0.2792
**Task:**
For a given **German word**, the model learns to generate 3 thematically related vocabulary items with German and English forms, in a strict JSON schema.
**Training Data Format:**
- `system`: Describes the task (3 related words, de/en, JSON, indices 1-3)
- `user`: A single German word (e.g., "Hotel", "Flugzeug", "Bibliothek")
- `assistant`: The target JSON array with exactly 3 word objects
The data covers common **everyday topics** (travel, hotel, restaurant, office, school, leisure, city, nature, etc.) and was prepared specifically for German language learners.
Training was performed in a **Kaggle notebook environment** using Hugging Face Transformers + TRL (SFTTrainer).
After fine-tuning, the model was converted to **GGUF 4-bit** for efficient inference.
There is **only one GGUF model file** (no extra merged/adapter variants).
---
## Usage
### Option 1: llama.cpp (Recommended)
**Why llama.cpp?**
GGUF is the native format of **llama.cpp**, which now supports many architectures (including Qwen2.5). It provides very efficient CPU and GPU inference.
#### Installation
```bash
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
make
```
#### Download Model (from Hugging Face)
```bash
huggingface-cli download BlackbirdTI/Qwen2.5-7B-Semantic-Triplets-DE-EN-GGUF \
--local-dir ./models/
```
#### Run Inference
```bash
./main -m ./models/qwen2.5-7b-instruct.Q4_K_M.gguf \
-p "Du bist ein linguistischer Assistent für eine Sprachenlern-App. Deine Aufgabe ist es, zu einem gegebenen deutschen Wort exakt 3 thematisch verwandte Hauptvokabeln zu finden und diese bilingual (Deutsch und Englisch) auszugeben. Jedes Wort MUSS einen eindeutigen sequenziellen Index haben, beginnend bei 1. Gib das Ergebnis ausschließlich als JSON-Array mit Objekten aus, die je ein 'index', 'de' und 'en' Feld enthalten.\n\nUser: Strand\nAssistant:" \
-n 150 \
--temp 0.7 \
--top-p 0.9
```
---
### Option 2: Ollama
#### Installation
```bash
curl -fsSL https://ollama.com/install.sh | sh
```
#### Modelfile
Create a file named `Modelfile` next to your `.gguf` file:
```text
FROM ./qwen2.5-7b-instruct.Q4_K_M.gguf
SYSTEM """Du bist ein linguistischer Assistent für eine Sprachenlern-App. Deine Aufgabe ist es, zu einem gegebenen deutschen Wort exakt 3 thematisch verwandte Hauptvokabeln zu finden und diese bilingual (Deutsch und Englisch) auszugeben. Jedes Wort MUSS einen eindeutigen sequenziellen Index haben, beginnend bei 1. Gib das Ergebnis ausschließlich als JSON-Array mit Objekten aus, die je ein 'index', 'de' und 'en' Feld enthalten."""
PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER stop "User:"
PARAMETER stop "\n\n"
```
#### Import and Run
```bash
ollama create qwen-triplets -f Modelfile
ollama run qwen-triplets "Strand"
```
---
### Option 3: Python (llama-cpp-python)
#### Installation
```bash
pip install llama-cpp-python
```
#### Example Code
```python
from llama_cpp import Llama
llm = Llama(
model_path="./models/qwen2.5-7b-instruct.Q4_K_M.gguf",
n_ctx=2048,
n_threads=8,
n_gpu_layers=35 # 0 for CPU-only; adjust for your GPU
)
system_prompt = """Du bist ein linguistischer Assistent für eine Sprachenlern-App. Deine Aufgabe ist es, zu einem gegebenen deutschen Wort exakt 3 thematisch verwandte Hauptvokabeln zu finden und diese bilingual (Deutsch und Englisch) auszugeben. Jedes Wort MUSS einen eindeutigen sequenziellen Index haben, beginnend bei 1. Gib das Ergebnis ausschließlich als JSON-Array mit Objekten aus, die je ein 'index', 'de' und 'en' Feld enthalten."""
user_input = "Strand"
prompt = f"{system_prompt}\n\nUser: {user_input}\nAssistant:"
output = llm(
prompt,
max_tokens=150,
temperature=0.7,
top_p=0.9,
stop=["User:", "\n\n"],
)
print(output["choices"][0]["text"])
```
---
### Option 4: LM Studio (GUI)
1. Download **LM Studio** from https://lmstudio.ai
2. Import the GGUF file via **Local Models → Import**
3. Select the model in the chat tab
4. Set the system prompt (same as above)
5. Enter German words as user input
---
## Performance (Indicative)
| Hardware | Inference Speed (per word) | Memory Usage |
|----------------|----------------------------|-------------|
| CPU (8 cores) | ~2–4 s | ~4–5 GB RAM |
| GPU (8 GB VRAM)| ~1–2 s | ~5–6 GB VRAM|
| Apple M1/M2 | ~1–3 s | ~5–6 GB RAM |
Actual performance depends on your hardware and llama.cpp build options.
---
## GGUF Benefits
- ✅ Single, self-contained model file
- ✅ 4-bit quantization provides good quality/speed tradeoff
- ✅ Runs on CPU-only machines
- ✅ Supported by many frontends (CLI, Ollama, LM Studio, Web UIs)
---
## Limitations
- Optimized for **single German words**, not for long sentences or dialogues
- Output is always **exactly 3** vocabulary pairs (not dynamic)
- Not designed for general chat or complex reasoning
- 4-bit quantization introduces minor quality loss compared to full precision
---
## File Structure
```text
Qwen2.5-7B-Semantic-Triplets-DE-EN-GGUF/
├── qwen2.5-7b-instruct.Q4_K_M.gguf
├── config.json
└── README.md
```
---
## License
- **Base Model:** Qwen2.5-7B-Instruct – Apache 2.0
- **This fine-tuned GGUF variant:** Apache 2.0
Users are free to use, modify, and deploy this model (including commercial use) under the terms of the Apache 2.0 license.
---
## Acknowledgments
- **Base Model:** [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct)
- **GGUF Format & Inference:** [llama.cpp](https://github.com/ggerganov/llama.cpp) by @ggerganov
- **Training:** Hugging Face Transformers + TRL