Qwen2.5-7B-Semantic-Triplets-DE-EN-GGUF
Model Description
This is a 4-bit GGUF quantized version of Qwen 2.5 7B, fine-tuned to generate 3 thematically related German vocabulary words (with English translations) for any given single German input word.
- Base Model:
Qwen/Qwen2.5-7B-Instruct - Quantization: GGUF 4-bit (Q4_K_M)
- Format: Compatible with llama.cpp, Ollama, LM Studio, and other GGUF-compatible inference engines
- Primary Use Case: Educational apps, language learning tools, vocabulary expansion from a single word
The model is optimized for word-level prompts (e.g., "Strand", "Hotel", "Rechnung") and delivers structured JSON outputs that can be easily processed by applications.
โ ๏ธ Important Usage Notes
Input Format:
Primarily a single German word (or a very short phrase).Output Format:
Always exactly 3 thematically related vocabulary items as a JSON array:
[
{"index": 1, "de": "...", "en": "..."},
{"index": 2, "de": "...", "en": "..."},
{"index": 3, "de": "...", "en": "..."}
]
Example
Input:
Strand
Expected Output:
[
{"index": 1, "de": "Strandkorb", "en": "wicker beach chair"},
{"index": 2, "de": "Badehandtuch", "en": "beach towel"},
{"index": 3, "de": "Sonnencreme", "en": "sunscreen"}
]
Training Details
- Fine-tuning steps: 50
- Final training loss: 0.2671
- Final validation loss: 0.2792
Task:
For a given German word, the model learns to generate 3 thematically related vocabulary items with German and English forms, in a strict JSON schema.
Training Data Format:
system: Describes the task (3 related words, de/en, JSON, indices 1-3)user: A single German word (e.g., "Hotel", "Flugzeug", "Bibliothek")assistant: The target JSON array with exactly 3 word objects
The data covers common everyday topics (travel, hotel, restaurant, office, school, leisure, city, nature, etc.) and was prepared specifically for German language learners.
Training was performed in a Kaggle notebook environment using Hugging Face Transformers + TRL (SFTTrainer).
After fine-tuning, the model was converted to GGUF 4-bit for efficient inference.
There is only one GGUF model file (no extra merged/adapter variants).
Usage
Option 1: llama.cpp (Recommended)
Why llama.cpp?
GGUF is the native format of llama.cpp, which now supports many architectures (including Qwen2.5). It provides very efficient CPU and GPU inference.
Installation
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
make
Download Model (from Hugging Face)
huggingface-cli download BlackbirdTI/Qwen2.5-7B-Semantic-Triplets-DE-EN-GGUF \
--local-dir ./models/
Run Inference
./main -m ./models/qwen2.5-7b-instruct.Q4_K_M.gguf \
-p "Du bist ein linguistischer Assistent fรผr eine Sprachenlern-App. Deine Aufgabe ist es, zu einem gegebenen deutschen Wort exakt 3 thematisch verwandte Hauptvokabeln zu finden und diese bilingual (Deutsch und Englisch) auszugeben. Jedes Wort MUSS einen eindeutigen sequenziellen Index haben, beginnend bei 1. Gib das Ergebnis ausschlieรlich als JSON-Array mit Objekten aus, die je ein 'index', 'de' und 'en' Feld enthalten.\n\nUser: Strand\nAssistant:" \
-n 150 \
--temp 0.7 \
--top-p 0.9
Option 2: Ollama
Installation
curl -fsSL https://ollama.com/install.sh | sh
Modelfile
Create a file named Modelfile next to your .gguf file:
FROM ./qwen2.5-7b-instruct.Q4_K_M.gguf
SYSTEM """Du bist ein linguistischer Assistent fรผr eine Sprachenlern-App. Deine Aufgabe ist es, zu einem gegebenen deutschen Wort exakt 3 thematisch verwandte Hauptvokabeln zu finden und diese bilingual (Deutsch und Englisch) auszugeben. Jedes Wort MUSS einen eindeutigen sequenziellen Index haben, beginnend bei 1. Gib das Ergebnis ausschlieรlich als JSON-Array mit Objekten aus, die je ein 'index', 'de' und 'en' Feld enthalten."""
PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER stop "User:"
PARAMETER stop "\n\n"
Import and Run
ollama create qwen-triplets -f Modelfile
ollama run qwen-triplets "Strand"
Option 3: Python (llama-cpp-python)
Installation
pip install llama-cpp-python
Example Code
from llama_cpp import Llama
llm = Llama(
model_path="./models/qwen2.5-7b-instruct.Q4_K_M.gguf",
n_ctx=2048,
n_threads=8,
n_gpu_layers=35 # 0 for CPU-only; adjust for your GPU
)
system_prompt = """Du bist ein linguistischer Assistent fรผr eine Sprachenlern-App. Deine Aufgabe ist es, zu einem gegebenen deutschen Wort exakt 3 thematisch verwandte Hauptvokabeln zu finden und diese bilingual (Deutsch und Englisch) auszugeben. Jedes Wort MUSS einen eindeutigen sequenziellen Index haben, beginnend bei 1. Gib das Ergebnis ausschlieรlich als JSON-Array mit Objekten aus, die je ein 'index', 'de' und 'en' Feld enthalten."""
user_input = "Strand"
prompt = f"{system_prompt}\n\nUser: {user_input}\nAssistant:"
output = llm(
prompt,
max_tokens=150,
temperature=0.7,
top_p=0.9,
stop=["User:", "\n\n"],
)
print(output["choices"][0]["text"])
Option 4: LM Studio (GUI)
- Download LM Studio from https://lmstudio.ai
- Import the GGUF file via Local Models โ Import
- Select the model in the chat tab
- Set the system prompt (same as above)
- Enter German words as user input
Performance (Indicative)
| Hardware | Inference Speed (per word) | Memory Usage |
|---|---|---|
| CPU (8 cores) | ~2โ4 s | ~4โ5 GB RAM |
| GPU (8 GB VRAM) | ~1โ2 s | ~5โ6 GB VRAM |
| Apple M1/M2 | ~1โ3 s | ~5โ6 GB RAM |
Actual performance depends on your hardware and llama.cpp build options.
GGUF Benefits
- โ Single, self-contained model file
- โ 4-bit quantization provides good quality/speed tradeoff
- โ Runs on CPU-only machines
- โ Supported by many frontends (CLI, Ollama, LM Studio, Web UIs)
Limitations
- Optimized for single German words, not for long sentences or dialogues
- Output is always exactly 3 vocabulary pairs (not dynamic)
- Not designed for general chat or complex reasoning
- 4-bit quantization introduces minor quality loss compared to full precision
File Structure
Qwen2.5-7B-Semantic-Triplets-DE-EN-GGUF/
โโโ qwen2.5-7b-instruct.Q4_K_M.gguf
โโโ config.json
โโโ README.md
License
- Base Model: Qwen2.5-7B-Instruct โ Apache 2.0
- This fine-tuned GGUF variant: Apache 2.0
Users are free to use, modify, and deploy this model (including commercial use) under the terms of the Apache 2.0 license.
Acknowledgments
- Base Model: Qwen/Qwen2.5-7B-Instruct
- GGUF Format & Inference: llama.cpp by @ggerganov
- Training: Hugging Face Transformers + TRL
- Downloads last month
- 82
4-bit