|
|
--- |
|
|
language: |
|
|
- de |
|
|
- en |
|
|
license: apache-2.0 |
|
|
tags: |
|
|
- vocabulary |
|
|
- education |
|
|
- german |
|
|
- language-learning |
|
|
- gguf |
|
|
- 4bit |
|
|
- qwen2.5 |
|
|
- word-level |
|
|
base_model: Qwen/Qwen2.5-7B-Instruct |
|
|
model_type: qwen2 |
|
|
quantization: 4bit |
|
|
library_name: llama-cpp |
|
|
pipeline_tag: text-generation |
|
|
datasets: |
|
|
- custom |
|
|
--- |
|
|
|
|
|
# Qwen2.5-7B-Semantic-Triplets-DE-EN-GGUF |
|
|
|
|
|
## Model Description |
|
|
|
|
|
This is a **4-bit GGUF quantized version** of Qwen 2.5 7B, fine-tuned to generate **3 thematically related German vocabulary words** (with English translations) for any given **single German input word**. |
|
|
|
|
|
- **Base Model:** `Qwen/Qwen2.5-7B-Instruct` |
|
|
- **Quantization:** GGUF 4-bit (Q4_K_M) |
|
|
- **Format:** Compatible with llama.cpp, Ollama, LM Studio, and other GGUF-compatible inference engines |
|
|
- **Primary Use Case:** Educational apps, language learning tools, vocabulary expansion from a single word |
|
|
|
|
|
The model is optimized for **word-level prompts** (e.g., "Strand", "Hotel", "Rechnung") and delivers structured JSON outputs that can be easily processed by applications. |
|
|
|
|
|
--- |
|
|
|
|
|
## ⚠️ Important Usage Notes |
|
|
|
|
|
- **Input Format:** |
|
|
Primarily a **single German word** (or a very short phrase). |
|
|
|
|
|
- **Output Format:** |
|
|
Always **exactly 3** thematically related vocabulary items as a JSON array: |
|
|
|
|
|
```json |
|
|
[ |
|
|
{"index": 1, "de": "...", "en": "..."}, |
|
|
{"index": 2, "de": "...", "en": "..."}, |
|
|
{"index": 3, "de": "...", "en": "..."} |
|
|
] |
|
|
``` |
|
|
|
|
|
### Example |
|
|
|
|
|
**Input:** |
|
|
|
|
|
```text |
|
|
Strand |
|
|
``` |
|
|
|
|
|
**Expected Output:** |
|
|
|
|
|
```json |
|
|
[ |
|
|
{"index": 1, "de": "Strandkorb", "en": "wicker beach chair"}, |
|
|
{"index": 2, "de": "Badehandtuch", "en": "beach towel"}, |
|
|
{"index": 3, "de": "Sonnencreme", "en": "sunscreen"} |
|
|
] |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## Training Details |
|
|
|
|
|
- **Fine-tuning steps:** 50 |
|
|
- **Final training loss:** 0.2671 |
|
|
- **Final validation loss:** 0.2792 |
|
|
|
|
|
**Task:** |
|
|
For a given **German word**, the model learns to generate 3 thematically related vocabulary items with German and English forms, in a strict JSON schema. |
|
|
|
|
|
**Training Data Format:** |
|
|
|
|
|
- `system`: Describes the task (3 related words, de/en, JSON, indices 1-3) |
|
|
- `user`: A single German word (e.g., "Hotel", "Flugzeug", "Bibliothek") |
|
|
- `assistant`: The target JSON array with exactly 3 word objects |
|
|
|
|
|
The data covers common **everyday topics** (travel, hotel, restaurant, office, school, leisure, city, nature, etc.) and was prepared specifically for German language learners. |
|
|
|
|
|
Training was performed in a **Kaggle notebook environment** using Hugging Face Transformers + TRL (SFTTrainer). |
|
|
After fine-tuning, the model was converted to **GGUF 4-bit** for efficient inference. |
|
|
|
|
|
There is **only one GGUF model file** (no extra merged/adapter variants). |
|
|
|
|
|
--- |
|
|
|
|
|
## Usage |
|
|
|
|
|
### Option 1: llama.cpp (Recommended) |
|
|
|
|
|
**Why llama.cpp?** |
|
|
GGUF is the native format of **llama.cpp**, which now supports many architectures (including Qwen2.5). It provides very efficient CPU and GPU inference. |
|
|
|
|
|
#### Installation |
|
|
|
|
|
```bash |
|
|
git clone https://github.com/ggerganov/llama.cpp |
|
|
cd llama.cpp |
|
|
make |
|
|
``` |
|
|
|
|
|
#### Download Model (from Hugging Face) |
|
|
|
|
|
```bash |
|
|
huggingface-cli download BlackbirdTI/Qwen2.5-7B-Semantic-Triplets-DE-EN-GGUF \ |
|
|
--local-dir ./models/ |
|
|
``` |
|
|
|
|
|
#### Run Inference |
|
|
|
|
|
```bash |
|
|
./main -m ./models/qwen2.5-7b-instruct.Q4_K_M.gguf \ |
|
|
-p "Du bist ein linguistischer Assistent für eine Sprachenlern-App. Deine Aufgabe ist es, zu einem gegebenen deutschen Wort exakt 3 thematisch verwandte Hauptvokabeln zu finden und diese bilingual (Deutsch und Englisch) auszugeben. Jedes Wort MUSS einen eindeutigen sequenziellen Index haben, beginnend bei 1. Gib das Ergebnis ausschließlich als JSON-Array mit Objekten aus, die je ein 'index', 'de' und 'en' Feld enthalten.\n\nUser: Strand\nAssistant:" \ |
|
|
-n 150 \ |
|
|
--temp 0.7 \ |
|
|
--top-p 0.9 |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
### Option 2: Ollama |
|
|
|
|
|
#### Installation |
|
|
|
|
|
```bash |
|
|
curl -fsSL https://ollama.com/install.sh | sh |
|
|
``` |
|
|
|
|
|
#### Modelfile |
|
|
|
|
|
Create a file named `Modelfile` next to your `.gguf` file: |
|
|
|
|
|
```text |
|
|
FROM ./qwen2.5-7b-instruct.Q4_K_M.gguf |
|
|
|
|
|
SYSTEM """Du bist ein linguistischer Assistent für eine Sprachenlern-App. Deine Aufgabe ist es, zu einem gegebenen deutschen Wort exakt 3 thematisch verwandte Hauptvokabeln zu finden und diese bilingual (Deutsch und Englisch) auszugeben. Jedes Wort MUSS einen eindeutigen sequenziellen Index haben, beginnend bei 1. Gib das Ergebnis ausschließlich als JSON-Array mit Objekten aus, die je ein 'index', 'de' und 'en' Feld enthalten.""" |
|
|
|
|
|
PARAMETER temperature 0.7 |
|
|
PARAMETER top_p 0.9 |
|
|
PARAMETER stop "User:" |
|
|
PARAMETER stop "\n\n" |
|
|
``` |
|
|
|
|
|
#### Import and Run |
|
|
|
|
|
```bash |
|
|
ollama create qwen-triplets -f Modelfile |
|
|
ollama run qwen-triplets "Strand" |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
### Option 3: Python (llama-cpp-python) |
|
|
|
|
|
#### Installation |
|
|
|
|
|
```bash |
|
|
pip install llama-cpp-python |
|
|
``` |
|
|
|
|
|
#### Example Code |
|
|
|
|
|
```python |
|
|
from llama_cpp import Llama |
|
|
|
|
|
llm = Llama( |
|
|
model_path="./models/qwen2.5-7b-instruct.Q4_K_M.gguf", |
|
|
n_ctx=2048, |
|
|
n_threads=8, |
|
|
n_gpu_layers=35 # 0 for CPU-only; adjust for your GPU |
|
|
) |
|
|
|
|
|
system_prompt = """Du bist ein linguistischer Assistent für eine Sprachenlern-App. Deine Aufgabe ist es, zu einem gegebenen deutschen Wort exakt 3 thematisch verwandte Hauptvokabeln zu finden und diese bilingual (Deutsch und Englisch) auszugeben. Jedes Wort MUSS einen eindeutigen sequenziellen Index haben, beginnend bei 1. Gib das Ergebnis ausschließlich als JSON-Array mit Objekten aus, die je ein 'index', 'de' und 'en' Feld enthalten.""" |
|
|
|
|
|
user_input = "Strand" |
|
|
|
|
|
prompt = f"{system_prompt}\n\nUser: {user_input}\nAssistant:" |
|
|
|
|
|
output = llm( |
|
|
prompt, |
|
|
max_tokens=150, |
|
|
temperature=0.7, |
|
|
top_p=0.9, |
|
|
stop=["User:", "\n\n"], |
|
|
) |
|
|
|
|
|
print(output["choices"][0]["text"]) |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
### Option 4: LM Studio (GUI) |
|
|
|
|
|
1. Download **LM Studio** from https://lmstudio.ai |
|
|
2. Import the GGUF file via **Local Models → Import** |
|
|
3. Select the model in the chat tab |
|
|
4. Set the system prompt (same as above) |
|
|
5. Enter German words as user input |
|
|
|
|
|
--- |
|
|
|
|
|
## Performance (Indicative) |
|
|
|
|
|
| Hardware | Inference Speed (per word) | Memory Usage | |
|
|
|----------------|----------------------------|-------------| |
|
|
| CPU (8 cores) | ~2–4 s | ~4–5 GB RAM | |
|
|
| GPU (8 GB VRAM)| ~1–2 s | ~5–6 GB VRAM| |
|
|
| Apple M1/M2 | ~1–3 s | ~5–6 GB RAM | |
|
|
|
|
|
Actual performance depends on your hardware and llama.cpp build options. |
|
|
|
|
|
--- |
|
|
|
|
|
## GGUF Benefits |
|
|
|
|
|
- ✅ Single, self-contained model file |
|
|
- ✅ 4-bit quantization provides good quality/speed tradeoff |
|
|
- ✅ Runs on CPU-only machines |
|
|
- ✅ Supported by many frontends (CLI, Ollama, LM Studio, Web UIs) |
|
|
|
|
|
--- |
|
|
|
|
|
## Limitations |
|
|
|
|
|
- Optimized for **single German words**, not for long sentences or dialogues |
|
|
- Output is always **exactly 3** vocabulary pairs (not dynamic) |
|
|
- Not designed for general chat or complex reasoning |
|
|
- 4-bit quantization introduces minor quality loss compared to full precision |
|
|
|
|
|
--- |
|
|
|
|
|
## File Structure |
|
|
|
|
|
```text |
|
|
Qwen2.5-7B-Semantic-Triplets-DE-EN-GGUF/ |
|
|
├── qwen2.5-7b-instruct.Q4_K_M.gguf |
|
|
├── config.json |
|
|
└── README.md |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## License |
|
|
|
|
|
- **Base Model:** Qwen2.5-7B-Instruct – Apache 2.0 |
|
|
- **This fine-tuned GGUF variant:** Apache 2.0 |
|
|
|
|
|
Users are free to use, modify, and deploy this model (including commercial use) under the terms of the Apache 2.0 license. |
|
|
|
|
|
--- |
|
|
|
|
|
## Acknowledgments |
|
|
|
|
|
- **Base Model:** [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) |
|
|
- **GGUF Format & Inference:** [llama.cpp](https://github.com/ggerganov/llama.cpp) by @ggerganov |
|
|
- **Training:** Hugging Face Transformers + TRL |
|
|
|