README.md · BlackbirdTI/Qwen2.5-7B-Semantic-Triplets-DE-EN-GGUF at main

Qwen2.5-7B-Semantic-Triplets-DE-EN-GGUF / README.md

BlackbirdTI

Rename semantic-triplets-readme.md to README.md

246bd86 verified about 2 hours ago

preview code

raw

history blame contribute delete

7.53 kB

	---
	language:
	- de
	- en
	license: apache-2.0
	tags:
	- vocabulary
	- education
	- german
	- language-learning
	- gguf
	- 4bit
	- qwen2.5
	- word-level
	base_model: Qwen/Qwen2.5-7B-Instruct
	model_type: qwen2
	quantization: 4bit
	library_name: llama-cpp
	pipeline_tag: text-generation
	datasets:
	- custom
	---

	# Qwen2.5-7B-Semantic-Triplets-DE-EN-GGUF

	## Model Description

	This is a 4-bit GGUF quantized version of Qwen 2.5 7B, fine-tuned to generate 3 thematically related German vocabulary words (with English translations) for any given single German input word.

	- Base Model: `Qwen/Qwen2.5-7B-Instruct`
	- Quantization: GGUF 4-bit (Q4_K_M)
	- Format: Compatible with llama.cpp, Ollama, LM Studio, and other GGUF-compatible inference engines
	- Primary Use Case: Educational apps, language learning tools, vocabulary expansion from a single word

	The model is optimized for word-level prompts (e.g., "Strand", "Hotel", "Rechnung") and delivers structured JSON outputs that can be easily processed by applications.

	---

	## ⚠️ Important Usage Notes

	- Input Format:
	Primarily a single German word (or a very short phrase).

	- Output Format:
	Always exactly 3 thematically related vocabulary items as a JSON array:

	```json
	[
	{"index": 1, "de": "...", "en": "..."},
	{"index": 2, "de": "...", "en": "..."},
	{"index": 3, "de": "...", "en": "..."}
	]
	```

	### Example

	Input:

	```text
	Strand
	```

	Expected Output:

	```json
	[
	{"index": 1, "de": "Strandkorb", "en": "wicker beach chair"},
	{"index": 2, "de": "Badehandtuch", "en": "beach towel"},
	{"index": 3, "de": "Sonnencreme", "en": "sunscreen"}
	]
	```

	---

	## Training Details

	- Fine-tuning steps: 50
	- Final training loss: 0.2671
	- Final validation loss: 0.2792

	Task:
	For a given German word, the model learns to generate 3 thematically related vocabulary items with German and English forms, in a strict JSON schema.

	Training Data Format:

	- `system`: Describes the task (3 related words, de/en, JSON, indices 1-3)
	- `user`: A single German word (e.g., "Hotel", "Flugzeug", "Bibliothek")
	- `assistant`: The target JSON array with exactly 3 word objects

	The data covers common everyday topics (travel, hotel, restaurant, office, school, leisure, city, nature, etc.) and was prepared specifically for German language learners.

	Training was performed in a Kaggle notebook environment using Hugging Face Transformers + TRL (SFTTrainer).
	After fine-tuning, the model was converted to GGUF 4-bit for efficient inference.

	There is only one GGUF model file (no extra merged/adapter variants).

	---

	## Usage

	### Option 1: llama.cpp (Recommended)

	Why llama.cpp?
	GGUF is the native format of llama.cpp, which now supports many architectures (including Qwen2.5). It provides very efficient CPU and GPU inference.

	#### Installation

	```bash
	git clone https://github.com/ggerganov/llama.cpp
	cd llama.cpp
	make
	```

	#### Download Model (from Hugging Face)

	```bash
	huggingface-cli download BlackbirdTI/Qwen2.5-7B-Semantic-Triplets-DE-EN-GGUF \
	--local-dir ./models/
	```

	#### Run Inference

	```bash
	./main -m ./models/qwen2.5-7b-instruct.Q4_K_M.gguf \
	-p "Du bist ein linguistischer Assistent für eine Sprachenlern-App. Deine Aufgabe ist es, zu einem gegebenen deutschen Wort exakt 3 thematisch verwandte Hauptvokabeln zu finden und diese bilingual (Deutsch und Englisch) auszugeben. Jedes Wort MUSS einen eindeutigen sequenziellen Index haben, beginnend bei 1. Gib das Ergebnis ausschließlich als JSON-Array mit Objekten aus, die je ein 'index', 'de' und 'en' Feld enthalten.\n\nUser: Strand\nAssistant:" \
	-n 150 \
	--temp 0.7 \
	--top-p 0.9
	```

	---

	### Option 2: Ollama

	#### Installation

	```bash
	curl -fsSL https://ollama.com/install.sh \| sh
	```

	#### Modelfile

	Create a file named `Modelfile` next to your `.gguf` file:

	```text
	FROM ./qwen2.5-7b-instruct.Q4_K_M.gguf

	SYSTEM """Du bist ein linguistischer Assistent für eine Sprachenlern-App. Deine Aufgabe ist es, zu einem gegebenen deutschen Wort exakt 3 thematisch verwandte Hauptvokabeln zu finden und diese bilingual (Deutsch und Englisch) auszugeben. Jedes Wort MUSS einen eindeutigen sequenziellen Index haben, beginnend bei 1. Gib das Ergebnis ausschließlich als JSON-Array mit Objekten aus, die je ein 'index', 'de' und 'en' Feld enthalten."""

	PARAMETER temperature 0.7
	PARAMETER top_p 0.9
	PARAMETER stop "User:"
	PARAMETER stop "\n\n"
	```

	#### Import and Run

	```bash
	ollama create qwen-triplets -f Modelfile
	ollama run qwen-triplets "Strand"
	```

	---

	### Option 3: Python (llama-cpp-python)

	#### Installation

	```bash
	pip install llama-cpp-python
	```

	#### Example Code

	```python
	from llama_cpp import Llama

	llm = Llama(
	model_path="./models/qwen2.5-7b-instruct.Q4_K_M.gguf",
	n_ctx=2048,
	n_threads=8,
	n_gpu_layers=35 # 0 for CPU-only; adjust for your GPU
	)

	system_prompt = """Du bist ein linguistischer Assistent für eine Sprachenlern-App. Deine Aufgabe ist es, zu einem gegebenen deutschen Wort exakt 3 thematisch verwandte Hauptvokabeln zu finden und diese bilingual (Deutsch und Englisch) auszugeben. Jedes Wort MUSS einen eindeutigen sequenziellen Index haben, beginnend bei 1. Gib das Ergebnis ausschließlich als JSON-Array mit Objekten aus, die je ein 'index', 'de' und 'en' Feld enthalten."""

	user_input = "Strand"

	prompt = f"{system_prompt}\n\nUser: {user_input}\nAssistant:"

	output = llm(
	prompt,
	max_tokens=150,
	temperature=0.7,
	top_p=0.9,
	stop=["User:", "\n\n"],
	)

	print(output["choices"][0]["text"])
	```

	---

	### Option 4: LM Studio (GUI)

	1. Download LM Studio from https://lmstudio.ai
	2. Import the GGUF file via Local Models → Import
	3. Select the model in the chat tab
	4. Set the system prompt (same as above)
	5. Enter German words as user input

	---

	## Performance (Indicative)

	\| Hardware \| Inference Speed (per word) \| Memory Usage \|
	\|----------------\|----------------------------\|-------------\|
	\| CPU (8 cores) \| ~2–4 s \| ~4–5 GB RAM \|
	\| GPU (8 GB VRAM)\| ~1–2 s \| ~5–6 GB VRAM\|
	\| Apple M1/M2 \| ~1–3 s \| ~5–6 GB RAM \|

	Actual performance depends on your hardware and llama.cpp build options.

	---

	## GGUF Benefits

	- ✅ Single, self-contained model file
	- ✅ 4-bit quantization provides good quality/speed tradeoff
	- ✅ Runs on CPU-only machines
	- ✅ Supported by many frontends (CLI, Ollama, LM Studio, Web UIs)

	---

	## Limitations

	- Optimized for single German words, not for long sentences or dialogues
	- Output is always exactly 3 vocabulary pairs (not dynamic)
	- Not designed for general chat or complex reasoning
	- 4-bit quantization introduces minor quality loss compared to full precision

	---

	## File Structure

	```text
	Qwen2.5-7B-Semantic-Triplets-DE-EN-GGUF/
	├── qwen2.5-7b-instruct.Q4_K_M.gguf
	├── config.json
	└── README.md
	```

	---

	## License

	- Base Model: Qwen2.5-7B-Instruct – Apache 2.0
	- This fine-tuned GGUF variant: Apache 2.0

	Users are free to use, modify, and deploy this model (including commercial use) under the terms of the Apache 2.0 license.

	---

	## Acknowledgments

	- Base Model: [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct)
	- GGUF Format & Inference: [llama.cpp](https://github.com/ggerganov/llama.cpp) by @ggerganov
	- Training: Hugging Face Transformers + TRL