--- language: - de - en license: apache-2.0 tags: - vocabulary - education - german - language-learning - gguf - 4bit - qwen2.5 - word-level base_model: Qwen/Qwen2.5-7B-Instruct model_type: qwen2 quantization: 4bit library_name: llama-cpp pipeline_tag: text-generation datasets: - custom --- # Qwen2.5-7B-Semantic-Triplets-DE-EN-GGUF ## Model Description This is a **4-bit GGUF quantized version** of Qwen 2.5 7B, fine-tuned to generate **3 thematically related German vocabulary words** (with English translations) for any given **single German input word**. - **Base Model:** `Qwen/Qwen2.5-7B-Instruct` - **Quantization:** GGUF 4-bit (Q4_K_M) - **Format:** Compatible with llama.cpp, Ollama, LM Studio, and other GGUF-compatible inference engines - **Primary Use Case:** Educational apps, language learning tools, vocabulary expansion from a single word The model is optimized for **word-level prompts** (e.g., "Strand", "Hotel", "Rechnung") and delivers structured JSON outputs that can be easily processed by applications. --- ## ⚠️ Important Usage Notes - **Input Format:** Primarily a **single German word** (or a very short phrase). - **Output Format:** Always **exactly 3** thematically related vocabulary items as a JSON array: ```json [ {"index": 1, "de": "...", "en": "..."}, {"index": 2, "de": "...", "en": "..."}, {"index": 3, "de": "...", "en": "..."} ] ``` ### Example **Input:** ```text Strand ``` **Expected Output:** ```json [ {"index": 1, "de": "Strandkorb", "en": "wicker beach chair"}, {"index": 2, "de": "Badehandtuch", "en": "beach towel"}, {"index": 3, "de": "Sonnencreme", "en": "sunscreen"} ] ``` --- ## Training Details - **Fine-tuning steps:** 50 - **Final training loss:** 0.2671 - **Final validation loss:** 0.2792 **Task:** For a given **German word**, the model learns to generate 3 thematically related vocabulary items with German and English forms, in a strict JSON schema. **Training Data Format:** - `system`: Describes the task (3 related words, de/en, JSON, indices 1-3) - `user`: A single German word (e.g., "Hotel", "Flugzeug", "Bibliothek") - `assistant`: The target JSON array with exactly 3 word objects The data covers common **everyday topics** (travel, hotel, restaurant, office, school, leisure, city, nature, etc.) and was prepared specifically for German language learners. Training was performed in a **Kaggle notebook environment** using Hugging Face Transformers + TRL (SFTTrainer). After fine-tuning, the model was converted to **GGUF 4-bit** for efficient inference. There is **only one GGUF model file** (no extra merged/adapter variants). --- ## Usage ### Option 1: llama.cpp (Recommended) **Why llama.cpp?** GGUF is the native format of **llama.cpp**, which now supports many architectures (including Qwen2.5). It provides very efficient CPU and GPU inference. #### Installation ```bash git clone https://github.com/ggerganov/llama.cpp cd llama.cpp make ``` #### Download Model (from Hugging Face) ```bash huggingface-cli download BlackbirdTI/Qwen2.5-7B-Semantic-Triplets-DE-EN-GGUF \ --local-dir ./models/ ``` #### Run Inference ```bash ./main -m ./models/qwen2.5-7b-instruct.Q4_K_M.gguf \ -p "Du bist ein linguistischer Assistent für eine Sprachenlern-App. Deine Aufgabe ist es, zu einem gegebenen deutschen Wort exakt 3 thematisch verwandte Hauptvokabeln zu finden und diese bilingual (Deutsch und Englisch) auszugeben. Jedes Wort MUSS einen eindeutigen sequenziellen Index haben, beginnend bei 1. Gib das Ergebnis ausschließlich als JSON-Array mit Objekten aus, die je ein 'index', 'de' und 'en' Feld enthalten.\n\nUser: Strand\nAssistant:" \ -n 150 \ --temp 0.7 \ --top-p 0.9 ``` --- ### Option 2: Ollama #### Installation ```bash curl -fsSL https://ollama.com/install.sh | sh ``` #### Modelfile Create a file named `Modelfile` next to your `.gguf` file: ```text FROM ./qwen2.5-7b-instruct.Q4_K_M.gguf SYSTEM """Du bist ein linguistischer Assistent für eine Sprachenlern-App. Deine Aufgabe ist es, zu einem gegebenen deutschen Wort exakt 3 thematisch verwandte Hauptvokabeln zu finden und diese bilingual (Deutsch und Englisch) auszugeben. Jedes Wort MUSS einen eindeutigen sequenziellen Index haben, beginnend bei 1. Gib das Ergebnis ausschließlich als JSON-Array mit Objekten aus, die je ein 'index', 'de' und 'en' Feld enthalten.""" PARAMETER temperature 0.7 PARAMETER top_p 0.9 PARAMETER stop "User:" PARAMETER stop "\n\n" ``` #### Import and Run ```bash ollama create qwen-triplets -f Modelfile ollama run qwen-triplets "Strand" ``` --- ### Option 3: Python (llama-cpp-python) #### Installation ```bash pip install llama-cpp-python ``` #### Example Code ```python from llama_cpp import Llama llm = Llama( model_path="./models/qwen2.5-7b-instruct.Q4_K_M.gguf", n_ctx=2048, n_threads=8, n_gpu_layers=35 # 0 for CPU-only; adjust for your GPU ) system_prompt = """Du bist ein linguistischer Assistent für eine Sprachenlern-App. Deine Aufgabe ist es, zu einem gegebenen deutschen Wort exakt 3 thematisch verwandte Hauptvokabeln zu finden und diese bilingual (Deutsch und Englisch) auszugeben. Jedes Wort MUSS einen eindeutigen sequenziellen Index haben, beginnend bei 1. Gib das Ergebnis ausschließlich als JSON-Array mit Objekten aus, die je ein 'index', 'de' und 'en' Feld enthalten.""" user_input = "Strand" prompt = f"{system_prompt}\n\nUser: {user_input}\nAssistant:" output = llm( prompt, max_tokens=150, temperature=0.7, top_p=0.9, stop=["User:", "\n\n"], ) print(output["choices"][0]["text"]) ``` --- ### Option 4: LM Studio (GUI) 1. Download **LM Studio** from https://lmstudio.ai 2. Import the GGUF file via **Local Models → Import** 3. Select the model in the chat tab 4. Set the system prompt (same as above) 5. Enter German words as user input --- ## Performance (Indicative) | Hardware | Inference Speed (per word) | Memory Usage | |----------------|----------------------------|-------------| | CPU (8 cores) | ~2–4 s | ~4–5 GB RAM | | GPU (8 GB VRAM)| ~1–2 s | ~5–6 GB VRAM| | Apple M1/M2 | ~1–3 s | ~5–6 GB RAM | Actual performance depends on your hardware and llama.cpp build options. --- ## GGUF Benefits - ✅ Single, self-contained model file - ✅ 4-bit quantization provides good quality/speed tradeoff - ✅ Runs on CPU-only machines - ✅ Supported by many frontends (CLI, Ollama, LM Studio, Web UIs) --- ## Limitations - Optimized for **single German words**, not for long sentences or dialogues - Output is always **exactly 3** vocabulary pairs (not dynamic) - Not designed for general chat or complex reasoning - 4-bit quantization introduces minor quality loss compared to full precision --- ## File Structure ```text Qwen2.5-7B-Semantic-Triplets-DE-EN-GGUF/ ├── qwen2.5-7b-instruct.Q4_K_M.gguf ├── config.json └── README.md ``` --- ## License - **Base Model:** Qwen2.5-7B-Instruct – Apache 2.0 - **This fine-tuned GGUF variant:** Apache 2.0 Users are free to use, modify, and deploy this model (including commercial use) under the terms of the Apache 2.0 license. --- ## Acknowledgments - **Base Model:** [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) - **GGUF Format & Inference:** [llama.cpp](https://github.com/ggerganov/llama.cpp) by @ggerganov - **Training:** Hugging Face Transformers + TRL