Qwen2.5-7B-DE-EN-Vocab-Extractor-GGUF / README.md

BlackbirdTI

Update README.md

9ca4541 verified 2 days ago

preview code

raw

history blame contribute delete

8.8 kB

metadata

language:
  - de
  - en
license: apache-2.0
tags:
  - vocabulary
  - education
  - german
  - language-learning
  - gguf
  - 4bit
  - qwen2.5
  - sentence-context
base_model: Qwen/Qwen2.5-7B-Instruct
model_type: qwen2
quantization: 4bit
library_name: llama-cpp
pipeline_tag: text-generation
datasets:
  - custom

Qwen2.5-7B-DE-EN-Vocab-Extractor-GGUF

Model Description

This is a 4-bit GGUF quantized version of Qwen 2.5 7B, fine-tuned to extract and generate 3 thematically related German vocabulary words (with English translations) from German sentences or phrases.

Base Model: Qwen/Qwen2.5-7B-Instruct
Quantization: GGUF 4-bit (e.g. Q4_K_M)
Format: Compatible with llama.cpp, Ollama, LM Studio, and other GGUF-compatible inference engines
Primary Use Case: Educational apps, language learning tools, vocabulary expansion from context

This README is intended as a complete description you can paste directly into the Hugging Face model page.

⚠️ Important Usage Notes

Input Format: Use complete German sentences or phrases (minimum ~3 words recommended).
The model was fine-tuned on contextual inputs, not isolated single words.
Output Format: Always generates exactly 3 related vocabulary words in JSON format:

[
  {"index": 1, "de": "...", "en": "..."},
  {"index": 2, "de": "...", "en": "..."},
  {"index": 3, "de": "...", "en": "..."}
]

✅ Good Input Examples

"Ich gehe am Wochenende an den Strand."
"Die Kinder spielen auf dem Spielplatz."
"Wir kaufen Brot in der Bäckerei."
"Der Arzt verschreibt ein neues Medikament."

❌ Less Suitable Inputs

"Strand"
"Spielplatz"
"Arzt"

Single-word prompts may still work, but quality is better with full sentences.

Training Details

Fine-tuning steps: 50
Final training loss: 0.2386
Final validation loss: 0.2654
Task: For a given German sentence, predict 3 thematically related vocabulary items (German + English) in a strict JSON schema.
Dataset format: JSONL with chat-style messages (system, user, assistant).
Frameworks: Hugging Face Transformers + TRL (SFTTrainer).
Conversion: After fine-tuning, the model was converted to GGUF 4-bit for efficient inference.

There is only one GGUF model file (no separate merged / adapter variants).

Usage

Option 1: llama.cpp (Recommended for Performance)

Why llama.cpp?
GGUF is a model format defined by the llama.cpp project. Although the name comes from LLaMA, llama.cpp is model-agnostic and supports Qwen, Mistral, Gemma and many other architectures. It is currently the most optimized and widely used backend for GGUF models.

Installation

git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
make

Download Model (from Hugging Face)

huggingface-cli download YOUR-USERNAME/Qwen2.5-7B-DE-EN-Vocab-Extractor-GGUF \
  --local-dir ./models/

Run Inference

./main -m ./models/qwen2.5-7b-de-en-vocab-extractor-q4_k_m.gguf \
  -p "Du bist ein linguistischer Assistent für eine Sprachenlern-App. Deine Aufgabe ist es, zu einem gegebenen deutschen Satz exakt 3 thematisch verwandte Hauptvokabeln zu finden und diese bilingual (Deutsch und Englisch) auszugeben. Jedes Wort MUSS einen eindeutigen sequenziellen Index haben, beginnend bei 1. Gib das Ergebnis ausschließlich als JSON-Array mit Objekten aus, die je ein 'index', 'de' und 'en' Feld enthalten.\n\nUser: Ich gehe am Wochenende an den Strand.\nAssistant:" \
  -n 200 \
  --temp 0.7 \
  --top-p 0.9

Option 2: Ollama

Installation

curl -fsSL https://ollama.com/install.sh | sh

Modelfile

Create a file named Modelfile next to your qwen2.5-7b-de-en-vocab-extractor-q4_k_m.gguf:

FROM ./qwen2.5-7b-de-en-vocab-extractor-q4_k_m.gguf

SYSTEM """Du bist ein linguistischer Assistent für eine Sprachenlern-App. Deine Aufgabe ist es, zu einem gegebenen deutschen Satz exakt 3 thematisch verwandte Hauptvokabeln zu finden und diese bilingual (Deutsch und Englisch) auszugeben. Jedes Wort MUSS einen eindeutigen sequenziellen Index haben, beginnend bei 1. Gib das Ergebnis ausschließlich als JSON-Array mit Objekten aus, die je ein 'index', 'de' und 'en' Feld enthalten."""

PARAMETER temperature 0.5
PARAMETER top_p 0.9
PARAMETER stop "User:"
PARAMETER stop "\n\n"

Import and Run

ollama create qwen-vocab -f Modelfile
ollama run qwen-vocab "Ich gehe am Wochenende an den Strand."

Option 3: Python (llama-cpp-python)

Installation

pip install llama-cpp-python

Example Code

from llama_cpp import Llama

# Load model
llm = Llama(
    model_path="./models/qwen2.5-7b-de-en-vocab-extractor-q4_k_m.gguf",
    n_ctx=2048,
    n_threads=8,
    n_gpu_layers=35  # 0 for CPU-only, adjust for your GPU
)

system_prompt = """Du bist ein linguistischer Assistent für eine Sprachenlern-App. Deine Aufgabe ist es, zu einem gegebenen deutschen Satz exakt 3 thematisch verwandte Hauptvokabeln zu finden und diese bilingual (Deutsch und Englisch) auszugeben. Jedes Wort MUSS einen eindeutigen sequenziellen Index haben, beginnend bei 1. Gib das Ergebnis ausschließlich als JSON-Array mit Objekten aus, die je ein 'index', 'de' und 'en' Feld enthalten."""

user_input = "Die Kinder spielen auf dem Spielplatz."

prompt = f"{system_prompt}\n\nUser: {user_input}\nAssistant:"

output = llm(
    prompt,
    max_tokens=200,
    temperature=0.7,
    top_p=0.9,
    stop=["User:", "\n\n"],
)

print(output["choices"][0]["text"])

Example Output:

[
  {"index": 1, "de": "Schaukel", "en": "swing"},
  {"index": 2, "de": "Sandkasten", "en": "sandpit"},
  {"index": 3, "de": "Rutsche", "en": "slide"}
]

Option 4: LM Studio (GUI)

Download LM Studio from https://lmstudio.ai
Import the GGUF file via Local Models → Import.
Select the model in the chat tab.
Set the system prompt to:

Du bist ein linguistischer Assistent für eine Sprachenlern-App. Deine Aufgabe ist es, zu einem gegebenen deutschen Satz exakt 3 thematisch verwandte Hauptvokabeln zu finden und diese bilingual (Deutsch und Englisch) auszugeben. Jedes Wort MUSS einen eindeutigen sequenziellen Index haben, beginnend bei 1. Gib das Ergebnis ausschließlich als JSON-Array mit Objekten aus, die je ein 'index', 'de' und 'en' Feld enthalten.

Enter German sentences as user messages.

Performance (Indicative)

Hardware	Inference Speed	Memory Usage
CPU (8 cores)	~3–5 s per sentence	~4–5 GB RAM
GPU (8 GB VRAM)	~1–2 s per sentence	~5–6 GB VRAM
Apple M1/M2	~2–3 s per sentence	~5–6 GB RAM

Actual performance depends on your exact hardware and llama.cpp build options.

Why llama.cpp for Qwen Models?

GGUF is the native format of llama.cpp, but it supports many architectures (including Qwen2.5).
Very efficient CPU and GPU backends (including Metal for Apple Silicon).
Many tools (Ollama, LM Studio, text-generation-webui, etc.) build on llama.cpp or support its GGUF format.

The project name is historical; the engine itself is not limited to LLaMA models.

GGUF Benefits

✅ Single self-contained file (easy to distribute and deploy)
✅ Good quality/speed trade-off with 4-bit quantization
✅ Runs on CPU-only machines
✅ Works with many frontends (CLI, GUIs, web UIs)

Limitations

Optimized for German input sentences; not a general-purpose chat model.
Always returns exactly 3 vocabulary entries.
Not tuned for tasks like long-form generation, translation of full documents, or complex reasoning.
4-bit quantization introduces small quality trade-offs compared to full-precision weights.

File Structure (Suggested)

Qwen2.5-7B-DE-EN-Vocab-Extractor-GGUF/
├── qwen2.5-7b-de-en-vocab-extractor-q4_k_m.gguf
├── tokenizer.json
├── config.json
└── README.md

Example Use Cases

Language Learning App

User reads a German sentence.
Model extracts 3 key vocabulary items with English translations.
These can be turned into flashcards or exercises.

Reading Assistant

While reading articles or books in German, difficult sentences can be sent to the model.
The model returns 3 useful words per sentence to focus vocabulary study.

Vocabulary Trainer Backend

Use this model as an offline backend to generate structured vocabulary items from arbitrary German input.

License

Base Model: Qwen2.5-7B-Instruct
license: apache-2.0