How to use from the
Use from the
llama-cpp-python library
# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="PrincekrampahReal/qwen_finetune",
	filename="",
)
llm.create_chat_completion(
	messages = "\"Меня зовут Вольфганг и я живу в Берлине\""
)

Qwen3-8B Swahili to English Translation

A fine-tuned version of Qwen/Qwen3-8B for translating Swahili text into English. The model was trained with QLoRA (4-bit quantized base plus LoRA adapters) and the adapters were merged back into the base, so this repository contains a standalone 16-bit model that loads with a single from_pretrained call.

Model Details

  • Base model: Qwen/Qwen3-8B
  • Task: Swahili to English translation (sw to en)
  • Fine-tuning method: QLoRA (NF4 4-bit base, LoRA adapters, merged to fp16)
  • Languages: Swahili (source), English (target)
  • Author: Prince (PrincekrampahReal)

Available Formats

Repository Format Use case
PrincekrampahReal/Qwen3-8B-sw-en_fine-tuned Merged fp16 Python inference, vLLM serving
PrincekrampahReal/Qwen3-8B-sw-en-lora LoRA adapter Load on top of the base model
PrincekrampahReal/qwen_finetune GGUF (q4_k_m, q8_0, f16) Ollama, llama.cpp, local CPU

Intended Use

The model is built to translate Swahili sentences into English. It expects a system instruction stating the task and the Swahili text as the user turn. It performs best on the kind of text it was trained on (see Training Data and Limitations below).

How to Use

Transformers

On a memory-limited GPU, load in 4-bit. On a larger GPU, drop the quantization config for full fp16.

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

MODEL = "PrincekrampahReal/Qwen3-8B-sw-en_fine-tuned"

bnb = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
)

model = AutoModelForCausalLM.from_pretrained(MODEL, quantization_config=bnb, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(MODEL)

def translate(swahili_text, max_new_tokens=128):
    messages = [
        {"role": "system", "content": "Translate the following Swahili text into English."},
        {"role": "user",   "content": swahili_text},
    ]
    inputs = tokenizer.apply_chat_template(
        messages,
        add_generation_prompt=True,
        enable_thinking=False,
        tokenize=True,
        return_dict=True,
        return_tensors="pt",
    ).to(model.device)

    with torch.no_grad():
        out = model.generate(**inputs, max_new_tokens=max_new_tokens, do_sample=False)

    gen = out[0][inputs["input_ids"].shape[-1]:]
    return tokenizer.decode(gen, skip_special_tokens=True, clean_up_tokenization_spaces=False).strip()

print(translate("Mungu ni upendo."))

Ollama

ollama run hf.co/PrincekrampahReal/qwen_finetune:q4_k_m

For a configured setup, create a Modelfile:

FROM hf.co/PrincekrampahReal/qwen_finetune:q4_k_m
SYSTEM "Translate the following Swahili text into English."
PARAMETER temperature 0
ollama create sw-en -f Modelfile
ollama run sw-en "Mungu ni upendo."

A note on decoding: use greedy decoding (do_sample=False or temperature 0) for faithful translation, and keep enable_thinking=False so the model translates directly instead of emitting a reasoning block.

Training Data

The model was trained on a cleaned version of kariiiiiimu/english-to-swahili, a parallel Swahili and English corpus of biblical text. The cleaning pipeline applied:

  • Orientation normalization: the source dataset had inconsistent column orientation (some rows had Swahili in the English column and vice versa). Language detection was used to detect each side and flip reversed rows so that the English and Swahili columns are consistent.
  • Exact deduplication on the source and target pair.
  • Near-duplicate removal using MinHash and LSH on the source text.
  • Quality filters: removal of empty rows, untranslated rows where source equals target, and rows with an implausible source-to-target length ratio.

Training Procedure

The base model was loaded in 4-bit (NF4) and fine-tuned with LoRA adapters, then the adapters were merged into a 16-bit model for distribution.

Hyperparameter Value
LoRA rank (r) 16
LoRA alpha 32
LoRA dropout 0.05
Target modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Epochs 3
Effective batch size 8 (2 per device, 4 gradient accumulation steps)
Learning rate 2e-4
LR scheduler linear
Optimizer adamw_8bit
Weight decay 0.001
Loss assistant-only (trained on the English output, not the prompt)
Precision fp16 (4-bit base via bitsandbytes)
Hardware NVIDIA T4

Evaluation

The model was evaluated with BLEU and chrF (via sacrebleu) against a held-out set, comparing the fine-tuned model to the base model.

Metric Base Fine-tuned
BLEU 22.46 44.16
chrF 43.34 70.88

chrF is the more reliable metric for this language pair: Swahili is morphologically rich, so a correct translation can differ from the reference in inflection, which word-level BLEU penalizes harshly while character-level chrF gives partial credit.

Limitations and Bias

  • Domain: the training data is biblical parallel text. The model is strongest on that register and vocabulary and may translate general, conversational, or technical Swahili less accurately.
  • Coverage: the training set is relatively small, so rare words, idioms, and named entities outside the training domain may be mistranslated.
  • Direction: the model is trained specifically for Swahili to English. It is not intended for English to Swahili.
  • Reasoning mode: the base model supports a thinking mode. This fine-tune is designed for direct translation, so generate with enable_thinking=False.

License

Released under the Apache 2.0 license, inherited from the Qwen3-8B base model.

Citation

If you use this model, please credit the base model and this fine-tune:

@misc{qwen3-sw-en,
  title  = {Qwen3-8B Swahili to English Translation},
  author = {Prince},
  year   = {2026},
  howpublished = {\url{https://huggingface.co/PrincekrampahReal/Qwen3-8B-sw-en_fine-tuned}},
}
Downloads last month
92
GGUF
Model size
8B params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware

4-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for PrincekrampahReal/qwen_finetune

Finetuned
Qwen/Qwen3-8B
Adapter
(1475)
this model

Dataset used to train PrincekrampahReal/qwen_finetune