Kikuyu-LoRA-Llama-3.2-3B-Instruct

A fine-tuned Llama 3.2 3B model for Kikuyu language generation using LoRA (Low-Rank Adaptation). This model has been trained on 120,000 Kikuyu monolingual sentences to understand and generate text in the Kikuyu language.

Model Details

Base model: meta-llama/Llama-3.2-3B-Instruct
Fine-tuning method: LoRA (Low-Rank Adaptation)
Language: Kikuyu (Gĩkũyũ)
Dataset: 120k Kikuyu monolingual sentences
Training epochs: 10
Custom tokenizer: Optimized BPE tokenizer for Kikuyu text

Features

Specialized Kikuyu language understanding
Memory-efficient LoRA architecture
Custom tokenizer handling Kikuyu diacritics (ũ, ĩ, etc.)
Optimized for Kikuyu text generation tasks

Installation

pip install torch transformers peft accelerate

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch

# Load Kikuyu tokenizer
tokenizer = AutoTokenizer.from_pretrained("thirtyninetythree/kikuyu-bpe-tokenizer")
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

# Load base model and resize embeddings
base_model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-3.2-3B-Instruct",
    torch_dtype=torch.float16,
    device_map="auto"
)
base_model.resize_token_embeddings(len(tokenizer))

# Load LoRA adapter
model = PeftModel.from_pretrained(
    base_model, 
    "thirtyninetythree/kikuyu-lora-llama-3.2-3b-instruct"
)

# Generation function
def generate_kikuyu(prompt, max_length=100):
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    inputs.pop('token_type_ids', None)  # Remove unused token types
    
    outputs = model.generate(
        **inputs,
        max_length=max_length,
        temperature=0.8,
        do_sample=True,
        top_p=0.9,
        repetition_penalty=1.2,
        no_repeat_ngram_size=3,
        pad_token_id=tokenizer.eos_token_id
    )
    
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

# Example usage
examples = [
    "Githeri ni",
    "Mũndũ ũrĩa",
    "Ngai ni",
    "We mwega"
]

for prompt in examples:
    result = generate_kikuyu(prompt)
    print(f"Input: {prompt}")
    print(f"Output: {result}")
    print("-" * 50)

Generation Parameters

You can adjust generation parameters for different outputs:

# More creative (higher temperature)
result = generate_kikuyu("Githeri ni", temperature=1.0, top_p=0.95)

# More focused (lower temperature)
result = generate_kikuyu("Githeri ni", temperature=0.5, top_p=0.8)

# Longer output
result = generate_kikuyu("Githeri ni", max_length=200)

Requirements

Python 3.8+
PyTorch 1.12+
Transformers 4.21+
PEFT library
CUDA-capable GPU (recommended)

Hardware Requirements

Minimum: 8GB GPU memory
Recommended: 16GB+ GPU memory
CPU: 4+ cores
RAM: 16GB+

Limitations

Trained primarily on monolingual Kikuyu text
Performance may vary on mixed-language inputs
Best results with culturally relevant Kikuyu contexts
May require prompt engineering for optimal outputs

Model Architecture

LoRA rank: 16
LoRA alpha: 32
Target modules: All attention and MLP layers
Dropout: 0.1
Vocabulary size: ~32,000 tokens (Kikuyu-optimized)

Citation

If you use this model, please cite:

@misc{kikuyu-lora-llama-3.2-3b,
  title={Kikuyu-LoRA-Llama-3.2-3B-Instruct},
  author={thirtyninetythree},
  year={2025},
  publisher={Hugging Face},
  url={https://huggingface.co/thirtyninetythree/kikuyu-lora-llama-3.2-3b-instruct}
}

License

This model inherits the license from the base Llama 3.2 model. Please refer to Meta's Llama license for usage terms.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support