Configuration Parsing Warning:Config file config.json cannot be fetched (too big)

Configuration Parsing Warning:Config file tokenizer_config.json cannot be fetched (too big)

Kolkha-Mini

Kolkha-Mini is a lightweight language model fine-tuned to specialize in the Georgian language.
It is intended as an early-stage foundation model for Georgian-focused NLP work.

This model prioritizes coherence and language exposure over grammatical perfection and should be treated as a base to build upon, not a production-ready assistant.

Base Model

Qwen/Qwen3-1.7B

Fine-Tuning Overview

Method: QLoRA (4-bit)
Training type: Causal Language Modeling
Epochs: 2
Context length: 1024 tokens
Optimizer: paged AdamW (8-bit)
Scheduler: cosine
Precision: FP16 compute, NF4 quantized base during training

The final model provided here is a fully merged FP16 model (no LoRA adapters required).

Training Details (High-Level)

Base model loaded in 4-bit NF4 using bitsandbytes
LoRA applied to all major attention and MLP projection layers:
- q_proj, k_proj, v_proj, o_proj
- gate_proj, up_proj, down_proj
Dataset manually packed into fixed 1024-token blocks to maximize GPU utilization
Chat templates applied prior to tokenization
Gradient checkpointing enabled for stability

Training was intentionally kept simple and stable, favoring correctness over experimental tricks.

Current Capabilities & Limitations

What it does well

Produces coherent Georgian text
Understands Georgian sentence structure
Serves as a solid starting point for further fine-tuning

Known issues

Grammatically incorrect sentences are common
Occasional hallucinations
Sometimes invents non-existent words
Not instruction-tuned or safety-aligned

These issues are expected given dataset size and training duration.
Performance is expected to improve significantly with a larger and cleaner dataset.

Intended Use

Georgian language research
Further fine-tuning
Dataset experimentation
Low-resource language modeling

Not recommended for:

Production deployment
High-stakes or factual tasks
Safety-critical applications

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "GiorgiGE/Kolkha-Mini-Georgian",
    torch_dtype="auto"
)

tokenizer = AutoTokenizer.from_pretrained(
    "GiorgiGE/Kolkha-Mini-Georgian"
)

Downloads last month: 2

Safetensors

Model size

2B params

Tensor type

F16

Model tree for GiorgiGE/Kolkha-Mini-Georgian

Base model

Qwen/Qwen3-1.7B-Base

Finetuned

Qwen/Qwen3-1.7B

Finetuned

(883)

this model