Kakugo 3B Assamese

[Paper] [Code] [Dataset]

Kakugo

A data distilled model trained specifically for Assamese.

This is Kakugo 3B Assamese, a small language model (SLM) fine-tuned to interact with the user in Assamese.

For Kakugo in other languages, check out the model and dataset collections.

How to use

To use this model, you can use your preferred LLM inference package.

This model should work with any package that supports the original base model ibm-granite/granite-4.0-micro.

We provide examples for how to run this with Huggingface or vLLM:

Huggingface (Recommended for beginners)

First, make sure transformers is installed on your machine.

pip install transformers

Then run the following Python code to generate a response from the LLM.

from transformers import pipeline

generator = pipeline(model="ptrdvn/kakugo-3B-asm", task="text-generation")

user_input = input("Please enter your input to the model in Assamese:")

do_reasoning = False

open_thinking_tag = "<think>"
close_thinking_tag = "</think>"

if do_reasoning:
    sys_msg = f"Before you respond, first think about your response and enclose your thinking process in {open_thinking_tag} and {close_thinking_tag} delimiters."
else:
    sys_msg = "Be concise in your responses."

message = [
        {"role": "system", "content": sys_msg},
        {"role": "user", "content": user_input}
    ]

output = generator(
    message,
    do_sample=False,
    repetition_penalty=1.05,
)

model_response = output[0]["generated_text"][-1]["content"]

if do_reasoning:
    model_response = model_response.split(close_thinking_tag)[-1]

print(model_response)

N.B. - We recommend using a repetition_penalty of 1.05 as sometimes the model can stuck in a loop of generating repetitive text when generating low-resource languages.

You can set do_reasoning to be either True or False to turn "thinking mode" on or off, respectively. If the model is used in thinking mode, then it will take longer to generate a response, but may lead to a better generated response. This mode is still experimental, so try both using and not using it for your use-case.

vLLM (Recommended for performance)

First, make sure vllm is installed on your machine.

pip install vllm

Then run the following Python code to generate a response from the LLM.

from vllm import LLM, SamplingParams
llm = LLM(model="ptrdvn/kakugo-3B-asm")

user_input = input("Please enter your input to the model in Assamese:")

do_reasoning = True

open_thinking_tag = "<think>"
close_thinking_tag = "</think>"

if do_reasoning:
    sys_msg = f"Before you respond, first think about your response and enclose your thinking process in {open_thinking_tag} and {close_thinking_tag} delimiters."
else:
    sys_msg = "Be concise in your responses."

sampling_params = SamplingParams(temperature=0, repetition_penalty=1.05, max_tokens=2048)

messages = [[
        {"role": "system", "content": sys_msg},
        {"role": "user", "content": user_input}
    ]]

output = llm.chat(messages, sampling_params)

model_response = output[0].outputs[0].text

if do_reasoning:
    model_response = model_response.split(close_thinking_tag)[-1]

print(model_response)

N.B. - When using vllm for inference of multiple inputs, we recommend inputting them all at once. I.e., add more items to the outer list of the messages variable in the above script. More on vLLM optimization.

We recommend using a repetition_penalty of 1.05 as sometimes the model can stuck in a loop of generating repetitive text when generating low-resource languages.

Training data

The training data for this model can be found at ptrdvn/kakugo-asm.

This data was created by prompting openai/gpt-oss-120b to generate prompts and responses in Assamese. We also translate a set of prompts and responses from the BAAI/Infinity-Instruct dataset.

More details about exactly how we created our data can be found in our paper.

Training

Full details of how this model was created (and how you can train a model in your own chosen language) can be found on our Github repo.

To make this model, we fine-tuned ibm-granite/granite-4.0-micro for 1 epoch on ptrdvn/kakugo-asm using Llama Factory.

Full Llama Factory training hyperparameters

### model
model_name_or_path: ibm-granite/granite-4.0-micro
trust_remote_code: true

### method
stage: sft
do_train: true
finetuning_type: full
deepspeed: examples/deepspeed/ds_z3_config.json  # choices: [ds_z0_config.json, ds_z2_config.json, ds_z3_config.json]

### dataset
dataset_dir: /workspace/train
dataset: ptrdvn/kakugo-asm
template: granite4
cutoff_len: 8000
overwrite_cache: true
preprocessing_num_workers: 16
dataloader_num_workers: 4
packing: true

### Reporting
report_to: wandb
run_name: ptrdvn/kakugo-asm
logging_steps: 1

### output
output_dir: ptrdvn/kakugo-asm
save_strategy: "no"
save_steps: 99999999
plot_loss: true
overwrite_output_dir: true
save_only_model: true

### train
per_device_train_batch_size: 1
gradient_accumulation_steps: 1
learning_rate: 1.0e-5
num_train_epochs: 1.0
lr_scheduler_type: cosine
warmup_ratio: 0.05
bf16: true
ddp_timeout: 180000000
resume_from_checkpoint: null

## eval
val_size: 0.02
per_device_eval_batch_size: 1
eval_strategy: steps
eval_steps: 0.2

Credits

This model was trained by @ptrdvn

If you use this model, please cite:

@article{devine2026kakugo,
  title={Kakugo: Distillation of Low-Resource Languages into Small Language Models},
  author={Devine, Peter and Sanni, Mardhiyah and Adilazuarda, Farid and Loizaga, Julieta Gil and Haddow, Barry},
  journal={arXiv preprint arXiv:2601.14051},
  year={2026}
}