Kisoku 3B SFT

The instruction-tuned version of Kisoku 3B Base, fine-tuned using supervised fine-tuning (SFT) on Google Cloud TPUs with MaxText.

Trained entirely from scratch (pretraining + SFT) by a solo researcher, supported by Google's TPU Research Cloud (TRC).

Overview

This model was SFT'd from the Kisoku 3B base checkpoint using a custom text-only chat template (### User / ### Assistant format) designed to avoid out-of-vocabulary special token issues common with Llama-family tokenizers.

The model uses Granite architecture (identical to Llama but with runtime logit scaling), enabling GGUF conversion and local deployment via llama.cpp.

Architecture

Parameter Value
Architecture GraniteForCausalLM
Parameters ~3B
Layers 28
Hidden size 3072
FFN size 8192
Attention heads 24
KV heads 6 (Grouped-Query Attention)
Head dim 128
Vocab size 128,256
Context length 4,096
Logit scaling 55.43 (Granite-specific)
Activation SiLU

Training Details

Pretraining (Base Model)

Detail Value
Framework MaxText (JAX) on TPU v4-32
Steps 460,000
Data DCLM-Baseline 1.0, FineWeb-Edu

SFT

Detail Value
Framework MaxText SFT on TPU
Steps ~2,499
Final loss ~1.6
Chat template Custom text-only (### User / ### Assistant)
Tokenizer Custom (at kisoku-sft-tokenizer/)

Local Deployment (GGUF)

A GGUF quantized version (Q8_0, 3.5GB) is available for local serving via llama.cpp:

# Serve with llama-server
llama-server -m kisoku-3b-sft-q8.gguf -c 4096 --port 8900

# Use with any OpenAI-compatible client
curl http://localhost:8900/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "kisoku", "messages": [{"role": "user", "content": "Hello!"}]}'

Note: Due to Granite logit scaling (55.4x), use temperature ~0.01 for standard behavior, or use the included proxy script that auto-adjusts temperature and injects logit_bias for special tokens.

Limitations

  • Undertrained base model (needs more pretraining tokens for competitive performance)
  • English-focused
  • No safety alignment (RLHF/DPO) applied
  • Granite logit scaling requires temperature adjustment at inference

Acknowledgments

Research supported with Cloud TPUs from Google's TPU Research Cloud (TRC).

License

Apache 2.0

Downloads last month
25
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for 0arch-io/kisoku-3b-sft

Finetuned
(1)
this model
Quantizations
2 models