Kisoku 3B SFT

The instruction-tuned version of Kisoku 3B Base, fine-tuned using supervised fine-tuning (SFT) on Google Cloud TPUs with MaxText.

Trained entirely from scratch (pretraining + SFT) by a solo researcher, supported by Google's TPU Research Cloud (TRC).

Overview

This model was SFT'd from the Kisoku 3B base checkpoint using a custom text-only chat template (### User / ### Assistant format) designed to avoid out-of-vocabulary special token issues common with Llama-family tokenizers.

The model uses Granite architecture (identical to Llama but with runtime logit scaling), enabling GGUF conversion and local deployment via llama.cpp.

Architecture

Parameter	Value
Architecture	GraniteForCausalLM
Parameters	~3B
Layers	28
Hidden size	3072
FFN size	8192
Attention heads	24
KV heads	6 (Grouped-Query Attention)
Head dim	128
Vocab size	128,256
Context length	4,096
Logit scaling	55.43 (Granite-specific)
Activation	SiLU

Training Details

Pretraining (Base Model)

Detail	Value
Framework	MaxText (JAX) on TPU v4-32
Steps	460,000
Data	DCLM-Baseline 1.0, FineWeb-Edu

SFT

Detail	Value
Framework	MaxText SFT on TPU
Steps	~2,499
Final loss	~1.6
Chat template	Custom text-only (`### User` / `### Assistant`)
Tokenizer	Custom (at `kisoku-sft-tokenizer/`)

Local Deployment (GGUF)

A GGUF quantized version (Q8_0, 3.5GB) is available for local serving via llama.cpp:

# Serve with llama-server
llama-server -m kisoku-3b-sft-q8.gguf -c 4096 --port 8900

# Use with any OpenAI-compatible client
curl http://localhost:8900/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "kisoku", "messages": [{"role": "user", "content": "Hello!"}]}'

Note: Due to Granite logit scaling (55.4x), use temperature ~0.01 for standard behavior, or use the included proxy script that auto-adjusts temperature and injects logit_bias for special tokens.

Limitations

Undertrained base model (needs more pretraining tokens for competitive performance)
English-focused
No safety alignment (RLHF/DPO) applied
Granite logit scaling requires temperature adjustment at inference

Acknowledgments

Research supported with Cloud TPUs from Google's TPU Research Cloud (TRC).

License

Apache 2.0

Downloads last month: 5

Safetensors

Model size

4B params

Tensor type

F16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for 0arch-io/kisoku-3b-sft

Base model

0arch-io/kisoku-3b-base

Finetuned

(1)

this model

Quantizations

2 models