Granite Wordle SFT (200 steps)

Weights derived from ibm-granite/granite-3.0-1b-a400m-instruct fine-tuned with PRIME-RL's SFT trainer on the willcb/V3-wordle dataset.

Steps: 200
Global batch size: 64 (micro batch 1, single H200 GPU)
Precision: bfloat16
Peak GPU memory: ~13 GiB
Final training loss: 0.053 (step 199)

The terminal checkpoint at step_200/ includes tokenizer files and pytorch_model.bin for inference or warm-starting downstream RL experiments.

Training command:

CUDA_VISIBLE_DEVICES=0 uv run sft @ examples/wordle/sft/granite_train.toml \
  --output-dir outputs/granite_wordle_sft_200 \
  --max-steps 200 \
  --weights.interval 10 \
  --ckpt.interval 50 \
  --ckpt.keep 20

For more details see the PRIME-RL repository.

Downloads last month: 1

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support