Scaffold GPT-2 Portuguese
GPT-2 272M trained from scratch on Portuguese news articles with scaffold (countdown) tokens for perfect length control.
Resources
| Resource | Link |
|---|---|
| Dataset | viniciusxpb/scaffold-tokens-dataset |
| Training code | github.com/viniciusxpb/scaffold-tokens |
| Author | Vinícius França |
What are Scaffold Tokens?
Each word in the training data is preceded by a countdown token <ff_N> that tells the model how many words remain:
<ff_6> O <ff_5> presidente <ff_4> anunciou <ff_3> novas <ff_2> medidas <ff_1> econômicas <ff_0> .
The model achieves 100% length accuracy (53/53 exact match) across all tested ranges (50 to 999 words).
Model Details
| Field | Value |
|---|---|
| Architecture | GPT-2 272M (12 layers, 768 dim, 6 heads) |
| Parameters | 272M (75M transformer + 118M value embeddings + 79M embeddings/head) |
| Vocab | 51,264 (50,257 BPE + 1,000 FF + 7 padding) |
| Tokenizer | tiktoken GPT-2 |
| Precision | BF16 |
| Training | 1,750 steps, 55 min on RTX 3060 |
| Final val loss | 1.381 |
| Dataset | scaffold-tokens-dataset (~208M tokens) |
Quick Start
# Clone the repo
git clone https://github.com/viniciusxpb/scaffold-tokens
cd scaffold-tokens
# Setup, download model, and generate
make setup
make download-model
make generate
Usage
import torch
import tiktoken
# Load model
ckpt = torch.load("model.pt", map_location="cuda", weights_only=False)
# The model uses a custom architecture (not HuggingFace Transformers).
# See the full inference code at the GitHub repo.
For full inference with forced countdown generation, see the training repository:
github.com/viniciusxpb/scaffold-tokens
Training Data
Trained on scaffold-tokens-dataset -- Portuguese news articles from Folha de S.Paulo (public domain), pre-tokenized with <ff_N> countdown tokens.
Results
| Metric | Value |
|---|---|
| Length control accuracy | 100% (53/53 exact match) |
| Final word loss | 2.08 |
| Final FF loss | 0.045 |
| Peak VRAM | 5.8 GB / 12 GB |
Checkpoint Format
The file model.pt is a PyTorch checkpoint containing:
{
"model": OrderedDict, # state_dict (weights only, no optimizer)
"step": 1750,
"val_loss": 1.381,
}
Citation
@misc{scaffold-tokens-2025,
title={Scaffold Tokens: Teaching LLMs to Plan with Countdown Tokens},
author={Vinícius França},
year={2025},
url={https://github.com/viniciusxpb/scaffold-tokens}
}