OpenTransformer
/

sat-retrofit-experiment

Model card Files Files and versions

sat-retrofit-experiment / README.md

OpenTransformer's picture

OpenTransformer

Upload README.md with huggingface_hub

f53ead3 verified about 1 month ago

|

history blame contribute delete

2.36 kB

	# SAT Retrofit Experiment: Can AR Models Be Forced to Output Multiple Tokens?

	## TL;DR
	No. Autoregressive (AR) models cannot be "snapped" to Semi-Autoregressive (SAT) inference. The hidden states only encode single next-token prediction. Joint AR+SAT training from scratch is required.

	## Key Results

	### Speed
	\| Method \| Tokens/sec \| Speedup \|
	\|--------\|-----------\|---------\|
	\| AR (baseline) \| 75.6 \| 1.0x \|
	\| Forced SAT (block=2) \| 149.9 \| 1.98x \|

	### Quality (Perplexity - lower is better)
	\| Prompt \| AR \| Forced SAT \| Degradation \|
	\|--------\|-----\|------------\|-------------\|
	\| "The quick brown fox" \| 16.2 \| 424.0 \| 26x worse \|
	\| "In the beginning" \| 7.4 \| 69.0 \| 9x worse \|
	\| "Once upon a time" \| 8.9 \| 56.5 \| 6x worse \|
	\| "The scientist discovered" \| 10.8 \| 296.8 \| 27x worse \|
	\| "Machine learning is" \| 7.7 \| 133.6 \| 17x worse \|

	### Example Outputs

	Prompt: "The scientist discovered that"

	- AR (good): "the bacteria were able to grow in the air, and that they could also grow in the water."
	- Forced SAT (broken): "the thatched bacteria were roofing able to offload growling the bacteria spores into into the the"

	## Why It Fails

	AR model hidden states at position N only encode "what is token N+1" - singular. There is no representation for "what are tokens N+1 AND N+2 simultaneously."

	When you force 2-token output:
	- Token 1: Predicted from position -1 (correct context) ✓
	- Token 2: Predicted from position -2 (STALE context, one step behind) ✗

	This creates alternating reasonable/wrong tokens as the model answers questions about outdated context.

	## The Solution: Joint AR+SAT Training

	Train the model on BOTH objectives from initialization:
	- AR loss: predict next token
	- SAT loss: predict next N tokens simultaneously

	This forces the model to build representations that encode multiple future tokens from each position, enabling both inference modes.

	See: [AGILLM-3](https://huggingface.co/OpenTransformer/AGILLM-3-large) for a working implementation.

	## Reproducing

	```bash
	pip install torch transformers
	python sat_test.py
	```

	## Citation

	If you use this finding, please cite:
	```
	@misc{sat_retrofit_2026,
	author = {Scott Bisset},
	title = {AR Models Cannot Be Retrofitted for SAT Inference},
	year = {2026},
	publisher = {OpenTransformers Ltd}
	}
	```

	## License
	MIT