Bad Autoencoding - Model Checkpoints

Checkpoints for the paper: "Optical Context Compression Is Just (Bad) Autoencoding"

Ivan Lee, Cheng Yang, Taylor Berg-Kirkpatrick

Available Checkpoints

Naming convention: {regime}_{config}_h{N}_{objective}[_recon-init]

Reconstruction

Checkpoint	Regime	CR	PPL
`vision_base_h0_recon`	Vision base	3.60	1.03
`meanpool_w4s4_h0_recon`	Meanpool w4s4	3.97	1.04
`conv1d_t250_h0_recon`	Conv1D t250	3.97	1.00
`vision_tiny_h0_recon`	Vision tiny	12.82	1.14
`conv1d_t63_h0_recon`	Conv1D t63	15.38	1.01

Language Modeling

Checkpoint	Regime	CR	Init	PPL
`vision_base_h0_lm`	Vision base	3.60	Direct	5.08
`vision_base_h0_lm_recon-init`	Vision base	3.60	From recon	5.06
`text_ctx277_h0_lm`	Text ctx277 (Truncation)	3.60	Direct	5.02
`meanpool_w4s4_h0_lm_recon-init`	Meanpool w4s4	3.97	From recon	5.02
`conv1d_t250_h0_lm_recon-init`	Conv1D t250	3.97	From recon	4.96

Model Details

Architecture: DeepSeek-OCR with vision encoder
Vision checkpoints: Trained encoder (base=768x768, tiny=384x384)
Text checkpoints: Truncation baseline (no vision encoder), context=277 tokens
Meanpool checkpoints: Frozen encoder, window=4, stride=4
Conv1D checkpoints: Trained hierarchical encoder (t250=CR 3.97, t63=CR 15.38)
Dataset: 510k samples from FineWiki

Usage

from huggingface_hub import hf_hub_download

# Download a specific checkpoint
checkpoint_path = hf_hub_download(
    repo_id="ivnle/bad-autoencoding",
    filename="vision_base_h0_lm/model.pt",
    repo_type="model"
)

Citation

@article{lee2024optical,
  title={Optical Context Compression Is Just (Bad) Autoencoding},
  author={Lee, Ivan and Yang, Cheng and Berg-Kirkpatrick, Taylor},
  journal={arXiv preprint arXiv:2512.03643},
  year={2024}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for ivnle/bad-autoencoding

Optical Context Compression Is Just (Bad) Autoencoding

Paper • 2512.03643 • Published Dec 3, 2025 • 1

ivnle
/

bad-autoencoding