| --- |
| license: apache-2.0 |
| pipeline_tag: feature-extraction |
| tags: |
| - vision |
| - ocr |
| - compression |
| - autoencoding |
| --- |
| |
| # Bad Autoencoding - Model Checkpoints |
|
|
| Checkpoints for the paper: **"Optical Context Compression Is Just (Bad) Autoencoding"** |
|
|
| Ivan Lee, Cheng Yang, Taylor Berg-Kirkpatrick |
|
|
| ## Links |
|
|
| - **Paper**: [arXiv:2512.03643](https://arxiv.org/abs/2512.03643) |
| - **Code**: [https://github.com/ivnle/bad-autoencoding](https://github.com/ivnle/bad-autoencoding) |
|
|
| ## Available Checkpoints |
|
|
| Naming convention: `{regime}_{config}_h{N}_{objective}[_recon-init]` |
|
|
| ### Reconstruction |
|
|
| | Checkpoint | Regime | CR | PPL | |
| |------------|--------|-----|-----| |
| | `vision_base_h0_recon` | Vision base | 3.60 | 1.03 | |
| | `meanpool_w4s4_h0_recon` | Meanpool w4s4 | 3.97 | 1.04 | |
| | `conv1d_t250_h0_recon` | Conv1D t250 | 3.97 | 1.00 | |
| | `vision_tiny_h0_recon` | Vision tiny | 12.82 | 1.14 | |
| | `conv1d_t63_h0_recon` | Conv1D t63 | 15.38 | 1.01 | |
|
|
| ### Language Modeling |
|
|
| | Checkpoint | Regime | CR | Init | PPL | |
| |------------|--------|-----|------|-----| |
| | `vision_base_h0_lm` | Vision base | 3.60 | Direct | 5.08 | |
| | `vision_base_h0_lm_recon-init` | Vision base | 3.60 | From recon | 5.06 | |
| | `text_ctx277_h0_lm` | Text ctx277 (Truncation) | 3.60 | Direct | 5.02 | |
| | `meanpool_w4s4_h0_lm_recon-init` | Meanpool w4s4 | 3.97 | From recon | 5.02 | |
| | `conv1d_t250_h0_lm_recon-init` | Conv1D t250 | 3.97 | From recon | 4.96 | |
|
|
| ## Model Details |
|
|
| - **Architecture**: DeepSeek-OCR with vision encoder |
| - **Vision checkpoints**: Trained encoder (base=768x768, tiny=384x384) |
| - **Text checkpoints**: Truncation baseline (no vision encoder), context=277 tokens |
| - **Meanpool checkpoints**: Frozen encoder, window=4, stride=4 |
| - **Conv1D checkpoints**: Trained hierarchical encoder (t250=CR 3.97, t63=CR 15.38) |
| - **Dataset**: 510k samples from FineWiki |
|
|
| ## Usage |
|
|
| ```python |
| from huggingface_hub import hf_hub_download |
| |
| # Download a specific checkpoint |
| checkpoint_path = hf_hub_download( |
| repo_id="ivnle/bad-autoencoding", |
| filename="vision_base_h0_lm/model.pt", |
| repo_type="model" |
| ) |
| ``` |
|
|
| ## Citation |
|
|
| ```bibtex |
| @article{lee2024optical, |
| title={Optical Context Compression Is Just (Bad) Autoencoding}, |
| author={Lee, Ivan and Yang, Cheng and Berg-Kirkpatrick, Taylor}, |
| journal={arXiv preprint arXiv:2512.03643}, |
| year={2024} |
| } |
| ``` |