ivnle
/

bad-autoencoding

Model card Files Files and versions

bad-autoencoding / README.md

nielsr's picture

nielsr HF Staff

Add pipeline tag

a0c3908 verified about 2 months ago

|

2.31 kB

	---
	license: apache-2.0
	pipeline_tag: feature-extraction
	tags:
	- vision
	- ocr
	- compression
	- autoencoding
	---

	# Bad Autoencoding - Model Checkpoints

	Checkpoints for the paper: "Optical Context Compression Is Just (Bad) Autoencoding"

	Ivan Lee, Cheng Yang, Taylor Berg-Kirkpatrick

	## Links

	- Paper: [arXiv:2512.03643](https://arxiv.org/abs/2512.03643)
	- Code: [https://github.com/ivnle/bad-autoencoding](https://github.com/ivnle/bad-autoencoding)

	## Available Checkpoints

	Naming convention: `{regime}_{config}_h{N}_{objective}[_recon-init]`

	### Reconstruction

	\| Checkpoint \| Regime \| CR \| PPL \|
	\|------------\|--------\|-----\|-----\|
	\| `vision_base_h0_recon` \| Vision base \| 3.60 \| 1.03 \|
	\| `meanpool_w4s4_h0_recon` \| Meanpool w4s4 \| 3.97 \| 1.04 \|
	\| `conv1d_t250_h0_recon` \| Conv1D t250 \| 3.97 \| 1.00 \|
	\| `vision_tiny_h0_recon` \| Vision tiny \| 12.82 \| 1.14 \|
	\| `conv1d_t63_h0_recon` \| Conv1D t63 \| 15.38 \| 1.01 \|

	### Language Modeling

	\| Checkpoint \| Regime \| CR \| Init \| PPL \|
	\|------------\|--------\|-----\|------\|-----\|
	\| `vision_base_h0_lm` \| Vision base \| 3.60 \| Direct \| 5.08 \|
	\| `vision_base_h0_lm_recon-init` \| Vision base \| 3.60 \| From recon \| 5.06 \|
	\| `text_ctx277_h0_lm` \| Text ctx277 (Truncation) \| 3.60 \| Direct \| 5.02 \|
	\| `meanpool_w4s4_h0_lm_recon-init` \| Meanpool w4s4 \| 3.97 \| From recon \| 5.02 \|
	\| `conv1d_t250_h0_lm_recon-init` \| Conv1D t250 \| 3.97 \| From recon \| 4.96 \|

	## Model Details

	- Architecture: DeepSeek-OCR with vision encoder
	- Vision checkpoints: Trained encoder (base=768x768, tiny=384x384)
	- Text checkpoints: Truncation baseline (no vision encoder), context=277 tokens
	- Meanpool checkpoints: Frozen encoder, window=4, stride=4
	- Conv1D checkpoints: Trained hierarchical encoder (t250=CR 3.97, t63=CR 15.38)
	- Dataset: 510k samples from FineWiki

	## Usage

	```python
	from huggingface_hub import hf_hub_download

	# Download a specific checkpoint
	checkpoint_path = hf_hub_download(
	repo_id="ivnle/bad-autoencoding",
	filename="vision_base_h0_lm/model.pt",
	repo_type="model"
	)
	```

	## Citation

	```bibtex
	@article{lee2024optical,
	title={Optical Context Compression Is Just (Bad) Autoencoding},
	author={Lee, Ivan and Yang, Cheng and Berg-Kirkpatrick, Taylor},
	journal={arXiv preprint arXiv:2512.03643},
	year={2024}
	}
	```