--- license: mit pipeline_tag: image-to-image tags: - discrete tokenization - autoregressive generation --- # InsightTok InsightTok is a discrete visual tokenizer designed to improve the fidelity of **text** and **faces**, two of the most challenging yet perceptually important structures in autoregressive image generation. It was introduced in the paper [InsightTok: Improving Text and Face Fidelity in Discrete Tokenization for Autoregressive Image Generation](https://huggingface.co/papers/2605.14333). - **Paper:** [InsightTok: Improving Text and Face Fidelity in Discrete Tokenization for Autoregressive Image Generation](https://huggingface.co/papers/2605.14333) - **Code:** [https://github.com/LeapLabTHU/InsightTok](https://github.com/LeapLabTHU/InsightTok) ## Model Details | Property | Value | |---|---:| | Downsampling rate | 16× | | Codebook size | 16,384 | | Latent dimension | 256 | | Number of parameters | 426M | ## Performance InsightTok achieves strong text and face reconstruction quality while maintaining a compact discrete representation through localized, content-aware perceptual losses.