---
license: apache-2.0
---
## Libra Vision Tokenizer

[**Libra: Building Decoupled Vision System on Large Language Models**](https://arxiv.org/abs/2405.10140)

This repo provides the pretrained weight of Libra vision tokenizer trained with lookup-free quantization. 

### !!! NOTE !!!
1. Please merge the weights into ``llama-2-7b-chat-hf-libra`` ([huggingface version of LLaMA2-7B-Chat](https://huggingface.co/docs/transformers/main/model_doc/llama2)).

2. Please download the pretrained CLIP model in huggingface and merge it into the path. The CLIP model can be downloaded [here](https://huggingface.co/openai/clip-vit-large-patch14-336).


The files should be organized as:

```
llama-2-7b-chat-hf-libra/
|
│   # original llama files
|
├── ...
│   
│   # newly added vision tokenizer
│   
├── vision_tokenizer_config.yaml
├── vqgan.ckpt
│
│   # CLIP model
│
└── openai-clip-vit-large-patch14-336/
    └── ...    
```