YifanXu's picture
Update README.md
3ea7ba7 verified
---
license: apache-2.0
---
## Libra Vision Tokenizer
[**Libra: Building Decoupled Vision System on Large Language Models**](https://arxiv.org/abs/2405.10140)
This repo provides the pretrained weight of Libra vision tokenizer trained with lookup-free quantization.
### !!! NOTE !!!
1. Please merge the weights into ``llama-2-7b-chat-hf-libra`` ([huggingface version of LLaMA2-7B-Chat](https://huggingface.co/docs/transformers/main/model_doc/llama2)).
2. Please download the pretrained CLIP model in huggingface and merge it into the path. The CLIP model can be downloaded [here](https://huggingface.co/openai/clip-vit-large-patch14-336).
The files should be organized as:
```
llama-2-7b-chat-hf-libra/
|
β”‚ # original llama files
|
β”œβ”€β”€ ...
β”‚
β”‚ # newly added vision tokenizer
β”‚
β”œβ”€β”€ vision_tokenizer_config.yaml
β”œβ”€β”€ vqgan.ckpt
β”‚
β”‚ # CLIP model
β”‚
└── openai-clip-vit-large-patch14-336/
└── ...
```