| | --- |
| | license: apache-2.0 |
| | --- |
| | ## Libra Vision Tokenizer |
| |
|
| | [**Libra: Building Decoupled Vision System on Large Language Models**](https://arxiv.org/abs/2405.10140) |
| |
|
| | This repo provides the pretrained weight of Libra vision tokenizer trained with lookup-free quantization. |
| |
|
| | ### !!! NOTE !!! |
| | 1. Please merge the weights into ``llama-2-7b-chat-hf-libra`` ([huggingface version of LLaMA2-7B-Chat](https://huggingface.co/docs/transformers/main/model_doc/llama2)). |
| |
|
| | 2. Please download the pretrained CLIP model in huggingface and merge it into the path. The CLIP model can be downloaded [here](https://huggingface.co/openai/clip-vit-large-patch14-336). |
| |
|
| |
|
| | The files should be organized as: |
| |
|
| | ``` |
| | llama-2-7b-chat-hf-libra/ |
| | | |
| | β # original llama files |
| | | |
| | βββ ... |
| | β |
| | β # newly added vision tokenizer |
| | β |
| | βββ vision_tokenizer_config.yaml |
| | βββ vqgan.ckpt |
| | β |
| | β # CLIP model |
| | β |
| | βββ openai-clip-vit-large-patch14-336/ |
| | βββ ... |
| | ``` |