yifanxu commited on
Commit
a97a793
Β·
1 Parent(s): 9598fd3

model version 1.0

Browse files
Files changed (3) hide show
  1. README.md +28 -3
  2. vision_tokenizer_config.yaml +23 -0
  3. vqgan.ckpt +3 -0
README.md CHANGED
@@ -1,3 +1,28 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## Libra Vision Tokenizer
2
+ This repo provides the pretrained weight of Libra vision tokenizer trained with lookup-free quantization.
3
+
4
+ ### !!! NOTE !!!
5
+ 1. Please merge the weights into ``llama-2-7b-chat-hf-libra`` (the huggingface version of LLaMA2-7B).
6
+
7
+ 2. Please download the pretrained CLIP model in huggingface and merge it into the path. The CLIP model can be downloaded [here](https://huggingface.co/openai/clip-vit-large-patch14-336).
8
+
9
+
10
+ The files should be organized as:
11
+
12
+ ```
13
+ llama-2-7b-chat-hf-libra/
14
+ |
15
+ β”‚ # original llama files
16
+ |
17
+ β”œβ”€β”€ ...
18
+ β”‚
19
+ β”‚ # newly added vision tokenizer
20
+ β”‚
21
+ β”œβ”€β”€ vision_tokenizer_config.yaml
22
+ β”œβ”€β”€ vqgan.ckpt
23
+ β”‚
24
+ β”‚ # CLIP model
25
+ β”‚
26
+ └── openai-clip-vit-large-patch14-336/
27
+ └── ...
28
+ ```
vision_tokenizer_config.yaml ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ freeze: True
2
+ max_vision_token_length: 578 # 24*24 (resolution) + 2 (<img> and <\img>); corresponding to model_config.max_vision_token_length, dataset_config.image_size
3
+ params:
4
+ embed_dim: 1024 # debug
5
+ ckpt_path: vqgan.ckpt
6
+ codebook_size: 512
7
+ num_codebook: 2
8
+ ddconfig:
9
+ # only_auto_encoder: True
10
+ encoder_name: openai-clip-vit-large-patch14-336
11
+ select_layer: [2,10,18,22]
12
+ double_z: False
13
+ z_channels: 1024
14
+ resolution: 336 # 336
15
+ in_channels: 3
16
+ out_ch: 3
17
+ ch: 128
18
+ ch_mult: [ 1,1,2,4,8] # num_down = len(ch_mult)-1
19
+ num_res_blocks: 2
20
+ attn_resolutions: [24]
21
+ dropout: 0.0
22
+ initial_resolution: 24
23
+ num_attn_head: 8
vqgan.ckpt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d01a38fadd81dec3557120ec6e8d36d51758ac1a8a8afe58102f404d03e47a08
3
+ size 3247360961