BAAI
/

Improve Emu3.5-VisionTokenizer model card: Add pipeline tag, GitHub link & sync content

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +11 -6
README.md CHANGED
@@ -1,12 +1,14 @@
1
  ---
2
  license: apache-2.0
 
3
  ---
 
4
  <div align='center'>
5
  <h1>Emu3.5: Native Multimodal Models are World Learners</h1>
6
 
7
  Emu3.5 Team, BAAI
8
 
9
- [Project Page](https://emu.world/) | [🤗HF Models](https://huggingface.co/collections/BAAI/emu35) | [Paper](https://arxiv.org/pdf/2510.26583)
10
  </div>
11
 
12
 
@@ -49,6 +51,9 @@ Emu3.5 Team, BAAI
49
  | Emu3.5-Image | [🤗 HF link](https://huggingface.co/BAAI/Emu3.5-Image/tree/main) |
50
  | Emu3.5-VisionTokenizer | [🤗 HF link](https://huggingface.co/BAAI/Emu3.5-VisionTokenizer/tree/main) |
51
 
 
 
 
52
  ## 2. Quick Start
53
 
54
  ### Environment Setup
@@ -64,7 +69,8 @@ pip install flash_attn==2.8.3 --no-build-isolation
64
  Edit `configs/config.py` to set:
65
 
66
  - Paths: `model_path`, `vq_path`
67
- - Task template: `task_type in {t2i, x2i, howto, story, explore, vla}`, `use_image` controls `<|IMAGE|>` usage (set to true when reference images are provided)
 
68
  - Sampling: `sampling_params` (classifier_free_guidance, temperature, top_k/top_p, etc.)
69
 
70
  ### Run Inference
@@ -85,9 +91,9 @@ python src/utils/vis_proto.py --input <input_proto_file> --output <output_dir>
85
 
86
  ## 3. Schedule
87
 
88
- - [x] Inference Code
89
  - [ ] Advanced Image Decoder
90
- - [ ] Discrete Diffusion Adaptation(DiDA)
91
 
92
 
93
  ## 4. Citation
@@ -102,5 +108,4 @@ python src/utils/vis_proto.py --input <input_proto_file> --output <output_dir>
102
  primaryClass={cs.CV},
103
  url={https://arxiv.org/abs/2510.26583},
104
  }
105
- ```
106
-
 
1
  ---
2
  license: apache-2.0
3
+ pipeline_tag: image-feature-extraction
4
  ---
5
+
6
  <div align='center'>
7
  <h1>Emu3.5: Native Multimodal Models are World Learners</h1>
8
 
9
  Emu3.5 Team, BAAI
10
 
11
+ [Project Page](https://emu.world/) | [🤗HF Models](https://huggingface.co/collections/BAAI/emu35) | [Paper](https://arxiv.org/pdf/2510.26583) | [Code](https://github.com/baaivision/Emu3.5)
12
  </div>
13
 
14
 
 
51
  | Emu3.5-Image | [🤗 HF link](https://huggingface.co/BAAI/Emu3.5-Image/tree/main) |
52
  | Emu3.5-VisionTokenizer | [🤗 HF link](https://huggingface.co/BAAI/Emu3.5-VisionTokenizer/tree/main) |
53
 
54
+ **Emu3.5** handles general tasks(including interleaved generation and image generation/editing), while **Emu3.5-Image** focuses on high-quality image generation/editing.
55
+
56
+
57
  ## 2. Quick Start
58
 
59
  ### Environment Setup
 
69
  Edit `configs/config.py` to set:
70
 
71
  - Paths: `model_path`, `vq_path`
72
+ - Task template: `task_type in {t2i, x2i, howto, story, explore, vla}`
73
+ - Input image: `use_image` (True to provide reference images, controls <|IMAGE|> token); set `reference_image` in each prompt to specify the image path.
74
  - Sampling: `sampling_params` (classifier_free_guidance, temperature, top_k/top_p, etc.)
75
 
76
  ### Run Inference
 
91
 
92
  ## 3. Schedule
93
 
94
+ - [x] Inference Code(auto-regressive version)
95
  - [ ] Advanced Image Decoder
96
+ - [ ] Discrete Diffusion Adaptation(DiDA) Inference & Weights
97
 
98
 
99
  ## 4. Citation
 
108
  primaryClass={cs.CV},
109
  url={https://arxiv.org/abs/2510.26583},
110
  }
111
+ ```