BAAI
/

nielsr HF Staff commited on
Commit
73de3b6
·
verified ·
1 Parent(s): 91789bb

Improve Emu3.5-VisionTokenizer model card: Add pipeline tag, GitHub link & sync content

Browse files

This PR enhances the model card for `Emu3.5-VisionTokenizer` by:
- Adding the `pipeline_tag: image-feature-extraction` to improve discoverability on the Hugging Face Hub, as instructed.
- Adding an explicit "Code" link to the GitHub repository in the introductory section for direct access to the codebase.
- Synchronizing the "Configuration" and "Schedule" sections, and adding an explanatory paragraph below the "Model & Weights" table to align with the latest details from the official GitHub README.

These updates ensure the model card is more complete, accurate, and discoverable.

Files changed (1) hide show
  1. README.md +11 -6
README.md CHANGED
@@ -1,12 +1,14 @@
1
  ---
2
  license: apache-2.0
 
3
  ---
 
4
  <div align='center'>
5
  <h1>Emu3.5: Native Multimodal Models are World Learners</h1>
6
 
7
  Emu3.5 Team, BAAI
8
 
9
- [Project Page](https://emu.world/) | [🤗HF Models](https://huggingface.co/collections/BAAI/emu35) | [Paper](https://arxiv.org/pdf/2510.26583)
10
  </div>
11
 
12
 
@@ -49,6 +51,9 @@ Emu3.5 Team, BAAI
49
  | Emu3.5-Image | [🤗 HF link](https://huggingface.co/BAAI/Emu3.5-Image/tree/main) |
50
  | Emu3.5-VisionTokenizer | [🤗 HF link](https://huggingface.co/BAAI/Emu3.5-VisionTokenizer/tree/main) |
51
 
 
 
 
52
  ## 2. Quick Start
53
 
54
  ### Environment Setup
@@ -64,7 +69,8 @@ pip install flash_attn==2.8.3 --no-build-isolation
64
  Edit `configs/config.py` to set:
65
 
66
  - Paths: `model_path`, `vq_path`
67
- - Task template: `task_type in {t2i, x2i, howto, story, explore, vla}`, `use_image` controls `<|IMAGE|>` usage (set to true when reference images are provided)
 
68
  - Sampling: `sampling_params` (classifier_free_guidance, temperature, top_k/top_p, etc.)
69
 
70
  ### Run Inference
@@ -85,9 +91,9 @@ python src/utils/vis_proto.py --input <input_proto_file> --output <output_dir>
85
 
86
  ## 3. Schedule
87
 
88
- - [x] Inference Code
89
  - [ ] Advanced Image Decoder
90
- - [ ] Discrete Diffusion Adaptation(DiDA)
91
 
92
 
93
  ## 4. Citation
@@ -102,5 +108,4 @@ python src/utils/vis_proto.py --input <input_proto_file> --output <output_dir>
102
  primaryClass={cs.CV},
103
  url={https://arxiv.org/abs/2510.26583},
104
  }
105
- ```
106
-
 
1
  ---
2
  license: apache-2.0
3
+ pipeline_tag: image-feature-extraction
4
  ---
5
+
6
  <div align='center'>
7
  <h1>Emu3.5: Native Multimodal Models are World Learners</h1>
8
 
9
  Emu3.5 Team, BAAI
10
 
11
+ [Project Page](https://emu.world/) | [🤗HF Models](https://huggingface.co/collections/BAAI/emu35) | [Paper](https://arxiv.org/pdf/2510.26583) | [Code](https://github.com/baaivision/Emu3.5)
12
  </div>
13
 
14
 
 
51
  | Emu3.5-Image | [🤗 HF link](https://huggingface.co/BAAI/Emu3.5-Image/tree/main) |
52
  | Emu3.5-VisionTokenizer | [🤗 HF link](https://huggingface.co/BAAI/Emu3.5-VisionTokenizer/tree/main) |
53
 
54
+ **Emu3.5** handles general tasks(including interleaved generation and image generation/editing), while **Emu3.5-Image** focuses on high-quality image generation/editing.
55
+
56
+
57
  ## 2. Quick Start
58
 
59
  ### Environment Setup
 
69
  Edit `configs/config.py` to set:
70
 
71
  - Paths: `model_path`, `vq_path`
72
+ - Task template: `task_type in {t2i, x2i, howto, story, explore, vla}`
73
+ - Input image: `use_image` (True to provide reference images, controls <|IMAGE|> token); set `reference_image` in each prompt to specify the image path.
74
  - Sampling: `sampling_params` (classifier_free_guidance, temperature, top_k/top_p, etc.)
75
 
76
  ### Run Inference
 
91
 
92
  ## 3. Schedule
93
 
94
+ - [x] Inference Code(auto-regressive version)
95
  - [ ] Advanced Image Decoder
96
+ - [ ] Discrete Diffusion Adaptation(DiDA) Inference & Weights
97
 
98
 
99
  ## 4. Citation
 
108
  primaryClass={cs.CV},
109
  url={https://arxiv.org/abs/2510.26583},
110
  }
111
+ ```