Improve Emu3.5-VisionTokenizer model card: Add pipeline tag, GitHub link & sync content
#1
by
nielsr
HF Staff
- opened
README.md
CHANGED
|
@@ -1,12 +1,14 @@
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
|
|
|
| 3 |
---
|
|
|
|
| 4 |
<div align='center'>
|
| 5 |
<h1>Emu3.5: Native Multimodal Models are World Learners</h1>
|
| 6 |
|
| 7 |
Emu3.5 Team, BAAI
|
| 8 |
|
| 9 |
-
[Project Page](https://emu.world/) | [🤗HF Models](https://huggingface.co/collections/BAAI/emu35) | [Paper](https://arxiv.org/pdf/2510.26583)
|
| 10 |
</div>
|
| 11 |
|
| 12 |
|
|
@@ -49,6 +51,9 @@ Emu3.5 Team, BAAI
|
|
| 49 |
| Emu3.5-Image | [🤗 HF link](https://huggingface.co/BAAI/Emu3.5-Image/tree/main) |
|
| 50 |
| Emu3.5-VisionTokenizer | [🤗 HF link](https://huggingface.co/BAAI/Emu3.5-VisionTokenizer/tree/main) |
|
| 51 |
|
|
|
|
|
|
|
|
|
|
| 52 |
## 2. Quick Start
|
| 53 |
|
| 54 |
### Environment Setup
|
|
@@ -64,7 +69,8 @@ pip install flash_attn==2.8.3 --no-build-isolation
|
|
| 64 |
Edit `configs/config.py` to set:
|
| 65 |
|
| 66 |
- Paths: `model_path`, `vq_path`
|
| 67 |
-
- Task template: `task_type in {t2i, x2i, howto, story, explore, vla}
|
|
|
|
| 68 |
- Sampling: `sampling_params` (classifier_free_guidance, temperature, top_k/top_p, etc.)
|
| 69 |
|
| 70 |
### Run Inference
|
|
@@ -85,9 +91,9 @@ python src/utils/vis_proto.py --input <input_proto_file> --output <output_dir>
|
|
| 85 |
|
| 86 |
## 3. Schedule
|
| 87 |
|
| 88 |
-
- [x] Inference Code
|
| 89 |
- [ ] Advanced Image Decoder
|
| 90 |
-
- [ ] Discrete Diffusion Adaptation(DiDA)
|
| 91 |
|
| 92 |
|
| 93 |
## 4. Citation
|
|
@@ -102,5 +108,4 @@ python src/utils/vis_proto.py --input <input_proto_file> --output <output_dir>
|
|
| 102 |
primaryClass={cs.CV},
|
| 103 |
url={https://arxiv.org/abs/2510.26583},
|
| 104 |
}
|
| 105 |
-
```
|
| 106 |
-
|
|
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
+
pipeline_tag: image-feature-extraction
|
| 4 |
---
|
| 5 |
+
|
| 6 |
<div align='center'>
|
| 7 |
<h1>Emu3.5: Native Multimodal Models are World Learners</h1>
|
| 8 |
|
| 9 |
Emu3.5 Team, BAAI
|
| 10 |
|
| 11 |
+
[Project Page](https://emu.world/) | [🤗HF Models](https://huggingface.co/collections/BAAI/emu35) | [Paper](https://arxiv.org/pdf/2510.26583) | [Code](https://github.com/baaivision/Emu3.5)
|
| 12 |
</div>
|
| 13 |
|
| 14 |
|
|
|
|
| 51 |
| Emu3.5-Image | [🤗 HF link](https://huggingface.co/BAAI/Emu3.5-Image/tree/main) |
|
| 52 |
| Emu3.5-VisionTokenizer | [🤗 HF link](https://huggingface.co/BAAI/Emu3.5-VisionTokenizer/tree/main) |
|
| 53 |
|
| 54 |
+
**Emu3.5** handles general tasks(including interleaved generation and image generation/editing), while **Emu3.5-Image** focuses on high-quality image generation/editing.
|
| 55 |
+
|
| 56 |
+
|
| 57 |
## 2. Quick Start
|
| 58 |
|
| 59 |
### Environment Setup
|
|
|
|
| 69 |
Edit `configs/config.py` to set:
|
| 70 |
|
| 71 |
- Paths: `model_path`, `vq_path`
|
| 72 |
+
- Task template: `task_type in {t2i, x2i, howto, story, explore, vla}`
|
| 73 |
+
- Input image: `use_image` (True to provide reference images, controls <|IMAGE|> token); set `reference_image` in each prompt to specify the image path.
|
| 74 |
- Sampling: `sampling_params` (classifier_free_guidance, temperature, top_k/top_p, etc.)
|
| 75 |
|
| 76 |
### Run Inference
|
|
|
|
| 91 |
|
| 92 |
## 3. Schedule
|
| 93 |
|
| 94 |
+
- [x] Inference Code(auto-regressive version)
|
| 95 |
- [ ] Advanced Image Decoder
|
| 96 |
+
- [ ] Discrete Diffusion Adaptation(DiDA) Inference & Weights
|
| 97 |
|
| 98 |
|
| 99 |
## 4. Citation
|
|
|
|
| 108 |
primaryClass={cs.CV},
|
| 109 |
url={https://arxiv.org/abs/2510.26583},
|
| 110 |
}
|
| 111 |
+
```
|
|
|