---
base_model:
- Qwen/Qwen2.5-VL-3B-Instruct
language:
- en
pipeline_tag: image-text-to-text
tags:
- vision
- object-detection
- multimodal
- ocr
- keypoint-detection
- visual-prompting
- open-set-detection
- object-pointing
library_name: transformers
license: other
---
This model is **Rex-Omni**, a 3B-parameter Multimodal Large Language Model (MLLM) presented in the paper "[Detect Anything via Next Point Prediction](https://huggingface.co/papers/2510.12798)". It is compatible with the Hugging Face `transformers` library and is licensed under the [IDEA License 1.0](https://github.com/IDEA-Research/Rex-Omni/blob/main/LICENSE).
Detect Anything via Next Point Prediction
> Rex-Omni is a 3B-parameter Multimodal Large Language Model (MLLM) that redefines object detection and a wide range of other visual perception tasks as a simple next-token prediction problem.

## ๐ Quick Start
### Installation
```bash
conda create -n rexomni -m python=3.10
pip install torch==2.6.0 torchvision==0.21.0 --index-url https://download.pytorch.org/whl/cu124
git clone https://github.com/IDEA-Research/Rex-Omni.git
cd Rex-Omni
pip install -v -e .
```
## 2. Quick Start: Using Rex-Omni for Detection
```python
from PIL import Image
from rex_omni import RexOmniWrapper, RexOmniVisualize
# Initialize model
model = RexOmniWrapper(
model_path="IDEA-Research/Rex-Omni",
backend="transformers" # or "vllm"
)
# Load image
image = Image.open("your_image.jpg")
# Object Detection
results = model.inference(
images=image,
task="detection",
categories=["person", "car", "dog"]
)
result = results[0]
# 4) Visualize
vis = RexOmniVisualize(
image=image,
predictions=result["extracted_predictions"],
font_size=20,
draw_width=5,
show_labels=True,
)
vis.save("visualize.jpg")
```
## 3. Tutorials
We provide a series of tutorials to help you get started with Rex-Omni.
- [Detection Example](https://github.com/IDEA-Research/Rex-Omni/blob/master/tutorials/detection_example/_full_notebook.ipynb)
- [Pointing Example](https://github.com/IDEA-Research/Rex-Omni/blob/master/tutorials/pointing_example/_full_tutorial.ipynb)
- [OCR Example](https://github.com/IDEA-Research/Rex-Omni/blob/master/tutorials/ocr_example/_full_tutorial.ipynb)
- [Keypointing Example](https://github.com/IDEA-Research/Rex-Omni/blob/master/tutorials/keypointing_example/_full_tutorial.ipynb)
- [Visual Prompting Example](https://github.com/IDEA-Research/Rex-Omni/blob/master/tutorials/visual_prompting_example/_full_tutorial.ipynb)
- [Batch Inference Example](https://github.com/IDEA-Research/Rex-Omni/blob/master/tutorials/other_example/batch_inference.py)
## ๐ License
Rex-Omni is licensed under the [IDEA License 1.0](LICENSE), Copyright (c) IDEA. All Rights Reserved. This model is based on Qwen, which is licensed under the [Qwen RESEARCH LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct/blob/main/LICENSE), Copyright (c) Alibaba Cloud. All Rights Reserved.
## ๐ Links
- ๐ [Homepage](https://rex-omni.github.io/)
- ๐ฎ [Demo](https://huggingface.co/spaces/Mountchicken/Rex-Omni)
## ๐ง Contact
For questions and feedback, please contact us at:
- Email: jiangqing@idea.edu.cn
- GitHub Issues: [IDEA-Research/Rex-Omni](https://github.com/IDEA-Research/Rex-Omni/issues)
## 7. Citation
Rex-Omni comes from a series of prior works. If youโre interested, you can take a look.
- [RexThinker](https://arxiv.org/abs/2506.04034)
- [RexSeek](https://arxiv.org/abs/2503.08507)
- [ChatRex](https://arxiv.org/abs/2411.18363)
- [DINO-X](https://arxiv.org/abs/2411.14347)
- [Grounidng DINO 1.5](https://arxiv.org/abs/2405.10300)
- [T-Rex2](https://link.springer.com/chapter/10.1007/978-3-031-73414-4_3)
- [T-Rex](https://arxiv.org/abs/2311.13596)
```bibtex
@misc{jiang2025detectpointprediction,
title={Detect Anything via Next Point Prediction},
author={Qing Jiang and Junan Huo and Xingyu Chen and Yuda Xiong and Zhaoyang Zeng and Yihao Chen and Tianhe Ren and Junzhi Yu and Lei Zhang},
year={2025},
eprint={2510.12798},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2510.12798},
}
```