--- base_model: - Qwen/Qwen2.5-VL-3B-Instruct language: - en pipeline_tag: image-text-to-text tags: - vision - object-detection - multimodal - ocr - keypoint-detection - visual-prompting - open-set-detection - object-pointing library_name: transformers license: other --- This model is **Rex-Omni**, a 3B-parameter Multimodal Large Language Model (MLLM) presented in the paper "[Detect Anything via Next Point Prediction](https://huggingface.co/papers/2510.12798)". It is compatible with the Hugging Face `transformers` library and is licensed under the [IDEA License 1.0](https://github.com/IDEA-Research/Rex-Omni/blob/main/LICENSE).

Detect Anything via Next Point Prediction

> Rex-Omni is a 3B-parameter Multimodal Large Language Model (MLLM) that redefines object detection and a wide range of other visual perception tasks as a simple next-token prediction problem.

## 🚀 Quick Start ### Installation ```bash conda create -n rexomni -m python=3.10 pip install torch==2.6.0 torchvision==0.21.0 --index-url https://download.pytorch.org/whl/cu124 git clone https://github.com/IDEA-Research/Rex-Omni.git cd Rex-Omni pip install -v -e . ``` ## 2. Quick Start: Using Rex-Omni for Detection ```python from PIL import Image from rex_omni import RexOmniWrapper, RexOmniVisualize # Initialize model model = RexOmniWrapper( model_path="IDEA-Research/Rex-Omni", backend="transformers" # or "vllm" ) # Load image image = Image.open("your_image.jpg") # Object Detection results = model.inference( images=image, task="detection", categories=["person", "car", "dog"] ) result = results[0] # 4) Visualize vis = RexOmniVisualize( image=image, predictions=result["extracted_predictions"], font_size=20, draw_width=5, show_labels=True, ) vis.save("visualize.jpg") ``` ## 3. Tutorials We provide a series of tutorials to help you get started with Rex-Omni. - [Detection Example](https://github.com/IDEA-Research/Rex-Omni/blob/master/tutorials/detection_example/_full_notebook.ipynb) - [Pointing Example](https://github.com/IDEA-Research/Rex-Omni/blob/master/tutorials/pointing_example/_full_tutorial.ipynb) - [OCR Example](https://github.com/IDEA-Research/Rex-Omni/blob/master/tutorials/ocr_example/_full_tutorial.ipynb) - [Keypointing Example](https://github.com/IDEA-Research/Rex-Omni/blob/master/tutorials/keypointing_example/_full_tutorial.ipynb) - [Visual Prompting Example](https://github.com/IDEA-Research/Rex-Omni/blob/master/tutorials/visual_prompting_example/_full_tutorial.ipynb) - [Batch Inference Example](https://github.com/IDEA-Research/Rex-Omni/blob/master/tutorials/other_example/batch_inference.py) ## 📄 License Rex-Omni is licensed under the [IDEA License 1.0](LICENSE), Copyright (c) IDEA. All Rights Reserved. This model is based on Qwen, which is licensed under the [Qwen RESEARCH LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct/blob/main/LICENSE), Copyright (c) Alibaba Cloud. All Rights Reserved. ## 🔗 Links - 🏠 [Homepage](https://rex-omni.github.io/) - 🎮 [Demo](https://huggingface.co/spaces/Mountchicken/Rex-Omni) ## 📧 Contact For questions and feedback, please contact us at: - Email: jiangqing@idea.edu.cn - GitHub Issues: [IDEA-Research/Rex-Omni](https://github.com/IDEA-Research/Rex-Omni/issues) ## 7. Citation Rex-Omni comes from a series of prior works. If you’re interested, you can take a look. - [RexThinker](https://arxiv.org/abs/2506.04034) - [RexSeek](https://arxiv.org/abs/2503.08507) - [ChatRex](https://arxiv.org/abs/2411.18363) - [DINO-X](https://arxiv.org/abs/2411.14347) - [Grounidng DINO 1.5](https://arxiv.org/abs/2405.10300) - [T-Rex2](https://link.springer.com/chapter/10.1007/978-3-031-73414-4_3) - [T-Rex](https://arxiv.org/abs/2311.13596) ```bibtex @misc{jiang2025detectpointprediction, title={Detect Anything via Next Point Prediction}, author={Qing Jiang and Junan Huo and Xingyu Chen and Yuda Xiong and Zhaoyang Zeng and Yihao Chen and Tianhe Ren and Junzhi Yu and Lei Zhang}, year={2025}, eprint={2510.12798}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2510.12798}, } ```