| | --- |
| | license: apache-2.0 |
| | language: |
| | - en |
| | tags: |
| | - scene-graph-generation |
| | - object-detection |
| | - visual-relationship-detection |
| | - pytorch |
| | - yolo |
| | pipeline_tag: object-detection |
| | library_name: sgg-benchmark |
| | model-index: |
| | - name: REACT++ yolo12m |
| | results: |
| | - task: |
| | type: object-detection |
| | name: Scene Graph Detection |
| | dataset: |
| | name: VG150 |
| | type: vg150 |
| | metrics: |
| | - type: mR@20 |
| | value: 10.52 |
| | name: mR@20 |
| | - type: R@20 |
| | value: 18.32 |
| | name: R@20 |
| | - type: F1@20 |
| | value: 13.36 |
| | name: F1@20 |
| | - type: mR@50 |
| | value: 13.22 |
| | name: mR@50 |
| | - type: R@50 |
| | value: 22.54 |
| | name: R@50 |
| | - type: F1@50 |
| | value: 16.67 |
| | name: F1@50 |
| | - type: mR@100 |
| | value: 13.96 |
| | name: mR@100 |
| | - type: R@100 |
| | value: 23.77 |
| | name: R@100 |
| | - type: F1@100 |
| | value: 17.59 |
| | name: F1@100 |
| | - type: e2e_latency_ms |
| | value: 19.4 |
| | name: e2e_latency_ms |
| | - name: REACT++ yolo26m |
| | results: |
| | - task: |
| | type: object-detection |
| | name: Scene Graph Detection |
| | dataset: |
| | name: VG150 |
| | type: vg150 |
| | metrics: |
| | - type: mR@20 |
| | value: 10.32 |
| | name: mR@20 |
| | - type: R@20 |
| | value: 20.0 |
| | name: R@20 |
| | - type: mR@50 |
| | value: 13.94 |
| | name: mR@50 |
| | - type: R@50 |
| | value: 26.9 |
| | name: R@50 |
| | - type: mR@100 |
| | value: 16.48 |
| | name: mR@100 |
| | - type: R@100 |
| | value: 32.08 |
| | name: R@100 |
| | - type: mean_recall |
| | value: 21.87 |
| | name: mean_recall |
| | - name: REACT++ yolov8m |
| | results: |
| | - task: |
| | type: object-detection |
| | name: Scene Graph Detection |
| | dataset: |
| | name: VG150 |
| | type: vg150 |
| | metrics: |
| | - type: mR@20 |
| | value: 12.05 |
| | name: mR@20 |
| | - type: R@20 |
| | value: 22.78 |
| | name: R@20 |
| | - type: F1@20 |
| | value: 15.76 |
| | name: F1@20 |
| | - type: mR@50 |
| | value: 15.42 |
| | name: mR@50 |
| | - type: R@50 |
| | value: 28.73 |
| | name: R@50 |
| | - type: F1@50 |
| | value: 20.07 |
| | name: F1@50 |
| | - type: mR@100 |
| | value: 16.51 |
| | name: mR@100 |
| | - type: R@100 |
| | value: 30.84 |
| | name: R@100 |
| | - type: F1@100 |
| | value: 21.51 |
| | name: F1@100 |
| | - type: e2e_latency_ms |
| | value: 17.8 |
| | name: e2e_latency_ms |
| | --- |
| | |
| | # REACT++ Scene Graph Generation — VG150 (yolo12m, yolo26m, yolov8m) |
| |
|
| | This repository contains **REACT++** model checkpoints for scene graph generation (SGG) |
| | on the **VG150** benchmark, across 3 backbone sizes. |
| |
|
| | REACT++ is a parameter-efficient, attention-augmented relation predictor built on top of |
| | a YOLO backbone. It uses: |
| |
|
| | - **DAMP** (Detection-Anchored Multi-Scale Pooling), a new simple pooling algorithm for one-stage object detectors such as YOLO |
| | - **SwiGLU gated MLP** for all feed-forward blocks (½ the params of ReLU-MLP at equal capacity) |
| | - **Visual x Semantic cross-attention** — visual tokens attend to GloVe prototype embeddings |
| | - **Geometry RoPE** — box-position encoded as a rotary frequency bias on the Q matrix |
| | - **Prototype Momentum Buffer** — per-class EMA prototype bank |
| | - **P5 Scene Context** — AIFI-enhanced P5 tokens provide global context via cross-attention |
| |
|
| | The models were trained with the |
| | [SGG-Benchmark](https://github.com/Maelic/SGG-Benchmark) framework and described in the |
| | [REACT++ paper (Neau et al., 2026)](https://arxiv.org/abs/2603.06386). |
| |
|
| | --- |
| |
|
| | ## Results — SGDet on VG150 test split (ONNX, CUDA) |
| |
|
| | > Metrics from end-to-end ONNX evaluation (`tools/eval_onnx_psg.py`). E2E Latency = image load + pre-process + ONNX forward. |
| |
|
| | | Backbone | Params | R@20 | R@50 | R@100 | mR@20 | mR@50 | mR@100 | F1@20 | F1@50 | F1@100 | E2E Lat. (ms) | |
| | |----------|:------:|-----:|-----:|------:|------:|------:|-------:|------:|------:|-------:|--------------:| |
| | | yolo12m | ~20.2M | 18.32 | 22.54 | 23.77 | 10.52 | 13.22 | 13.96 | 13.36 | 16.67 | 17.59 | 19.4 | |
| | | yolo26m | ~20.2M | 20.0 | 26.9 | 32.08 | 10.32 | 13.94 | 16.48 | - | - | - | - | |
| | | yolov8m | ~25.9M | 22.78 | 28.73 | 30.84 | 12.05 | 15.42 | 16.51 | 15.76 | 20.07 | 21.51 | 17.8 | |
| |
|
| | --- |
| |
|
| | ## Checkpoints |
| |
|
| | | Variant | Sub-folder | Checkpoint files | |
| | |---------|------------|-----------------| |
| | | yolo12m | `yolo12m/` | `yolo12m/model.onnx` (ONNX) · `yolo12m/best_model_epoch_19.pth` (PyTorch) | |
| | | yolo26m | `yolo26m/` | `yolo26m/react_pp_yolo26m.onnx` (ONNX) · `yolo26m/best_model_epoch_18.pth` (PyTorch) | |
| | | yolov8m | `yolov8m/` | `yolov8m/model.onnx` (ONNX) · `yolov8m/best_model_epoch_6.pth` (PyTorch) | |
| |
|
| | --- |
| |
|
| | ## Usage |
| |
|
| | ### ONNX (recommended — no Python dependencies beyond onnxruntime) |
| |
|
| | ```python |
| | from huggingface_hub import hf_hub_download |
| | |
| | onnx_path = hf_hub_download( |
| | repo_id="maelic/REACT-pp-VG150", |
| | filename="yolo12m/react_pp_yolo12m.onnx", |
| | repo_type="model", |
| | ) |
| | # Run with tools/eval_onnx_psg.py or load directly via onnxruntime |
| | ``` |
| |
|
| | ### PyTorch |
| |
|
| | ```python |
| | # 1. Clone the repository |
| | # git clone https://github.com/Maelic/SGG-Benchmark |
| | |
| | # 2. Install dependencies |
| | # pip install -e . |
| | |
| | # 3. Download checkpoint + config |
| | from huggingface_hub import hf_hub_download |
| | |
| | ckpt_path = hf_hub_download( |
| | repo_id="maelic/REACT-pp-VG150", |
| | filename="yolo12m/best_model.pth", |
| | repo_type="model", |
| | ) |
| | cfg_path = hf_hub_download( |
| | repo_id="maelic/REACT-pp-VG150", |
| | filename="yolo12m/config.yml", |
| | repo_type="model", |
| | ) |
| | |
| | # 4. Run evaluation |
| | import subprocess |
| | subprocess.run([ |
| | "python", "tools/relation_eval_hydra.py", |
| | "--config-path", str(cfg_path), |
| | "--task", "sgdet", |
| | "--eval-only", |
| | "--checkpoint", str(ckpt_path), |
| | ]) |
| | ``` |
| |
|
| | --- |
| |
|
| | ## Citation |
| |
|
| | ```bibtex |
| | @article{neau2026reactpp, |
| | title = {REACT++: Efficient Cross-Attention for Real-Time Scene Graph Generation |
| | }, |
| | author = {Neau, Maëlic and Falomir, Zoe}, |
| | year = {2026}, |
| | url = {https://arxiv.org/abs/2603.06386}, |
| | } |
| | ``` |
| |
|