maelic
/

REACTPlusPlus_PSG

+---
+license: apache-2.0
+language:
+  - en
+tags:
+  - scene-graph-generation
+  - object-detection
+  - visual-relationship-detection
+  - pytorch
+  - yolo
+pipeline_tag: object-detection
+library_name: sgg-benchmark
+model-index:
+  - name: REACT++ yolo12n
+    results:
+      - task:
+          type: object-detection
+          name: Scene Graph Detection
+        dataset:
+          name: PSG
+          type: psg
+        metrics: []
+  - name: REACT++ yolo12s
+    results:
+      - task:
+          type: object-detection
+          name: Scene Graph Detection
+        dataset:
+          name: PSG
+          type: psg
+        metrics:
+          - type: mR@20
+            value: 2.91
+            name: mR@20
+          - type: R@20
+            value: 6.71
+            name: R@20
+          - type: zsR@20
+            value: 1.82
+            name: zsR@20
+          - type: mR@50
+            value: 3.93
+            name: mR@50
+          - type: R@50
+            value: 9.28
+            name: R@50
+          - type: zsR@50
+            value: 2.66
+            name: zsR@50
+          - type: mR@100
+            value: 4.62
+            name: mR@100
+          - type: R@100
+            value: 11.21
+            name: R@100
+          - type: zsR@100
+            value: 3.22
+            name: zsR@100
+          - type: mean_recall
+            value: 24.71
+            name: mean_recall
+  - name: REACT++ yolo12m
+    results:
+      - task:
+          type: object-detection
+          name: Scene Graph Detection
+        dataset:
+          name: PSG
+          type: psg
+        metrics:
+          - type: mR@20
+            value: 22.73
+            name: mR@20
+          - type: R@20
+            value: 31.11
+            name: R@20
+          - type: zsR@20
+            value: 1.81
+            name: zsR@20
+          - type: mR@50
+            value: 25.75
+            name: mR@50
+          - type: R@50
+            value: 36.29
+            name: R@50
+          - type: zsR@50
+            value: 2.8
+            name: zsR@50
+          - type: mR@100
+            value: 27.55
+            name: mR@100
+          - type: R@100
+            value: 39.44
+            name: R@100
+          - type: zsR@100
+            value: 3.77
+            name: zsR@100
+          - type: mean_recall
+            value: 26.32
+            name: mean_recall
+  - name: REACT++ yolo12l
+    results:
+      - task:
+          type: object-detection
+          name: Scene Graph Detection
+        dataset:
+          name: PSG
+          type: psg
+        metrics:
+          - type: mR@20
+            value: 23.34
+            name: mR@20
+          - type: R@20
+            value: 29.72
+            name: R@20
+          - type: zsR@20
+            value: 1.74
+            name: zsR@20
+          - type: mR@50
+            value: 25.82
+            name: mR@50
+          - type: R@50
+            value: 35.12
+            name: R@50
+          - type: zsR@50
+            value: 2.77
+            name: zsR@50
+          - type: mR@100
+            value: 27.47
+            name: mR@100
+          - type: R@100
+            value: 37.99
+            name: R@100
+          - type: zsR@100
+            value: 3.53
+            name: zsR@100
+          - type: mean_recall
+            value: 33.16
+            name: mean_recall
+  - name: REACT++ yolov8m
+    results:
+      - task:
+          type: object-detection
+          name: Scene Graph Detection
+        dataset:
+          name: PSG
+          type: psg
+        metrics:
+          - type: mR@20
+            value: 2.82
+            name: mR@20
+          - type: R@20
+            value: 10.02
+            name: R@20
+          - type: zsR@20
+            value: 1.97
+            name: zsR@20
+          - type: mR@50
+            value: 4.57
+            name: mR@50
+          - type: R@50
+            value: 13.75
+            name: R@50
+          - type: zsR@50
+            value: 2.8
+            name: zsR@50
+          - type: mR@100
+            value: 5.98
+            name: mR@100
+          - type: R@100
+            value: 16.24
+            name: R@100
+          - type: zsR@100
+            value: 3.49
+            name: zsR@100
+          - type: mean_recall
+            value: 21.42
+            name: mean_recall
+---
+# REACT++ Scene Graph Generation — PSG (yolo12n, yolo12s, yolo12m, yolo12l, yolov8m)
+This repository contains **REACT++** model checkpoints for scene graph generation (SGG)
+on the **PSG** benchmark, across 5 backbone sizes.
+REACT++ is a parameter-efficient, attention-augmented relation predictor built on top of
+a YOLO12 backbone.  It uses:
+- **SwiGLU gated MLP** for all feed-forward blocks (½ the params of ReLU-MLP at equal capacity)
+- **Visual × Semantic cross-attention** — visual tokens attend to GloVe prototype embeddings
+- **Geometry RoPE** — box-position encoded as a rotary frequency bias on the Q matrix
+- **Prototype Momentum Buffer** — per-class EMA prototype bank (MoCo/DINO-style)
+- **P5 Scene Context** — AIFI-enhanced P5 tokens provide global context via cross-attention
+The models were trained with the
+[SGG-Benchmark](https://github.com/Maelic/SGG-Benchmark) framework and described in the
+[REACT paper (Neau et al., BMVC 2025)](https://arxiv.org/abs/2405.16116).
+---
+## Results — SGDet on PSG test split
+| Backbone | Params (backbone) | mR@20 | mR@50 | mR@100 | R@20 | R@50 | R@100 |
+|----------|:-----------------:|------:|------:|-------:|-----:|-----:|------:|
+| yolo12n | ~2.6M | - | - | - | - | - | - |
+| yolo12s | ~9.2M | 2.91 | 3.93 | 4.62 | 6.71 | 9.28 | 11.21 |
+| yolo12m | ~20.2M | 22.73 | 25.75 | 27.55 | 31.11 | 36.29 | 39.44 |
+| yolo12l | ~26.5M | 23.34 | 25.82 | 27.47 | 29.72 | 35.12 | 37.99 |
+| yolov8m | ~25.9M | 2.82 | 4.57 | 5.98 | 10.02 | 13.75 | 16.24 |
+---
+## Checkpoints
+| Variant | Sub-folder | Checkpoint file |
+|---------|------------|-----------------|
+| yolo12n | `yolo12n/` | `yolo12n/best_model_epoch_5.pth` |
+| yolo12s | `yolo12s/` | `yolo12s/best_model_epoch_6.pth` |
+| yolo12m | `yolo12m/` | `yolo12m/best_model_epoch_9.pth` |
+| yolo12l | `yolo12l/` | `yolo12l/best_model_epoch_9.pth` |
+| yolov8m | `yolov8m/` | `yolov8m/best_model_epoch_6.pth` |
+---
+## Usage
+```python
+# 1. Clone the repository
+#    git clone https://github.com/Maelic/SGG-Benchmark
+# 2. Install dependencies
+#    pip install -e .
+# 3. Download a checkpoint
+from huggingface_hub import hf_hub_download
+ckpt_path = hf_hub_download(
+    repo_id="maelic/REACTPlusPlus_PSG",
+    filename="yolo12n/best_model.pth",
+    repo_type="model",
+)
+cfg_path = hf_hub_download(
+    repo_id="maelic/REACTPlusPlus_PSG",
+    filename="yolo12n/hydra_config.yaml",
+    repo_type="model",
+)
+# 4. Run evaluation
+import subprocess
+subprocess.run([
+    "python", "tools/relation_train_net_hydra.py",
+    "--config-path", str(cfg_path),
+    "--task", "sgdet",
+    "--eval-only",
+    "--checkpoint", str(ckpt_path),
+])
+```
+---
+## Citation
+```bibtex
+@inproceedings{neau2025react,
+  title   = {REACT: Relation Extraction through Attention-guided Contrastive Training},
+  author  = {Neau, Maëlic and others},
+  booktitle = {BMVC},
+  year    = {2025},
+  url     = {https://arxiv.org/abs/2405.16116},
+}
+```