Spaces:
Sleeping
Sleeping
Upload gradio_show.md
Browse files- gradio_show.md +56 -0
gradio_show.md
ADDED
|
@@ -0,0 +1,56 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# 📘 MMScan Hierarchical VIsual Grounding Challenge
|
| 2 |
+
|
| 3 |
+

|
| 4 |
+
## 🔍 Challenge Introduction
|
| 5 |
+
|
| 6 |
+
**Hierarchical Visual Grounding (HVG) Task in the MMScan Benchmark**:
|
| 7 |
+
This task evaluates a model’s ability to perform visual grounding at multiple levels of granularity — from region to object-level, and from single-target localization to inter-targets localization. Given a natural language description, models are expected to accurately locate the corresponding object(s) within the 3D scenes, reflecting comprehensive spatial and attribute-level understanding.
|
| 8 |
+
|
| 9 |
+
- **Overview**: You can refer to this [website](https://neurips.cc/virtual/2024/poster/97429) for an overview and our [paper](https://arxiv.org/abs/2406.09401) for more details.
|
| 10 |
+
- **Challenge Data and Codebase**: The challenge dataset includes:
|
| 11 |
+
- **Training set**: Language prompts + ground-truth bounding boxes
|
| 12 |
+
- **Validation set**: Language prompts + ground-truth bounding boxes
|
| 13 |
+
- **Test set**: Language prompts only (no ground truth provided)
|
| 14 |
+
Follow the [instructions](https://github.com/OpenRobotLab/EmbodiedScan/tree/mmscan) to get familiar with data organization and MMScan APIs. All the code for MMScan is available [here](https://github.com/OpenRobotLab/EmbodiedScan/tree/mmscan).
|
| 15 |
+
|
| 16 |
+
- **Evaluation Metrics**: For the visual grounding task, our evaluator computes multiple metrics including AP@0.25 (Average Precision), gTop-1@0.25, and gTop-3@0.25 where the gTop-k metric is an expanded metric that generalizes the traditional Top-k metric, offering superior flexibility and interpretability compared to traditional ones when oriented towards multi-target grounding.
|
| 17 |
+
|
| 18 |
+
- **Contact**: For any questions related to the HVG challenge, feel free to reach out to [**Jingli Lin**](linjingli166@gmail).
|
| 19 |
+
|
| 20 |
+
---
|
| 21 |
+
## 📝 How to Participate
|
| 22 |
+
|
| 23 |
+
To register for the challenge, please contact us via [**Google Mail**](linjingli166@gmail) and include the following information:
|
| 24 |
+
|
| 25 |
+
- A **self-chosen username** (this will be shown on the leaderboard)
|
| 26 |
+
- A **login password**
|
| 27 |
+
- Your **team or institution name**
|
| 28 |
+
- A brief statement on your **motivation for participating**
|
| 29 |
+
|
| 30 |
+
> 📌 **Submission limit**: Each user is allowed a **maximum of 5 submissions per day**.
|
| 31 |
+
---
|
| 32 |
+
## 🚀 Submission Guidelines
|
| 33 |
+
|
| 34 |
+
- Your submission should be a **dictionary**, where each key is a **sample ID** from the test split.
|
| 35 |
+
- For each sample, provide:
|
| 36 |
+
- `pred_bboxes`: a list of predicted bounding boxes
|
| 37 |
+
- `scores`: the corresponding confidence scores
|
| 38 |
+
- An expected result is:
|
| 39 |
+
|
| 40 |
+
```python
|
| 41 |
+
{
|
| 42 |
+
'VG_Inter_Space_OO__1mp3d_0009_region0__55'(sample ID):
|
| 43 |
+
{
|
| 44 |
+
'pred_bboxes'(list, 100*9): [[...],...]
|
| 45 |
+
'scores'(list, 100): [...]
|
| 46 |
+
}
|
| 47 |
+
...
|
| 48 |
+
}
|
| 49 |
+
```
|
| 50 |
+
|
| 51 |
+
> 💡 **Note**: The bounding boxes do **not** need to be sorted by confidence.
|
| 52 |
+
|
| 53 |
+
- ⛔ **Limit the number of predicted boxes to 100 per sample**. If your submission contains more than 100 boxes for a single sample, only the top 100 will be considered.
|
| 54 |
+
|
| 55 |
+
- ⏱️ **Efficiency Tip**: Round all floating-point numbers in your submission to **two decimal places** to reduce file size and transmission overhead.
|
| 56 |
+
|