Critical Recall Collapse & Query Saturation in High-Density Crowds

#3
by Repaltoofficial - opened

I have been stress-testing the RF-DETR model on High-Density Crowd Scenarios to evaluate its robustness against occlusion and object saturation.

I observed a catastrophic drop in recall (>80% miss rate) when applied to crowded street scenes (see attached logs).

Specific Failure Modes Observed:

Query Saturation / Attention Collapse: The model correctly identifies the first ~10-15 subjects in the immediate foreground but completely ignores the hundreds of subjects in the background. It appears the Deformable Attention mechanism fails to sample key points on smaller, densely packed objects, treating the crowd texture as background noise.

Instance Merging: In cases of high overlap, the model fails to separate individuals, occasionally generating a single "Mega-Box" around clusters of 3-4 distinct people.

Conclusion: The current fine-tuning seems heavily biased toward Sparse/Low-Occlusion data (like COCO). The model currently lacks the "Instance Separation" logic required for real-world surveillance or counting tasks.

My team at Repalto specializes in Dense-Crowd Annotation and Hard-Occlusion mining. We can construct a specific "High-Density" evaluation set to help you tune the query distribution for these edge cases. Happy to share a sample batch if you are interested.

10.01.2026_07.29.38_REC
10.01.2026_07.26.36_REC

Sign up or log in to comment