File size: 4,508 Bytes
8c46cab | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 | ---
license: other
license_name: insightface-non-commercial
license_link: https://github.com/deepinsight/insightface#license
tags:
- face-detection
- face-recognition
- scrfd
- arcface
- onnx
- batch-inference
- tensorrt
library_name: onnx
pipeline_tag: image-classification
---
# InsightFace Batch-Optimized Models (Max Batch 64)
Re-exported InsightFace models with **proper dynamic batch support** and **no cross-frame contamination**.
## ⚠️ Version Difference
| Repository | Max Batch | Best For |
|------------|-----------|----------|
| [alonsorobots/scrfd_320_batched](https://huggingface.co/alonsorobots/scrfd_320_batched) | 1-32 | Standard use, tested extensively |
| **This repo** | **1-64** | Experimentation with larger batches |
**Recommendation:** Use max batch=32 for optimal performance. Batch=64 provides similar throughput but uses more VRAM.
## Why These Models?
The original InsightFace ONNX models have issues with batch inference:
- `buffalo_l` detection model: hardcoded batch=1
- `buffalo_l_batch` detection model: **broken** - has cross-frame contamination due to reshape operations that flatten the batch dimension
These re-exports fix the `dynamic_axes` in the ONNX graph for **true batch inference**.
## Models
| Model | Task | Input Shape | Output | Batch | Speedup |
|-------|------|-------------|--------|-------|---------|
| `scrfd_10g_320_batch64.onnx` | Face Detection | `[N, 3, 320, 320]` | boxes, landmarks | 1-64 | **6×** |
| `arcface_w600k_r50_batch64.onnx` | Face Embedding | `[N, 3, 112, 112]` | 512-dim vectors | 1-64 | **10×** |
## Performance (TensorRT FP16, RTX 5090)
### Batch Size Comparison (Full Video, 12,263 frames)
| Batch Size | FPS | Relative |
|------------|-----|----------|
| 16 | 2,007 | 1.00× |
| **32** | **2,097** | **1.05×** ✅ Optimal |
| 64 | 2,034 | 1.01× |
**Key Finding:** Batch=32 is optimal. Batch=64 provides no additional benefit due to GPU memory bandwidth saturation.
### With Pipelined Preprocessing (4 workers)
| Configuration | FPS | Speedup |
|---------------|-----|---------|
| Sequential batch=16 | 1,211 | baseline |
| **Pipelined batch=32** | **2,097** | **1.73×** |
## Usage
```python
import numpy as np
import onnxruntime as ort
# Load model
sess = ort.InferenceSession("scrfd_10g_320_batch64.onnx",
providers=["TensorrtExecutionProvider", "CUDAExecutionProvider"])
# Batch inference (any size from 1-64)
batch = np.random.randn(32, 3, 320, 320).astype(np.float32)
outputs = sess.run(None, {"input.1": batch})
# outputs[0-2]: scores per FPN level (stride 8, 16, 32)
# outputs[3-5]: bboxes per FPN level
# outputs[6-8]: keypoints per FPN level
```
## TensorRT Configuration
When using TensorRT, set profile shapes to support your desired batch range:
```python
providers = [
("TensorrtExecutionProvider", {
"trt_fp16_enable": True,
"trt_engine_cache_enable": True,
"trt_profile_min_shapes": "input.1:1x3x320x320",
"trt_profile_opt_shapes": "input.1:32x3x320x320", # Optimize for batch=32
"trt_profile_max_shapes": "input.1:64x3x320x320", # Support up to 64
}),
"CUDAExecutionProvider",
]
```
## Verified: No Batch Contamination
```python
# Same frame processed alone vs in batch = identical results
single_output = sess.run(None, {"input.1": frame[np.newaxis, ...]})
batch[7] = frame
batch_output = sess.run(None, {"input.1": batch})
max_diff = np.max(np.abs(single_output[0] - batch_output[0][7]))
# max_diff < 1e-5 ✓
```
## Re-export Process
These models were re-exported from InsightFace's PyTorch source using MMDetection with proper `dynamic_axes`:
```python
dynamic_axes = {
"input.1": {0: "batch"},
"score_8": {0: "batch"},
"score_16": {0: "batch"},
# ... all outputs
}
```
## License
**Non-commercial research purposes only** - per [InsightFace license](https://github.com/deepinsight/insightface#license).
For commercial licensing, contact: `recognition-oss-pack@insightface.ai`
## Credits
- Original models: [InsightFace](https://github.com/deepinsight/insightface) by Jia Guo et al.
- SCRFD paper: [Sample and Computation Redistribution for Efficient Face Detection](https://arxiv.org/abs/2105.04714)
- ArcFace paper: [ArcFace: Additive Angular Margin Loss for Deep Face Recognition](https://arxiv.org/abs/1801.07698)
|