micron-flow / README.md
krasnoteh's picture
Update README.md
ed56ca1 verified
---
license: mit
---
# Micron-Flow: Real-Time Optical Flow Model
## Model Overview
**Micron-Flow** is a lightweight optical flow model optimized for real-time inference at **80+ FPS** on high-end GPUs. By leveraging knowledge distillation from RAFT-Large, this model achieves **high accuracy** while maintaining an extremely small size of **522K parameters**.
## Model Details
- **Architecture**: Modified U-Net with MobileNetV2-based Siamese encoder, residual blocks, and a flow refinement module.
- **Parameters**: 522K
- **Input Resolution**: (152, 240)
- **Training Dataset**: 200K video frame pairs generated from the **Moments of Time** dataset using RAFT-Large.
- **Distillation Approach**:
- Mean squared error (MSE) loss in tanh-space
- Edge-aware smoothness loss
- **Optimization**: Trained with **CosineAnnealing** scheduler and progressive encoder unfreezing.
## Performance
| Device | Inference Time | FPS |
|-------------|---------------|------|
| **RTX 4090** | 0.012 sec | 83 |
| **GTX 1650** | 0.013 sec | 76 |
| **CPU-Only** | 0.07 sec | 14 |
## Key Features
- **Real-time processing**: 80+ FPS on RTX 4090
- **Small model size**: Only 2.1MB on disk
- **Efficient architecture**:
- Depthwise convolutions for reduced parameters
- Inverted residual blocks for better efficiency
- Flow refiner for enhanced motion consistency
- **Optimized training pipeline**: GPU caching and JPEG decoding acceleration
## Limitations
- Trained on synthetic optical flow from RAFT-Large, which may introduce biases.
- Resolution fixed to (152, 240) – requires up/downscaling for different input sizes.
## Model Usage
```python
from torchvision.transforms.functional import to_tensor
# load the model from the .pth file
model = MicronFlow().eval()
frame1 = to_tensor(image1).unsqueeze(0)
frame2 = to_tensor(image2).unsqueeze(0)
flow = model(frame1, frame2)
```
## License
MIT License.
## Links
- **Code**: [GitHub](https://github.com/krasnoteh/micron-flow)