Two-Stream Violence Detection Network
Model Architecture
- Type: Two-Stream Network (Spatial + Temporal)
- Streams:
- RGB Stream: ResNet3D (r3d_18) to process raw video frames. Captures appearance info.
- Optical Flow Stream: ResNet3D (r3d_18) to process computed dense optical flow. Captures motion info.
- Fusion: Features from both streams are concatenated and passed through fully connected layers.
- Input: 16 Frames (RGB) + 16 Flow Fields (Computed on the fly).
- Computation: Optical flow is computed using Farneback algorithm within the Dataloader.
Dataset Structure
Expects Dataset folder in parent directory.
Dataset/
├── violence/
└── no-violence/
How to Run
- Install dependencies:
torch,opencv-python(with contrib if needed for some algorithms, but Farneback is standard),torchvision. - Run
python train.py.