| # Feature Fusion Network | |
| ## Model Architecture | |
| - **Type**: Multi-Modal Hybrid (CNN + Transformer) | |
| - **Pathway 1 (Spatial)**: ResNet3D (r3d_18) for robust localized feature extraction. | |
| - **Pathway 2 (Spatiotemporal)**: TimeSformer (Transformer) block dealing with patches and frames to capture long-range dependencies. | |
| - **Fusion**: Late fusion via concatenation of flattened feature vectors (512 features from CNN + 256 features from Transformer). | |
| - **Classification Head**: MLP mapping fused features to binary classes. | |
| ## Dataset Structure | |
| Expects `Dataset` folder in parent directory. | |
| ``` | |
| Dataset/ | |
| βββ violence/ | |
| βββ no-violence/ | |
| ``` | |
| ## How to Run | |
| 1. Install dependencies: `torch`, `opencv-python`, `scikit-learn`, `numpy`, `torchvision`. | |
| 2. Run `python train.py`. | |