File size: 802 Bytes

d31a75f

# Feature Fusion Network

## Model Architecture
- **Type**: Multi-Modal Hybrid (CNN + Transformer)
- **Pathway 1 (Spatial)**: ResNet3D (r3d_18) for robust localized feature extraction.

- **Pathway 2 (Spatiotemporal)**: TimeSformer (Transformer) block dealing with patches and frames to capture long-range dependencies.

- **Fusion**: Late fusion via concatenation of flattened feature vectors (512 features from CNN + 256 features from Transformer).

- **Classification Head**: MLP mapping fused features to binary classes.



## Dataset Structure

Expects `Dataset` folder in parent directory.

```

Dataset/

├── violence/

└── no-violence/

```



## How to Run

1. Install dependencies: `torch`, `opencv-python`, `scikit-learn`, `numpy`, `torchvision`.

2. Run `python train.py`.