File size: 8,985 Bytes
a23bb0b 0c67a17 a23bb0b 0c67a17 a23bb0b 0c67a17 a23bb0b 0c67a17 a23bb0b 0c67a17 a23bb0b 0c67a17 a23bb0b 0c67a17 a23bb0b 0c67a17 a23bb0b 0c67a17 a23bb0b 0c67a17 a23bb0b 0c67a17 a23bb0b 0c67a17 a23bb0b 0c67a17 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 |
---
license: mit
base_model: MCG-NJU/videomae-base
tags:
- video-classification
- crime-detection
- violence-detection
- videomae
- computer-vision
- security
- surveillance
- generated_from_trainer
language:
- en
datasets:
- jinmang2/ucf_crime
metrics:
- accuracy
- precision
- recall
- f1
pipeline_tag: video-classification
model-index:
- name: test-upload-model
results:
- task:
name: Violence Detection
type: video-classification
dataset:
name: UCF Crime Dataset (Subset)
type: jinmang2/ucf_crime
args: violence_detection
metrics:
- name: Accuracy
type: accuracy
value: 0.5000
- name: Precision
type: precision
value: 0.2500
- name: Recall
type: recall
value: 0.5000
- name: F1
type: f1
value: 0.3333
---
# Nikeytas/Test Upload Model
This model is a fine-tuned version of [MCG-NJU/videomae-base](https://huggingface.co/MCG-NJU/videomae-base) on the UCF Crime dataset with **event-based binary classification**. It achieves the following results on the evaluation set:
- **Loss**: 0.5847
- **Accuracy**: 0.5000
- **Precision**: 0.2500
- **Recall**: 0.5000
- **F1 Score**: 0.3333
## π― Model Overview
This VideoMAE model has been fine-tuned for **binary violence detection** in video content. The model classifies videos into two categories:
- **Violent Crime** (1): Videos containing violent criminal activities
- **Non-Violent Incident** (0): Videos with non-violent or normal activities
The model is based on the **VideoMAE architecture** and has been specifically trained on a curated subset of the UCF Crime dataset with event-based categorization for realistic crime detection scenarios.
## π Dataset & Training
### Dataset Composition
**Total Videos**: 20
- **Violent Crime Videos**: 10
- **Non-Violent Incident Videos**: 10
**Class Balance**: 50.0% violent crimes
**Event Distribution**:
- **Arrest**: 20 videos
- **Arson**: 20 videos
**Data Splits**:
- **Training**: 12 videos
- **Validation**: 4 videos
- **Test**: 4 videos
## π― Performance
### Performance Metrics
**Validation Performance**:
- **eval_loss**: 0.5847
- **eval_accuracy**: 0.5000
- **eval_precision**: 0.2500
- **eval_recall**: 0.5000
- **eval_f1**: 0.3333
- **eval_runtime**: 0.6636
- **eval_samples_per_second**: 6.0270
- **eval_steps_per_second**: 3.0140
- **epoch**: 1.0000
**Test Performance**:
- **eval_loss**: 0.6700
- **eval_accuracy**: 0.5000
- **eval_precision**: 0.2500
- **eval_recall**: 0.5000
- **eval_f1**: 0.3333
- **eval_runtime**: 0.4271
- **eval_samples_per_second**: 9.3660
- **eval_steps_per_second**: 4.6830
- **epoch**: 1.0000
**Training Information**:
- **Training Time**: 0.1 minutes
- **Best Accuracy Achieved**: 0.5000
- **Model Architecture**: VideoMAE Base (fine-tuned)
- **Fine-tuning Approach**: Event-based binary classification
## π Training Procedure
### Training Hyperparameters
The following hyperparameters were used during training:
- **Learning Rate**: 5e-05
- **Train Batch Size**: 2
- **Eval Batch Size**: 2
- **Optimizer**: AdamW with betas=(0.9,0.999) and epsilon=1e-08
- **LR Scheduler Type**: Linear
- **Training Epochs**: 1
- **Weight Decay**: 0.01
### Training Results
| Training Loss | Epoch | Step | Validation Loss | Accuracy |
|---------------|-------|------|-----------------|----------|
| 0.5 | 1.00 | N/A | 0.5847 | 0.5000 |
### Framework Versions
- **Transformers**: 4.30.2+
- **PyTorch**: 2.0.1+
- **Datasets**: Latest
- **Device**: Apple Silicon MPS / CUDA / CPU (Auto-detected)
## π Quick Start
### Installation
```bash
pip install transformers torch torchvision opencv-python pillow
```
### Basic Usage
```python
import torch
from transformers import AutoModelForVideoClassification, AutoProcessor
import cv2
import numpy as np
# Load model and processor
model = AutoModelForVideoClassification.from_pretrained("Nikeytas/test-upload-model")
processor = AutoProcessor.from_pretrained("Nikeytas/test-upload-model")
# Process video
def classify_video(video_path, num_frames=16):
# Extract frames
cap = cv2.VideoCapture(video_path)
frames = []
total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
indices = np.linspace(0, total_frames - 1, num_frames, dtype=int)
for idx in indices:
cap.set(cv2.CAP_PROP_POS_FRAMES, idx)
ret, frame = cap.read()
if ret:
frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
frames.append(frame_rgb)
cap.release()
# Process with model
inputs = processor(frames, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
predicted_class = torch.argmax(predictions, dim=-1).item()
confidence = predictions[0][predicted_class].item()
label = "Violent Crime" if predicted_class == 1 else "Non-Violent"
return label, confidence
# Example usage
video_path = "path/to/your/video.mp4"
prediction, confidence = classify_video(video_path)
print(f"Prediction: {prediction} (Confidence: {confidence:.3f})")
```
### Batch Processing
```python
import os
from pathlib import Path
def process_video_directory(video_dir, output_file="results.txt"):
results = []
for video_file in Path(video_dir).glob("*.mp4"):
try:
prediction, confidence = classify_video(str(video_file))
results.append({
"file": video_file.name,
"prediction": prediction,
"confidence": confidence
})
print(f"β
{video_file.name}: {prediction} ({confidence:.3f})")
except Exception as e:
print(f"β Error processing {video_file.name}: {e}")
# Save results
with open(output_file, "w") as f:
for result in results:
f.write(f"{result['file']}: {result['prediction']} ({result['confidence']:.3f})\n")
return results
# Process all videos in a directory
results = process_video_directory("./videos/")
```
## π Technical Specifications
- **Base Model**: MCG-NJU/videomae-base
- **Architecture**: Vision Transformer (ViT) adapted for video
- **Input Resolution**: 224x224 pixels per frame
- **Temporal Resolution**: 16 frames per video clip
- **Output Classes**: 2 (Binary classification)
- **Training Framework**: HuggingFace Transformers
- **Optimization**: AdamW optimizer with learning rate 5e-5
## β οΈ Limitations
1. **Dataset Scope**: Trained on a subset of UCF Crime dataset - may not generalize to all types of violence
2. **Temporal Context**: Uses 16-frame clips which may miss context in longer sequences
3. **Environmental Bias**: Performance may vary with different lighting, camera angles, and video quality
4. **False Positives**: May misclassify intense but non-violent activities (sports, action movies)
5. **Real-time Performance**: Processing time depends on hardware capabilities
## π Ethical Considerations
### Intended Use
- **Primary**: Research and development in video analysis
- **Secondary**: Security system enhancement with human oversight
- **Educational**: Computer vision and AI safety research
### Prohibited Uses
- **Surveillance without consent**: Do not use for unauthorized monitoring
- **Discriminatory profiling**: Avoid bias against specific groups or communities
- **Automated punishment**: Never use for automated legal or disciplinary actions
- **Privacy violation**: Respect privacy laws and individual rights
### Bias and Fairness
- Model trained on specific dataset that may not represent all populations
- Regular evaluation needed for bias detection and mitigation
- Human oversight required for critical applications
- Consider demographic representation in deployment scenarios
## π Model Card Information
- **Developed by**: Research Team
- **Model Type**: Video Classification (Binary)
- **Training Data**: UCF Crime Dataset (Subset)
- **Training Date**: 2025-06-08 15:19:08 UTC
- **Evaluation Metrics**: Accuracy, Precision, Recall, F1-Score
- **Intended Users**: Researchers, Security Professionals, Developers
## π Citation
If you use this model in your research, please cite:
```bibtex
@misc{Nikeytas_test_upload_model,
title={VideoMAE Fine-tuned for Crime Detection},
author={Research Team},
year={2024},
publisher={Hugging Face},
url={https://huggingface.co/Nikeytas/test-upload-model}
}
```
## π€ Contributing
We welcome contributions to improve the model! Please:
1. Report issues with specific examples
2. Suggest improvements for bias reduction
3. Share evaluation results on new datasets
4. Contribute to documentation and examples
## π Contact
For questions, issues, or collaboration opportunities, please open an issue in the model repository or contact the development team.
---
*Last updated: 2025-06-08 15:19:08 UTC*
*Model version: 1.0*
*Framework: HuggingFace Transformers*
|