File size: 8,985 Bytes
a23bb0b
0c67a17
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a23bb0b
 
0c67a17
a23bb0b
0c67a17
a23bb0b
0c67a17
 
 
 
 
a23bb0b
0c67a17
a23bb0b
0c67a17
 
 
a23bb0b
0c67a17
a23bb0b
0c67a17
a23bb0b
0c67a17
a23bb0b
0c67a17
 
 
a23bb0b
0c67a17
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a23bb0b
0c67a17
a23bb0b
0c67a17
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
---
license: mit
base_model: MCG-NJU/videomae-base
tags:
- video-classification
- crime-detection
- violence-detection
- videomae
- computer-vision
- security
- surveillance
- generated_from_trainer
language:
- en
datasets:
- jinmang2/ucf_crime
metrics:
- accuracy
- precision
- recall
- f1
pipeline_tag: video-classification
model-index:
- name: test-upload-model
  results:
  - task:
      name: Violence Detection
      type: video-classification
    dataset:
      name: UCF Crime Dataset (Subset)
      type: jinmang2/ucf_crime
      args: violence_detection
    metrics:
    - name: Accuracy
      type: accuracy
      value: 0.5000
    - name: Precision
      type: precision
      value: 0.2500
    - name: Recall
      type: recall
      value: 0.5000
    - name: F1
      type: f1
      value: 0.3333
---

# Nikeytas/Test Upload Model

This model is a fine-tuned version of [MCG-NJU/videomae-base](https://huggingface.co/MCG-NJU/videomae-base) on the UCF Crime dataset with **event-based binary classification**. It achieves the following results on the evaluation set:

- **Loss**: 0.5847
- **Accuracy**: 0.5000
- **Precision**: 0.2500
- **Recall**: 0.5000
- **F1 Score**: 0.3333

## 🎯 Model Overview

This VideoMAE model has been fine-tuned for **binary violence detection** in video content. The model classifies videos into two categories:
- **Violent Crime** (1): Videos containing violent criminal activities
- **Non-Violent Incident** (0): Videos with non-violent or normal activities

The model is based on the **VideoMAE architecture** and has been specifically trained on a curated subset of the UCF Crime dataset with event-based categorization for realistic crime detection scenarios.

## πŸ“Š Dataset & Training

### Dataset Composition

**Total Videos**: 20
- **Violent Crime Videos**: 10
- **Non-Violent Incident Videos**: 10

**Class Balance**: 50.0% violent crimes

**Event Distribution**:
- **Arrest**: 20 videos
- **Arson**: 20 videos

**Data Splits**:
- **Training**: 12 videos
- **Validation**: 4 videos  
- **Test**: 4 videos

## 🎯 Performance

### Performance Metrics

**Validation Performance**:
- **eval_loss**: 0.5847
- **eval_accuracy**: 0.5000
- **eval_precision**: 0.2500
- **eval_recall**: 0.5000
- **eval_f1**: 0.3333
- **eval_runtime**: 0.6636
- **eval_samples_per_second**: 6.0270
- **eval_steps_per_second**: 3.0140
- **epoch**: 1.0000

**Test Performance**:
- **eval_loss**: 0.6700
- **eval_accuracy**: 0.5000
- **eval_precision**: 0.2500
- **eval_recall**: 0.5000
- **eval_f1**: 0.3333
- **eval_runtime**: 0.4271
- **eval_samples_per_second**: 9.3660
- **eval_steps_per_second**: 4.6830
- **epoch**: 1.0000

**Training Information**:
- **Training Time**: 0.1 minutes
- **Best Accuracy Achieved**: 0.5000
- **Model Architecture**: VideoMAE Base (fine-tuned)
- **Fine-tuning Approach**: Event-based binary classification

## πŸš€ Training Procedure

### Training Hyperparameters

The following hyperparameters were used during training:
- **Learning Rate**: 5e-05
- **Train Batch Size**: 2
- **Eval Batch Size**: 2 
- **Optimizer**: AdamW with betas=(0.9,0.999) and epsilon=1e-08
- **LR Scheduler Type**: Linear
- **Training Epochs**: 1
- **Weight Decay**: 0.01

### Training Results

| Training Loss | Epoch | Step | Validation Loss | Accuracy |
|---------------|-------|------|-----------------|----------|
| 0.5 | 1.00 | N/A | 0.5847 | 0.5000 |

### Framework Versions

- **Transformers**: 4.30.2+
- **PyTorch**: 2.0.1+
- **Datasets**: Latest
- **Device**: Apple Silicon MPS / CUDA / CPU (Auto-detected)

## πŸš€ Quick Start

### Installation

```bash
pip install transformers torch torchvision opencv-python pillow
```

### Basic Usage

```python
import torch
from transformers import AutoModelForVideoClassification, AutoProcessor
import cv2
import numpy as np

# Load model and processor
model = AutoModelForVideoClassification.from_pretrained("Nikeytas/test-upload-model")
processor = AutoProcessor.from_pretrained("Nikeytas/test-upload-model")

# Process video
def classify_video(video_path, num_frames=16):
    # Extract frames
    cap = cv2.VideoCapture(video_path)
    frames = []
    
    total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
    indices = np.linspace(0, total_frames - 1, num_frames, dtype=int)
    
    for idx in indices:
        cap.set(cv2.CAP_PROP_POS_FRAMES, idx)
        ret, frame = cap.read()
        if ret:
            frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
            frames.append(frame_rgb)
    
    cap.release()
    
    # Process with model
    inputs = processor(frames, return_tensors="pt")
    
    with torch.no_grad():
        outputs = model(**inputs)
        predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
        predicted_class = torch.argmax(predictions, dim=-1).item()
        confidence = predictions[0][predicted_class].item()
    
    label = "Violent Crime" if predicted_class == 1 else "Non-Violent"
    return label, confidence

# Example usage
video_path = "path/to/your/video.mp4"
prediction, confidence = classify_video(video_path)
print(f"Prediction: {prediction} (Confidence: {confidence:.3f})")
```

### Batch Processing

```python
import os
from pathlib import Path

def process_video_directory(video_dir, output_file="results.txt"):
    results = []
    
    for video_file in Path(video_dir).glob("*.mp4"):
        try:
            prediction, confidence = classify_video(str(video_file))
            results.append({
                "file": video_file.name,
                "prediction": prediction,
                "confidence": confidence
            })
            print(f"βœ… {video_file.name}: {prediction} ({confidence:.3f})")
        except Exception as e:
            print(f"❌ Error processing {video_file.name}: {e}")
    
    # Save results
    with open(output_file, "w") as f:
        for result in results:
            f.write(f"{result['file']}: {result['prediction']} ({result['confidence']:.3f})\n")
    
    return results

# Process all videos in a directory
results = process_video_directory("./videos/")
```

## πŸ“ˆ Technical Specifications

- **Base Model**: MCG-NJU/videomae-base
- **Architecture**: Vision Transformer (ViT) adapted for video
- **Input Resolution**: 224x224 pixels per frame
- **Temporal Resolution**: 16 frames per video clip
- **Output Classes**: 2 (Binary classification)
- **Training Framework**: HuggingFace Transformers
- **Optimization**: AdamW optimizer with learning rate 5e-5

## ⚠️ Limitations

1. **Dataset Scope**: Trained on a subset of UCF Crime dataset - may not generalize to all types of violence
2. **Temporal Context**: Uses 16-frame clips which may miss context in longer sequences
3. **Environmental Bias**: Performance may vary with different lighting, camera angles, and video quality
4. **False Positives**: May misclassify intense but non-violent activities (sports, action movies)
5. **Real-time Performance**: Processing time depends on hardware capabilities

## πŸ”’ Ethical Considerations

### Intended Use
- **Primary**: Research and development in video analysis
- **Secondary**: Security system enhancement with human oversight
- **Educational**: Computer vision and AI safety research

### Prohibited Uses
- **Surveillance without consent**: Do not use for unauthorized monitoring
- **Discriminatory profiling**: Avoid bias against specific groups or communities  
- **Automated punishment**: Never use for automated legal or disciplinary actions
- **Privacy violation**: Respect privacy laws and individual rights

### Bias and Fairness
- Model trained on specific dataset that may not represent all populations
- Regular evaluation needed for bias detection and mitigation
- Human oversight required for critical applications
- Consider demographic representation in deployment scenarios

## πŸ“ Model Card Information

- **Developed by**: Research Team
- **Model Type**: Video Classification (Binary)
- **Training Data**: UCF Crime Dataset (Subset)
- **Training Date**: 2025-06-08 15:19:08 UTC
- **Evaluation Metrics**: Accuracy, Precision, Recall, F1-Score
- **Intended Users**: Researchers, Security Professionals, Developers

## πŸ“š Citation

If you use this model in your research, please cite:

```bibtex
@misc{Nikeytas_test_upload_model,
    title={VideoMAE Fine-tuned for Crime Detection},
    author={Research Team},
    year={2024},
    publisher={Hugging Face},
    url={https://huggingface.co/Nikeytas/test-upload-model}
}
```

## 🀝 Contributing

We welcome contributions to improve the model! Please:
1. Report issues with specific examples
2. Suggest improvements for bias reduction
3. Share evaluation results on new datasets
4. Contribute to documentation and examples

## πŸ“ž Contact

For questions, issues, or collaboration opportunities, please open an issue in the model repository or contact the development team.

---

*Last updated: 2025-06-08 15:19:08 UTC*
*Model version: 1.0*
*Framework: HuggingFace Transformers*