--- license: mit base_model: MCG-NJU/videomae-base tags: - video-classification - crime-detection - violence-detection - videomae - computer-vision - security - surveillance - generated_from_trainer language: - en datasets: - jinmang2/ucf_crime metrics: - accuracy - precision - recall - f1 pipeline_tag: video-classification model-index: - name: test-upload-model results: - task: name: Violence Detection type: video-classification dataset: name: UCF Crime Dataset (Subset) type: jinmang2/ucf_crime args: violence_detection metrics: - name: Accuracy type: accuracy value: 0.5000 - name: Precision type: precision value: 0.2500 - name: Recall type: recall value: 0.5000 - name: F1 type: f1 value: 0.3333 --- # Nikeytas/Test Upload Model This model is a fine-tuned version of [MCG-NJU/videomae-base](https://huggingface.co/MCG-NJU/videomae-base) on the UCF Crime dataset with **event-based binary classification**. It achieves the following results on the evaluation set: - **Loss**: 0.5847 - **Accuracy**: 0.5000 - **Precision**: 0.2500 - **Recall**: 0.5000 - **F1 Score**: 0.3333 ## 🎯 Model Overview This VideoMAE model has been fine-tuned for **binary violence detection** in video content. The model classifies videos into two categories: - **Violent Crime** (1): Videos containing violent criminal activities - **Non-Violent Incident** (0): Videos with non-violent or normal activities The model is based on the **VideoMAE architecture** and has been specifically trained on a curated subset of the UCF Crime dataset with event-based categorization for realistic crime detection scenarios. ## 📊 Dataset & Training ### Dataset Composition **Total Videos**: 20 - **Violent Crime Videos**: 10 - **Non-Violent Incident Videos**: 10 **Class Balance**: 50.0% violent crimes **Event Distribution**: - **Arrest**: 20 videos - **Arson**: 20 videos **Data Splits**: - **Training**: 12 videos - **Validation**: 4 videos - **Test**: 4 videos ## 🎯 Performance ### Performance Metrics **Validation Performance**: - **eval_loss**: 0.5847 - **eval_accuracy**: 0.5000 - **eval_precision**: 0.2500 - **eval_recall**: 0.5000 - **eval_f1**: 0.3333 - **eval_runtime**: 0.6636 - **eval_samples_per_second**: 6.0270 - **eval_steps_per_second**: 3.0140 - **epoch**: 1.0000 **Test Performance**: - **eval_loss**: 0.6700 - **eval_accuracy**: 0.5000 - **eval_precision**: 0.2500 - **eval_recall**: 0.5000 - **eval_f1**: 0.3333 - **eval_runtime**: 0.4271 - **eval_samples_per_second**: 9.3660 - **eval_steps_per_second**: 4.6830 - **epoch**: 1.0000 **Training Information**: - **Training Time**: 0.1 minutes - **Best Accuracy Achieved**: 0.5000 - **Model Architecture**: VideoMAE Base (fine-tuned) - **Fine-tuning Approach**: Event-based binary classification ## 🚀 Training Procedure ### Training Hyperparameters The following hyperparameters were used during training: - **Learning Rate**: 5e-05 - **Train Batch Size**: 2 - **Eval Batch Size**: 2 - **Optimizer**: AdamW with betas=(0.9,0.999) and epsilon=1e-08 - **LR Scheduler Type**: Linear - **Training Epochs**: 1 - **Weight Decay**: 0.01 ### Training Results | Training Loss | Epoch | Step | Validation Loss | Accuracy | |---------------|-------|------|-----------------|----------| | 0.5 | 1.00 | N/A | 0.5847 | 0.5000 | ### Framework Versions - **Transformers**: 4.30.2+ - **PyTorch**: 2.0.1+ - **Datasets**: Latest - **Device**: Apple Silicon MPS / CUDA / CPU (Auto-detected) ## 🚀 Quick Start ### Installation ```bash pip install transformers torch torchvision opencv-python pillow ``` ### Basic Usage ```python import torch from transformers import AutoModelForVideoClassification, AutoProcessor import cv2 import numpy as np # Load model and processor model = AutoModelForVideoClassification.from_pretrained("Nikeytas/test-upload-model") processor = AutoProcessor.from_pretrained("Nikeytas/test-upload-model") # Process video def classify_video(video_path, num_frames=16): # Extract frames cap = cv2.VideoCapture(video_path) frames = [] total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT)) indices = np.linspace(0, total_frames - 1, num_frames, dtype=int) for idx in indices: cap.set(cv2.CAP_PROP_POS_FRAMES, idx) ret, frame = cap.read() if ret: frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB) frames.append(frame_rgb) cap.release() # Process with model inputs = processor(frames, return_tensors="pt") with torch.no_grad(): outputs = model(**inputs) predictions = torch.nn.functional.softmax(outputs.logits, dim=-1) predicted_class = torch.argmax(predictions, dim=-1).item() confidence = predictions[0][predicted_class].item() label = "Violent Crime" if predicted_class == 1 else "Non-Violent" return label, confidence # Example usage video_path = "path/to/your/video.mp4" prediction, confidence = classify_video(video_path) print(f"Prediction: {prediction} (Confidence: {confidence:.3f})") ``` ### Batch Processing ```python import os from pathlib import Path def process_video_directory(video_dir, output_file="results.txt"): results = [] for video_file in Path(video_dir).glob("*.mp4"): try: prediction, confidence = classify_video(str(video_file)) results.append({ "file": video_file.name, "prediction": prediction, "confidence": confidence }) print(f"✅ {video_file.name}: {prediction} ({confidence:.3f})") except Exception as e: print(f"❌ Error processing {video_file.name}: {e}") # Save results with open(output_file, "w") as f: for result in results: f.write(f"{result['file']}: {result['prediction']} ({result['confidence']:.3f})\n") return results # Process all videos in a directory results = process_video_directory("./videos/") ``` ## 📈 Technical Specifications - **Base Model**: MCG-NJU/videomae-base - **Architecture**: Vision Transformer (ViT) adapted for video - **Input Resolution**: 224x224 pixels per frame - **Temporal Resolution**: 16 frames per video clip - **Output Classes**: 2 (Binary classification) - **Training Framework**: HuggingFace Transformers - **Optimization**: AdamW optimizer with learning rate 5e-5 ## ⚠️ Limitations 1. **Dataset Scope**: Trained on a subset of UCF Crime dataset - may not generalize to all types of violence 2. **Temporal Context**: Uses 16-frame clips which may miss context in longer sequences 3. **Environmental Bias**: Performance may vary with different lighting, camera angles, and video quality 4. **False Positives**: May misclassify intense but non-violent activities (sports, action movies) 5. **Real-time Performance**: Processing time depends on hardware capabilities ## 🔒 Ethical Considerations ### Intended Use - **Primary**: Research and development in video analysis - **Secondary**: Security system enhancement with human oversight - **Educational**: Computer vision and AI safety research ### Prohibited Uses - **Surveillance without consent**: Do not use for unauthorized monitoring - **Discriminatory profiling**: Avoid bias against specific groups or communities - **Automated punishment**: Never use for automated legal or disciplinary actions - **Privacy violation**: Respect privacy laws and individual rights ### Bias and Fairness - Model trained on specific dataset that may not represent all populations - Regular evaluation needed for bias detection and mitigation - Human oversight required for critical applications - Consider demographic representation in deployment scenarios ## 📝 Model Card Information - **Developed by**: Research Team - **Model Type**: Video Classification (Binary) - **Training Data**: UCF Crime Dataset (Subset) - **Training Date**: 2025-06-08 15:19:08 UTC - **Evaluation Metrics**: Accuracy, Precision, Recall, F1-Score - **Intended Users**: Researchers, Security Professionals, Developers ## 📚 Citation If you use this model in your research, please cite: ```bibtex @misc{Nikeytas_test_upload_model, title={VideoMAE Fine-tuned for Crime Detection}, author={Research Team}, year={2024}, publisher={Hugging Face}, url={https://huggingface.co/Nikeytas/test-upload-model} } ``` ## 🤝 Contributing We welcome contributions to improve the model! Please: 1. Report issues with specific examples 2. Suggest improvements for bias reduction 3. Share evaluation results on new datasets 4. Contribute to documentation and examples ## 📞 Contact For questions, issues, or collaboration opportunities, please open an issue in the model repository or contact the development team. --- *Last updated: 2025-06-08 15:19:08 UTC* *Model version: 1.0* *Framework: HuggingFace Transformers*