Update README.md

![diagram-export-27-08-2025-19_58_02.png](https://cdn-uploads.huggingface.co/production/uploads/6883d9803cf41741e3a9f69a/XqEZybue6-StX1ayUt2kl.png)

Files changed (1) hide show

README.md +103 -3

README.md CHANGED Viewed

@@ -1,3 +1,103 @@
----
-license: mit
----

+---
+license: mit
+datasets:
+- abdallahwagih/ucf101-videos
+metrics:
+- accuracy
+base_model:
+- google/mobilenet_v2_1.0_224
+pipeline_tag: video-classification
+tags:
+- action-recognition
+- cnn-gru
+- video-classification
+- ucf101
+- action
+- mobilenetv2
+- deep-learning
+- pytorch
+---
+# Action Detection with CNN-GRU on MobileNetV2
+## Overview
+This model performs human action classification on videos using a [CNN-GRU architecture](https://arxiv.org/abs/1412.7753) built on top of **MobileNetV2 (1.0, 224)** features and trained on the [UCF101](https://www.crcv.ucf.edu/data/UCF101.php) dataset.
+It is well-suited for recognizing actions from short trimmed video clips.
+***
+## Model Details
+- **Base model:** `google/mobilenet_v2_1.0_224`
+- **Architecture:** CNN-GRU
+- **Dataset:** UCF101 - Action Recognition Dataset (https://www.kaggle.com/datasets/abdallahwagih/ucf101-videos)
+- **Task:** Video Classification (Action Recognition)
+- **Metrics:** Accuracy
+- **License:** MIT
+***
+## Usage
+### Requirements
+```bash
+pip install torch torchvision opencv-python
+```
+### Example Code
+```python
+from action_model import load_action_model, preprocess_frames, predict_action
+import cv2
+# Load model
+model = load_action_model(model_path="best_model.pt", device="cpu", num_classes=5)
+# Read frames from video
+cap = cv2.VideoCapture("path_to_video.mp4")
+frames = []
+while True:
+    ret, frame = cap.read()
+    if not ret:
+        break
+    frames.append(frame)
+cap.release()
+# Preprocess frames for model input
+clip_tensor = preprocess_frames(frames[:16], seq_len=16, resize=(112,112))
+# Predict action
+result = predict_action(model, clip_tensor, device="cpu")
+print(result)
+```
+***
+## Training & Evaluation
+- Trained on UCF101 split 1 with MobileNetV2 backbone.
+- Sequence length: 16 frames per clip.
+- Metric: Top-1 classification accuracy.
+***
+## Intended Use & Limitations
+**Intended for:**
+- Video analytics
+- Educational research
+- Baseline for video action recognition tasks
+**Limitations:**
+- Predicts only UCF101 subset classes
+- Needs short, trimmed video clips
+- Not robust to out-of-domain videos or very low-res input
+***
+## Tags
+`action` · `cnn-gru` · `video-classification` · `ucf101` · `mobilenetv2` · `deep-learning` · `torch`