| --- |
| license: mit |
| datasets: |
| - abdallahwagih/ucf101-videos |
| metrics: |
| - accuracy |
| base_model: |
| - google/mobilenet_v2_1.0_224 |
| pipeline_tag: video-classification |
|
|
| tags: |
| - action-recognition |
| - cnn-gru |
| - video-classification |
| - ucf101 |
| - action |
| - mobilenetv2 |
| - deep-learning |
| - pytorch |
| --- |
| |
| # Action Detection with CNN-GRU on MobileNetV2 |
|
|
| ## Overview |
|
|
| This model performs human action classification on videos using a CNN-GRU architecture built on top of **MobileNetV2 (1.0, 224)** features and trained on the [UCF101](https://www.kaggle.com/datasets/abdallahwagih/ucf101-videos) dataset. |
| It is well-suited for recognizing actions from short trimmed video clips. |
|
|
| *** |
| |
| ## Model Details |
| |
| - **Base model:** `google/mobilenet_v2_1.0_224` |
| - **Architecture:** CNN-GRU |
|
|
|  |
|
|
| - **Dataset:** UCF101 - Action Recognition Dataset (https://www.kaggle.com/datasets/abdallahwagih/ucf101-videos) |
| - **Task:** Video Classification (Action Recognition) |
| - **Metrics:** Accuracy |
| - **License:** MIT |
|
|
| *** |
| |
| ## Usage |
| |
| ### Requirements |
| |
| ```bash |
| pip install torch torchvision opencv-python |
| ``` |
| |
| ### Example Code |
| |
| ```python |
| from action_model import load_action_model, preprocess_frames, predict_action |
| import cv2 |
| |
| # Load model |
| model = load_action_model(model_path="best_model.pt", device="cpu", num_classes=5) |
| |
| # Read frames from video |
| cap = cv2.VideoCapture("path_to_video.mp4") |
| frames = [] |
| while True: |
| ret, frame = cap.read() |
| if not ret: |
| break |
| frames.append(frame) |
| cap.release() |
| |
| # Preprocess frames for model input |
| clip_tensor = preprocess_frames(frames[:16], seq_len=16, resize=(112,112)) |
| |
| # Predict action |
| result = predict_action(model, clip_tensor, device="cpu") |
| print(result) |
| ``` |
| |
| *** |
| |
| ## Training & Evaluation |
| |
| - Trained on UCF101 split 1 with MobileNetV2 backbone. |
| - Sequence length: 16 frames per clip. |
| - Metric: Top-1 classification accuracy. |
| |
| *** |
| |
| ## Intended Use & Limitations |
| |
| **Intended for:** |
| - Video analytics |
| - Educational research |
| - Baseline for video action recognition tasks |
| |
| **Limitations:** |
| - Predicts only UCF101 subset classes |
| - Needs short, trimmed video clips |
| - Not robust to out-of-domain videos or very low-res input |
| |
| *** |
| |
| ## Tags |
| |
| `action` 路 `cnn-gru` 路 `video-classification` 路 `ucf101` 路 `mobilenetv2` 路 `deep-learning` 路 `torch` |