R3D-18 for UCF-101 Action Recognition

Lightweight 3D ResNet-18 fine-tuned on UCF-101. Good baseline for action recognition with fast inference.

Model Details

Model Type: 3D ResNet-18
Base Model: R3D-18 pretrained on Kinetics-400
Parameters: 33.2M
Input: 16 frames @ 112×112 resolution
Classes: 101 action categories

Performance

Metric	Value
Accuracy	83.80%
F1 Score	0.828
Precision	0.842

Comparison:

Published R3D-18: 82.8%
This model: 83.80% (+1.0%)

Usage

import torch
# Load from HuggingFace
from huggingface_hub import hf_hub_download
from torchvision.transforms import Compose, Resize, CenterCrop, Normalize, ToTensor
model_path = hf_hub_download(repo_id="dronefreak/r3d-18-ucf101", filename="r3d18-ufc101-split-1.pth")
model = torch.load(model_path)

# Prepare video (16 frames, C×T×H×W)
transform = Compose([
    Resize((128, 171)),
    CenterCrop(112),
    ToTensor(),
    Normalize(mean=[0.43216, 0.394666, 0.37645], 
              std=[0.22803, 0.22145, 0.216989])
])

# Inference
with torch.no_grad():
    output = model(video_tensor)
    prediction = output.argmax(dim=1)

Training

Dataset: UCF-101 Split 1 (9,537 train / 3,783 test videos)
Epochs: 100
Batch Size: 32
Optimizer: SGD (lr=0.001, momentum=0.9, weight_decay=1e-4)
Augmentation: ColorJitter, RandomHorizontalFlip, RandomCrop

Use Cases

✅ Best for:

Baseline comparisons
Transfer learning starting point
Educational purposes
Fast prototyping

⚠️ Consider alternatives for:

Maximum accuracy (use MC3-18: 87.05%)
Real-time inference (use spatial models)

Comparison with MC3-18

Model	Accuracy	Speed	Use Case
R3D-18	83.80%	Fast	Baseline, prototyping
MC3-18	87.05%	Moderate	Best performance

Limitations

Trained only on UCF-101 (limited to 101 action classes)
Requires 16-frame clips (not suitable for real-time single-frame)
Best performance on similar action types to UCF-101

Citation

@misc{r3d_18_ucf101,
  author = {Saumya Saksena},
  title = {R3D-18 for UCF-101 Action Recognition},
  year = {2024},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/dronefreak/r3d-18-ucf101}}
}

License

Apache-2.0

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

Video Classification

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for dronefreak/r3d-18-ucf101

A Closer Look at Spatiotemporal Convolutions for Action Recognition

Paper • 1711.11248 • Published Nov 30, 2017 • 1

Evaluation results

Top-1 Accuracy on UCF-101
test set self-reported

83.800
F1 Score on UCF-101
test set self-reported

82.770