ViT Human Action Recognition Model
This is a fine-tuned Vision Transformer (ViT) model for image-based multi-class human action recognition. The model predicts one of 15 common human activities such as "running", "eating", "texting", or "using a laptop" from static images.
Model Details
- Base Model: google/vit-base-patch16-224-in21k
- Developed by: Harsha Vardhan Mannem
- Task: Image Classification
- Architecture: Vision Transformer (ViT)
- Total Classes: 15
- Language: N/A (Vision Model)
- License: MIT
Supported Classes
The model predicts the following 15 activities:
callingclappingcyclingdancingdrinkingeatingfightinghugginglaughinglistening_to_musicrunningsittingsleepingtextingusing_laptop
Intended Use
Direct Use
- Predicts human activity from a single image
- Useful for research, prototypes, computer vision pipelines involving human action detection
Limitations & Risks
- Not designed for multi-label scenarios (only one action per image)
- Not intended for video recognition tasks
- Accuracy may degrade with poor lighting, extreme angles, or occlusions
- Model may reflect biases present in the training data
Quickstart Example
from transformers import pipeline
pipe = pipeline("image-classification", model="harsha90145/vit-human-pose-classification-model")
url = "https://images.pexels.com/photos/1755385/pexels-photo-1755385.jpeg"
output = pipe(url)
print(output)
Example Output:
[{'label': 'running', 'score': 0.92}]
## Evaluation Results
Test set size: 2520 samples
Class Precision Recall F1-Score Support
calling 0.66 0.69 0.67 159
clapping 0.82 0.81 0.82 191
cycling 0.94 0.92 0.93 167
dancing 0.91 0.83 0.87 155
drinking 0.79 0.82 0.81 170
eating 0.83 0.86 0.85 169
fighting 0.85 0.89 0.87 154
hugging 0.77 0.81 0.79 149
laughing 0.80 0.77 0.78 176
listening_to_music 0.75 0.65 0.70 179
running 0.80 0.88 0.84 159
sitting 0.65 0.67 0.66 166
sleeping 0.80 0.81 0.81 167
texting 0.67 0.66 0.66 179
using_laptop 0.72 0.69 0.71 180
Overall Accuracy: 78%
Macro F1 Score: 78%
Weighted F1 Score: 78%
@misc{harsha2024humanactionvit,
title={ViT-based Human Action Recognition Model},
author={Harsha Vardhan Mannem},
howpublished={\url{https://huggingface.co/harsha90145/vit-human-pose-classification-model}},
year={2024}
}
- Downloads last month
- 11
Model tree for Harsha901/vit-human-pose-classification-model
Base model
google/vit-base-patch16-224-in21k