same same
Im making a little something something similar and also don't know what I'm doing. I'm trying to map body posture in video files to detect incorrect human movement in the realm of athletics. It's like a natural human movement and possibly pre-injury detection type thing. I have a "relatively" functional base body mapping script and some good data if that's what you're aiming for and want to workshop it a bit. Just let me know.
Your pre-injury detection sounds extremely cool. What kind of data do you possess for training this? and what model are you using?
I'm looking for pre-trained pose models but cannot find anything proper with transformers. The existing ones are either using keras or diffusers or they aren't just what I'm looking for.
I have code for classification using MediaPipe and Movenet (and probably can get a classification dataset out of it) but I do not know how to make a transformer model from scratch (heard that's extra hard).
Data-wise: I have a friend who owns a fitness center specializing in natural movement and he has many hours of video data via his movement consultations and in general and its quite a diverse set (via videos of incorrect posture, correct posture, different body types, different focal points of right and wrong movement, ext... Its all well organized and labeled too).
Model: I mainly just been testing and workshopping the idea so I've messed with a couple. Ive played around a bit with the OpenPose model but I think the best base I've played with so far has been the MediaPipe Pose model which is a type of CNN, which just breaks up the video frame by frame and draws landmarks on different body focal points then returns a video with the marked body landmarks. It's a pre trained model and seems pretty solid.
I'm pretty new to the ml world though. I work as a web dev, ui's and apis and stuff so am by far no expert in machine learning. It is awesome though so I've been getting into it and making random little ml tools as a learning experience.
okay, that sounds extremely cool not gonna lie.
Yup, I've tested Mediapiepe, open pose, movement, etc and the former gives the best results, in real-time using video input as well.
I've been doing ML for some time but very new to HuggingFace. Web dev scares me a lot so Ive only messed around with the basic HTML, CSS, and JS.
For now, I think I'll find a nice dataset (COCO, MPII Human pose or Human3.6M datasets), classify images using my own code by locally applying mediapipe and math formulae, run it through the 'google/vit-base-patch16-224-in21k' vision transformer, and then go about it. Let's see if it works, else I'll pivot again.
Also if you're planning to open-source your project do keep me updated. It sounds very interesting and a lot of ML and math is involved, with the bonus of fitness, which I'm interested in.
https://huggingface.co/ronka/postureDetection
This is my model, I can train it better using better data but that'll take time, see if this is any use to you.