Mobile-VideoGPT QVED Finetuned Model
This model is a finetuned version of Amshaker/Mobile-VideoGPT-0.5B on the QVED (Qualitative Video-based Exercise Dataset) for physiotherapy exercise assessment.
Model Description
- Base Model: Amshaker/Mobile-VideoGPT-0.5B
- Architecture: Mobile-VideoGPT with LoRA adapters
- Vision Encoder: VideoMamba + CLIP
- Task: Video-based exercise quality assessment and feedback generation
- Dataset: QVED (Physiotherapy Exercise Videos)
- Training Type: Fresh training with randomly initialized LoRA adapters
Training Details
Hyperparameters
- Epochs: 3
- Learning Rate: 0.0002
- MM Projector LR: 0.0001
- LoRA Rank: 64
- LoRA Alpha: 128
- Batch Size: 8
- Gradient Accumulation Steps: 8
- Effective Batch Size: 64
- Max Sequence Length: 2048
- Weight Decay: 0.0
- Warmup Ratio: 0.05
- LR Scheduler: Cosine
- Precision: bfloat16 + TF32
- Gradient Checkpointing: Enabled
Training Infrastructure
- Framework: DeepSpeed with ZeRO-2
- Mixed Precision: bfloat16 + TF32
- Optimization: LoRA (Low-Rank Adaptation) {f"- Training Strategy: Continued training from existing LoRA checkpoint ({hyperparams.get('lora_checkpoint')})" if hyperparams.get('lora_checkpoint') else "- Training Strategy: Fresh training with randomly initialized LoRA adapters"}
Dataset Splits
- Train: 8107 samples
- Validation: 2702 samples
- Test: 2704 samples
- Total: 13513 samples
Training Configuration
- Vision Tower: OpenGVLab/VideoMamba
- Image Vision Tower: openai/clip-vit-base-patch16
- Projector Type: ETP (Efficient Token Projection)
- Frames per Video: 4 frames selected via TopK
- Image Aspect Ratio: Pad
- Group by Modality Length: Enabled
Usage
Loading the Model
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch
# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
"Amshaker/Mobile-VideoGPT-0.5B",
torch_dtype=torch.bfloat16,
device_map="auto"
)
# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "EdgeVLM-Labs/mobile-videogpt-qved-finetune-20260317_124726")
tokenizer = AutoTokenizer.from_pretrained("EdgeVLM-Labs/mobile-videogpt-qved-finetune-20260317_124726")
Running Inference
# Prepare video input
video_path = "path/to/exercise_video.mp4"
prompt = "Analyze this physiotherapy exercise video and provide feedback."
# Generate response
response = model.generate(
video_path=video_path,
prompt=prompt,
max_new_tokens=512
)
print(response)
Using the Inference Script
python utils/test_inference.py \
--model_path EdgeVLM-Labs/mobile-videogpt-qved-finetune-20260317_124726 \
--video_path sample_videos/exercise.mp4 \
--prompt "Evaluate this exercise" \
--max_new_tokens 512
Evaluation Metrics
The model is evaluated on:
- BERT Similarity: Semantic similarity between generated and ground truth descriptions
- METEOR Score: Translation quality metric for generated text
- ROUGE-L Score: Longest common subsequence based similarity
- Exercise Identification: Accuracy in identifying the correct exercise type
Intended Use
This model is designed for:
- Automated physiotherapy exercise assessment
- Generating feedback on exercise form and technique
- Identifying exercise types from video
- Educational and research purposes in healthcare AI
Limitations
- Trained on a limited dataset (13513 samples)
- Performance may vary on exercises not seen during training
- Should not replace professional medical advice
- Video quality and angle significantly affect performance
Training Procedure
This model was finetuned using:
- Dataset Preparation: QVED videos with quality filtering and optional augmentation
- LoRA Finetuning: Efficient parameter-efficient finetuning of Mobile-VideoGPT-0.5B
- Validation: Continuous evaluation on validation set during training
- Metrics Tracking: WandB integration for experiment tracking
Citation
If you use this model, please cite:
@misc{mobile-videogpt-qved-finetune,
author = {EdgeVLM Labs},
title = {Mobile-VideoGPT QVED Finetuned Model},
year = {2025},
publisher = {HuggingFace},
url = {https://huggingface.co/EdgeVLM-Labs/mobile-videogpt-qved-finetune-20260317_124726}
}
Model Card Authors
EdgeVLM Labs
Model Card Contact
For questions or feedback, please open an issue in the model repository.
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support