Mobile-VideoGPT QVED Finetuned Model

This model is a finetuned version of Amshaker/Mobile-VideoGPT-0.5B on the QVED (Qualitative Video-based Exercise Dataset) for physiotherapy exercise assessment.

Model Description

  • Base Model: Amshaker/Mobile-VideoGPT-0.5B
  • Architecture: Mobile-VideoGPT with LoRA adapters
  • Vision Encoder: VideoMamba + CLIP
  • Task: Video-based exercise quality assessment and feedback generation
  • Dataset: QVED (Physiotherapy Exercise Videos)
  • Training Type: Fresh training with randomly initialized LoRA adapters

Training Details

Hyperparameters

  • Epochs: 3
  • Learning Rate: 0.0002
  • MM Projector LR: 0.0001
  • LoRA Rank: 64
  • LoRA Alpha: 128
  • Batch Size: 8
  • Gradient Accumulation Steps: 8
  • Effective Batch Size: 64
  • Max Sequence Length: 2048
  • Weight Decay: 0.0
  • Warmup Ratio: 0.05
  • LR Scheduler: Cosine
  • Precision: bfloat16 + TF32
  • Gradient Checkpointing: Enabled

Training Infrastructure

  • Framework: DeepSpeed with ZeRO-2
  • Mixed Precision: bfloat16 + TF32
  • Optimization: LoRA (Low-Rank Adaptation) {f"- Training Strategy: Continued training from existing LoRA checkpoint ({hyperparams.get('lora_checkpoint')})" if hyperparams.get('lora_checkpoint') else "- Training Strategy: Fresh training with randomly initialized LoRA adapters"}

Dataset Splits

  • Train: 8107 samples
  • Validation: 2702 samples
  • Test: 2704 samples
  • Total: 13513 samples

Training Configuration

  • Vision Tower: OpenGVLab/VideoMamba
  • Image Vision Tower: openai/clip-vit-base-patch16
  • Projector Type: ETP (Efficient Token Projection)
  • Frames per Video: 4 frames selected via TopK
  • Image Aspect Ratio: Pad
  • Group by Modality Length: Enabled

Usage

Loading the Model

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch

# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
    "Amshaker/Mobile-VideoGPT-0.5B",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "EdgeVLM-Labs/mobile-videogpt-qved-finetune-20260317_124726")
tokenizer = AutoTokenizer.from_pretrained("EdgeVLM-Labs/mobile-videogpt-qved-finetune-20260317_124726")

Running Inference

# Prepare video input
video_path = "path/to/exercise_video.mp4"
prompt = "Analyze this physiotherapy exercise video and provide feedback."

# Generate response
response = model.generate(
    video_path=video_path,
    prompt=prompt,
    max_new_tokens=512
)

print(response)

Using the Inference Script

python utils/test_inference.py \
    --model_path EdgeVLM-Labs/mobile-videogpt-qved-finetune-20260317_124726 \
    --video_path sample_videos/exercise.mp4 \
    --prompt "Evaluate this exercise" \
    --max_new_tokens 512

Evaluation Metrics

The model is evaluated on:

  • BERT Similarity: Semantic similarity between generated and ground truth descriptions
  • METEOR Score: Translation quality metric for generated text
  • ROUGE-L Score: Longest common subsequence based similarity
  • Exercise Identification: Accuracy in identifying the correct exercise type

Intended Use

This model is designed for:

  • Automated physiotherapy exercise assessment
  • Generating feedback on exercise form and technique
  • Identifying exercise types from video
  • Educational and research purposes in healthcare AI

Limitations

  • Trained on a limited dataset (13513 samples)
  • Performance may vary on exercises not seen during training
  • Should not replace professional medical advice
  • Video quality and angle significantly affect performance

Training Procedure

This model was finetuned using:

  1. Dataset Preparation: QVED videos with quality filtering and optional augmentation
  2. LoRA Finetuning: Efficient parameter-efficient finetuning of Mobile-VideoGPT-0.5B
  3. Validation: Continuous evaluation on validation set during training
  4. Metrics Tracking: WandB integration for experiment tracking

Citation

If you use this model, please cite:

@misc{mobile-videogpt-qved-finetune,
  author = {EdgeVLM Labs},
  title = {Mobile-VideoGPT QVED Finetuned Model},
  year = {2025},
  publisher = {HuggingFace},
  url = {https://huggingface.co/EdgeVLM-Labs/mobile-videogpt-qved-finetune-20260317_124726}
}

Model Card Authors

EdgeVLM Labs

Model Card Contact

For questions or feedback, please open an issue in the model repository.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support