metadata
license: apache-2.0
tags:
- vision-language
- video
- internvl
- homework
InterVL-HW1
Trained and exported on 2025-10-13_11-29-14.
- Backbone: InternVLChatModel
- AMP dtype: bfloat16
- Uses video pixel_values with temporal mean-pooling in vision encoder.
- Includes training checkpoint in
checkpoints/.
If you trained with a monkey-patched forward, runtime weights are still standard. You can reuse them with the original InternVLChatModel codebase.