---
license: apache-2.0
tags:
- vision-language
- video
- internvl
- homework
---

# InterVL-HW1

Trained and exported on 2025-10-13_11-29-14.

- Backbone: InternVLChatModel
- AMP dtype: bfloat16
- Uses video pixel_values with temporal mean-pooling in vision encoder.
- Includes training checkpoint in `checkpoints/`.

> If you trained with a monkey-patched forward, runtime weights are still standard. You can reuse them with the original InternVLChatModel codebase.