--- license: apache-2.0 tags: - vision-language - video - internvl - homework --- # InterVL-HW1 Trained and exported on 2025-10-13_11-29-14. - Backbone: InternVLChatModel - AMP dtype: bfloat16 - Uses video pixel_values with temporal mean-pooling in vision encoder. - Includes training checkpoint in `checkpoints/`. > If you trained with a monkey-patched forward, runtime weights are still standard. You can reuse them with the original InternVLChatModel codebase.