This model was trained on the MSR-VTT dataset using a custom CLIP-based architecture. Now using an N-pairs margin loss for training.