ViD-GAN Encoder (VideoRAG Retrieval Model)

ViD-GAN is a custom-trained retrieval model designed for video-based question answering using a multimodal retrieval approach.

This repository contains:

  • ViD-GAN Encoder (SentenceTransformer-based)
  • ViD-GAN Discriminator (grounding verification module)

๐Ÿš€ What This Model Does

ViD-GAN Encoder generates embeddings for:

  • user questions
  • video transcript chunks
  • multimodal chunks (transcript + detected visual objects)

It is trained using contrastive learning (InfoNCE) to improve retrieval quality.


๐Ÿ› ๏ธ How to Use

Load Encoder

from sentence_transformers import SentenceTransformer
model = SentenceTransformer('nandakrishnan1311/ViD-GAN-Encoder')

Encode Text

emb = model.encode('In the UK, what is totally illegal?')
print(emb.shape)

๐Ÿ“ฆ Files

  • ViD-GAN-Encoder/ : SentenceTransformer encoder
  • ViD-GAN-Discriminator.pt : grounding discriminator

โš ๏ธ Limitations

  • Trained on a small auto-generated dataset
  • Visual info is based on YOLO object labels (may include false detections)
  • Intended for research and prototype use

๐Ÿ‘ค Author

Developed by Nandakrishnan O ๐Ÿ‡ฎ๐Ÿ‡ณ

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support