metadata
language: en
tags:
- sentence-transformers
- retrieval
- contrastive-learning
- multimodal
- video
- rag
- faiss
- pytorch
license: apache-2.0
ViD-GAN Encoder (VideoRAG Retrieval Model)
ViD-GAN is a custom-trained retrieval model designed for video-based question answering using a multimodal retrieval approach.
This repository contains:
- ViD-GAN Encoder (SentenceTransformer-based)
- ViD-GAN Discriminator (grounding verification module)
๐ What This Model Does
ViD-GAN Encoder generates embeddings for:
- user questions
- video transcript chunks
- multimodal chunks (transcript + detected visual objects)
It is trained using contrastive learning (InfoNCE) to improve retrieval quality.
๐ ๏ธ How to Use
Load Encoder
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('nandakrishnan1311/ViD-GAN-Encoder')
Encode Text
emb = model.encode('In the UK, what is totally illegal?')
print(emb.shape)
๐ฆ Files
ViD-GAN-Encoder/: SentenceTransformer encoderViD-GAN-Discriminator.pt: grounding discriminator
โ ๏ธ Limitations
- Trained on a small auto-generated dataset
- Visual info is based on YOLO object labels (may include false detections)
- Intended for research and prototype use
๐ค Author
Developed by Nandakrishnan O ๐ฎ๐ณ