ViD-GAN-Encoder / README.md
nandakrishnan1311's picture
Upload ViD-GAN Encoder + Discriminator + README
6a267f4
metadata
language: en
tags:
  - sentence-transformers
  - retrieval
  - contrastive-learning
  - multimodal
  - video
  - rag
  - faiss
  - pytorch
license: apache-2.0

ViD-GAN Encoder (VideoRAG Retrieval Model)

ViD-GAN is a custom-trained retrieval model designed for video-based question answering using a multimodal retrieval approach.

This repository contains:

  • ViD-GAN Encoder (SentenceTransformer-based)
  • ViD-GAN Discriminator (grounding verification module)

๐Ÿš€ What This Model Does

ViD-GAN Encoder generates embeddings for:

  • user questions
  • video transcript chunks
  • multimodal chunks (transcript + detected visual objects)

It is trained using contrastive learning (InfoNCE) to improve retrieval quality.


๐Ÿ› ๏ธ How to Use

Load Encoder

from sentence_transformers import SentenceTransformer
model = SentenceTransformer('nandakrishnan1311/ViD-GAN-Encoder')

Encode Text

emb = model.encode('In the UK, what is totally illegal?')
print(emb.shape)

๐Ÿ“ฆ Files

  • ViD-GAN-Encoder/ : SentenceTransformer encoder
  • ViD-GAN-Discriminator.pt : grounding discriminator

โš ๏ธ Limitations

  • Trained on a small auto-generated dataset
  • Visual info is based on YOLO object labels (may include false detections)
  • Intended for research and prototype use

๐Ÿ‘ค Author

Developed by Nandakrishnan O ๐Ÿ‡ฎ๐Ÿ‡ณ