ViD-GAN-Encoder / README.md
nandakrishnan1311's picture
Upload ViD-GAN Encoder + Discriminator + README
6a267f4
---
language: en
tags:
- sentence-transformers
- retrieval
- contrastive-learning
- multimodal
- video
- rag
- faiss
- pytorch
license: apache-2.0
---
# ViD-GAN Encoder (VideoRAG Retrieval Model)
ViD-GAN is a custom-trained retrieval model designed for **video-based question answering** using a multimodal retrieval approach.
This repository contains:
- **ViD-GAN Encoder** (SentenceTransformer-based)
- **ViD-GAN Discriminator** (grounding verification module)
---
## ๐Ÿš€ What This Model Does
ViD-GAN Encoder generates embeddings for:
- user questions
- video transcript chunks
- multimodal chunks (transcript + detected visual objects)
It is trained using **contrastive learning (InfoNCE)** to improve retrieval quality.
---
## ๐Ÿ› ๏ธ How to Use
### Load Encoder
```python
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('nandakrishnan1311/ViD-GAN-Encoder')
```
### Encode Text
```python
emb = model.encode('In the UK, what is totally illegal?')
print(emb.shape)
```
---
## ๐Ÿ“ฆ Files
- `ViD-GAN-Encoder/` : SentenceTransformer encoder
- `ViD-GAN-Discriminator.pt` : grounding discriminator
---
## โš ๏ธ Limitations
- Trained on a small auto-generated dataset
- Visual info is based on YOLO object labels (may include false detections)
- Intended for research and prototype use
---
## ๐Ÿ‘ค Author
Developed by **Nandakrishnan O** ๐Ÿ‡ฎ๐Ÿ‡ณ