File size: 1,401 Bytes
6a267f4 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 |
---
language: en
tags:
- sentence-transformers
- retrieval
- contrastive-learning
- multimodal
- video
- rag
- faiss
- pytorch
license: apache-2.0
---
# ViD-GAN Encoder (VideoRAG Retrieval Model)
ViD-GAN is a custom-trained retrieval model designed for **video-based question answering** using a multimodal retrieval approach.
This repository contains:
- **ViD-GAN Encoder** (SentenceTransformer-based)
- **ViD-GAN Discriminator** (grounding verification module)
---
## 🚀 What This Model Does
ViD-GAN Encoder generates embeddings for:
- user questions
- video transcript chunks
- multimodal chunks (transcript + detected visual objects)
It is trained using **contrastive learning (InfoNCE)** to improve retrieval quality.
---
## 🛠️ How to Use
### Load Encoder
```python
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('nandakrishnan1311/ViD-GAN-Encoder')
```
### Encode Text
```python
emb = model.encode('In the UK, what is totally illegal?')
print(emb.shape)
```
---
## 📦 Files
- `ViD-GAN-Encoder/` : SentenceTransformer encoder
- `ViD-GAN-Discriminator.pt` : grounding discriminator
---
## ⚠️ Limitations
- Trained on a small auto-generated dataset
- Visual info is based on YOLO object labels (may include false detections)
- Intended for research and prototype use
---
## 👤 Author
Developed by **Nandakrishnan O** 🇮🇳
|