File size: 1,401 Bytes
6a267f4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
---
language: en
tags:
- sentence-transformers
- retrieval
- contrastive-learning
- multimodal
- video
- rag
- faiss
- pytorch
license: apache-2.0
---

# ViD-GAN Encoder (VideoRAG Retrieval Model)

ViD-GAN is a custom-trained retrieval model designed for **video-based question answering** using a multimodal retrieval approach.

This repository contains:
- **ViD-GAN Encoder** (SentenceTransformer-based)
- **ViD-GAN Discriminator** (grounding verification module)

---

## 🚀 What This Model Does

ViD-GAN Encoder generates embeddings for:
- user questions
- video transcript chunks
- multimodal chunks (transcript + detected visual objects)

It is trained using **contrastive learning (InfoNCE)** to improve retrieval quality.

---

## 🛠️ How to Use

### Load Encoder

```python
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('nandakrishnan1311/ViD-GAN-Encoder')
```

### Encode Text

```python
emb = model.encode('In the UK, what is totally illegal?')
print(emb.shape)
```

---

## 📦 Files

- `ViD-GAN-Encoder/` : SentenceTransformer encoder
- `ViD-GAN-Discriminator.pt` : grounding discriminator

---

## ⚠️ Limitations

- Trained on a small auto-generated dataset
- Visual info is based on YOLO object labels (may include false detections)
- Intended for research and prototype use

---

## 👤 Author

Developed by **Nandakrishnan O** 🇮🇳