--- language: en tags: - sentence-transformers - retrieval - contrastive-learning - multimodal - video - rag - faiss - pytorch license: apache-2.0 --- # ViD-GAN Encoder (VideoRAG Retrieval Model) ViD-GAN is a custom-trained retrieval model designed for **video-based question answering** using a multimodal retrieval approach. This repository contains: - **ViD-GAN Encoder** (SentenceTransformer-based) - **ViD-GAN Discriminator** (grounding verification module) --- ## 🚀 What This Model Does ViD-GAN Encoder generates embeddings for: - user questions - video transcript chunks - multimodal chunks (transcript + detected visual objects) It is trained using **contrastive learning (InfoNCE)** to improve retrieval quality. --- ## 🛠️ How to Use ### Load Encoder ```python from sentence_transformers import SentenceTransformer model = SentenceTransformer('nandakrishnan1311/ViD-GAN-Encoder') ``` ### Encode Text ```python emb = model.encode('In the UK, what is totally illegal?') print(emb.shape) ``` --- ## 📦 Files - `ViD-GAN-Encoder/` : SentenceTransformer encoder - `ViD-GAN-Discriminator.pt` : grounding discriminator --- ## ⚠️ Limitations - Trained on a small auto-generated dataset - Visual info is based on YOLO object labels (may include false detections) - Intended for research and prototype use --- ## 👤 Author Developed by **Nandakrishnan O** 🇮🇳