BASH-Lab
/

RAVEN-AV-7B

Video-Text-to-Text

text-generation

Model card Files Files and versions

Subrata132 commited on Aug 12, 2025

Commit

5611633

·

verified ·

1 Parent(s): e657dcc

Update README.md

Files changed (1) hide show

README.md +48 -3

README.md CHANGED Viewed

@@ -1,3 +1,48 @@
----
-license: cc-by-4.0
----

+<p align="center">
+    <img src="./assets/raven_logo.png" width="100" style="margin-bottom: 0.2;"/>
+<p>
+<h3 align="center">
+    <a href="https://arxiv.org/pdf/2505.17114" style="color:#825987">
+        RAVEN: Query-Guided Representation Alignment for Question
+        Answering over Audio, Video, Embedded Sensors, and Natural Language
+    </a>
+</h3>
+<h5 align="center">
+    Project Page:
+    <a href="https://bashlab.github.io/raven_project/" style="color:#825987">
+        https://bashlab.github.io/raven_project/
+    </a>
+</h5>
+<p align="center">
+  <img src="./assets/raven_architecture.png" width="800" />
+<p>
+---
+## 🛠️ Requirements and Installation
+Basic Dependencies:
+* Python >= 3.8
+* Pytorch >= 2.2.0
+* CUDA Version >= 11.8
+* transformers == 4.40.0 (for reproducing paper results)
+* tokenizers == 0.19.1
+```bash
+cd RAVEN
+pip install -r requirements.txt
+pip install flash-attn==2.5.8 --no-build-isolation
+pip install opencv-python==4.5.5.64
+apt-get update && apt-get install ffmpeg libsm6 libxext6  -y
+```
+---
+## 🤖 Inference
+- **STEP 1:** Download $\texttt{siglip-so400m-patch14-384}$ from here [google/siglip-so400m-patch14-384](https://huggingface.co/google/siglip-so400m-patch14-384)
+- **STEP 2:** Download **RAVEN** checkpoint
+```bash
+CUDA_VISIBLE_DEVICES=0 python inference.py --model-path=<MODEL PATH> --modal-type=<MODAL TYPE>
+```
+## 👍 Acknowledgement
+The codebase of RAVEN is adapted from [**VideoLLaMA2**](https://github.com/DAMO-NLP-SG/VideoLLaMA2). We are also grateful for their contribution.