Subrata132 commited on
Commit
5611633
·
verified ·
1 Parent(s): e657dcc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +48 -3
README.md CHANGED
@@ -1,3 +1,48 @@
1
- ---
2
- license: cc-by-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <p align="center">
2
+ <img src="./assets/raven_logo.png" width="100" style="margin-bottom: 0.2;"/>
3
+ <p>
4
+
5
+ <h3 align="center">
6
+ <a href="https://arxiv.org/pdf/2505.17114" style="color:#825987">
7
+ RAVEN: Query-Guided Representation Alignment for Question
8
+ Answering over Audio, Video, Embedded Sensors, and Natural Language
9
+ </a>
10
+ </h3>
11
+ <h5 align="center">
12
+ Project Page:
13
+ <a href="https://bashlab.github.io/raven_project/" style="color:#825987">
14
+ https://bashlab.github.io/raven_project/
15
+ </a>
16
+ </h5>
17
+ <p align="center">
18
+ <img src="./assets/raven_architecture.png" width="800" />
19
+ <p>
20
+
21
+ ---
22
+
23
+ ## 🛠️ Requirements and Installation
24
+ Basic Dependencies:
25
+ * Python >= 3.8
26
+ * Pytorch >= 2.2.0
27
+ * CUDA Version >= 11.8
28
+ * transformers == 4.40.0 (for reproducing paper results)
29
+ * tokenizers == 0.19.1
30
+
31
+ ```bash
32
+ cd RAVEN
33
+ pip install -r requirements.txt
34
+ pip install flash-attn==2.5.8 --no-build-isolation
35
+ pip install opencv-python==4.5.5.64
36
+ apt-get update && apt-get install ffmpeg libsm6 libxext6 -y
37
+ ```
38
+ ---
39
+
40
+ ## 🤖 Inference
41
+ - **STEP 1:** Download $\texttt{siglip-so400m-patch14-384}$ from here [google/siglip-so400m-patch14-384](https://huggingface.co/google/siglip-so400m-patch14-384)
42
+ - **STEP 2:** Download **RAVEN** checkpoint
43
+ ```bash
44
+ CUDA_VISIBLE_DEVICES=0 python inference.py --model-path=<MODEL PATH> --modal-type=<MODAL TYPE>
45
+ ```
46
+
47
+ ## 👍 Acknowledgement
48
+ The codebase of RAVEN is adapted from [**VideoLLaMA2**](https://github.com/DAMO-NLP-SG/VideoLLaMA2). We are also grateful for their contribution.