tsunghanwu commited on
Commit
5fc6d8e
Β·
verified Β·
1 Parent(s): 7029b89

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +55 -0
README.md ADDED
@@ -0,0 +1,55 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ ---
4
+
5
+ # REVERSE-LLaVA-MORE-8B
6
+
7
+ ## Model Summary
8
+
9
+ REVERSE-LLaVA-MORE-8B is an open-source vision-language model (VLM) that performs both next-token prediction and self-verification / self-correction during generation. It is built upon LLaVA-MORE (LLaVA with LLaMA-3.1) and fine-tuned on the REVERSE Visual Instruct 1.3M dataset. The model is equipped with a retrospective resampling mechanism that detects and corrects hallucinations on the fly. Training was conducted in early March, 2025.
10
+
11
+ ## Performance
12
+
13
+ REVERSE-LLaVA-MORE-8B delivers **strong performance gains** in hallucination reduction across multiple captioning and open-ended VQA benchmarks:
14
+
15
+ | Benchmark | Metric | Best Baseline | REVERSE (Ο„=0.003) | REVERSE (Ο„=0.0003) |
16
+ | ------------ | ----------------------------- | ---------------- | ----------------- | ------------------ |
17
+ | CHAIR-MSCOCO | CHAIR (↓) | DoLA (13.8) | 12.2 | **8.4** |
18
+ | | CHAIRs (↓) | DoLA (51.8) | 42.4 | **25.2** |
19
+ | AMBER-G | Hallucination (↓) | Woodpecker (7.4) | 6.5 | **5.1** |
20
+ | | Coverage (↑) | DoLA (53.1) | **54.8** | 38.9 |
21
+ | MMHal-Bench | Score (↑) | DoLA (2.54) | 2.28 | **2.93** |
22
+ | | Hallucination Rate (↓) | DoLA (0.51) | 0.54 | **0.40** |
23
+ | HaloQuest | Avg. Accuracy (↑) | DoLA (22.8) | 26.7 | **36.7** |
24
+ | | False Premise Acc. (↑) | DoLA (15.5) | 30.0 | **39.5** |
25
+ | | Visual Challenging Acc. (↑) | **DoLA (45.1)** | 31.3 | 30.9 |
26
+ | | Insufficient Context Acc. (↑) | DoLA (7.4) | 11.7 | **38.1** |
27
+
28
+ On discriminative tasks, REVERSE-LLaVA-MORE performs competitively with existing base VLM:
29
+
30
+ | Benchmark | Metric | LLaVA-MORE-8B | REVERSE (Ο„=0.5) |
31
+ | ------------ | ----------------------------- | ---------------- | ---------------- |
32
+ | AMBER-D | F1 Score (↑) | **71.6** | 69.3 |
33
+ | POPE | F1 Score (↑) | **85.1** | 84.4 |
34
+ | MME-Hall | Score (↑) | **678.3** | 657.6 |
35
+
36
+ ## Usage
37
+
38
+ Please refer to the installation guide on GitHub to get started:
39
+ πŸ‘‰ [Installation Guide](https://github.com/tsunghan-wu/reverse_vlm)
40
+
41
+ ## Additional Resources
42
+
43
+ - πŸ“„ Project Page: [https://reverse-vlm.github.io/](https://reverse-vlm.github.io/)
44
+ - 🧾 Dataset: [REVERSE Visual Instruct 1.3M](https://huggingface.co/datasets/tsunghanwu/reverse-instruct-1.3m)
45
+ - πŸ”§ Ask Questions: [GitHub Issues](https://github.com/tsunghan-wu/reverse_vlm/issues)
46
+
47
+ ## Intended Use
48
+
49
+ **Primary Use Cases:**
50
+ - Reducing hallucination in image captioning and open-ended VQA
51
+ - Evaluating hallucination-aware generation strategies
52
+ - Research on grounded and trustworthy multimodal reasoning
53
+
54
+ **Target Users:**
55
+ Researchers, developers, and students working on VLMs, hallucination mitigation, and vision-language alignment.