tsunghanwu
/

reverse_llava_more

Safetensors

llava_llama

Model card Files Files and versions

xet

Community

tsunghanwu commited on Apr 17, 2025

Commit

5fc6d8e

verified ·

1 Parent(s): 7029b89

Create README.md

Browse files

Files changed (1) hide show

README.md +55 -0

README.md ADDED Viewed

	@@ -0,0 +1,55 @@

+---
+license: mit
+---
+# REVERSE-LLaVA-MORE-8B
+## Model Summary
+REVERSE-LLaVA-MORE-8B is an open-source vision-language model (VLM) that performs both next-token prediction and self-verification / self-correction during generation. It is built upon LLaVA-MORE (LLaVA with LLaMA-3.1) and fine-tuned on the REVERSE Visual Instruct 1.3M dataset. The model is equipped with a retrospective resampling mechanism that detects and corrects hallucinations on the fly. Training was conducted in early March, 2025.
+## Performance
+REVERSE-LLaVA-MORE-8B delivers **strong performance gains** in hallucination reduction across multiple captioning and open-ended VQA benchmarks:
+| Benchmark    | Metric                        | Best Baseline    | REVERSE (τ=0.003) | REVERSE (τ=0.0003) |
+| ------------ | ----------------------------- | ---------------- | ----------------- | ------------------ |
+| CHAIR-MSCOCO | CHAIR (↓)                     | DoLA (13.8)      | 12.2              | **8.4**            |
+|              | CHAIRs (↓)                    | DoLA (51.8)      | 42.4              | **25.2**           |
+| AMBER-G      | Hallucination (↓)             | Woodpecker (7.4) | 6.5               | **5.1**            |
+|              | Coverage (↑)                  | DoLA (53.1)      | **54.8**          | 38.9               |
+| MMHal-Bench  | Score (↑)                     | DoLA (2.54)      | 2.28              | **2.93**           |
+|              | Hallucination Rate (↓)        | DoLA (0.51)      | 0.54              | **0.40**           |
+| HaloQuest    | Avg. Accuracy (↑)             | DoLA (22.8)      | 26.7              | **36.7**           |
+|              | False Premise Acc. (↑)        | DoLA (15.5)      | 30.0              | **39.5**           |
+|              | Visual Challenging Acc. (↑)   | **DoLA (45.1)**  | 31.3              | 30.9               |
+|              | Insufficient Context Acc. (↑) | DoLA (7.4)       | 11.7              | **38.1**           |
+On discriminative tasks, REVERSE-LLaVA-MORE performs competitively with existing base VLM:
+| Benchmark    | Metric                        | LLaVA-MORE-8B    | REVERSE (τ=0.5) |
+| ------------ | ----------------------------- | ---------------- | ---------------- |
+| AMBER-D      | F1 Score (↑)                  | **71.6**         | 69.3             |
+| POPE         | F1 Score (↑)                  | **85.1**         | 84.4             |
+| MME-Hall     | Score (↑)                     | **678.3**        | 657.6            |
+## Usage
+Please refer to the installation guide on GitHub to get started:
+👉 [Installation Guide](https://github.com/tsunghan-wu/reverse_vlm)
+## Additional Resources
+- 📄 Project Page: [https://reverse-vlm.github.io/](https://reverse-vlm.github.io/)
+- 🧾 Dataset: [REVERSE Visual Instruct 1.3M](https://huggingface.co/datasets/tsunghanwu/reverse-instruct-1.3m)
+- 🔧 Ask Questions: [GitHub Issues](https://github.com/tsunghan-wu/reverse_vlm/issues)
+## Intended Use
+**Primary Use Cases:**
+- Reducing hallucination in image captioning and open-ended VQA
+- Evaluating hallucination-aware generation strategies
+- Research on grounded and trustworthy multimodal reasoning
+**Target Users:**
+Researchers, developers, and students working on VLMs, hallucination mitigation, and vision-language alignment.