Update README.md
Browse files
README.md
CHANGED
|
@@ -179,6 +179,7 @@ Register in `vlmeval/config.py`:
|
|
| 179 |
from functools import partial
|
| 180 |
from vlmeval.vlm import InternVLChat
|
| 181 |
|
|
|
|
| 182 |
"KVL-DPO": partial(InternVLChat, model_path="amoeba04/KVL-DPO", max_new_tokens=16384, version="V2.0"),
|
| 183 |
```
|
| 184 |
|
|
@@ -187,37 +188,12 @@ Run evaluation:
|
|
| 187 |
python run.py --data MMBench_DEV_EN --model KVL-DPO --verbose
|
| 188 |
```
|
| 189 |
|
| 190 |
-
## Intended Use
|
| 191 |
|
| 192 |
-
- **Scientific Document Understanding**:
|
| 193 |
-
- **Medical Image Analysis**:
|
| 194 |
- **Visual Question Answering**: General and domain-specific VQA tasks
|
| 195 |
- **Chain-of-Thought Reasoning**: Complex visual reasoning with step-by-step explanations
|
| 196 |
-
- **Human-Aligned Responses**: Improved response quality through preference optimization
|
| 197 |
-
|
| 198 |
-
## Model Comparison
|
| 199 |
-
|
| 200 |
-
| Model | Training Method | Key Advantage |
|
| 201 |
-
|-------|----------------|---------------|
|
| 202 |
-
| KVL | SFT (4M samples) | Strong domain knowledge |
|
| 203 |
-
| KVL-DPO | SFT + DPO | Better aligned with human preferences |
|
| 204 |
-
|
| 205 |
-
## License
|
| 206 |
-
|
| 207 |
-
This model is released under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).
|
| 208 |
-
|
| 209 |
-
## Citation
|
| 210 |
-
|
| 211 |
-
If you use this model, please cite:
|
| 212 |
-
```bibtex
|
| 213 |
-
@misc{kvl-dpo,
|
| 214 |
-
title={KVL-DPO: Vision-Language Model with Direct Preference Optimization},
|
| 215 |
-
author={amoeba04},
|
| 216 |
-
year={2025},
|
| 217 |
-
publisher={Hugging Face},
|
| 218 |
-
url={https://huggingface.co/amoeba04/KVL-DPO}
|
| 219 |
-
}
|
| 220 |
-
```
|
| 221 |
|
| 222 |
## Acknowledgments
|
| 223 |
|
|
@@ -225,3 +201,7 @@ If you use this model, please cite:
|
|
| 225 |
- [ms-swift](https://github.com/modelscope/ms-swift) - Training framework
|
| 226 |
- [MMInstruction](https://huggingface.co/MMInstruction) - VLFeedback dataset
|
| 227 |
- All dataset creators for their valuable contributions
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 179 |
from functools import partial
|
| 180 |
from vlmeval.vlm import InternVLChat
|
| 181 |
|
| 182 |
+
# Add to ungrouped dict
|
| 183 |
"KVL-DPO": partial(InternVLChat, model_path="amoeba04/KVL-DPO", max_new_tokens=16384, version="V2.0"),
|
| 184 |
```
|
| 185 |
|
|
|
|
| 188 |
python run.py --data MMBench_DEV_EN --model KVL-DPO --verbose
|
| 189 |
```
|
| 190 |
|
| 191 |
+
## Intended Use
|
| 192 |
|
| 193 |
+
- **Scientific Document Understanding**: Analyzing figures, tables, and diagrams from scientific papers
|
| 194 |
+
- **Medical Image Analysis**: Radiology, pathology, and endoscopy image interpretation
|
| 195 |
- **Visual Question Answering**: General and domain-specific VQA tasks
|
| 196 |
- **Chain-of-Thought Reasoning**: Complex visual reasoning with step-by-step explanations
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 197 |
|
| 198 |
## Acknowledgments
|
| 199 |
|
|
|
|
| 201 |
- [ms-swift](https://github.com/modelscope/ms-swift) - Training framework
|
| 202 |
- [MMInstruction](https://huggingface.co/MMInstruction) - VLFeedback dataset
|
| 203 |
- All dataset creators for their valuable contributions
|
| 204 |
+
|
| 205 |
+
## License
|
| 206 |
+
|
| 207 |
+
This model is released under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).
|