LiAutoAD
/

Ristretto-3B

Image-Text-to-Text

feature-extraction

Model card Files Files and versions

mjl1206 commited on Mar 31, 2025

Commit

9155b42

·

verified ·

1 Parent(s): 002c2cb

Update README.md

Files changed (1) hide show

README.md +5 -5

README.md CHANGED Viewed

@@ -137,12 +137,12 @@ question = 'Hello, who are you?'
 response, history = model.chat(tokenizer, None, question, generation_config, history=None, return_history=True)
 print(f'User: {question}         Assistant: {response}')
-# text-image conversation
 question = '<image>         Please describe the image.'
 response, history = model.chat(tokenizer, pixel_values, question, generation_config, history=None, return_history=True)
 print(f'User: {question}         Assistant: {response}')
-# multi-round conversation
 question = 'What is best title for the image?'
 response, history = model.chat(tokenizer, pixel_values, question, generation_config, history=history, return_history=True)
 print(f'User: {question}         Assistant: {response}')
@@ -153,15 +153,15 @@ print(f'User: {question}         Assistant: {response}')
 | Benchmark | Qwen2.5-VL-3B | InternVL2.5-4B | Ristretto-3B |
 | :-------: | :----------: | :-------------: | :----: |
-| MMBench-TEST-avg      | 76.8 | 78.2 | - |
 | MMStar                | 56.3 | 58.7 | 62.6 |
 | MMMU-VAL              | 51.2 | 51.8 | 49.1 |
-| MathVista-mini-test   | 61.2 | 60.8 | 67.9 |
 | HallucinationBench    | 46.6 | 46.6 | 50.2 |
 | AI2D                  | 81.4 | 81.4 | 84.3 |
 | OCRBench              | 82.8 | 82.0 | 84.0 |
 | MMVet                 | 60.0 | 61.5 | 61.8 |
-| Average               | 64.5 | 65.1 | - |
 We use [VLMEvalKit](https://github.com/open-compass/VLMEvalKit) to evaluate Ristretto-3B. Other results are taken from [OpenCompass](https://rank.opencompass.org.cn/leaderboard-multimodal)

 response, history = model.chat(tokenizer, None, question, generation_config, history=None, return_history=True)
 print(f'User: {question}         Assistant: {response}')
+# text-image conversation && multi-round conversation
 question = '<image>         Please describe the image.'
 response, history = model.chat(tokenizer, pixel_values, question, generation_config, history=None, return_history=True)
 print(f'User: {question}         Assistant: {response}')
 question = 'What is best title for the image?'
 response, history = model.chat(tokenizer, pixel_values, question, generation_config, history=history, return_history=True)
 print(f'User: {question}         Assistant: {response}')
 | Benchmark | Qwen2.5-VL-3B | InternVL2.5-4B | Ristretto-3B |
 | :-------: | :----------: | :-------------: | :----: |
+| MMBench-TEST-avg      | 76.8 | 78.2 | 80.1 |
 | MMStar                | 56.3 | 58.7 | 62.6 |
 | MMMU-VAL              | 51.2 | 51.8 | 49.1 |
+| MathVista-MINI-test   | 61.2 | 60.8 | 67.9 |
 | HallucinationBench    | 46.6 | 46.6 | 50.2 |
 | AI2D                  | 81.4 | 81.4 | 84.3 |
 | OCRBench              | 82.8 | 82.0 | 84.0 |
 | MMVet                 | 60.0 | 61.5 | 61.8 |
+| Average               | 64.5 | 65.1 | 67.6 |
 We use [VLMEvalKit](https://github.com/open-compass/VLMEvalKit) to evaluate Ristretto-3B. Other results are taken from [OpenCompass](https://rank.opencompass.org.cn/leaderboard-multimodal)