Update README.md
Browse files
README.md
CHANGED
|
@@ -32,7 +32,7 @@ This repository contains Diffusers-format model weights for **SkyworkVL-38B**, a
|
|
| 32 |
|
| 33 |
### 1. Multi-Resolution Processing
|
| 34 |
|
| 35 |
-
-
|
| 36 |
|
| 37 |
### 2. Multi-Stage Supervised Fine-Tuning (SFT)
|
| 38 |
|
|
@@ -42,11 +42,11 @@ This repository contains Diffusers-format model weights for **SkyworkVL-38B**, a
|
|
| 42 |
|
| 43 |
### 3. High-Quality Chain-of-Thought (CoT) Fine-Tuning
|
| 44 |
|
| 45 |
-
-
|
| 46 |
|
| 47 |
### 4. GRPO + Rule-Based Reward Training
|
| 48 |
|
| 49 |
-
-
|
| 50 |
|
| 51 |
## Model Introduction
|
| 52 |
|
|
@@ -58,8 +58,8 @@ This repository contains Diffusers-format model weights for **SkyworkVL-38B**, a
|
|
| 58 |
|
| 59 |
| Metric | MathVista (testmini) | MMMU (val) | AI2D (BBox) | OCRBench | MME | **RealWorldQA** | **HallusionBench** |
|
| 60 |
| --------------------------- | -------------------- | --------------- | --------------- | ------------- | -------------- | --------------- | ------------------ |
|
| 61 |
-
| Internvl2.5-38B
|
| 62 |
-
|
|
| 63 |
|
| 64 |
*The performance improvements above demonstrate notable gains in multi-disciplinary question answering, object detection (BBox), and scientific chart analysis among other benchmarks.*
|
| 65 |
|
|
|
|
| 32 |
|
| 33 |
### 1. Multi-Resolution Processing
|
| 34 |
|
| 35 |
+
- Images are processed at multiple resolutions. For each resolution, we apply Closest Aspect Ratio Matching to partition the image into tiles. Finally, the original image is resized into a tile and appended to the final representation—ensuring comprehensive image understanding.
|
| 36 |
|
| 37 |
### 2. Multi-Stage Supervised Fine-Tuning (SFT)
|
| 38 |
|
|
|
|
| 42 |
|
| 43 |
### 3. High-Quality Chain-of-Thought (CoT) Fine-Tuning
|
| 44 |
|
| 45 |
+
- Integrates high-quality CoT data including self-collected multimodal Chinese Gaokao data with detailed analysis to boost the model’s reasoning capability.
|
| 46 |
|
| 47 |
### 4. GRPO + Rule-Based Reward Training
|
| 48 |
|
| 49 |
+
- Utilizes GRPO and rule-based reward training to further refine output quality and overall performance.
|
| 50 |
|
| 51 |
## Model Introduction
|
| 52 |
|
|
|
|
| 58 |
|
| 59 |
| Metric | MathVista (testmini) | MMMU (val) | AI2D (BBox) | OCRBench | MME | **RealWorldQA** | **HallusionBench** |
|
| 60 |
| --------------------------- | -------------------- | --------------- | --------------- | ------------- | -------------- | --------------- | ------------------ |
|
| 61 |
+
| Internvl2.5-38B | 71.9 | 63.9 | 87.6 | 842 | 2455 | 73.5 | 56.8 |
|
| 62 |
+
| SkyworkVL-38B | **74.4** | **64.0** | **88.4** | **854** | **2479** | **76.9** | **58.9** |
|
| 63 |
|
| 64 |
*The performance improvements above demonstrate notable gains in multi-disciplinary question answering, object detection (BBox), and scientific chart analysis among other benchmarks.*
|
| 65 |
|