jiangbop commited on
Commit
c8fc398
·
verified ·
1 Parent(s): ecb19a4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -5
README.md CHANGED
@@ -32,7 +32,7 @@ This repository contains Diffusers-format model weights for **SkyworkVL-38B**, a
32
 
33
  ### 1. Multi-Resolution Processing
34
 
35
- - **Innovative Image Tiling:** Images are processed at multiple resolutions. For each resolution, we apply Closest Aspect Ratio Matching to partition the image into tiles. Finally, the original image is resized into a tile and appended to the final representation—ensuring comprehensive image understanding.
36
 
37
  ### 2. Multi-Stage Supervised Fine-Tuning (SFT)
38
 
@@ -42,11 +42,11 @@ This repository contains Diffusers-format model weights for **SkyworkVL-38B**, a
42
 
43
  ### 3. High-Quality Chain-of-Thought (CoT) Fine-Tuning
44
 
45
- - **Enhanced Reasoning:** Integrates high-quality CoT data including self-collected multimodal Chinese Gaokao data with detailed analysis to boost the model’s reasoning capability.
46
 
47
  ### 4. GRPO + Rule-Based Reward Training
48
 
49
- - **Performance Boost:** Utilizes GRPO and rule-based reward training to further refine output quality and overall performance.
50
 
51
  ## Model Introduction
52
 
@@ -58,8 +58,8 @@ This repository contains Diffusers-format model weights for **SkyworkVL-38B**, a
58
 
59
  | Metric | MathVista (testmini) | MMMU (val) | AI2D (BBox) | OCRBench | MME | **RealWorldQA** | **HallusionBench** |
60
  | --------------------------- | -------------------- | --------------- | --------------- | ------------- | -------------- | --------------- | ------------------ |
61
- | Internvl2.5-38B (官方) | 71.9 | 63.9 | 87.6 | 842 | 2455 | 73.5 | 56.8 |
62
- | **SkyworkVL-38B (Current)** | **74.4 (+2.5)** | **64.0 (+0.1)** | **88.4 (+0.8)** | **854 (+12)** | **2479 (+24)** | **76.9 (+3.4)** | **58.9 (+2.1)** |
63
 
64
  *The performance improvements above demonstrate notable gains in multi-disciplinary question answering, object detection (BBox), and scientific chart analysis among other benchmarks.*
65
 
 
32
 
33
  ### 1. Multi-Resolution Processing
34
 
35
+ - Images are processed at multiple resolutions. For each resolution, we apply Closest Aspect Ratio Matching to partition the image into tiles. Finally, the original image is resized into a tile and appended to the final representation—ensuring comprehensive image understanding.
36
 
37
  ### 2. Multi-Stage Supervised Fine-Tuning (SFT)
38
 
 
42
 
43
  ### 3. High-Quality Chain-of-Thought (CoT) Fine-Tuning
44
 
45
+ - Integrates high-quality CoT data including self-collected multimodal Chinese Gaokao data with detailed analysis to boost the model’s reasoning capability.
46
 
47
  ### 4. GRPO + Rule-Based Reward Training
48
 
49
+ - Utilizes GRPO and rule-based reward training to further refine output quality and overall performance.
50
 
51
  ## Model Introduction
52
 
 
58
 
59
  | Metric | MathVista (testmini) | MMMU (val) | AI2D (BBox) | OCRBench | MME | **RealWorldQA** | **HallusionBench** |
60
  | --------------------------- | -------------------- | --------------- | --------------- | ------------- | -------------- | --------------- | ------------------ |
61
+ | Internvl2.5-38B | 71.9 | 63.9 | 87.6 | 842 | 2455 | 73.5 | 56.8 |
62
+ | SkyworkVL-38B | **74.4** | **64.0** | **88.4** | **854** | **2479** | **76.9** | **58.9** |
63
 
64
  *The performance improvements above demonstrate notable gains in multi-disciplinary question answering, object detection (BBox), and scientific chart analysis among other benchmarks.*
65