Update README.md
Browse files
README.md
CHANGED
|
@@ -89,11 +89,11 @@ See [here](https://github.com/FreedomIntelligence/ALLaVA/tree/main?tab=readme-ov
|
|
| 89 |
## 🏋️♂️ Training
|
| 90 |
|
| 91 |
### Data
|
| 92 |
-
|
| 93 |
<img src="training_datasets_by_stage.jpg" width = "640" alt="training_datasets" align=center />
|
| 94 |
-
</div>
|
| 95 |
|
| 96 |
-
ALLaVA uses
|
| 97 |
|
| 98 |
|
| 99 |
### Code
|
|
@@ -110,7 +110,7 @@ These two models share the same PT procedure. -->
|
|
| 110 |
### Hyperparameters
|
| 111 |
|
| 112 |
| Global Batch Size| ZeRO Stage| Optimizer | Max LR| Min LR | Scheduler | Weight decay |
|
| 113 |
-
| ---: | ---: |--:| ---: | ---: | ---: | ---: |
|
| 114 |
| 256 (PT) / 128 (FT) | 1| AdamW | 2e-5 | 2e-6 | CosineAnnealingWarmRestarts | 0 |
|
| 115 |
|
| 116 |
The LM backbone, projector are trainable, while the vision encoder is kept frozen.
|
|
|
|
| 89 |
## 🏋️♂️ Training
|
| 90 |
|
| 91 |
### Data
|
| 92 |
+
<div align=center>
|
| 93 |
<img src="training_datasets_by_stage.jpg" width = "640" alt="training_datasets" align=center />
|
| 94 |
+
</div>
|
| 95 |
|
| 96 |
+
ALLaVA uses 1.0M and 1.5M data for PT. and FT., respectively.
|
| 97 |
|
| 98 |
|
| 99 |
### Code
|
|
|
|
| 110 |
### Hyperparameters
|
| 111 |
|
| 112 |
| Global Batch Size| ZeRO Stage| Optimizer | Max LR| Min LR | Scheduler | Weight decay |
|
| 113 |
+
| ---: | ---: |--:| ---: | ---: | ---: | ---: |
|
| 114 |
| 256 (PT) / 128 (FT) | 1| AdamW | 2e-5 | 2e-6 | CosineAnnealingWarmRestarts | 0 |
|
| 115 |
|
| 116 |
The LM backbone, projector are trainable, while the vision encoder is kept frozen.
|