Update README.md
Browse files
README.md
CHANGED
|
@@ -24,7 +24,7 @@ A compact vision language model that you can pretrain and finetune on a single c
|
|
| 24 |
* 08/17/2025: improved **VQAv2** average dev-test score from **44.01%** to **56.91%** by upgrading the vision tower from SigLip to SigLip2.
|
| 25 |
* 08/09/2025: initial version of MicroLlava released
|
| 26 |
|
| 27 |
-
##
|
| 28 |
|
| 29 |
| Item | Detail |
|
| 30 |
|-----------------|--------|
|
|
@@ -55,7 +55,7 @@ Supervised finetuning on all datasets from the TinyLLaVA Factory guide (except `
|
|
| 55 |
|
| 56 |
---
|
| 57 |
|
| 58 |
-
## Quick start
|
| 59 |
|
| 60 |
```python
|
| 61 |
from transformers import AutoTokenizer, AutoProcessor, AutoModelForCausalLM
|
|
@@ -83,7 +83,7 @@ output_ids = model.generate(**inputs, max_new_tokens=64)
|
|
| 83 |
print(tokenizer.decode(output_ids[0], skip_special_tokens=True))
|
| 84 |
```
|
| 85 |
|
| 86 |
-
## Evaluation
|
| 87 |
|
| 88 |
### VQAv2 Evaluation Results (MicroLlama 300M + Siglip2-so400m-patch4-384)
|
| 89 |
|
|
@@ -113,7 +113,7 @@ Community contributions with benchmark results are welcome and encouraged.
|
|
| 113 |
|
| 114 |
---
|
| 115 |
|
| 116 |
-
## Intended uses and limitations
|
| 117 |
|
| 118 |
**Intended uses**
|
| 119 |
- Rapid experimentation for vision-language research on limited hardware
|
|
@@ -129,19 +129,7 @@ Community contributions with benchmark results are welcome and encouraged.
|
|
| 129 |
|
| 130 |
---
|
| 131 |
|
| 132 |
-
##
|
| 133 |
-
|
| 134 |
-
To reproduce results and training runs:
|
| 135 |
-
|
| 136 |
-
1. Fix all random seeds in training scripts
|
| 137 |
-
2. Record exact dataset versions and any filtering applied
|
| 138 |
-
3. Log optimizer type, learning rate schedule, precision settings, and gradient accumulation steps
|
| 139 |
-
4. Save the exact TinyLLaVA Factory commit or fork commit used for both pretraining and finetuning
|
| 140 |
-
5. Document hardware and software versions (CUDA, PyTorch, etc.)
|
| 141 |
-
|
| 142 |
-
---
|
| 143 |
-
|
| 144 |
-
## Citation
|
| 145 |
|
| 146 |
```bibtex
|
| 147 |
@misc{wang2024microllama,
|
|
@@ -152,7 +140,7 @@ To reproduce results and training runs:
|
|
| 152 |
}
|
| 153 |
```
|
| 154 |
|
| 155 |
-
## License
|
| 156 |
|
| 157 |
This model is released under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0).
|
| 158 |
|
|
@@ -163,7 +151,7 @@ If you use this model in your research or applications, please credit the origin
|
|
| 163 |
|
| 164 |
---
|
| 165 |
|
| 166 |
-
## Acknowledgements
|
| 167 |
|
| 168 |
This work builds upon the efforts of many in the open-source AI community:
|
| 169 |
|
|
|
|
| 24 |
* 08/17/2025: improved **VQAv2** average dev-test score from **44.01%** to **56.91%** by upgrading the vision tower from SigLip to SigLip2.
|
| 25 |
* 08/09/2025: initial version of MicroLlava released
|
| 26 |
|
| 27 |
+
## π― TLDR
|
| 28 |
|
| 29 |
| Item | Detail |
|
| 30 |
|-----------------|--------|
|
|
|
|
| 55 |
|
| 56 |
---
|
| 57 |
|
| 58 |
+
## π Quick start
|
| 59 |
|
| 60 |
```python
|
| 61 |
from transformers import AutoTokenizer, AutoProcessor, AutoModelForCausalLM
|
|
|
|
| 83 |
print(tokenizer.decode(output_ids[0], skip_special_tokens=True))
|
| 84 |
```
|
| 85 |
|
| 86 |
+
## π Evaluation
|
| 87 |
|
| 88 |
### VQAv2 Evaluation Results (MicroLlama 300M + Siglip2-so400m-patch4-384)
|
| 89 |
|
|
|
|
| 113 |
|
| 114 |
---
|
| 115 |
|
| 116 |
+
## β
Intended uses and limitations
|
| 117 |
|
| 118 |
**Intended uses**
|
| 119 |
- Rapid experimentation for vision-language research on limited hardware
|
|
|
|
| 129 |
|
| 130 |
---
|
| 131 |
|
| 132 |
+
## π Citation
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 133 |
|
| 134 |
```bibtex
|
| 135 |
@misc{wang2024microllama,
|
|
|
|
| 140 |
}
|
| 141 |
```
|
| 142 |
|
| 143 |
+
## π License
|
| 144 |
|
| 145 |
This model is released under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0).
|
| 146 |
|
|
|
|
| 151 |
|
| 152 |
---
|
| 153 |
|
| 154 |
+
## π Acknowledgements
|
| 155 |
|
| 156 |
This work builds upon the efforts of many in the open-source AI community:
|
| 157 |
|