keeeeenw
/

MicroLlava

Visual Question Answering

text-generation

vision-language

Eval Results (legacy)

Model card Files Files and versions

keeeeenw commited on Aug 17, 2025

Commit

df70f3b

·

verified ·

1 Parent(s): 74b0906

Update README.md

Files changed (1) hide show

README.md +7 -19

README.md CHANGED Viewed

@@ -24,7 +24,7 @@ A compact vision language model that you can pretrain and finetune on a single c
 * 08/17/2025: improved **VQAv2** average dev-test score from **44.01%** to **56.91%** by upgrading the vision tower from SigLip to SigLip2.
 * 08/09/2025: initial version of MicroLlava released
-## 🚀 TLDR
 | Item            | Detail |
 |-----------------|--------|
@@ -55,7 +55,7 @@ Supervised finetuning on all datasets from the TinyLLaVA Factory guide (except `
 ---
-## Quick start
 ```python
 from transformers import AutoTokenizer, AutoProcessor, AutoModelForCausalLM
@@ -83,7 +83,7 @@ output_ids = model.generate(**inputs, max_new_tokens=64)
 print(tokenizer.decode(output_ids[0], skip_special_tokens=True))
 ```
-## Evaluation
 ### VQAv2 Evaluation Results (MicroLlama 300M + Siglip2-so400m-patch4-384)
@@ -113,7 +113,7 @@ Community contributions with benchmark results are welcome and encouraged.
 ---
-## Intended uses and limitations
 **Intended uses**
 - Rapid experimentation for vision-language research on limited hardware
@@ -129,19 +129,7 @@ Community contributions with benchmark results are welcome and encouraged.
 ---
-## Reproducibility checklist
-To reproduce results and training runs:
-1. Fix all random seeds in training scripts
-2. Record exact dataset versions and any filtering applied
-3. Log optimizer type, learning rate schedule, precision settings, and gradient accumulation steps
-4. Save the exact TinyLLaVA Factory commit or fork commit used for both pretraining and finetuning
-5. Document hardware and software versions (CUDA, PyTorch, etc.)
----
-## Citation
 ```bibtex
 @misc{wang2024microllama,
@@ -152,7 +140,7 @@ To reproduce results and training runs:
 }
 ```
-## License
 This model is released under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0).
@@ -163,7 +151,7 @@ If you use this model in your research or applications, please credit the origin
 ---
-## Acknowledgements
 This work builds upon the efforts of many in the open-source AI community:

 * 08/17/2025: improved **VQAv2** average dev-test score from **44.01%** to **56.91%** by upgrading the vision tower from SigLip to SigLip2.
 * 08/09/2025: initial version of MicroLlava released
+## 🎯 TLDR
 | Item            | Detail |
 |-----------------|--------|
 ---
+## 🚀 Quick start
 ```python
 from transformers import AutoTokenizer, AutoProcessor, AutoModelForCausalLM
 print(tokenizer.decode(output_ids[0], skip_special_tokens=True))
 ```
+## 🏆 Evaluation
 ### VQAv2 Evaluation Results (MicroLlama 300M + Siglip2-so400m-patch4-384)
 ---
+## ✅ Intended uses and limitations
 **Intended uses**
 - Rapid experimentation for vision-language research on limited hardware
 ---
+## 📝 Citation
 ```bibtex
 @misc{wang2024microllama,
 }
 ```
+## 📄 License
 This model is released under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0).
 ---
+## 🙏 Acknowledgements
 This work builds upon the efforts of many in the open-source AI community: