sbrzz
/

nanoVLM

@@ -1,27 +1,36 @@
----
-# For reference on model card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/modelcard.md?plain=1
-# Doc / guide: https://huggingface.co/docs/hub/model-cards
-library_name: nanovlm
-license: mit
-pipeline_tag: image-text-to-text
-tags:
-  - vision-language
-  - multimodal
-  - research
----
-**nanoVLM** is a minimal and lightweight Vision-Language Model (VLM) designed for efficient training and experimentation. Built using pure PyTorch, the entire model architecture and training logic fits within ~750 lines of code. It combines a ViT-based image encoder (SigLIP-B/16-224-85M) with a lightweight causal language model (SmolLM2-135M), resulting in a compact 222M parameter model.
-For more information, check out the base model on https://huggingface.co/lusxvr/nanoVLM-222M.
-**Usage:**
-Clone the nanoVLM repository: https://github.com/huggingface/nanoVLM.
-Follow the install instructions and run the following code:
-```python
-from models.vision_language_model import VisionLanguageModel
-model = VisionLanguageModel.from_pretrained("sbrzz/nanoVLM")
-```

+---
+# For reference on model card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/modelcard.md?plain=1
+# Doc / guide: https://huggingface.co/docs/hub/model-cards
+library_name: nanovlm
+license: mit
+pipeline_tag: image-text-to-text
+tags:
+  - vision-language
+  - multimodal
+  - research
+---
+**Introduction**
+You can find the history behind this work in this blog post:
+**datasets**
+- "localized_narratives" part from the_cauldron (200k items)
+- private dataset (30k items)
+**nanoVLM** is a minimal and lightweight Vision-Language Model (VLM) designed for efficient training and experimentation. Built using pure PyTorch, the entire model architecture and training logic fits within ~750 lines of code. It combines a ViT-based image encoder (SigLIP-B/16-224-85M) with a lightweight causal language model (SmolLM2-135M), resulting in a compact 222M parameter model.
+For more information, check out the base model on https://huggingface.co/lusxvr/nanoVLM-222M.
+**Usage:**
+Clone the nanoVLM repository: https://github.com/huggingface/nanoVLM.
+Follow the install instructions and run the following code:
+```python
+from models.vision_language_model import VisionLanguageModel
+model = VisionLanguageModel.from_pretrained("sbrzz/nanoVLM")
+```