FineVision

Running

App Files Files Community

lusxvr commited on Sep 2, 2025

Commit

4f6aa65

1 Parent(s): e6ebd3c

update

Browse files

Files changed (1) hide show

app/src/content/article.mdx +5 -4

app/src/content/article.mdx CHANGED Viewed

@@ -4,11 +4,12 @@ subtitle: "A new open dataset for data-centric training of Vision Language Model
 description: "A new open dataset for data-centric training of Vision Language Models"
 authors:
   - "Luis Wiedmann"
-  - "Andi Marafioti"
   - "Orr Zohar"
   - "Thibaud Frere"
 affiliation: "Hugging Face"
-published: "Sep 3, 2025"
 tags:
   - research
   - vision-language models
@@ -52,12 +53,12 @@ We manually collect **over 180** image-text datasets from the recent literature
 <Wide>
 <Accordion title="FineVision Subsets">
-|Subset Name                           |Total Images|Total Samples|Total Turns|Total Question Tokens|Total Answer Tokens|Category              |Citation                                                           |
 |--------------------------------------|------------|-------------|-----------|---------------------|-------------------|----------------------|-------------------------------------------------------------------|
 |coco_colors                           |118,287     |118,287      |118,287    |1,301,157            |6,376,672          |Captioning & Knowledge|[@noauthor_hazal-karakusmscoco-controlnet-canny-less-colors_nodate]|
 |densefusion_1m                        |1,058,751   |1,058,751    |1,058,751  |10,692,478           |263,718,217        |Captioning & Knowledge|[@li_densefusion-1m_2024]                                          |
 |face_emotion                          |797         |797          |797        |8,767                |8,066              |Captioning & Knowledge|[@mollahosseini_affectnet_2017]                                    |
-|google_landmarks                      |299,993     |299,993      |842,127    |6,194,978            |10,202,980         |Captioning & Knowledge|                                                                   |
 |image_textualization(filtered)        |99,573      |99,573       |99,573     |917,577              |19,374,090         |Captioning & Knowledge|[@pi_image_2024]                                                   |
 |laion_gpt4v                           |9,301       |9,301        |9,301      |93,950               |1,875,283          |Captioning & Knowledge|[@noauthor_laiongpt4v-dataset_2023]                                |
 |localized_narratives                  |199,998     |199,998      |199,998    |2,167,179            |8,021,473          |Captioning & Knowledge|[@vedaldi_connecting_2020]                                         |

 description: "A new open dataset for data-centric training of Vision Language Models"
 authors:
   - "Luis Wiedmann"
   - "Orr Zohar"
+  - "Andi Marafioti"
+  - "Amir Mahla"
   - "Thibaud Frere"
 affiliation: "Hugging Face"
+published: "Sep 4, 2025"
 tags:
   - research
   - vision-language models
 <Wide>
 <Accordion title="FineVision Subsets">
+|Subset Name                           |Total Images|Total Samples|Total Turns|Total Question Tokens|Total Answer Tokens|Category              |Source                                                             |
 |--------------------------------------|------------|-------------|-----------|---------------------|-------------------|----------------------|-------------------------------------------------------------------|
 |coco_colors                           |118,287     |118,287      |118,287    |1,301,157            |6,376,672          |Captioning & Knowledge|[@noauthor_hazal-karakusmscoco-controlnet-canny-less-colors_nodate]|
 |densefusion_1m                        |1,058,751   |1,058,751    |1,058,751  |10,692,478           |263,718,217        |Captioning & Knowledge|[@li_densefusion-1m_2024]                                          |
 |face_emotion                          |797         |797          |797        |8,767                |8,066              |Captioning & Knowledge|[@mollahosseini_affectnet_2017]                                    |
+|google_landmarks                      |299,993     |299,993      |842,127    |6,194,978            |10,202,980         |Captioning & Knowledge|Ours                                                               |
 |image_textualization(filtered)        |99,573      |99,573       |99,573     |917,577              |19,374,090         |Captioning & Knowledge|[@pi_image_2024]                                                   |
 |laion_gpt4v                           |9,301       |9,301        |9,301      |93,950               |1,875,283          |Captioning & Knowledge|[@noauthor_laiongpt4v-dataset_2023]                                |
 |localized_narratives                  |199,998     |199,998      |199,998    |2,167,179            |8,021,473          |Captioning & Knowledge|[@vedaldi_connecting_2020]                                         |