Post
130
###### CVPR2026 MetaFood Workshop Challenge Alert ######
π½οΈ Dishcovery: Mission II VLM Challenge
Weβre excited to share that the Dishcovery II Vision-Language Model Challenge is LIVE, as part of the 3rd MetaFood Workshop @ CVPR 2026.
π Join the challenge:
https://www.kaggle.com/competitions/dishcovery-mission-ii-cvpr-2026
π Dataset collection (Hugging Face):
jesusmolrdv/MTF25-VLM-Challenge-Dataset-Web
jesusmolrdv/MTF25-VLM-Challenge-Dataset-Synth
π Workshop details:
https://sites.google.com/view/cvpr-metafood-2026
π Task
Build a Vision-Language Model that aligns food images with text under real-world conditions:
1. Multi-label retrieval β identify relevant ingredients/components
2. Single-label retrieval β select the best dense food description
π¦ Dataset Highlights
- 400K+ imageβcaption pairs
- Mix of real, noisy, and synthetic data
- Designed for fine-grained food understanding
- Reflects real-world multimodal challenges
βοΈ What makes this interesting?
- Not just accuracy β efficiency matters
- Robustness to noise and domain shift
- Fine-grained alignment between visual and semantic concepts
- Benchmark for next-gen VLMs (CLIP, SigLIP, LLaVA-style models)
π Timeline (key milestones)
May 1, 2026 β Final predictions + method summary
π Why participate?
- Benchmark your models on large-scale multimodal food data
- Test robustness under realistic conditions
- Gain visibility via a global leaderboard
- Contribute to the growing Food Γ Vision Γ Language research space
π³ The kitchen is heating up β looking forward to seeing what the community builds!
#multimodal #computervision #vlm #deeplearning #datasets #kaggle #huggingface #ai #research
Dataset Citation: Precision at Scale: Domain-Specific Datasets On-Demand (2407.03463)
π½οΈ Dishcovery: Mission II VLM Challenge
Weβre excited to share that the Dishcovery II Vision-Language Model Challenge is LIVE, as part of the 3rd MetaFood Workshop @ CVPR 2026.
π Join the challenge:
https://www.kaggle.com/competitions/dishcovery-mission-ii-cvpr-2026
π Dataset collection (Hugging Face):
jesusmolrdv/MTF25-VLM-Challenge-Dataset-Web
jesusmolrdv/MTF25-VLM-Challenge-Dataset-Synth
π Workshop details:
https://sites.google.com/view/cvpr-metafood-2026
π Task
Build a Vision-Language Model that aligns food images with text under real-world conditions:
1. Multi-label retrieval β identify relevant ingredients/components
2. Single-label retrieval β select the best dense food description
π¦ Dataset Highlights
- 400K+ imageβcaption pairs
- Mix of real, noisy, and synthetic data
- Designed for fine-grained food understanding
- Reflects real-world multimodal challenges
βοΈ What makes this interesting?
- Not just accuracy β efficiency matters
- Robustness to noise and domain shift
- Fine-grained alignment between visual and semantic concepts
- Benchmark for next-gen VLMs (CLIP, SigLIP, LLaVA-style models)
π Timeline (key milestones)
May 1, 2026 β Final predictions + method summary
π Why participate?
- Benchmark your models on large-scale multimodal food data
- Test robustness under realistic conditions
- Gain visibility via a global leaderboard
- Contribute to the growing Food Γ Vision Γ Language research space
π³ The kitchen is heating up β looking forward to seeing what the community builds!
#multimodal #computervision #vlm #deeplearning #datasets #kaggle #huggingface #ai #research
Dataset Citation: Precision at Scale: Domain-Specific Datasets On-Demand (2407.03463)