@MaziyarPanahi on Hugging Face: "We annotated 119K medical images with two frontier VLMs (Qwen 3.5, Kimi K2.5)…"

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

posted an update Mar 24

Post

2393

We annotated 119K medical images with two frontier VLMs (Qwen 3.5, Kimi K2.5), cross-validated at 93% agreement, and produced 110K training records, all for under $500. Fine-tuning 3 small models (2-3B params) improved all benchmarks: best model reaches +15.0% average exact match.

Everything is open-sourced: datasets, adapters, and code.

https://huggingface.co/blog/OpenMed/synthvision

emylton

Mar 25

Interesting work. The part that stands out isn’t just the cost efficiency, but the discipline in pipeline design.

A lot of teams are still chasing “perfect” datasets with heavy manual annotation, while this approach shows that synthetic data + cross-model validation can already reach production-grade quality when done carefully.

A few takeaways that feel increasingly hard to ignore:

Synthetic data is no longer the bottleneck if validation is handled properly
The real leverage is in data curation pipelines, not raw data collection
Smaller models (2–3B) can outperform expectations when trained on clean, consistent signals

The dual-VLM agreement (~93%) is particularly interesting. It’s a pragmatic way to approximate label reliability without introducing significant human cost.

Also worth noting: achieving this under $500 challenges a lot of assumptions around “necessary” infrastructure and annotation budgets.

Overall, this feels less like a modeling breakthrough and more like a well-executed data engineering strategy—which, in practice, is where most real gains come from.

Yehor

Mar 30

Congrats!

In this post