AI & ML interests

None defined yet.

Recent Activity

err805ย  updated a dataset 10 days ago
moondream/gsm8k
err805ย  authored a paper about 1 month ago
Moondream Segmentation: From Words to Masks
vikhyatkย  updated a dataset about 1 month ago
moondream/rps-finetune
View all activity

vikhyatkย 
updated a model 19 days ago
vikhyatkย 
published a model 19 days ago

Always output inaccurate results

2
#34 opened about 2 months ago by
Gureumi
vikhyatkย 
posted an update 6 months ago
view post
Post
4964
Announcing RefCOCO-M, a refreshed RefCOCO with pixel-accurate masks and the problematic prompts removed.

moondream/refcoco-m
vikhyatkย 
posted an update 8 months ago
vikhyatkย 
posted an update about 1 year ago
view post
Post
6233
๐Ÿšจ New VQA + captioning dataset! moondream/megalith-mdqa

Images from Megalith, captioned using Moondream, then transformed to short-form QA.

9M+ images, 6-10 QA pairs per image.
vikhyatkย 
posted an update over 1 year ago
view post
Post
6955
Just released a dataset with 7000+ hours of synthetically generated lo-fi music. vikhyatk/lofi
vikhyatkย 
posted an update over 1 year ago
view post
Post
6895
Pushed a new update to vikhyatk/moondream2 today. TextVQA up from 60.2 to 65.2, DocVQA up from 61.9 to 70.5.

Space has been updated to the new model if you want to try it out! vikhyatk/moondream2
vikhyatkย 
posted an update almost 2 years ago
view post
Post
3403
๐Ÿš€ Exciting news! We've just launched "Thundermoon" - the latest version of Moondream, our open-source vision language model! ๐ŸŒ™

Key improvements in this release:
1. Massive leap in OCR capabilities
2. Enhanced document understanding
3. Significant boosts across key metrics:
* DocVQA: 61.9 (โ†‘103%)
* TextVQA: 60.2 (โ†‘5.2%)
* GQA: 64.9 (โ†‘2.9%)

What does this mean? Moondream can now tackle complex document analysis tasks with unprecedented accuracy for a model of its size. From deciphering handwritten notes to interpreting data tables, the applications are vast.

Check out the image for a glimpse of Moondream in action, effortlessly extracting insights from a 1944 sugar industry document!

Why it matters:
* Democratizing AI: As an open-source project, we're making advanced vision AI accessible to all developers.
* Efficiency: Proving that smaller models can deliver big results.
* Real-world impact: From historical document analysis to modern business intelligence, the potential use cases are exciting.

Curious to try it out? Try out the live demo here! https://moondream.ai/playground
  • 4 replies
ยท
vikhyatkย 
posted an update almost 2 years ago