Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing
    • Website
      • Tasks
      • HuggingChat
      • Collections
      • Languages
      • Organizations
    • Community
      • Blog
      • Posts
      • Daily Papers
      • Learn
      • Discord
      • Forum
      • GitHub
    • Solutions
      • Team & Enterprise
      • Hugging Face PRO
      • Enterprise Support
      • Inference Providers
      • Inference Endpoints
      • Storage Buckets

  • Log In
  • Sign Up
melsiddieg 's Collections
DiffusionLLMs
Arudi
Biomedical
from_scratch_pretrain
bert and friends
Audiovisual
Research and Optimization
Visual and OCR
finetune_datasets

Audiovisual

updated Apr 28
Upvote
-

  • microsoft/VibeVoice-1.5B

    Text-to-Speech • 3B • Updated Jan 22 • 237k • 2.41k

  • ibm-granite/granite-docling-258M

    Image-Text-to-Text • 0.3B • Updated Sep 23, 2025 • 101k • 1.2k

  • deepseek-ai/DeepSeek-OCR

    Image-Text-to-Text • 3B • Updated Nov 4, 2025 • 2.2M • 3.29k

  • Qwen/Qwen3-VL-2B-Thinking

    Image-Text-to-Text • 2B • Updated Oct 20, 2025 • 62.9k • 115

  • datalab-to/chandra

    Image-Text-to-Text • 9B • Updated Mar 26 • 160k • 527

  • Qwen/Qwen3-VL-2B-Instruct

    Image-Text-to-Text • 2B • Updated Oct 23, 2025 • 2.12M • 433

  • PokeeAI/pokee_research_7b

    Text Generation • 8B • Updated Oct 23, 2025 • 25 • • 100

  • openbmb/MiniCPM-o-4_5

    Any-to-Any • 9B • Updated May 19 • 374k • 1.41k

  • Qwen/Qwen3-ForcedAligner-0.6B

    Automatic Speech Recognition • 0.9B • Updated Jan 30 • 465k • 145

  • seemorg/books-ocr

    Viewer • Updated May 2, 2025 • 49.1k • 33 • 5
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs