Audiovisual - a melsiddieg Collection

melsiddieg 's Collections

from_scratch_pretrain

bert and friends

Research and Optimization

finetune_datasets

Audiovisual

updated Apr 28

microsoft/VibeVoice-1.5B

Text-to-Speech • 3B • Updated Jan 22 • 237k • 2.41k
ibm-granite/granite-docling-258M

Image-Text-to-Text • 0.3B • Updated Sep 23, 2025 • 101k • 1.2k
deepseek-ai/DeepSeek-OCR

Image-Text-to-Text • 3B • Updated Nov 4, 2025 • 2.2M • 3.29k
Qwen/Qwen3-VL-2B-Thinking

Image-Text-to-Text • 2B • Updated Oct 20, 2025 • 62.9k • 115
datalab-to/chandra

Image-Text-to-Text • 9B • Updated Mar 26 • 160k • 527
Qwen/Qwen3-VL-2B-Instruct

Image-Text-to-Text • 2B • Updated Oct 23, 2025 • 2.12M • 433
PokeeAI/pokee_research_7b

Text Generation • 8B • Updated Oct 23, 2025 • 25 • • 100
openbmb/MiniCPM-o-4_5

Any-to-Any • 9B • Updated May 19 • 374k • 1.41k
Qwen/Qwen3-ForcedAligner-0.6B

Automatic Speech Recognition • 0.9B • Updated Jan 30 • 465k • 145
seemorg/books-ocr

Viewer • Updated May 2, 2025 • 49.1k • 33 • 5