MFM - Multimodal Foundation Models - a LeafInTheTree Collection

LeafInTheTree 's Collections

Speech-2-Speech

MFM - Multimodal Foundation Models

MFM - Multimodal Foundation Models

updated Mar 2

Paused

Agents

Featured

102

Idefics3

📊

102

Generate text based on an image and prompt
Running on Zero

Agents

160

VideoLLaMA2

🎥

160

Media understanding
Running on Zero

Agents

54

GroundingDINO ⚔ OWL

🦖

54

Identify objects in images using text queries
Runtime error

Agents

85

Paligemma HF

🤗

85

Generate text and segment images using PaliGemma
Paused

Agents

Featured

315

PaliGemma Demo

🤲

315

Annotate and describe images with text prompts
Running on Zero

Agents

Featured

518

Florence2 + SAM2

🔥

518

Segment objects in images or videos using text prompts
Runtime error

Agents

11

Florence 2 Vision Model V1

💻

11

Analyze images to caption, detect objects, and extract text
Build error

Agents

2

Marketing Vision

👁

2
Runtime error

Agents

2

Idefics3

📊

2
Paused

Agents

10

Theia

⚡

10

Generate detailed image analyses and depth predictions
Runtime error

Agents

16

XGen MM

💻

16

Generate detailed descriptions from images and questions
Sleeping

Agents

LLaMA 3.1 Vision

🦙
Runtime error

Agents

Featured

79

Chameleon 30b

🔥

79

Chat about images and get instant answers
Running

Agents

Featured

513

InternVL

⚡

513

Chat with an AI that understands images and text
Running on Zero

Agents

Featured

844

Florence 2

📉

844

Generate captions, detections, and segmentations for any image
Running on Zero

Agents

Featured

223

Phi 3.5 Vision

🔥

223

Ask questions about images and get detailed answers
Running on Zero

Agents

Featured

885

MiniGPT-4

🚀

885

Chat with images using MiniGPT-4
Runtime error

Agents

40

Mistral Pixtral Demo

👀

40

Chat with Pixtral 12B using Mistral Inference
Build error

Agents

Featured

323

Ovis1.6 Gemma2 9B

🐑

323

Interact with a chatbot that understands text and images
meta-llama/Llama-Guard-3-11B-Vision

Image-Text-to-Text • 11B • Updated Nov 18, 2024 • 2.78k • 76
Running

Agents

Featured

105

Owlv2

👀

105

State-of-the-art Zero-shot Object Detection
Runtime error

Agents

Featured

390

Llama-Vision-11B

🚀

390

Chat with Llama about images and text
Running on Zero

Agents

144

SmolVLM

📊

144

Answer questions about images with AI chat
Paused

Agents

7

GLM-Edge-V-5B Space

📷

7

Generate text responses based on images and chat history
Runtime error

Agents

17

Paligemma2 Detection

😻

17

Paligemma2 Detection with Supervision
Runtime error

Agents

40

Florence Llama

💬

40

Generate text responses from images and text input
Runtime error

Agents

6

Paligemma2 10b Ft Docci 448

📉

6
Runtime error

Agents

5

VisPer-LM

🔍

5

Visualize image depth, segmentation, and generation
Runtime error

Agents

Featured

2.02k

Chat With Janus-Pro-7B

🌍

2.02k

A unified multimodal understanding and generation model.