F5-TTS
F5-TTS & E2-TTS: Zero-Shot Voice Cloning (Unofficial Demo)
lista de spaces
F5-TTS & E2-TTS: Zero-Shot Voice Cloning (Unofficial Demo)
Generate audio from text using voice prompts
Depth Control for FLUX
Generate realistic dialogue from a script, using Dia!
Generate custom captions, tags, or prompts for any image
Describe any selected part of an image
Upgraded to v1.0!
Launch an interactive web interface for the tool
Generate detailed images from your text prompts
Clone a voice and generate custom speech
PDF Translator powered by local llm, side by side reading
A Step Towards Music Generation Foundation Model
OpenSource Music Generator
Generate a full song from lyrics and style prompts
A Family of Open Sourced Music Foundation Models
Generate speech from text using voice design, cloning or presets
This space offers an easy-to-use interface for voice cloning
Generate audio from text with voice presets
Generate speech in a cloned voice from a short audio clip
Chat with a multimodal AI using text and images
Chatterbox TTS supporting 23 languages
Isolate specific sounds from audio or video
A simple gradio platform for demonstrating MOSS-TTS capabili
Generate expressive speech from text with voice and emotion control
Generate multilingual embeddings for text and images
Identify named entities in text
Extract entities, classifications, and JSON structures from text
Generate detailed answers from PDFs and URLs
FireRed-Image-Edit-1.0
Try the Hugging Face API through the playground
Hub API Documentation
Zero GPU Text-to-Speech using Fish Audio S2 Pro
Multimodal OCR model for complex document understanding.
MolmoPoint - Image & Video Pointing & Tracking
0.1B multilingual TTS with voice cloning
High-quality voice cloning TTS for 600+ languages
Music Generation Foundation Model v1.5
Generate images from text prompts using ERNIE-Image
OpenAI Privacy Filter ZeroGPU demo
Z-Anime 6B - CPU anime image generation via sd.cpp
Pocket TTS optimized for Hugging Face Spaces on CPU
Expressive TTS with voice cloning β DramaBox demo
Chat with a 1930sβstyle language model
High-fidelity pixel-aligned image-to-3D generation.
All-in-one playground for Ling series LLMs
Zero-shot expressive voice cloning and speech generation
Analyze music and answer questions from audio or YouTube links
Transcribe audio files with timestamps and downloadable subtitles
Text-to-audio with SA3 Medium / Small Music / Small SFX.
Pixel Diffusion Decoder
Run Bonsai-Image-4B models on GPU
Run and monitor a selfβhosted AI assistant via web dashboard
Zero-shot TTS & voice cloning with Higgs Audio v3 (4B)
Multilingual streaming ASR with NeMo
Generate natural-sounding speech from typed text