Text-to-Audio
Diffusers
English
music
art
How to use from the
Use from the
Diffusers library
pip install -U diffusers transformers accelerate
import torch
from diffusers import DiffusionPipeline

# switch to "mps" for apple devices
pipe = DiffusionPipeline.from_pretrained("lichang0928/QA-MDT", dtype=torch.bfloat16, device_map="cuda")

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image = pipe(prompt).images[0]

Model Description

This model, QA-MDT, allows for easy setup and usage for generating music from text prompts. It incorporates a quality-aware training strategy to improve the fidelity of generated music.

How to Use

A Hugging Face Diffusers implementation is available at this model and this space. For more detailed instructions and the official PyTorch implementation, please refer to the project's Github repository and project page.

The model was presented in the paper QA-MDT: Quality-aware Masked Diffusion Transformer for Enhanced Music Generation.

Downloads last month
34
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Datasets used to train lichang0928/QA-MDT

Paper for lichang0928/QA-MDT