Spaces:

Vaishnavi0404
/

Text2Sing-DiffSinger

Running

App Files Files Community

Text2Sing-DiffSinger / README.md

Vaishnavi0404

Update README.md

4be38d1 verified 11 months ago

preview code

raw

history blame contribute delete

3.41 kB

A newer version of the Gradio SDK is available: 6.9.0

Upgrade

metadata

license: apache-2.0
title: Vaishnavi0404/Text2Sing-DiffSinger
sdk: gradio
emoji: 👀
colorFrom: purple
colorTo: gray

Text2Sing-DiffSinger

Convert normal text into singing voice with music based on the emotional content of the text.

Overview

Text2Sing-DiffSinger is a machine learning-based system that converts regular text into singing voice with appropriate musical accompaniment. The system analyzes the emotional content of the text and generates singing that matches the mood, along with suitable background music.

Features

Text-to-singing conversion using advanced voice synthesis
Emotion detection from text input
Musical accompaniment generation based on detected emotions
Adjustable parameters for voice type, tempo, and pitch
Interactive web interface built with Gradio

Installation

Clone this repository:

git clone https://github.com/yourusername/Text2Sing-DiffSinger.git
cd Text2Sing-DiffSinger

Install dependencies:

pip install -r requirements.txt

Set up speaker embeddings:

python setup.py

Usage

Run the application:

python app.py

Open your web browser and navigate to http://localhost:7860
Enter your text, select voice options, and click "Convert to Singing"

How It Works

The system works in several steps:

Text Analysis: Analyzes the input text to detect emotional content and breaks it down into phonemes.
Speech Synthesis: Converts the text into speech using a neural text-to-speech model.
Singing Conversion: Transforms the speech into singing by modifying pitch, timing, and adding singing-specific effects.
Music Generation: Creates musical accompaniment that matches the emotional content of the text.
Audio Mixing: Combines the singing voice with the accompaniment to produce the final output.

Adjustable Parameters

Voice Type: Choose between neutral, feminine, or masculine voice.
Tempo: Adjust the speed of the singing (60-180 BPM).
Pitch Adjustment: Shift the pitch up or down (-12 to +12 semitones).

Project Structure

.
├── app.py                   # Main application file with Gradio interface
├── text_processor.py        # Text analysis and phonetic processing
├── voice_synthesizer.py     # Speech synthesis module
├── singing_converter.py     # Speech-to-singing conversion
├── music_generator.py       # Musical accompaniment generation
├── setup.py                 # Setup script for speaker embeddings
├── requirements.txt         # Python dependencies
└── speaker_embeddings/      # Directory for speaker embedding files

Dependencies

torch & torchaudio: For neural network models
transformers: For speech synthesis
gradio: For web interface
librosa & soundfile: For audio processing
text2emotion: For emotion detection
music21: For music generation
nltk: For natural language processing
phonemizer: For phonetic transcription

Future Improvements

Integration with more advanced DiffSinger models
Fine-tuning on singing voice datasets
Support for different musical styles
Multi-language support
Voice cloning capabilities

License

MIT License

Acknowledgments

This project builds upon various open-source projects and research, including:

DiffSinger
SpeechT5
Music21
Gradio