Vaishnavi0404 commited on
Commit
22b1baa
Β·
verified Β·
1 Parent(s): dfe9736

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +105 -13
README.md CHANGED
@@ -1,13 +1,105 @@
1
- ---
2
- title: Text2Sing DiffSinger
3
- emoji: πŸ“‰
4
- colorFrom: pink
5
- colorTo: yellow
6
- sdk: gradio
7
- sdk_version: 5.24.0
8
- app_file: app.py
9
- pinned: false
10
- license: apache-2.0
11
- ---
12
-
13
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Text2Sing-DiffSinger
2
+
3
+ Convert normal text into singing voice with music based on the emotional content of the text.
4
+
5
+ ## Overview
6
+
7
+ Text2Sing-DiffSinger is a machine learning-based system that converts regular text into singing voice with appropriate musical accompaniment. The system analyzes the emotional content of the text and generates singing that matches the mood, along with suitable background music.
8
+
9
+ ## Features
10
+
11
+ - Text-to-singing conversion using advanced voice synthesis
12
+ - Emotion detection from text input
13
+ - Musical accompaniment generation based on detected emotions
14
+ - Adjustable parameters for voice type, tempo, and pitch
15
+ - Interactive web interface built with Gradio
16
+
17
+ ## Installation
18
+
19
+ 1. Clone this repository:
20
+ ```bash
21
+ git clone https://github.com/yourusername/Text2Sing-DiffSinger.git
22
+ cd Text2Sing-DiffSinger
23
+ ```
24
+
25
+ 2. Install dependencies:
26
+ ```bash
27
+ pip install -r requirements.txt
28
+ ```
29
+
30
+ 3. Set up speaker embeddings:
31
+ ```bash
32
+ python setup.py
33
+ ```
34
+
35
+ ## Usage
36
+
37
+ 1. Run the application:
38
+ ```bash
39
+ python app.py
40
+ ```
41
+
42
+ 2. Open your web browser and navigate to http://localhost:7860
43
+
44
+ 3. Enter your text, select voice options, and click "Convert to Singing"
45
+
46
+ ## How It Works
47
+
48
+ The system works in several steps:
49
+
50
+ 1. **Text Analysis**: Analyzes the input text to detect emotional content and breaks it down into phonemes.
51
+ 2. **Speech Synthesis**: Converts the text into speech using a neural text-to-speech model.
52
+ 3. **Singing Conversion**: Transforms the speech into singing by modifying pitch, timing, and adding singing-specific effects.
53
+ 4. **Music Generation**: Creates musical accompaniment that matches the emotional content of the text.
54
+ 5. **Audio Mixing**: Combines the singing voice with the accompaniment to produce the final output.
55
+
56
+ ## Adjustable Parameters
57
+
58
+ - **Voice Type**: Choose between neutral, feminine, or masculine voice.
59
+ - **Tempo**: Adjust the speed of the singing (60-180 BPM).
60
+ - **Pitch Adjustment**: Shift the pitch up or down (-12 to +12 semitones).
61
+
62
+ ## Project Structure
63
+
64
+ ```
65
+ .
66
+ β”œβ”€β”€ app.py # Main application file with Gradio interface
67
+ β”œβ”€β”€ text_processor.py # Text analysis and phonetic processing
68
+ β”œβ”€β”€ voice_synthesizer.py # Speech synthesis module
69
+ β”œβ”€β”€ singing_converter.py # Speech-to-singing conversion
70
+ β”œβ”€β”€ music_generator.py # Musical accompaniment generation
71
+ β”œβ”€β”€ setup.py # Setup script for speaker embeddings
72
+ β”œβ”€β”€ requirements.txt # Python dependencies
73
+ └── speaker_embeddings/ # Directory for speaker embedding files
74
+ ```
75
+
76
+ ## Dependencies
77
+
78
+ - torch & torchaudio: For neural network models
79
+ - transformers: For speech synthesis
80
+ - gradio: For web interface
81
+ - librosa & soundfile: For audio processing
82
+ - text2emotion: For emotion detection
83
+ - music21: For music generation
84
+ - nltk: For natural language processing
85
+ - phonemizer: For phonetic transcription
86
+
87
+ ## Future Improvements
88
+
89
+ - Integration with more advanced DiffSinger models
90
+ - Fine-tuning on singing voice datasets
91
+ - Support for different musical styles
92
+ - Multi-language support
93
+ - Voice cloning capabilities
94
+
95
+ ## License
96
+
97
+ [MIT License](LICENSE)
98
+
99
+ ## Acknowledgments
100
+
101
+ This project builds upon various open-source projects and research, including:
102
+ - DiffSinger
103
+ - SpeechT5
104
+ - Music21
105
+ - Gradio