siddhartharyaai commited on
Commit
e2dc845
Β·
verified Β·
1 Parent(s): bd4988a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +128 -91
README.md CHANGED
@@ -9,128 +9,165 @@ app_file: app.py
9
  pinned: false
10
  ---
11
 
12
- # πŸŽ™ MyPod - AI Based Podcast Generator
13
 
14
- Welcome to **MyPod**, your go-to AI-powered podcast generator! πŸŽ‰
15
-
16
- **MyPod** transforms your documents, webpages, YouTube videos, or researched topics into a more human-sounding, conversational podcast. Select a tone and a duration range, and let **MyPod** generate an engaging podcast tailored to your preferences.
17
-
18
- ## πŸš€ **Features**
19
 
20
- 1. **Multiple Input Sources:**
21
- - **Upload PDF:** Convert your PDF documents into podcasts.
22
- - **Enter URL:** Convert the content of any webpage into a podcast.
23
- - **YouTube Link:** Transcribe and convert YouTube videos into podcasts *(Requires User Authentication - Work in Progress)*.
24
- - **Research a Topic:** Provide a detailed topic, and **MyPod** will research and generate a podcast based on the latest information.
25
 
26
- 2. **Customizable Tone and Length:**
27
- - **Tone Options:** Choose from Humorous, Formal, Casual, or Youthful to set the desired tone of your podcast.
28
- - **Duration Range:** Select from 1-3 Minutes, 3-5 Minutes, 5-10 Minutes, or 10-20 Minutes to specify the length of your podcast.
29
 
30
- 3. **Enhanced Research Capability:**
31
- - Utilizes Wikipedia and various News RSS feeds to gather relevant information.
32
- - Implements a fallback mechanism to leverage the LLM's (Groq API) knowledge base if primary sources do not provide sufficient information.
 
 
33
 
34
- 4. **Natural-Sounding Voices:**
35
- - **Jane and Emma:** Two distinct voices are used to create a natural and engaging conversational flow.
36
- - **Optimized Parameters:** Adjusted speed and pitch settings to enhance the naturalness without compromising clarity.
37
 
38
- 5. **Engaging Podcast Scripts:**
39
- - Dynamic and varied introductions and dialogues.
40
- - Incorporates storytelling techniques, analogies, and thought-provoking questions to captivate listeners.
 
 
41
 
42
- ## πŸ“¦ **Installation**
 
43
 
44
- To set up and run **MyPod** locally, follow these steps:
45
 
46
- 1. **Clone the Repository:**
47
 
48
- ```bash
49
- git clone https://github.com/yourusername/mypod.git
50
- cd mypod
51
- Create a Virtual Environment:
52
 
 
 
 
 
53
  bash
54
  Copy code
55
- python3 -m venv venv
56
- source venv/bin/activate # On Windows: venv\Scripts\activate
57
- Install Dependencies:
 
58
 
59
  bash
60
  Copy code
61
  pip install -r requirements.txt
62
- Set Up Environment Variables:
63
-
64
- Ensure you have a valid Groq API key. Set it as an environment variable:
65
-
66
- bash
67
- Copy code
68
- export GROQ_API_KEY='your_groq_api_key_here' # On Windows: set GROQ_API_KEY=your_groq_api_key_here
69
- Run the Application:
70
 
71
  bash
72
  Copy code
73
  python app.py
74
- The Gradio interface will launch, and you can access MyPod via the provided URL.
75
 
76
- πŸ›  Usage Instructions
77
- ⏳ Please be patient while your podcast is being generated.
78
- This process involves content analysis, script creation, and high-quality audio synthesis, which may take a few minutes.
79
 
80
- Provide an Input Source:
81
-
82
- Upload PDF: Click to upload a PDF file from your device.
83
- Enter URL: Input the URL of a webpage you wish to convert into a podcast.
84
- YouTube Link: (Work in Progress) Enter a YouTube video URL to transcribe and convert into a podcast.
85
- Research a Topic: Enter a detailed topic to research and convert into a podcast.
86
  Select Tone and Duration:
87
 
88
  Tone: Choose from Humorous, Formal, Casual, or Youthful.
89
  Length: Select the desired duration range for your podcast.
90
  Generate Podcast:
91
 
92
- Click the "Submit" button to generate your podcast.
93
- The generated podcast will be available for download, along with the transcript.
94
- πŸ“ˆ Technical Details
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
95
  Backend Technologies:
96
 
97
- Groq API: Utilized for generating engaging podcast scripts based on input texts.
98
- TTS (Text-to-Speech): Uses the tts_models/en/ljspeech/glow-tts model for generating speech.
99
- Whisper ASR: Implements OpenAI's Whisper model for transcribing YouTube videos.
100
- Web Scraping: Employs BeautifulSoup to extract text from webpages and RSS feeds.
101
- Audio Processing: Uses Pydub for audio manipulation and processing.
102
- Key Enhancements:
103
-
104
- Research Sufficiency Check: Ensures that the information gathered is comprehensive enough to generate a meaningful podcast. If primary sources are insufficient, it leverages the LLM's knowledge base to supplement the data.
105
- Dynamic System Prompt: Guides the LLM to create more engaging and dynamic podcast scripts, avoiding repetitive introductions and ensuring a natural conversational flow.
106
- Optimized TTS Parameters: Adjusted speed and pitch settings to make the voices sound more natural and human-like without introducing artificial pauses or breaks.
107
- πŸ§ͺ Testing and Feedback
108
- Generate a Podcast:
109
-
110
- Provide a detailed input (e.g., a comprehensive topic or a well-structured PDF).
111
- Select your desired tone and duration.
112
- Submit to generate the podcast.
113
- Assess Output Quality:
114
-
115
- Transcript: Review the transcript for accuracy and engagement.
116
- Audio Quality: Listen to the podcast to evaluate the naturalness of the voices and the overall flow of the conversation.
117
- Iterate on TTS Settings:
118
-
119
- If the voices still sound synthetic, consider experimenting with different speed_factor and semitones values in the utils.py file under the generate_audio_wav function.
120
- Example adjustments:
121
- Test Case 1: semitones = -2, speed_factor = 1.2
122
- Test Case 2: semitones = -3, speed_factor = 1.25 (Current Settings)
123
- Test Case 3: semitones = -4, speed_factor = 1.3
124
- Provide Feedback:
125
-
126
- Share your experiences and any suggestions for further improvements to help enhance MyPod.
127
- πŸ“ Contributing
128
- Contributions are welcome! Please fork the repository and submit a pull request with your enhancements or bug fixes.
129
-
130
- πŸ“œ License
 
 
 
 
 
 
 
 
 
 
 
131
  This project is licensed under the MIT License.
132
 
133
- πŸ“§ Contact
134
- For any questions or support, please contact siddhartharya@gmail.com.
 
 
 
 
 
 
 
135
 
136
- Launch MyPod and start transforming your content into engaging podcasts today!
 
9
  pinned: false
10
  ---
11
 
 
12
 
13
+ Thought for a few seconds
14
+ markdown
15
+ Copy code
16
+ # πŸŽ™οΈ MyPod - AI Based Podcast Generator
 
17
 
18
+ Welcome to **MyPod**, your go-to AI-powered podcast generator! πŸš€ Whether you have documents, webpages, YouTube videos, or topics you'd like to explore, MyPod transforms your content into engaging, conversational podcasts with ease.
 
 
 
 
19
 
20
+ ## 🌟 Features
 
 
21
 
22
+ - **Multiple Input Sources:**
23
+ - **Upload PDF:** Convert your PDF documents into podcasts.
24
+ - **Enter URL:** Transform the content of any webpage into a podcast.
25
+ - **YouTube Link (Requires User Auth - Work in Progress):** Transcribe and convert YouTube videos into podcasts.
26
+ - **Research a Topic:** Provide a detailed topic statement to generate a podcast based on researched information.
27
 
28
+ - **Customizable Output:**
29
+ - **Tone Selection:** Choose from Humorous, Formal, Casual, or Youthful tones to match your desired podcast style.
30
+ - **Duration Options:** Select the length of your podcast ranging from 1-3 minutes to 10-20 minutes.
31
 
32
+ - **Automated Pronunciation Handling:**
33
+ - **Abbreviation Splitting:** Automatically splits abbreviations and concatenated words for accurate pronunciation.
34
+
35
+ - **Distinct Speaker Voices:**
36
+ - **Jane & Emma:** Enjoy a conversation between two distinct voicesβ€”Jane with a natural tone and Emma with a deeper, richer voice.
37
 
38
+ - **Transcript Generation:**
39
+ - Receive a markdown-formatted transcript alongside your podcast audio.
40
 
41
+ ## πŸ“¦ Installation
42
 
43
+ Follow these steps to set up and run MyPod on your local machine:
44
 
45
+ ### 1. Clone the Repository
 
 
 
46
 
47
+ ```bash
48
+ git clone https://github.com/yourusername/mypod.git
49
+ cd mypod
50
+ 2. Create a Virtual Environment (Optional but Recommended)
51
  bash
52
  Copy code
53
+ python -m venv mypod_env
54
+ source mypod_env/bin/activate # On Windows: mypod_env\Scripts\activate
55
+ 3. Install Dependencies
56
+ Ensure you have Python 3.7 or higher installed. Then, install the required packages:
57
 
58
  bash
59
  Copy code
60
  pip install -r requirements.txt
61
+ πŸš€ Usage
62
+ Launch the Gradio interface to start generating your podcasts:
 
 
 
 
 
 
63
 
64
  bash
65
  Copy code
66
  python app.py
67
+ This will start a local web server and provide a URL (e.g., http://127.0.0.1:7860) where you can interact with MyPod.
68
 
69
+ πŸ“ How to Use
70
+ Choose Your Input Source:
 
71
 
72
+ Upload PDF: Click on "Upload PDF" and select your PDF document.
73
+ Enter URL: Input the URL of the webpage you want to convert into a podcast.
74
+ Enter YouTube Link: Provide the YouTube video URL (Note: Requires User Auth - Work in Progress).
75
+ Research a Topic: Enter a detailed topic statement. Be as specific as possible. If the topic is too niche or specific, the outcome may vary.
 
 
76
  Select Tone and Duration:
77
 
78
  Tone: Choose from Humorous, Formal, Casual, or Youthful.
79
  Length: Select the desired duration range for your podcast.
80
  Generate Podcast:
81
 
82
+ Click on the "Submit" button.
83
+ Wait for the processing to complete. YouTube transcriptions may take longer due to processing requirements.
84
+ Download Your Podcast:
85
+
86
+ Once generated, download the podcast audio and view the transcript.
87
+ πŸ“š Input Sources Explained
88
+ 1. Upload PDF
89
+ Purpose: Convert the text content of a PDF document into an audio podcast.
90
+ Supported Formats: Only .pdf files are accepted.
91
+ 2. Enter URL
92
+ Purpose: Extract and convert the textual content of any webpage into a podcast.
93
+ Supported Content: Most standard webpages with readable text content.
94
+ 3. Enter YouTube Link (Requires User Auth - Work in Progress)
95
+ Purpose: Transcribe the audio from a YouTube video and convert it into a podcast.
96
+ Note: This feature is currently a work in progress and requires user authentication to access certain videos.
97
+ 4. Research a Topic
98
+ Purpose: Generate a podcast based on researched information from reputable sources.
99
+ Recommendation: Provide a detailed and specific topic statement to achieve the best results. Extremely niche or highly specialized topics may yield less comprehensive podcasts.
100
+ 🎨 Customization Options
101
+ Tone Selection:
102
+
103
+ Humorous: Funny and exciting, making listeners chuckle.
104
+ Formal: Business-like, well-structured, and professional.
105
+ Casual: Like a relaxed conversation between close friends.
106
+ Youthful: Energetic and lively, similar to how teenagers might chat.
107
+ Duration Selection:
108
+
109
+ 1-3 Minutes: Approximately 200-450 words.
110
+ 3-5 Minutes: Approximately 450-750 words.
111
+ 5-10 Minutes: Approximately 750-1500 words.
112
+ 10-20 Minutes: Approximately 1500-3000 words.
113
+ βš™οΈ Technical Details
114
  Backend Technologies:
115
 
116
+ Gradio: For building the interactive web interface.
117
+ Groq API: For generating podcast scripts.
118
+ TTS (Text-to-Speech): For converting scripts into audio.
119
+ Whisper ASR: For transcribing YouTube videos.
120
+ Pydub: For audio manipulation and processing.
121
+ Performance Optimizations:
122
+
123
+ Regex Precompilation: Combined and precompiled regex patterns to speed up abbreviation splitting.
124
+ Efficient Processing: Optimized text preprocessing to minimize podcast generation time.
125
+ πŸ› οΈ Development
126
+ Repository Structure
127
+ scss
128
+ Copy code
129
+ mypod/
130
+ β”œβ”€β”€ app.py
131
+ β”œβ”€β”€ utils.py
132
+ β”œβ”€β”€ prompts.py
133
+ β”œβ”€β”€ requirements.txt
134
+ β”œβ”€β”€ README.md
135
+ └── ... (other files)
136
+ app.py: Contains the Gradio interface and main application logic.
137
+ utils.py: Handles text processing, audio generation, and other utility functions.
138
+ prompts.py: Defines the system prompt for the language model.
139
+ requirements.txt: Lists all Python dependencies.
140
+ Contributing
141
+ We welcome contributions to improve MyPod! Whether it's fixing bugs, enhancing features, or optimizing performance, your help is valuable.
142
+
143
+ Fork the Repository
144
+ Create a Feature Branch:
145
+ bash
146
+ Copy code
147
+ git checkout -b feature/YourFeature
148
+ Commit Your Changes:
149
+ bash
150
+ Copy code
151
+ git commit -m "Add some feature"
152
+ Push to the Branch:
153
+ bash
154
+ Copy code
155
+ git push origin feature/YourFeature
156
+ Open a Pull Request
157
+ Reporting Issues
158
+ If you encounter any bugs or have suggestions for improvements, please open an issue in the Issues section of the repository.
159
+
160
+ πŸ“„ License
161
  This project is licensed under the MIT License.
162
 
163
+ 🀝 Acknowledgements
164
+ Gradio: For making it easy to build machine learning demos.
165
+ OpenAI Whisper: For powerful speech recognition capabilities.
166
+ TTS Community: For developing robust text-to-speech models.
167
+ Various RSS Feeds and Wikipedia: For providing reliable information sources.
168
+ πŸŽ‰ Get Started!
169
+ πŸ”₯ Ready to create your personalized podcast? Give MyPod a try now and let the magic happen! πŸ”₯
170
+
171
+ Launch MyPod and start transforming your content into engaging podcasts today!
172
 
173
+ Happy Podcasting! 🎧✨