zoya-hammad commited on
Commit
c61fb2d
Β·
1 Parent(s): 8ac9730
README.md CHANGED
@@ -16,7 +16,7 @@ A comprehensive analysis platform for the Pakistan Stock Exchange (PSX) with AI-
16
 
17
  - **Analysis Chatbot**: Ask questions about historical PSX data using RAG (Retrieval Augmented Generation)
18
  - 🎀 **Voice Input**: Speak your queries using speech-to-text (OpenAI Whisper API)
19
- - πŸŽ™οΈ **Podcast Generation**: Convert answers to audio podcasts (Microsoft VibeVoice TTS)
20
  - **Current Trends**: Real-time market sentiment analysis and live stock price trends
21
  - πŸ“Š **Excel Export**: Download reports as spreadsheet with multiple sheets
22
  - **Urdu Translation**: Translate analysis results to Urdu (OpenRouter free model)
 
16
 
17
  - **Analysis Chatbot**: Ask questions about historical PSX data using RAG (Retrieval Augmented Generation)
18
  - 🎀 **Voice Input**: Speak your queries using speech-to-text (OpenAI Whisper API)
19
+ - πŸŽ™οΈ **Podcast Generation**: Convert answers to audio podcasts (OpenAI TTS)
20
  - **Current Trends**: Real-time market sentiment analysis and live stock price trends
21
  - πŸ“Š **Excel Export**: Download reports as spreadsheet with multiple sheets
22
  - **Urdu Translation**: Translate analysis results to Urdu (OpenRouter free model)
VIBEVOICE_UPGRADE_GUIDE.md DELETED
@@ -1,366 +0,0 @@
1
- # VibeVoice TTS - Upgrade Guide
2
-
3
- **Date:** October 21, 2025
4
- **Issue:** VibeVoice model requires latest transformers version
5
- **Solution:** Upgrade transformers to version 4.48.0 or higher
6
-
7
- ---
8
-
9
- ## Quick Fix
10
-
11
- Run this command to upgrade transformers and install dependencies:
12
-
13
- ```bash
14
- pip install --upgrade transformers>=4.48.0
15
- pip install torch accelerate sentencepiece scipy
16
- ```
17
-
18
- Or install all requirements at once:
19
-
20
- ```bash
21
- pip install -r requirements.txt
22
- ```
23
-
24
- ---
25
-
26
- ## What Changed
27
-
28
- ### Updated Files
29
-
30
- 1. **`requirements.txt`**
31
- - Added `transformers>=4.48.0` (instead of generic `transformers`)
32
- - Re-added all TTS dependencies: `torch`, `accelerate`, `sentencepiece`, `scipy`
33
-
34
- 2. **`analysis_chatbot.py`**
35
- - Added `trust_remote_code=True` parameter to pipeline
36
- - Added transformers version debugging output
37
- - Improved error messages with helpful tips
38
-
39
- ---
40
-
41
- ## Installation Steps
42
-
43
- ### For Local Development
44
-
45
- ```bash
46
- # Option 1: Upgrade existing installation
47
- pip install --upgrade transformers>=4.48.0
48
-
49
- # Option 2: Fresh install from requirements
50
- pip install -r requirements.txt
51
-
52
- # Option 3: Install from latest transformers source (if still not working)
53
- pip install git+https://github.com/huggingface/transformers.git
54
- ```
55
-
56
- ### For HuggingFace Spaces
57
-
58
- The updated `requirements.txt` will automatically install the latest transformers version when you push to HuggingFace Spaces. No manual action needed.
59
-
60
- ```bash
61
- git add requirements.txt fintech_project/pages/analysis_chatbot.py
62
- git commit -m "Update transformers for VibeVoice support"
63
- git push
64
- ```
65
-
66
- ---
67
-
68
- ## Why This Should Work
69
-
70
- ### VibeVoice Model Details
71
-
72
- - **Model:** `microsoft/VibeVoice-1.5B`
73
- - **Released:** Recently (late 2024/early 2025)
74
- - **Requires:** Transformers 4.48.0+
75
- - **Feature:** `trust_remote_code=True` enables loading of newer model architectures
76
-
77
- ### Key Changes in Code
78
-
79
- **Before:**
80
- ```python
81
- pipe = pipeline("text-to-speech", model="microsoft/VibeVoice-1.5B")
82
- ```
83
-
84
- **After:**
85
- ```python
86
- pipe = pipeline(
87
- "text-to-speech",
88
- model="microsoft/VibeVoice-1.5B",
89
- trust_remote_code=True # ← This is crucial!
90
- )
91
- ```
92
-
93
- The `trust_remote_code=True` parameter allows transformers to load custom model code that ships with the model on HuggingFace, which is necessary for very new architectures.
94
-
95
- ---
96
-
97
- ## Testing
98
-
99
- ### Test 1: Check Transformers Version
100
-
101
- ```python
102
- import transformers
103
- print(transformers.__version__)
104
- # Should show: 4.48.0 or higher
105
- ```
106
-
107
- ### Test 2: Try Loading Model
108
-
109
- ```python
110
- from transformers import pipeline
111
-
112
- pipe = pipeline(
113
- "text-to-speech",
114
- model="microsoft/VibeVoice-1.5B",
115
- trust_remote_code=True
116
- )
117
- print("βœ“ Model loaded successfully!")
118
- ```
119
-
120
- ### Test 3: Generate Audio
121
-
122
- ```python
123
- result = pipe("Hello, this is a test.", max_length=1000)
124
- print(f"βœ“ Generated audio shape: {result['audio'].shape}")
125
- ```
126
-
127
- ### Test 4: Full Podcast Test
128
-
129
- 1. Run your Streamlit app:
130
- ```bash
131
- streamlit run fintech_project/app.py
132
- ```
133
-
134
- 2. Navigate to Analysis Chatbot
135
- 3. Generate an answer
136
- 4. Click "πŸŽ™οΈ Generate Podcast"
137
- 5. Should work without errors!
138
-
139
- ---
140
-
141
- ## Troubleshooting
142
-
143
- ### Issue 1: Still Getting "vibevoice not recognized"
144
-
145
- **Solution A:** Force upgrade transformers
146
- ```bash
147
- pip install --upgrade --force-reinstall transformers>=4.48.0
148
- ```
149
-
150
- **Solution B:** Install from source (bleeding edge)
151
- ```bash
152
- pip install git+https://github.com/huggingface/transformers.git
153
- ```
154
-
155
- **Solution C:** Check Python version
156
- - VibeVoice requires Python 3.8+
157
- - Recommended: Python 3.10 or 3.11
158
-
159
- ### Issue 2: CUDA/GPU Errors
160
-
161
- **If you see CUDA errors but don't have a GPU:**
162
-
163
- ```bash
164
- # Install CPU-only PyTorch
165
- pip install torch --index-url https://download.pytorch.org/whl/cpu
166
- ```
167
-
168
- **Note:** VibeVoice will work on CPU, just slower (30-60s per podcast instead of 10-20s).
169
-
170
- ### Issue 3: Memory Errors
171
-
172
- **Solution:** VibeVoice-1.5B needs ~6GB RAM
173
-
174
- - On local machine: Close other applications
175
- - On HuggingFace Spaces: Model loads on first use and is cached
176
- - If still issues: Consider using smaller TTS model
177
-
178
- ### Issue 4: "trust_remote_code" Warning
179
-
180
- **Warning message:**
181
- ```
182
- You are using a model with `trust_remote_code=True`. This can execute arbitrary code.
183
- ```
184
-
185
- **This is normal and safe for official Microsoft models.**
186
-
187
- To suppress the warning:
188
- ```bash
189
- export TRANSFORMERS_NO_ADVISORY_WARNINGS=1
190
- ```
191
-
192
- ---
193
-
194
- ## Alternative TTS Models (If Still Not Working)
195
-
196
- If VibeVoice still doesn't work after upgrading, here are proven alternatives:
197
-
198
- ### Option 1: OpenAI TTS (Recommended)
199
- - **Cost:** ~$0.015/1K chars
200
- - **Quality:** Excellent
201
- - **Speed:** Fast (10-20s)
202
- - **Code:** Already implemented (see `PODCAST_FIX.md`)
203
-
204
- ### Option 2: Coqui TTS (Open Source)
205
- ```bash
206
- pip install TTS
207
- ```
208
- ```python
209
- from TTS.api import TTS
210
- tts = TTS("tts_models/en/ljspeech/tacotron2-DDC")
211
- tts.tts_to_file(text="Hello", file_path="output.wav")
212
- ```
213
-
214
- ### Option 3: gTTS (Simple, Free)
215
- ```bash
216
- pip install gTTS
217
- ```
218
- ```python
219
- from gtts import gTTS
220
- tts = gTTS("Hello, this is a test", lang='en')
221
- tts.save("output.mp3")
222
- ```
223
-
224
- **Note:** gTTS is much simpler but lower quality than VibeVoice.
225
-
226
- ---
227
-
228
- ## Performance Comparison
229
-
230
- | Model | Quality | Speed (CPU) | Cost | Setup |
231
- |-------|---------|-------------|------|-------|
232
- | **VibeVoice-1.5B** | Excellent | 30-60s | Free | Complex |
233
- | **OpenAI TTS** | Excellent | 10-20s | ~$0.02 | Simple |
234
- | **Coqui TTS** | Good | 15-30s | Free | Medium |
235
- | **gTTS** | Basic | 5-10s | Free | Very Simple |
236
-
237
- ---
238
-
239
- ## Recommended Approach
240
-
241
- ### Step 1: Try VibeVoice with Upgrade
242
- ```bash
243
- pip install --upgrade transformers>=4.48.0
244
- streamlit run fintech_project/app.py
245
- ```
246
-
247
- ### Step 2: If Not Working, Use OpenAI TTS
248
- - Refer to `PODCAST_FIX.md` for OpenAI implementation
249
- - Already coded and tested
250
- - More reliable for production
251
-
252
- ### Step 3: For Cost Savings (Future)
253
- - Use VibeVoice once it's stable
254
- - Or use Coqui TTS for open-source solution
255
-
256
- ---
257
-
258
- ## HuggingFace Spaces Deployment
259
-
260
- ### Current Setup
261
-
262
- Your `requirements.txt` now has:
263
- ```txt
264
- transformers>=4.48.0
265
- torch
266
- accelerate
267
- sentencepiece
268
- scipy
269
- ```
270
-
271
- ### Expected Behavior on Spaces
272
-
273
- 1. **First deployment:** Will install transformers 4.48.0+
274
- 2. **Model loading:** First user triggers download (~1.5GB)
275
- 3. **Cached:** Subsequent uses are fast
276
- 4. **Works with:** `trust_remote_code=True` parameter
277
-
278
- ### Build Time
279
-
280
- - **Normal:** 10-15 minutes (torch installation)
281
- - **With GPU:** May take longer
282
- - **Model download:** First use only (~5 minutes)
283
-
284
- ---
285
-
286
- ## Docker Considerations
287
-
288
- If using Docker, your Dockerfile should have:
289
-
290
- ```dockerfile
291
- # Install PyTorch CPU version to reduce image size
292
- RUN pip install torch --index-url https://download.pytorch.org/whl/cpu
293
-
294
- # Install other requirements
295
- RUN pip install -r requirements.txt
296
- ```
297
-
298
- **Image size impact:**
299
- - Base + requirements: ~4GB
300
- - With CPU PyTorch: ~6GB
301
- - With GPU PyTorch: ~10GB
302
-
303
- ---
304
-
305
- ## Security Note
306
-
307
- The `trust_remote_code=True` parameter allows the model to execute custom code. This is safe for:
308
-
309
- βœ… Official models from Microsoft, Meta, Google
310
- βœ… Well-known model repositories
311
- βœ… Models with many downloads/stars
312
-
313
- ⚠️ Use caution with unknown/untested models
314
-
315
- For VibeVoice from Microsoft: **Completely safe**
316
-
317
- ---
318
-
319
- ## Summary
320
-
321
- ### βœ… What We Did
322
-
323
- 1. Updated `requirements.txt` to specify `transformers>=4.48.0`
324
- 2. Added `trust_remote_code=True` to model loading
325
- 3. Kept all VibeVoice functionality (free, open-source)
326
- 4. Added better error messages and debugging
327
-
328
- ### πŸš€ Next Steps
329
-
330
- 1. **Install/upgrade transformers:**
331
- ```bash
332
- pip install --upgrade transformers>=4.48.0
333
- ```
334
-
335
- 2. **Test locally:**
336
- ```bash
337
- streamlit run fintech_project/app.py
338
- ```
339
-
340
- 3. **Deploy to HuggingFace Spaces:**
341
- ```bash
342
- git add -A
343
- git commit -m "Update transformers for VibeVoice support"
344
- git push
345
- ```
346
-
347
- ### πŸ“Š Expected Results
348
-
349
- βœ… VibeVoice loads successfully
350
- βœ… Podcast generation works
351
- βœ… Free and open-source
352
- βœ… High-quality audio output
353
- βœ… No API costs
354
-
355
- ---
356
-
357
- ## Fallback Plan
358
-
359
- If upgrading transformers still doesn't work, you have a ready-to-use OpenAI TTS implementation documented in `PODCAST_FIX.md`. Just uncomment that code and you're good to go!
360
-
361
- ---
362
-
363
- **Version:** 2.4
364
- **Last Updated:** October 21, 2025
365
- **Status:** πŸ”„ Testing with latest transformers
366
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
fintech_project/pages/analysis_chatbot.py CHANGED
@@ -303,30 +303,11 @@ def transcribe_audio(audio_bytes) -> str:
303
  except Exception as e:
304
  raise RuntimeError(f"Transcription failed: {e}")
305
 
306
- @st.cache_resource
307
- def load_tts_pipeline():
308
- """Load VibeVoice text-to-speech model with latest transformers"""
309
- try:
310
- from transformers import pipeline
311
- import transformers
312
-
313
- # Show transformers version for debugging
314
- print(f"Transformers version: {transformers.__version__}")
315
-
316
- # Load the pipeline
317
- pipe = pipeline(
318
- "text-to-speech",
319
- model="microsoft/VibeVoice-1.5B",
320
- trust_remote_code=True # Required for newer models
321
- )
322
- return pipe
323
- except Exception as e:
324
- st.error(f"Failed to load TTS model: {e}")
325
- st.info("πŸ’‘ Tip: Try upgrading transformers with: pip install --upgrade transformers")
326
- return None
327
-
328
  def generate_podcast(question: str, answer: str) -> bytes:
329
- """Generate podcast audio from question and answer using VibeVoice TTS"""
 
 
 
330
  # Create podcast script
331
  podcast_script = f"""Welcome to FinSight PSX Insights.
332
 
@@ -352,27 +333,19 @@ Thank you for listening to FinSight PSX Insights."""
352
  except Exception as e:
353
  st.warning(f"Could not enhance podcast script: {e}. Using basic format.")
354
 
355
- # Generate audio using VibeVoice
356
- tts_pipe = load_tts_pipeline()
357
- if tts_pipe:
358
- try:
359
- # Generate speech
360
- result = tts_pipe(podcast_script, max_length=1000)
361
-
362
- # Convert to bytes for download
363
- audio_array = result["audio"]
364
-
365
- # Save as WAV format
366
- import scipy.io.wavfile as wav
367
- buffer = BytesIO()
368
- sample_rate = result.get("sampling_rate", 16000)
369
- wav.write(buffer, sample_rate, audio_array)
370
- buffer.seek(0)
371
- return buffer.getvalue()
372
- except Exception as e:
373
- raise RuntimeError(f"Podcast generation failed: {e}")
374
- else:
375
- raise RuntimeError("TTS model not available. Please check transformers installation.")
376
 
377
  # -------------------------------------------
378
  # STREAMLIT UI
@@ -506,12 +479,12 @@ if st.session_state.get("last_answer"):
506
  # Show podcast player if generated
507
  if st.session_state.get("podcast_audio"):
508
  st.markdown("### 🎧 Generated Podcast")
509
- st.audio(st.session_state["podcast_audio"], format="audio/wav")
510
  st.download_button(
511
  "πŸ“₯ Download Podcast",
512
  data=st.session_state["podcast_audio"],
513
- file_name="finsight_podcast.wav",
514
- mime="audio/wav",
515
  use_container_width=True
516
  )
517
 
 
303
  except Exception as e:
304
  raise RuntimeError(f"Transcription failed: {e}")
305
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
306
  def generate_podcast(question: str, answer: str) -> bytes:
307
+ """Generate podcast audio from question and answer using OpenAI TTS"""
308
+ if client is None:
309
+ raise RuntimeError("OPENAI_API_KEY not set.")
310
+
311
  # Create podcast script
312
  podcast_script = f"""Welcome to FinSight PSX Insights.
313
 
 
333
  except Exception as e:
334
  st.warning(f"Could not enhance podcast script: {e}. Using basic format.")
335
 
336
+ # Generate audio using OpenAI TTS
337
+ try:
338
+ # Use OpenAI's TTS API with the 'alloy' voice
339
+ response = client.audio.speech.create(
340
+ model="tts-1", # Standard quality (use "tts-1-hd" for higher quality)
341
+ voice="alloy", # Options: alloy, echo, fable, onyx, nova, shimmer
342
+ input=podcast_script
343
+ )
344
+
345
+ # Return the audio bytes
346
+ return response.content
347
+ except Exception as e:
348
+ raise RuntimeError(f"Podcast generation failed: {e}")
 
 
 
 
 
 
 
 
349
 
350
  # -------------------------------------------
351
  # STREAMLIT UI
 
479
  # Show podcast player if generated
480
  if st.session_state.get("podcast_audio"):
481
  st.markdown("### 🎧 Generated Podcast")
482
+ st.audio(st.session_state["podcast_audio"], format="audio/mp3")
483
  st.download_button(
484
  "πŸ“₯ Download Podcast",
485
  data=st.session_state["podcast_audio"],
486
+ file_name="finsight_podcast.mp3",
487
+ mime="audio/mpeg",
488
  use_container_width=True
489
  )
490
 
requirements.txt CHANGED
@@ -22,12 +22,5 @@ pyarrow
22
  tqdm
23
  openpyxl
24
 
25
- # Audio processing
26
- streamlit-audiorec
27
- scipy
28
-
29
- # Text-to-speech (latest version for VibeVoice support)
30
- transformers>=4.48.0
31
- torch
32
- accelerate
33
- sentencepiece
 
22
  tqdm
23
  openpyxl
24
 
25
+ # Audio processing (for voice input)
26
+ streamlit-audiorec