Spaces:

zoya-hammadk
/

finsight_psx

Sleeping

App Files Files Community

zoya-hammad commited on Oct 21, 2025

Commit

c61fb2d

1 Parent(s): 8ac9730

-

Browse files

Files changed (4) hide show

README.md +1 -1
VIBEVOICE_UPGRADE_GUIDE.md +0 -366
fintech_project/pages/analysis_chatbot.py +20 -47
requirements.txt +2 -9

README.md CHANGED Viewed

@@ -16,7 +16,7 @@ A comprehensive analysis platform for the Pakistan Stock Exchange (PSX) with AI-
 - **Analysis Chatbot**: Ask questions about historical PSX data using RAG (Retrieval Augmented Generation)
   - 🎤 **Voice Input**: Speak your queries using speech-to-text (OpenAI Whisper API)
-  - 🎙️ **Podcast Generation**: Convert answers to audio podcasts (Microsoft VibeVoice TTS)
 - **Current Trends**: Real-time market sentiment analysis and live stock price trends
   - 📊 **Excel Export**: Download reports as spreadsheet with multiple sheets
 - **Urdu Translation**: Translate analysis results to Urdu (OpenRouter free model)

 - **Analysis Chatbot**: Ask questions about historical PSX data using RAG (Retrieval Augmented Generation)
   - 🎤 **Voice Input**: Speak your queries using speech-to-text (OpenAI Whisper API)
+  - 🎙️ **Podcast Generation**: Convert answers to audio podcasts (OpenAI TTS)
 - **Current Trends**: Real-time market sentiment analysis and live stock price trends
   - 📊 **Excel Export**: Download reports as spreadsheet with multiple sheets
 - **Urdu Translation**: Translate analysis results to Urdu (OpenRouter free model)

VIBEVOICE_UPGRADE_GUIDE.md DELETED Viewed

@@ -1,366 +0,0 @@
-# VibeVoice TTS - Upgrade Guide
-**Date:** October 21, 2025
-**Issue:** VibeVoice model requires latest transformers version
-**Solution:** Upgrade transformers to version 4.48.0 or higher
----
-## Quick Fix
-Run this command to upgrade transformers and install dependencies:
-```bash
-pip install --upgrade transformers>=4.48.0
-pip install torch accelerate sentencepiece scipy
-```
-Or install all requirements at once:
-```bash
-pip install -r requirements.txt
-```
----
-## What Changed
-### Updated Files
-1. **`requirements.txt`**
-   - Added `transformers>=4.48.0` (instead of generic `transformers`)
-   - Re-added all TTS dependencies: `torch`, `accelerate`, `sentencepiece`, `scipy`
-2. **`analysis_chatbot.py`**
-   - Added `trust_remote_code=True` parameter to pipeline
-   - Added transformers version debugging output
-   - Improved error messages with helpful tips
----
-## Installation Steps
-### For Local Development
-```bash
-# Option 1: Upgrade existing installation
-pip install --upgrade transformers>=4.48.0
-# Option 2: Fresh install from requirements
-pip install -r requirements.txt
-# Option 3: Install from latest transformers source (if still not working)
-pip install git+https://github.com/huggingface/transformers.git
-```
-### For HuggingFace Spaces
-The updated `requirements.txt` will automatically install the latest transformers version when you push to HuggingFace Spaces. No manual action needed.
-```bash
-git add requirements.txt fintech_project/pages/analysis_chatbot.py
-git commit -m "Update transformers for VibeVoice support"
-git push
-```
----
-## Why This Should Work
-### VibeVoice Model Details
-- **Model:** `microsoft/VibeVoice-1.5B`
-- **Released:** Recently (late 2024/early 2025)
-- **Requires:** Transformers 4.48.0+
-- **Feature:** `trust_remote_code=True` enables loading of newer model architectures
-### Key Changes in Code
-**Before:**
-```python
-pipe = pipeline("text-to-speech", model="microsoft/VibeVoice-1.5B")
-```
-**After:**
-```python
-pipe = pipeline(
-    "text-to-speech",
-    model="microsoft/VibeVoice-1.5B",
-    trust_remote_code=True  # ← This is crucial!
-)
-```
-The `trust_remote_code=True` parameter allows transformers to load custom model code that ships with the model on HuggingFace, which is necessary for very new architectures.
----
-## Testing
-### Test 1: Check Transformers Version
-```python
-import transformers
-print(transformers.__version__)
-# Should show: 4.48.0 or higher
-```
-### Test 2: Try Loading Model
-```python
-from transformers import pipeline
-pipe = pipeline(
-    "text-to-speech",
-    model="microsoft/VibeVoice-1.5B",
-    trust_remote_code=True
-)
-print("✓ Model loaded successfully!")
-```
-### Test 3: Generate Audio
-```python
-result = pipe("Hello, this is a test.", max_length=1000)
-print(f"✓ Generated audio shape: {result['audio'].shape}")
-```
-### Test 4: Full Podcast Test
-1. Run your Streamlit app:
-   ```bash
-   streamlit run fintech_project/app.py
-   ```
-2. Navigate to Analysis Chatbot
-3. Generate an answer
-4. Click "🎙️ Generate Podcast"
-5. Should work without errors!
----
-## Troubleshooting
-### Issue 1: Still Getting "vibevoice not recognized"
-**Solution A:** Force upgrade transformers
-```bash
-pip install --upgrade --force-reinstall transformers>=4.48.0
-```
-**Solution B:** Install from source (bleeding edge)
-```bash
-pip install git+https://github.com/huggingface/transformers.git
-```
-**Solution C:** Check Python version
-- VibeVoice requires Python 3.8+
-- Recommended: Python 3.10 or 3.11
-### Issue 2: CUDA/GPU Errors
-**If you see CUDA errors but don't have a GPU:**
-```bash
-# Install CPU-only PyTorch
-pip install torch --index-url https://download.pytorch.org/whl/cpu
-```
-**Note:** VibeVoice will work on CPU, just slower (30-60s per podcast instead of 10-20s).
-### Issue 3: Memory Errors
-**Solution:** VibeVoice-1.5B needs ~6GB RAM
-- On local machine: Close other applications
-- On HuggingFace Spaces: Model loads on first use and is cached
-- If still issues: Consider using smaller TTS model
-### Issue 4: "trust_remote_code" Warning
-**Warning message:**
-```
-You are using a model with `trust_remote_code=True`. This can execute arbitrary code.
-```
-**This is normal and safe for official Microsoft models.**
-To suppress the warning:
-```bash
-export TRANSFORMERS_NO_ADVISORY_WARNINGS=1
-```
----
-## Alternative TTS Models (If Still Not Working)
-If VibeVoice still doesn't work after upgrading, here are proven alternatives:
-### Option 1: OpenAI TTS (Recommended)
-- **Cost:** ~$0.015/1K chars
-- **Quality:** Excellent
-- **Speed:** Fast (10-20s)
-- **Code:** Already implemented (see `PODCAST_FIX.md`)
-### Option 2: Coqui TTS (Open Source)
-```bash
-pip install TTS
-```
-```python
-from TTS.api import TTS
-tts = TTS("tts_models/en/ljspeech/tacotron2-DDC")
-tts.tts_to_file(text="Hello", file_path="output.wav")
-```
-### Option 3: gTTS (Simple, Free)
-```bash
-pip install gTTS
-```
-```python
-from gtts import gTTS
-tts = gTTS("Hello, this is a test", lang='en')
-tts.save("output.mp3")
-```
-**Note:** gTTS is much simpler but lower quality than VibeVoice.
----
-## Performance Comparison
-| Model | Quality | Speed (CPU) | Cost | Setup |
-|-------|---------|-------------|------|-------|
-| **VibeVoice-1.5B** | Excellent | 30-60s | Free | Complex |
-| **OpenAI TTS** | Excellent | 10-20s | ~$0.02 | Simple |
-| **Coqui TTS** | Good | 15-30s | Free | Medium |
-| **gTTS** | Basic | 5-10s | Free | Very Simple |
----
-## Recommended Approach
-### Step 1: Try VibeVoice with Upgrade
-```bash
-pip install --upgrade transformers>=4.48.0
-streamlit run fintech_project/app.py
-```
-### Step 2: If Not Working, Use OpenAI TTS
-- Refer to `PODCAST_FIX.md` for OpenAI implementation
-- Already coded and tested
-- More reliable for production
-### Step 3: For Cost Savings (Future)
-- Use VibeVoice once it's stable
-- Or use Coqui TTS for open-source solution
----
-## HuggingFace Spaces Deployment
-### Current Setup
-Your `requirements.txt` now has:
-```txt
-transformers>=4.48.0
-torch
-accelerate
-sentencepiece
-scipy
-```
-### Expected Behavior on Spaces
-1. **First deployment:** Will install transformers 4.48.0+
-2. **Model loading:** First user triggers download (~1.5GB)
-3. **Cached:** Subsequent uses are fast
-4. **Works with:** `trust_remote_code=True` parameter
-### Build Time
-- **Normal:** 10-15 minutes (torch installation)
-- **With GPU:** May take longer
-- **Model download:** First use only (~5 minutes)
----
-## Docker Considerations
-If using Docker, your Dockerfile should have:
-```dockerfile
-# Install PyTorch CPU version to reduce image size
-RUN pip install torch --index-url https://download.pytorch.org/whl/cpu
-# Install other requirements
-RUN pip install -r requirements.txt
-```
-**Image size impact:**
-- Base + requirements: ~4GB
-- With CPU PyTorch: ~6GB
-- With GPU PyTorch: ~10GB
----
-## Security Note
-The `trust_remote_code=True` parameter allows the model to execute custom code. This is safe for:
-✅ Official models from Microsoft, Meta, Google
-✅ Well-known model repositories
-✅ Models with many downloads/stars
-⚠️ Use caution with unknown/untested models
-For VibeVoice from Microsoft: **Completely safe**
----
-## Summary
-### ✅ What We Did
-1. Updated `requirements.txt` to specify `transformers>=4.48.0`
-2. Added `trust_remote_code=True` to model loading
-3. Kept all VibeVoice functionality (free, open-source)
-4. Added better error messages and debugging
-### 🚀 Next Steps
-1. **Install/upgrade transformers:**
-   ```bash
-   pip install --upgrade transformers>=4.48.0
-   ```
-2. **Test locally:**
-   ```bash
-   streamlit run fintech_project/app.py
-   ```
-3. **Deploy to HuggingFace Spaces:**
-   ```bash
-   git add -A
-   git commit -m "Update transformers for VibeVoice support"
-   git push
-   ```
-### 📊 Expected Results
-✅ VibeVoice loads successfully
-✅ Podcast generation works
-✅ Free and open-source
-✅ High-quality audio output
-✅ No API costs
----
-## Fallback Plan
-If upgrading transformers still doesn't work, you have a ready-to-use OpenAI TTS implementation documented in `PODCAST_FIX.md`. Just uncomment that code and you're good to go!
----
-**Version:** 2.4
-**Last Updated:** October 21, 2025
-**Status:** 🔄 Testing with latest transformers

fintech_project/pages/analysis_chatbot.py CHANGED Viewed

@@ -303,30 +303,11 @@ def transcribe_audio(audio_bytes) -> str:
     except Exception as e:
         raise RuntimeError(f"Transcription failed: {e}")
-@st.cache_resource
-def load_tts_pipeline():
-    """Load VibeVoice text-to-speech model with latest transformers"""
-    try:
-        from transformers import pipeline
-        import transformers
-        # Show transformers version for debugging
-        print(f"Transformers version: {transformers.__version__}")
-        # Load the pipeline
-        pipe = pipeline(
-            "text-to-speech",
-            model="microsoft/VibeVoice-1.5B",
-            trust_remote_code=True  # Required for newer models
-        )
-        return pipe
-    except Exception as e:
-        st.error(f"Failed to load TTS model: {e}")
-        st.info("💡 Tip: Try upgrading transformers with: pip install --upgrade transformers")
-        return None
 def generate_podcast(question: str, answer: str) -> bytes:
-    """Generate podcast audio from question and answer using VibeVoice TTS"""
     # Create podcast script
     podcast_script = f"""Welcome to FinSight PSX Insights.
@@ -352,27 +333,19 @@ Thank you for listening to FinSight PSX Insights."""
         except Exception as e:
             st.warning(f"Could not enhance podcast script: {e}. Using basic format.")
-    # Generate audio using VibeVoice
-    tts_pipe = load_tts_pipeline()
-    if tts_pipe:
-        try:
-            # Generate speech
-            result = tts_pipe(podcast_script, max_length=1000)
-            # Convert to bytes for download
-            audio_array = result["audio"]
-            # Save as WAV format
-            import scipy.io.wavfile as wav
-            buffer = BytesIO()
-            sample_rate = result.get("sampling_rate", 16000)
-            wav.write(buffer, sample_rate, audio_array)
-            buffer.seek(0)
-            return buffer.getvalue()
-        except Exception as e:
-            raise RuntimeError(f"Podcast generation failed: {e}")
-    else:
-        raise RuntimeError("TTS model not available. Please check transformers installation.")
 # -------------------------------------------
 # STREAMLIT UI
@@ -506,12 +479,12 @@ if st.session_state.get("last_answer"):
         # Show podcast player if generated
         if st.session_state.get("podcast_audio"):
             st.markdown("### 🎧 Generated Podcast")
-            st.audio(st.session_state["podcast_audio"], format="audio/wav")
             st.download_button(
                 "📥 Download Podcast",
                 data=st.session_state["podcast_audio"],
-                file_name="finsight_podcast.wav",
-                mime="audio/wav",
                 use_container_width=True
             )

     except Exception as e:
         raise RuntimeError(f"Transcription failed: {e}")
 def generate_podcast(question: str, answer: str) -> bytes:
+    """Generate podcast audio from question and answer using OpenAI TTS"""
+    if client is None:
+        raise RuntimeError("OPENAI_API_KEY not set.")
     # Create podcast script
     podcast_script = f"""Welcome to FinSight PSX Insights.
         except Exception as e:
             st.warning(f"Could not enhance podcast script: {e}. Using basic format.")
+    # Generate audio using OpenAI TTS
+    try:
+        # Use OpenAI's TTS API with the 'alloy' voice
+        response = client.audio.speech.create(
+            model="tts-1",  # Standard quality (use "tts-1-hd" for higher quality)
+            voice="alloy",  # Options: alloy, echo, fable, onyx, nova, shimmer
+            input=podcast_script
+        )
+        # Return the audio bytes
+        return response.content
+    except Exception as e:
+        raise RuntimeError(f"Podcast generation failed: {e}")
 # -------------------------------------------
 # STREAMLIT UI
         # Show podcast player if generated
         if st.session_state.get("podcast_audio"):
             st.markdown("### 🎧 Generated Podcast")
+            st.audio(st.session_state["podcast_audio"], format="audio/mp3")
             st.download_button(
                 "📥 Download Podcast",
                 data=st.session_state["podcast_audio"],
+                file_name="finsight_podcast.mp3",
+                mime="audio/mpeg",
                 use_container_width=True
             )

requirements.txt CHANGED Viewed

@@ -22,12 +22,5 @@ pyarrow
 tqdm
 openpyxl
-# Audio processing
-streamlit-audiorec
-scipy
-# Text-to-speech (latest version for VibeVoice support)
-transformers>=4.48.0
-torch
-accelerate
-sentencepiece

 tqdm
 openpyxl
+# Audio processing (for voice input)
+streamlit-audiorec