Spaces:
Sleeping
Sleeping
Shantika commited on
Commit ·
598efec
1
Parent(s): d6c8310
Upload full project to Space
Browse files- IMPLEMENTATION_NOTES.md +135 -0
- README.md +277 -13
- download_models.py +40 -0
- pipe_method3.py +572 -0
- start.sh +67 -0
- stt_llm_ttsopenai.py +636 -0
- web_demo/.gitignore +24 -0
- web_demo/README.md +16 -0
- web_demo/envdatavars.txt +1 -0
- web_demo/eslint.config.js +29 -0
- web_demo/index.html +13 -0
- web_demo/package-lock.json +0 -0
- web_demo/package.json +28 -0
- web_demo/public/vite.svg +1 -0
- web_demo/src/App.css +211 -0
- web_demo/src/App.jsx +162 -0
- web_demo/src/assets/react.svg +1 -0
- web_demo/src/components/MicrophoneTest.css +315 -0
- web_demo/src/components/MicrophoneTest.jsx +307 -0
- web_demo/src/components/SttLlmTts.css +653 -0
- web_demo/src/components/SttLlmTts.jsx +505 -0
- web_demo/src/components/TextToSpeech.css +321 -0
- web_demo/src/components/TextToSpeech.jsx +327 -0
- web_demo/src/index.css +47 -0
- web_demo/src/main.jsx +10 -0
- web_demo/vite.config.js +7 -0
IMPLEMENTATION_NOTES.md
ADDED
|
@@ -0,0 +1,135 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Implementation Notes
|
| 2 |
+
|
| 3 |
+
## Architecture Overview
|
| 4 |
+
|
| 5 |
+
The STT system is built in 5 progressive steps, each adding functionality on top of the previous:
|
| 6 |
+
|
| 7 |
+
1. **Step 1**: Basic offline transcription (Whisper/Vosk)
|
| 8 |
+
2. **Step 2**: HTTP API for file uploads
|
| 9 |
+
3. **Step 3**: WebSocket streaming for real-time audio
|
| 10 |
+
4. **Step 4**: Telephony audio format support (Twilio/Exotel)
|
| 11 |
+
5. **Step 5**: Production-ready with stability features
|
| 12 |
+
|
| 13 |
+
## Key Components
|
| 14 |
+
|
| 15 |
+
### Audio Processing
|
| 16 |
+
|
| 17 |
+
- **TelephonyAudioConverter**: Handles format conversion
|
| 18 |
+
- Twilio: 8kHz μ-law → 16kHz PCM
|
| 19 |
+
- Exotel: 8kHz PCM → 16kHz PCM
|
| 20 |
+
- Uses scipy.signal.resample for sample rate conversion
|
| 21 |
+
|
| 22 |
+
### Voice Activity Detection (VAD)
|
| 23 |
+
|
| 24 |
+
- Simple energy-based VAD in Step 5
|
| 25 |
+
- Threshold: 0.01 (configurable)
|
| 26 |
+
- Frame-based analysis (25ms frames)
|
| 27 |
+
- Detects speech vs silence
|
| 28 |
+
|
| 29 |
+
### Audio Buffering
|
| 30 |
+
|
| 31 |
+
- **AudioBuffer**: Accumulates audio chunks
|
| 32 |
+
- Configurable chunk duration (default: 1.0s)
|
| 33 |
+
- Minimum interval between transcriptions (0.5s)
|
| 34 |
+
- Handles silence timeouts (3.0s)
|
| 35 |
+
|
| 36 |
+
### Duplicate Prevention
|
| 37 |
+
|
| 38 |
+
- Compares new transcriptions with previous
|
| 39 |
+
- Prevents sending identical text multiple times
|
| 40 |
+
- Simple substring matching (can be enhanced)
|
| 41 |
+
|
| 42 |
+
## Things to Consider
|
| 43 |
+
|
| 44 |
+
### Performance
|
| 45 |
+
|
| 46 |
+
1. **Model Loading**: Whisper models are loaded per connection (lazy loading)
|
| 47 |
+
- Consider model caching/pooling for production
|
| 48 |
+
- Larger models (medium/large) are more accurate but slower
|
| 49 |
+
|
| 50 |
+
2. **Chunk Size**: Balance between latency and accuracy
|
| 51 |
+
- Smaller chunks = lower latency but less context
|
| 52 |
+
- Larger chunks = better accuracy but higher latency
|
| 53 |
+
|
| 54 |
+
3. **Concurrent Connections**: Each connection loads its own model
|
| 55 |
+
- Consider shared model instances for multiple connections
|
| 56 |
+
- Monitor memory usage with many concurrent calls
|
| 57 |
+
|
| 58 |
+
### Audio Quality
|
| 59 |
+
|
| 60 |
+
1. **Sample Rate**: Whisper works best with 16kHz
|
| 61 |
+
- Telephony audio (8kHz) must be upsampled
|
| 62 |
+
- Quality may be reduced compared to native 16kHz
|
| 63 |
+
|
| 64 |
+
2. **Noise**: Telephony audio often has background noise
|
| 65 |
+
- Consider noise reduction preprocessing
|
| 66 |
+
- VAD helps filter silence but not noise
|
| 67 |
+
|
| 68 |
+
3. **Format Conversion**: μ-law to PCM conversion may introduce artifacts
|
| 69 |
+
- Test with real telephony audio
|
| 70 |
+
- Consider alternative conversion methods if quality is poor
|
| 71 |
+
|
| 72 |
+
### Stability & Reliability
|
| 73 |
+
|
| 74 |
+
1. **Disconnections**: Handled gracefully in Step 5
|
| 75 |
+
- Final transcription on remaining buffer
|
| 76 |
+
- Session cleanup on disconnect
|
| 77 |
+
|
| 78 |
+
2. **Error Handling**: Comprehensive error catching
|
| 79 |
+
- Logs errors per call
|
| 80 |
+
- Continues processing on individual failures
|
| 81 |
+
|
| 82 |
+
3. **Logging**: Per-call logging in Step 5
|
| 83 |
+
- Logs stored in `logs/stt.log`
|
| 84 |
+
- Includes call_id for tracking
|
| 85 |
+
|
| 86 |
+
### Scaling Considerations
|
| 87 |
+
|
| 88 |
+
1. **Model Memory**: Whisper models are large (base ~150MB, large ~3GB)
|
| 89 |
+
- Consider GPU acceleration for faster inference
|
| 90 |
+
- Model quantization for reduced memory
|
| 91 |
+
|
| 92 |
+
2. **API Rate Limiting**: No rate limiting implemented
|
| 93 |
+
- Add rate limiting for production
|
| 94 |
+
- Consider request queuing
|
| 95 |
+
|
| 96 |
+
3. **Database**: No persistent storage
|
| 97 |
+
- Add database for call transcripts
|
| 98 |
+
- Store session metadata
|
| 99 |
+
|
| 100 |
+
4. **Load Balancing**: Single server implementation
|
| 101 |
+
- Consider multiple workers/instances
|
| 102 |
+
- Use message queue for audio processing
|
| 103 |
+
|
| 104 |
+
### Security
|
| 105 |
+
|
| 106 |
+
1. **Authentication**: No authentication implemented
|
| 107 |
+
- Add API keys/tokens
|
| 108 |
+
- WebSocket authentication
|
| 109 |
+
|
| 110 |
+
2. **Input Validation**: Basic validation
|
| 111 |
+
- Validate audio format/size
|
| 112 |
+
- Rate limit per client
|
| 113 |
+
|
| 114 |
+
3. **Data Privacy**: Transcripts logged
|
| 115 |
+
- Consider encryption for sensitive data
|
| 116 |
+
- Implement data retention policies
|
| 117 |
+
|
| 118 |
+
## Testing Recommendations
|
| 119 |
+
|
| 120 |
+
1. **Unit Tests**: Test audio conversion functions
|
| 121 |
+
2. **Integration Tests**: Test WebSocket streaming with real audio
|
| 122 |
+
3. **Load Tests**: Test with multiple concurrent connections
|
| 123 |
+
4. **Telephony Tests**: Test with actual Twilio/Exotel audio streams
|
| 124 |
+
|
| 125 |
+
## Future Enhancements
|
| 126 |
+
|
| 127 |
+
1. **Better VAD**: Use more sophisticated VAD (e.g., WebRTC VAD)
|
| 128 |
+
2. **Streaming Model**: Use streaming-capable models for lower latency
|
| 129 |
+
3. **Language Detection**: Auto-detect language
|
| 130 |
+
4. **Speaker Diarization**: Identify different speakers
|
| 131 |
+
5. **Punctuation**: Better punctuation in transcripts
|
| 132 |
+
6. **Timestamping**: Word-level timestamps
|
| 133 |
+
7. **Confidence Scores**: Return confidence scores per word
|
| 134 |
+
|
| 135 |
+
|
README.md
CHANGED
|
@@ -1,13 +1,277 @@
|
|
| 1 |
-
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
--
|
| 12 |
-
|
| 13 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# NeuralVoice AI
|
| 2 |
+
|
| 3 |
+
A real-time voice AI assistant system that combines Speech-to-Text (STT), Large Language Model (LLM), and Text-to-Speech (TTS) capabilities for natural phone conversations. Built with FastAPI, Vosk, OpenAI, and Piper TTS, integrated with Twilio for telephony.
|
| 4 |
+
|
| 5 |
+
## 🎯 Overview
|
| 6 |
+
|
| 7 |
+
NeuralVoice AI enables real-time bidirectional voice conversations over phone calls. The system:
|
| 8 |
+
- **Listens** to caller speech using Vosk STT
|
| 9 |
+
- **Understands** and responds using OpenAI's GPT models
|
| 10 |
+
- **Speaks** back using Piper TTS with phone-optimized audio processing
|
| 11 |
+
- **Handles** natural conversation flow with barge-in support and voice activity detection
|
| 12 |
+
|
| 13 |
+
## ✨ Features
|
| 14 |
+
|
| 15 |
+
- **Real-time Speech Recognition**: Vosk-based STT with voice activity detection (VAD)
|
| 16 |
+
- **Intelligent Responses**: OpenAI GPT integration for contextual conversations
|
| 17 |
+
- **Natural Voice Synthesis**: Piper TTS with phone-optimized audio filters
|
| 18 |
+
- **Barge-in Support**: Callers can interrupt the AI mid-sentence
|
| 19 |
+
- **WebSocket Streaming**: Low-latency bidirectional audio streaming via Twilio Media Streams
|
| 20 |
+
- **Web Dashboard**: React-based frontend for monitoring live call transcripts
|
| 21 |
+
- **Production Ready**: Includes error handling, keepalive, and session management
|
| 22 |
+
|
| 23 |
+
## 🏗️ Architecture
|
| 24 |
+
|
| 25 |
+
```
|
| 26 |
+
┌─────────────┐
|
| 27 |
+
│ Twilio │ ← Phone calls
|
| 28 |
+
└──────┬──────┘
|
| 29 |
+
│ WebSocket (8kHz μ-law)
|
| 30 |
+
▼
|
| 31 |
+
┌─────────────────────────────────┐
|
| 32 |
+
│ FastAPI Backend (Python) │
|
| 33 |
+
│ ┌──────────────────────────┐ │
|
| 34 |
+
│ │ STT (Vosk) │ │
|
| 35 |
+
│ │ ↓ │ │
|
| 36 |
+
│ │ LLM (OpenAI GPT) │ │
|
| 37 |
+
│ │ ↓ │ │
|
| 38 |
+
│ │ TTS (Piper + ffmpeg) │ │
|
| 39 |
+
│ └──────────────────────────┘ │
|
| 40 |
+
└──────┬──────────────────────────┘
|
| 41 |
+
│
|
| 42 |
+
├─→ WebSocket → React Frontend (Live Transcripts)
|
| 43 |
+
└─→ WebSocket → Twilio (Audio Playback)
|
| 44 |
+
```
|
| 45 |
+
|
| 46 |
+
## 📁 Project Structure
|
| 47 |
+
|
| 48 |
+
```
|
| 49 |
+
nv2/
|
| 50 |
+
├── stt_llm_ttsopenai.py # Main production server (STT+LLM+TTS pipeline)
|
| 51 |
+
├── pipe_method3.py # Alternative implementation with improved VAD
|
| 52 |
+
├── download_models.py # Script to download Vosk and Piper models
|
| 53 |
+
├── requirements.txt # Python dependencies
|
| 54 |
+
├── Dockerfile # Docker configuration
|
| 55 |
+
├── start.sh # Startup script
|
| 56 |
+
├── IMPLEMENTATION_NOTES.md # Technical implementation details
|
| 57 |
+
├── README.md # This file
|
| 58 |
+
└── web_demo/ # React frontend
|
| 59 |
+
├── src/
|
| 60 |
+
│ ├── App.jsx # Main React app
|
| 61 |
+
│ ├── components/
|
| 62 |
+
│ │ ├── MicrophoneTest.jsx # STT testing component
|
| 63 |
+
│ │ ├── TextToSpeech.jsx # TTS testing component
|
| 64 |
+
│ │ └── SttLlmTts.jsx # Full pipeline testing
|
| 65 |
+
│ └── ...
|
| 66 |
+
├── package.json
|
| 67 |
+
└── vite.config.js
|
| 68 |
+
```
|
| 69 |
+
|
| 70 |
+
## 🚀 Quick Start
|
| 71 |
+
|
| 72 |
+
### Prerequisites
|
| 73 |
+
|
| 74 |
+
- Python 3.8+
|
| 75 |
+
- Node.js 16+ (for frontend)
|
| 76 |
+
- ffmpeg (for audio processing)
|
| 77 |
+
- Piper TTS binary (or install via package manager)
|
| 78 |
+
- OpenAI API key
|
| 79 |
+
- Twilio account (for phone integration)
|
| 80 |
+
|
| 81 |
+
### Installation
|
| 82 |
+
|
| 83 |
+
1. **Clone the repository**
|
| 84 |
+
```bash
|
| 85 |
+
git clone https://github.com/NuralVoice-AI-Model/NeuralVoiceAI.git
|
| 86 |
+
cd NeuralVoiceAI
|
| 87 |
+
```
|
| 88 |
+
|
| 89 |
+
2. **Install Python dependencies**
|
| 90 |
+
```bash
|
| 91 |
+
pip install -r requirements.txt
|
| 92 |
+
```
|
| 93 |
+
|
| 94 |
+
3. **Download AI models**
|
| 95 |
+
```bash
|
| 96 |
+
python download_models.py
|
| 97 |
+
```
|
| 98 |
+
This will download:
|
| 99 |
+
- Vosk STT model (English, ~1.8GB)
|
| 100 |
+
- Piper TTS model (English, ~50MB)
|
| 101 |
+
|
| 102 |
+
4. **Install frontend dependencies**
|
| 103 |
+
```bash
|
| 104 |
+
cd web_demo
|
| 105 |
+
npm install
|
| 106 |
+
cd ..
|
| 107 |
+
```
|
| 108 |
+
|
| 109 |
+
5. **Set environment variables**
|
| 110 |
+
```bash
|
| 111 |
+
export OPENAI_API_KEY="your-openai-api-key"
|
| 112 |
+
export OPENAI_MODEL="gpt-4o-mini" # or gpt-4, gpt-3.5-turbo
|
| 113 |
+
export PIPER_BIN="piper" # or full path to piper binary
|
| 114 |
+
export PIPER_MODEL_PATH="models/piper/en_US-lessac-medium.onnx"
|
| 115 |
+
export VOSK_MODEL_PATH="models/vosk-model-en-us-0.22-lgraph"
|
| 116 |
+
export TWILIO_STREAM_URL="wss://your-domain.com/stream" # For Twilio integration
|
| 117 |
+
export PORT=8080
|
| 118 |
+
```
|
| 119 |
+
|
| 120 |
+
### Running the Application
|
| 121 |
+
|
| 122 |
+
1. **Start the backend server**
|
| 123 |
+
```bash
|
| 124 |
+
python stt_llm_ttsopenai.py
|
| 125 |
+
# or
|
| 126 |
+
python pipe_method3.py
|
| 127 |
+
```
|
| 128 |
+
Server will start on `http://0.0.0.0:8080`
|
| 129 |
+
|
| 130 |
+
2. **Start the frontend (optional)**
|
| 131 |
+
```bash
|
| 132 |
+
cd web_demo
|
| 133 |
+
npm run dev
|
| 134 |
+
```
|
| 135 |
+
Frontend will be available at `http://localhost:5173`
|
| 136 |
+
|
| 137 |
+
3. **Configure Twilio (for phone calls)**
|
| 138 |
+
- Set your Twilio voice webhook to: `https://your-domain.com/voice`
|
| 139 |
+
- Ensure `TWILIO_STREAM_URL` points to your WebSocket endpoint
|
| 140 |
+
- Use ngrok or similar for local development:
|
| 141 |
+
```bash
|
| 142 |
+
ngrok http 8080
|
| 143 |
+
# Set TWILIO_STREAM_URL to wss://your-ngrok-url.ngrok.io/stream
|
| 144 |
+
```
|
| 145 |
+
|
| 146 |
+
## 🔧 Configuration
|
| 147 |
+
|
| 148 |
+
### Environment Variables
|
| 149 |
+
|
| 150 |
+
| Variable | Description | Default |
|
| 151 |
+
|----------|-------------|---------|
|
| 152 |
+
| `OPENAI_API_KEY` | OpenAI API key (required) | - |
|
| 153 |
+
| `OPENAI_MODEL` | OpenAI model to use | `gpt-4o-mini` |
|
| 154 |
+
| `VOSK_MODEL_PATH` | Path to Vosk STT model | `models/vosk-model-en-us-0.22-lgraph` |
|
| 155 |
+
| `PIPER_BIN` | Path to Piper TTS binary | `piper` |
|
| 156 |
+
| `PIPER_MODEL_PATH` | Path to Piper TTS model | - |
|
| 157 |
+
| `TWILIO_STREAM_URL` | WebSocket URL for Twilio streams | - |
|
| 158 |
+
| `HOST` | Server host | `0.0.0.0` |
|
| 159 |
+
| `PORT` | Server port | `8080` |
|
| 160 |
+
|
| 161 |
+
### Tuning Parameters
|
| 162 |
+
|
| 163 |
+
In `stt_llm_ttsopenai.py` or `pipe_method3.py`, you can adjust:
|
| 164 |
+
|
| 165 |
+
- **STT Latency**: `SILENCE_MS`, `STABLE_PARTIAL_MS`
|
| 166 |
+
- **VAD Sensitivity**: `RMS_SPEECH_THRESHOLD`, `SPEECH_START_FRAMES`
|
| 167 |
+
- **LLM Response**: `SYSTEM_PROMPT`, `max_tokens`, `temperature`
|
| 168 |
+
- **TTS Chunking**: `CHUNK_MAX_CHARS`, `CHUNK_END_RE`
|
| 169 |
+
|
| 170 |
+
## 📡 API Endpoints
|
| 171 |
+
|
| 172 |
+
### HTTP Endpoints
|
| 173 |
+
|
| 174 |
+
- `GET /health` - Health check
|
| 175 |
+
- `POST /voice` - Twilio webhook (returns TwiML)
|
| 176 |
+
- `GET /voice` - Twilio webhook (GET method)
|
| 177 |
+
|
| 178 |
+
### WebSocket Endpoints
|
| 179 |
+
|
| 180 |
+
- `WS /stream` - Main audio streaming endpoint for Twilio
|
| 181 |
+
- `WS /client-ws` - Frontend client WebSocket for live transcripts
|
| 182 |
+
|
| 183 |
+
## 🎤 Usage
|
| 184 |
+
|
| 185 |
+
### Making a Phone Call
|
| 186 |
+
|
| 187 |
+
1. Configure Twilio to call your `/voice` endpoint
|
| 188 |
+
2. The system will:
|
| 189 |
+
- Answer the call
|
| 190 |
+
- Stream audio bidirectionally
|
| 191 |
+
- Transcribe speech in real-time
|
| 192 |
+
- Generate AI responses
|
| 193 |
+
- Speak responses back to caller
|
| 194 |
+
|
| 195 |
+
### Testing Components
|
| 196 |
+
|
| 197 |
+
The web dashboard (`web_demo`) provides three testing interfaces:
|
| 198 |
+
|
| 199 |
+
1. **Microphone Test**: Test STT with your microphone
|
| 200 |
+
2. **Text-to-Speech**: Test TTS with custom text
|
| 201 |
+
3. **STT-LLM-TTS**: Test the full pipeline
|
| 202 |
+
|
| 203 |
+
### Example Conversation Flow
|
| 204 |
+
|
| 205 |
+
```
|
| 206 |
+
Caller: "Hello, I need help with my account"
|
| 207 |
+
↓ [STT: Vosk transcribes]
|
| 208 |
+
↓ [LLM: OpenAI generates response]
|
| 209 |
+
↓ [TTS: Piper synthesizes audio]
|
| 210 |
+
AI: "I'd be happy to help. What's your account number?"
|
| 211 |
+
↓ [Caller can interrupt/barge-in at any time]
|
| 212 |
+
Caller: "It's 12345"
|
| 213 |
+
↓ [Process repeats...]
|
| 214 |
+
```
|
| 215 |
+
|
| 216 |
+
## 🐳 Docker Deployment
|
| 217 |
+
|
| 218 |
+
```bash
|
| 219 |
+
docker build -t neuralvoice-ai .
|
| 220 |
+
docker run -p 8080:8080 \
|
| 221 |
+
-e OPENAI_API_KEY="your-key" \
|
| 222 |
+
-e PIPER_MODEL_PATH="/app/models/piper/en_US-lessac-medium.onnx" \
|
| 223 |
+
-v $(pwd)/models:/app/models \
|
| 224 |
+
neuralvoice-ai
|
| 225 |
+
```
|
| 226 |
+
|
| 227 |
+
## 🔍 Monitoring
|
| 228 |
+
|
| 229 |
+
- **Logs**: Check console output for real-time STT, LLM, and TTS logs
|
| 230 |
+
- **Web Dashboard**: View live call transcripts in the React frontend
|
| 231 |
+
- **Health Endpoint**: `GET /health` for service status
|
| 232 |
+
|
| 233 |
+
## 🛠️ Development
|
| 234 |
+
|
| 235 |
+
### Key Components
|
| 236 |
+
|
| 237 |
+
1. **STT Engine** (`vosk`): Offline speech recognition
|
| 238 |
+
2. **LLM Integration** (`openai`): GPT-based conversation
|
| 239 |
+
3. **TTS Engine** (`piper`): Neural text-to-speech
|
| 240 |
+
4. **Audio Processing** (`audioop`, `ffmpeg`): Format conversion and filtering
|
| 241 |
+
5. **WebSocket Handler**: Real-time bidirectional streaming
|
| 242 |
+
|
| 243 |
+
### Code Flow
|
| 244 |
+
|
| 245 |
+
1. Twilio sends 8kHz μ-law audio chunks via WebSocket
|
| 246 |
+
2. Audio is converted to 16kHz PCM for Vosk
|
| 247 |
+
3. Vosk performs real-time transcription
|
| 248 |
+
4. VAD detects speech endpoints
|
| 249 |
+
5. User utterances trigger OpenAI API calls
|
| 250 |
+
6. LLM responses are chunked and sent to Piper TTS
|
| 251 |
+
7. TTS audio is processed with phone-optimized filters
|
| 252 |
+
8. Audio is converted back to 8kHz μ-law and streamed to Twilio
|
| 253 |
+
|
| 254 |
+
## 📝 Notes
|
| 255 |
+
|
| 256 |
+
- **Latency**: Typical end-to-end latency is 1-3 seconds
|
| 257 |
+
- **Barge-in**: Callers can interrupt the AI by speaking (detected via VAD)
|
| 258 |
+
- **Audio Quality**: Phone-optimized filters (highpass/lowpass/compand) improve clarity
|
| 259 |
+
- **Model Size**: Vosk model is ~1.8GB, ensure sufficient disk space
|
| 260 |
+
- **Memory**: Each call loads the Vosk model (cached after first load)
|
| 261 |
+
|
| 262 |
+
## 🤝 Contributing
|
| 263 |
+
|
| 264 |
+
Contributions are welcome! Please feel free to submit a Pull Request.
|
| 265 |
+
|
| 266 |
+
## 📄 License
|
| 267 |
+
|
| 268 |
+
Copyright © 2026 Blink Digital India Pvt Ltd. All rights reserved.
|
| 269 |
+
|
| 270 |
+
All code in this repository is the property of Blink Digital India Pvt Ltd. Unauthorized copying, modification, distribution, or use of this software, via any medium, is strictly prohibited without express written permission from Blink Digital India Pvt Ltd.
|
| 271 |
+
|
| 272 |
+
## 🙏 Acknowledgments
|
| 273 |
+
|
| 274 |
+
- [Vosk](https://alphacephei.com/vosk/) - Speech recognition
|
| 275 |
+
- [OpenAI](https://openai.com/) - Language models
|
| 276 |
+
- [Piper TTS](https://github.com/rhasspy/piper) - Text-to-speech
|
| 277 |
+
- [Twilio](https://www.twilio.com/) - Telephony platform
|
download_models.py
ADDED
|
@@ -0,0 +1,40 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import os
|
| 2 |
+
import urllib.request
|
| 3 |
+
import zipfile
|
| 4 |
+
import tarfile
|
| 5 |
+
|
| 6 |
+
def download_file(url, dest):
|
| 7 |
+
if os.path.exists(dest):
|
| 8 |
+
print(f"File already exists: {dest}")
|
| 9 |
+
return
|
| 10 |
+
print(f"Downloading {url} to {dest}...")
|
| 11 |
+
urllib.request.urlretrieve(url, dest)
|
| 12 |
+
print("Download complete.")
|
| 13 |
+
|
| 14 |
+
def setup_models():
|
| 15 |
+
# Vosk Model
|
| 16 |
+
vosk_dir = "models/vosk-model-en-us-0.22-lgraph"
|
| 17 |
+
if not os.path.exists(vosk_dir):
|
| 18 |
+
os.makedirs("models", exist_ok=True)
|
| 19 |
+
zip_path = "models/vosk-model.zip"
|
| 20 |
+
download_file("https://alphacephei.com/vosk/models/vosk-model-en-us-0.22-lgraph.zip", zip_path)
|
| 21 |
+
print("Extracting Vosk model...")
|
| 22 |
+
with zipfile.ZipFile(zip_path, 'r') as zip_ref:
|
| 23 |
+
zip_ref.extractall("models")
|
| 24 |
+
os.remove(zip_path)
|
| 25 |
+
print("Vosk model setup complete.")
|
| 26 |
+
else:
|
| 27 |
+
print("Vosk model already exists.")
|
| 28 |
+
|
| 29 |
+
# Piper Model
|
| 30 |
+
piper_model_dir = "models/piper"
|
| 31 |
+
os.makedirs(piper_model_dir, exist_ok=True)
|
| 32 |
+
|
| 33 |
+
piper_onnx = os.path.join(piper_model_dir, "en_US-lessac-medium.onnx")
|
| 34 |
+
piper_json = os.path.join(piper_model_dir, "en_US-lessac-medium.onnx.json")
|
| 35 |
+
|
| 36 |
+
download_file("https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/lessac/medium/en_US-lessac-medium.onnx", piper_onnx)
|
| 37 |
+
download_file("https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/lessac/medium/en_US-lessac-medium.onnx.json", piper_json)
|
| 38 |
+
|
| 39 |
+
if __name__ == "__main__":
|
| 40 |
+
setup_models()
|
pipe_method3.py
ADDED
|
@@ -0,0 +1,572 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Twilio Media Streams (bidirectional) + Vosk + OpenAI Answer + Piper -> Twilio playback
|
| 3 |
+
|
| 4 |
+
What this version does:
|
| 5 |
+
- NO intent / NO clarify JSON
|
| 6 |
+
- Logs only:
|
| 7 |
+
STT_FINAL> ...
|
| 8 |
+
LLM_ANS> ...
|
| 9 |
+
TTS> ...
|
| 10 |
+
- Generation-id safe TTS (no self-cancel on Railway)
|
| 11 |
+
- Better phone clarity using ffmpeg filters (highpass/lowpass/compand)
|
| 12 |
+
- Proper 20ms pacing + keepalive marks to prevent WS idle timeouts
|
| 13 |
+
"""
|
| 14 |
+
|
| 15 |
+
import asyncio
|
| 16 |
+
import base64
|
| 17 |
+
import json
|
| 18 |
+
import logging
|
| 19 |
+
import os
|
| 20 |
+
import re
|
| 21 |
+
import tempfile
|
| 22 |
+
import time
|
| 23 |
+
import audioop
|
| 24 |
+
import subprocess
|
| 25 |
+
import threading
|
| 26 |
+
from dataclasses import dataclass, field
|
| 27 |
+
from typing import Optional, List, Dict
|
| 28 |
+
|
| 29 |
+
from fastapi import FastAPI, WebSocket, WebSocketDisconnect, Request
|
| 30 |
+
from fastapi.responses import PlainTextResponse, Response
|
| 31 |
+
from fastapi.middleware.cors import CORSMiddleware
|
| 32 |
+
from vosk import Model, KaldiRecognizer
|
| 33 |
+
from openai import OpenAI
|
| 34 |
+
|
| 35 |
+
# ----------------------------
|
| 36 |
+
# Logging
|
| 37 |
+
# ----------------------------
|
| 38 |
+
logging.basicConfig(level=logging.INFO, format="%(asctime)s %(levelname)s %(message)s")
|
| 39 |
+
log = logging.getLogger("app")
|
| 40 |
+
|
| 41 |
+
def P(tag: str, msg: str):
|
| 42 |
+
print(f"{tag} {msg}", flush=True)
|
| 43 |
+
|
| 44 |
+
# ----------------------------
|
| 45 |
+
# Env
|
| 46 |
+
# ----------------------------
|
| 47 |
+
VOSK_MODEL_PATH = os.getenv("VOSK_MODEL_PATH", "/app/models/vosk-model-en-us-0.22-lgraph").strip()
|
| 48 |
+
TWILIO_STREAM_URL = os.getenv("TWILIO_STREAM_URL", "").strip()
|
| 49 |
+
|
| 50 |
+
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY", "").strip()
|
| 51 |
+
OPENAI_MODEL = os.getenv("OPENAI_MODEL", "gpt-4o-mini").strip()
|
| 52 |
+
|
| 53 |
+
PIPER_BIN = os.getenv("PIPER_BIN", "piper").strip()
|
| 54 |
+
PIPER_MODEL_PATH = os.getenv("PIPER_MODEL_PATH", "").strip()
|
| 55 |
+
|
| 56 |
+
HOST = "0.0.0.0"
|
| 57 |
+
PORT = int(os.getenv("PORT", "8080"))
|
| 58 |
+
|
| 59 |
+
# ----------------------------
|
| 60 |
+
# FastAPI
|
| 61 |
+
# ----------------------------
|
| 62 |
+
app = FastAPI()
|
| 63 |
+
app.add_middleware(
|
| 64 |
+
CORSMiddleware,
|
| 65 |
+
allow_origins=["*"],
|
| 66 |
+
allow_credentials=True,
|
| 67 |
+
allow_methods=["*"],
|
| 68 |
+
allow_headers=["*"],
|
| 69 |
+
)
|
| 70 |
+
|
| 71 |
+
# ----------------------------
|
| 72 |
+
# Audio / Twilio
|
| 73 |
+
# ----------------------------
|
| 74 |
+
FRAME_MS = 20
|
| 75 |
+
INPUT_RATE = 8000
|
| 76 |
+
STT_RATE = 16000
|
| 77 |
+
BYTES_PER_20MS_MULAW = int(INPUT_RATE * (FRAME_MS / 1000.0)) # 160 bytes @ 8kHz, 20ms
|
| 78 |
+
|
| 79 |
+
# ----------------------------
|
| 80 |
+
# VAD settings
|
| 81 |
+
# ----------------------------
|
| 82 |
+
RMS_SPEECH_THRESHOLD = 450
|
| 83 |
+
SPEECH_START_FRAMES = 3
|
| 84 |
+
SPEECH_END_SILENCE_FRAMES = 40 # 800ms
|
| 85 |
+
MAX_UTTERANCE_MS = 12000
|
| 86 |
+
PARTIAL_EMIT_EVERY_MS = 250
|
| 87 |
+
|
| 88 |
+
# ----------------------------
|
| 89 |
+
# LLM prompt
|
| 90 |
+
# ----------------------------
|
| 91 |
+
SYSTEM_PROMPT = (
|
| 92 |
+
"You are a phone-call assistant. "
|
| 93 |
+
"Reply in 1 short sentence (max 15 words). "
|
| 94 |
+
"No filler. No greetings unless user greets first."
|
| 95 |
+
)
|
| 96 |
+
|
| 97 |
+
# ----------------------------
|
| 98 |
+
# Cached Vosk model
|
| 99 |
+
# ----------------------------
|
| 100 |
+
_VOSK_MODEL = None
|
| 101 |
+
|
| 102 |
+
def now_ms() -> int:
|
| 103 |
+
return int(time.time() * 1000)
|
| 104 |
+
|
| 105 |
+
def build_twiml(stream_url: str) -> str:
|
| 106 |
+
return f"""<?xml version="1.0" encoding="UTF-8"?>
|
| 107 |
+
<Response>
|
| 108 |
+
<Connect>
|
| 109 |
+
<Stream url="{stream_url}" />
|
| 110 |
+
</Connect>
|
| 111 |
+
<Pause length="600"/>
|
| 112 |
+
</Response>
|
| 113 |
+
"""
|
| 114 |
+
|
| 115 |
+
def split_mulaw_frames(mulaw_bytes: bytes) -> List[bytes]:
|
| 116 |
+
frames = []
|
| 117 |
+
for i in range(0, len(mulaw_bytes), BYTES_PER_20MS_MULAW):
|
| 118 |
+
chunk = mulaw_bytes[i:i + BYTES_PER_20MS_MULAW]
|
| 119 |
+
if len(chunk) < BYTES_PER_20MS_MULAW:
|
| 120 |
+
chunk += b"\xFF" * (BYTES_PER_20MS_MULAW - len(chunk))
|
| 121 |
+
frames.append(chunk)
|
| 122 |
+
return frames
|
| 123 |
+
|
| 124 |
+
async def drain_queue(q: asyncio.Queue):
|
| 125 |
+
try:
|
| 126 |
+
while True:
|
| 127 |
+
q.get_nowait()
|
| 128 |
+
q.task_done()
|
| 129 |
+
except asyncio.QueueEmpty:
|
| 130 |
+
return
|
| 131 |
+
|
| 132 |
+
# ----------------------------
|
| 133 |
+
# OpenAI
|
| 134 |
+
# ----------------------------
|
| 135 |
+
def openai_client() -> OpenAI:
|
| 136 |
+
if not OPENAI_API_KEY:
|
| 137 |
+
raise RuntimeError("OPENAI_API_KEY not set")
|
| 138 |
+
return OpenAI(api_key=OPENAI_API_KEY)
|
| 139 |
+
|
| 140 |
+
def openai_answer_blocking(history: List[Dict], user_text: str) -> str:
|
| 141 |
+
client = openai_client()
|
| 142 |
+
msgs = [{"role": "system", "content": SYSTEM_PROMPT}]
|
| 143 |
+
# short tail context
|
| 144 |
+
tail = history[-6:] if len(history) > 1 else []
|
| 145 |
+
msgs.extend(tail)
|
| 146 |
+
msgs.append({"role": "user", "content": user_text})
|
| 147 |
+
|
| 148 |
+
resp = client.chat.completions.create(
|
| 149 |
+
model=OPENAI_MODEL,
|
| 150 |
+
messages=msgs,
|
| 151 |
+
temperature=0.3,
|
| 152 |
+
max_tokens=80,
|
| 153 |
+
)
|
| 154 |
+
ans = (resp.choices[0].message.content or "").strip()
|
| 155 |
+
return ans
|
| 156 |
+
|
| 157 |
+
# ----------------------------
|
| 158 |
+
# Piper TTS -> 8k mulaw (clarity improved)
|
| 159 |
+
# ----------------------------
|
| 160 |
+
def piper_tts_to_mulaw(text: str) -> bytes:
|
| 161 |
+
if not PIPER_MODEL_PATH:
|
| 162 |
+
raise RuntimeError("PIPER_MODEL_PATH not set")
|
| 163 |
+
|
| 164 |
+
text = (text or "").strip()
|
| 165 |
+
if not text:
|
| 166 |
+
return b""
|
| 167 |
+
|
| 168 |
+
with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as wavf:
|
| 169 |
+
wav_path = wavf.name
|
| 170 |
+
with tempfile.NamedTemporaryFile(suffix=".mulaw", delete=False) as mlf:
|
| 171 |
+
mulaw_path = mlf.name
|
| 172 |
+
|
| 173 |
+
try:
|
| 174 |
+
r1 = subprocess.run(
|
| 175 |
+
[PIPER_BIN, "--model", PIPER_MODEL_PATH, "--output_file", wav_path],
|
| 176 |
+
input=text.encode("utf-8"),
|
| 177 |
+
stdout=subprocess.PIPE,
|
| 178 |
+
stderr=subprocess.PIPE,
|
| 179 |
+
)
|
| 180 |
+
if r1.returncode != 0:
|
| 181 |
+
raise RuntimeError(f"piper rc={r1.returncode} stderr={r1.stderr.decode('utf-8','ignore')[:500]}")
|
| 182 |
+
|
| 183 |
+
# Phone-clarity filter chain:
|
| 184 |
+
# - highpass removes rumble
|
| 185 |
+
# - lowpass removes harshness
|
| 186 |
+
# - compand evens volume (helps “clarity” on phone)
|
| 187 |
+
# - dynaudnorm is avoided (can pump / distort at 8k)
|
| 188 |
+
af = "highpass=f=200,lowpass=f=3400,compand=attacks=0:decays=0.3:points=-80/-80|-20/-10|0/-3"
|
| 189 |
+
|
| 190 |
+
r2 = subprocess.run(
|
| 191 |
+
["ffmpeg", "-y", "-i", wav_path,
|
| 192 |
+
"-ac", "1", "-ar", "8000",
|
| 193 |
+
"-af", af,
|
| 194 |
+
"-f", "mulaw", mulaw_path],
|
| 195 |
+
stdout=subprocess.PIPE,
|
| 196 |
+
stderr=subprocess.PIPE,
|
| 197 |
+
)
|
| 198 |
+
if r2.returncode != 0:
|
| 199 |
+
raise RuntimeError(f"ffmpeg rc={r2.returncode} stderr={r2.stderr.decode('utf-8','ignore')[:500]}")
|
| 200 |
+
|
| 201 |
+
with open(mulaw_path, "rb") as f:
|
| 202 |
+
data = f.read()
|
| 203 |
+
|
| 204 |
+
P("TTS>", f"audio_bytes={len(data)}")
|
| 205 |
+
return data
|
| 206 |
+
finally:
|
| 207 |
+
for p in (wav_path, mulaw_path):
|
| 208 |
+
try:
|
| 209 |
+
os.unlink(p)
|
| 210 |
+
except Exception:
|
| 211 |
+
pass
|
| 212 |
+
|
| 213 |
+
# ----------------------------
|
| 214 |
+
# Call state
|
| 215 |
+
# ----------------------------
|
| 216 |
+
@dataclass
|
| 217 |
+
class CancelFlag:
|
| 218 |
+
is_set: bool = False
|
| 219 |
+
def set(self):
|
| 220 |
+
self.is_set = True
|
| 221 |
+
|
| 222 |
+
@dataclass
|
| 223 |
+
class CallState:
|
| 224 |
+
call_id: str
|
| 225 |
+
stream_sid: str = ""
|
| 226 |
+
|
| 227 |
+
# vad
|
| 228 |
+
in_speech: bool = False
|
| 229 |
+
speech_start_count: int = 0
|
| 230 |
+
silence_count: int = 0
|
| 231 |
+
utter_start_ms: int = 0
|
| 232 |
+
|
| 233 |
+
rec: Optional[KaldiRecognizer] = None
|
| 234 |
+
|
| 235 |
+
# partials
|
| 236 |
+
last_partial: str = ""
|
| 237 |
+
last_partial_emit_ms: int = 0
|
| 238 |
+
|
| 239 |
+
# outbound
|
| 240 |
+
outbound_q: asyncio.Queue = field(default_factory=lambda: asyncio.Queue(maxsize=50000))
|
| 241 |
+
outbound_task: Optional[asyncio.Task] = None
|
| 242 |
+
keepalive_task: Optional[asyncio.Task] = None
|
| 243 |
+
mark_i: int = 0
|
| 244 |
+
|
| 245 |
+
# speaking / generation
|
| 246 |
+
bot_speaking: bool = False
|
| 247 |
+
cancel_llm: CancelFlag = field(default_factory=CancelFlag)
|
| 248 |
+
tts_generation_id: int = 0
|
| 249 |
+
|
| 250 |
+
# conversation history
|
| 251 |
+
history: List[Dict] = field(default_factory=list)
|
| 252 |
+
bot_lock: asyncio.Lock = field(default_factory=asyncio.Lock)
|
| 253 |
+
|
| 254 |
+
def bump_tts_generation(self) -> int:
|
| 255 |
+
self.tts_generation_id += 1
|
| 256 |
+
return self.tts_generation_id
|
| 257 |
+
|
| 258 |
+
# ----------------------------
|
| 259 |
+
# Keepalive marks (prevents WS ping timeout)
|
| 260 |
+
# ----------------------------
|
| 261 |
+
async def twilio_keepalive(ws: WebSocket, st: CallState):
|
| 262 |
+
try:
|
| 263 |
+
while True:
|
| 264 |
+
await asyncio.sleep(10)
|
| 265 |
+
if st.stream_sid:
|
| 266 |
+
st.mark_i += 1
|
| 267 |
+
name = f"ka_{st.mark_i}"
|
| 268 |
+
await ws.send_text(json.dumps({
|
| 269 |
+
"event": "mark",
|
| 270 |
+
"streamSid": st.stream_sid,
|
| 271 |
+
"mark": {"name": name},
|
| 272 |
+
}))
|
| 273 |
+
P("TWILIO>", f"keepalive_mark={name}")
|
| 274 |
+
except asyncio.CancelledError:
|
| 275 |
+
return
|
| 276 |
+
except Exception as e:
|
| 277 |
+
P("SYS>", f"keepalive_error={e}")
|
| 278 |
+
|
| 279 |
+
# ----------------------------
|
| 280 |
+
# HTTP
|
| 281 |
+
# ----------------------------
|
| 282 |
+
@app.get("/health")
|
| 283 |
+
async def health():
|
| 284 |
+
return {"ok": True}
|
| 285 |
+
|
| 286 |
+
@app.post("/voice")
|
| 287 |
+
async def voice(request: Request):
|
| 288 |
+
stream_url = TWILIO_STREAM_URL
|
| 289 |
+
if not stream_url:
|
| 290 |
+
host = request.headers.get("host")
|
| 291 |
+
if host:
|
| 292 |
+
stream_url = f"wss://{host}/stream"
|
| 293 |
+
P("SYS>", f"auto_stream_url={stream_url}")
|
| 294 |
+
if not stream_url:
|
| 295 |
+
return PlainTextResponse("TWILIO_STREAM_URL not set and host not found", status_code=500)
|
| 296 |
+
return Response(content=build_twiml(stream_url), media_type="application/xml")
|
| 297 |
+
|
| 298 |
+
@app.get("/voice")
|
| 299 |
+
async def voice_get(request: Request):
|
| 300 |
+
return await voice(request)
|
| 301 |
+
|
| 302 |
+
# ----------------------------
|
| 303 |
+
# WebSocket /stream
|
| 304 |
+
# ----------------------------
|
| 305 |
+
@app.websocket("/stream")
|
| 306 |
+
async def stream(ws: WebSocket):
|
| 307 |
+
await ws.accept()
|
| 308 |
+
st = CallState(call_id=str(id(ws)))
|
| 309 |
+
st.history = [{"role": "system", "content": SYSTEM_PROMPT}]
|
| 310 |
+
P("SYS>", f"ws_open call_id={st.call_id}")
|
| 311 |
+
|
| 312 |
+
global _VOSK_MODEL
|
| 313 |
+
if _VOSK_MODEL is None:
|
| 314 |
+
P("SYS>", f"loading_vosk={VOSK_MODEL_PATH}")
|
| 315 |
+
_VOSK_MODEL = Model(VOSK_MODEL_PATH)
|
| 316 |
+
P("SYS>", "vosk_loaded")
|
| 317 |
+
|
| 318 |
+
st.rec = KaldiRecognizer(_VOSK_MODEL, STT_RATE)
|
| 319 |
+
st.rec.SetWords(False)
|
| 320 |
+
|
| 321 |
+
st.outbound_task = asyncio.create_task(outbound_sender(ws, st))
|
| 322 |
+
|
| 323 |
+
try:
|
| 324 |
+
while True:
|
| 325 |
+
raw = await ws.receive_text()
|
| 326 |
+
msg = json.loads(raw)
|
| 327 |
+
event = msg.get("event")
|
| 328 |
+
|
| 329 |
+
if event == "start":
|
| 330 |
+
st.stream_sid = msg["start"]["streamSid"]
|
| 331 |
+
P("TWILIO>", f"start streamSid={st.stream_sid}")
|
| 332 |
+
|
| 333 |
+
if st.keepalive_task is None:
|
| 334 |
+
st.keepalive_task = asyncio.create_task(twilio_keepalive(ws, st))
|
| 335 |
+
|
| 336 |
+
# optional short greeting
|
| 337 |
+
asyncio.create_task(speak_text(ws, st, "Hi! How can I help?"))
|
| 338 |
+
|
| 339 |
+
elif event == "media":
|
| 340 |
+
mulaw = base64.b64decode(msg["media"]["payload"])
|
| 341 |
+
pcm16_8k = audioop.ulaw2lin(mulaw, 2)
|
| 342 |
+
pcm16_16k, _ = audioop.ratecv(pcm16_8k, 2, 1, INPUT_RATE, STT_RATE, None)
|
| 343 |
+
|
| 344 |
+
rms = audioop.rms(pcm16_16k, 2)
|
| 345 |
+
is_speech = rms >= RMS_SPEECH_THRESHOLD
|
| 346 |
+
|
| 347 |
+
# barge-in: cancel current bot audio if caller speaks
|
| 348 |
+
if st.bot_speaking and is_speech:
|
| 349 |
+
await barge_in(ws, st)
|
| 350 |
+
|
| 351 |
+
await vad_and_stt(ws, st, pcm16_16k, is_speech)
|
| 352 |
+
|
| 353 |
+
elif event == "mark":
|
| 354 |
+
name = (msg.get("mark") or {}).get("name")
|
| 355 |
+
P("TWILIO>", f"mark_received={name}")
|
| 356 |
+
|
| 357 |
+
elif event == "stop":
|
| 358 |
+
P("TWILIO>", "stop")
|
| 359 |
+
break
|
| 360 |
+
|
| 361 |
+
except WebSocketDisconnect:
|
| 362 |
+
P("SYS>", "ws_disconnect")
|
| 363 |
+
except Exception as e:
|
| 364 |
+
P("SYS>", f"ws_error={e}")
|
| 365 |
+
log.exception("ws_error")
|
| 366 |
+
finally:
|
| 367 |
+
if st.keepalive_task:
|
| 368 |
+
st.keepalive_task.cancel()
|
| 369 |
+
if st.outbound_task:
|
| 370 |
+
st.outbound_task.cancel()
|
| 371 |
+
P("SYS>", "ws_closed")
|
| 372 |
+
|
| 373 |
+
# ----------------------------
|
| 374 |
+
# VAD + STT
|
| 375 |
+
# ----------------------------
|
| 376 |
+
async def vad_and_stt(ws: WebSocket, st: CallState, pcm16_16k: bytes, is_speech: bool):
|
| 377 |
+
t = now_ms()
|
| 378 |
+
|
| 379 |
+
if not st.in_speech:
|
| 380 |
+
if is_speech:
|
| 381 |
+
st.speech_start_count += 1
|
| 382 |
+
if st.speech_start_count >= SPEECH_START_FRAMES:
|
| 383 |
+
st.in_speech = True
|
| 384 |
+
st.silence_count = 0
|
| 385 |
+
st.utter_start_ms = t
|
| 386 |
+
st.speech_start_count = 0
|
| 387 |
+
st.last_partial = ""
|
| 388 |
+
st.last_partial_emit_ms = 0
|
| 389 |
+
|
| 390 |
+
st.rec = KaldiRecognizer(_VOSK_MODEL, STT_RATE)
|
| 391 |
+
st.rec.SetWords(False)
|
| 392 |
+
else:
|
| 393 |
+
st.speech_start_count = 0
|
| 394 |
+
return
|
| 395 |
+
|
| 396 |
+
st.rec.AcceptWaveform(pcm16_16k)
|
| 397 |
+
|
| 398 |
+
# partial logging only (you said UI later)
|
| 399 |
+
if t - st.last_partial_emit_ms >= PARTIAL_EMIT_EVERY_MS:
|
| 400 |
+
st.last_partial_emit_ms = t
|
| 401 |
+
try:
|
| 402 |
+
pj = json.loads(st.rec.PartialResult() or "{}")
|
| 403 |
+
partial = (pj.get("partial") or "").strip()
|
| 404 |
+
except Exception:
|
| 405 |
+
partial = ""
|
| 406 |
+
if partial and partial != st.last_partial:
|
| 407 |
+
st.last_partial = partial
|
| 408 |
+
P("STT_PART>", partial)
|
| 409 |
+
|
| 410 |
+
if (t - st.utter_start_ms) > MAX_UTTERANCE_MS:
|
| 411 |
+
await finalize_utterance(ws, st, "max_utterance")
|
| 412 |
+
return
|
| 413 |
+
|
| 414 |
+
if is_speech:
|
| 415 |
+
st.silence_count = 0
|
| 416 |
+
return
|
| 417 |
+
|
| 418 |
+
st.silence_count += 1
|
| 419 |
+
if st.silence_count >= SPEECH_END_SILENCE_FRAMES:
|
| 420 |
+
await finalize_utterance(ws, st, f"vad_silence_{SPEECH_END_SILENCE_FRAMES*FRAME_MS}ms")
|
| 421 |
+
|
| 422 |
+
async def finalize_utterance(ws: WebSocket, st: CallState, reason: str):
|
| 423 |
+
if not st.in_speech:
|
| 424 |
+
return
|
| 425 |
+
|
| 426 |
+
st.in_speech = False
|
| 427 |
+
st.silence_count = 0
|
| 428 |
+
st.speech_start_count = 0
|
| 429 |
+
st.last_partial = ""
|
| 430 |
+
|
| 431 |
+
try:
|
| 432 |
+
j = json.loads(st.rec.FinalResult() or "{}")
|
| 433 |
+
except Exception:
|
| 434 |
+
j = {}
|
| 435 |
+
|
| 436 |
+
user_text = (j.get("text") or "").strip()
|
| 437 |
+
if not user_text:
|
| 438 |
+
return
|
| 439 |
+
|
| 440 |
+
P("STT_FINAL>", f"{user_text} ({reason})")
|
| 441 |
+
|
| 442 |
+
async def bot_job():
|
| 443 |
+
async with st.bot_lock:
|
| 444 |
+
await answer_and_speak(ws, st, user_text)
|
| 445 |
+
|
| 446 |
+
asyncio.create_task(bot_job())
|
| 447 |
+
|
| 448 |
+
# ----------------------------
|
| 449 |
+
# LLM Answer -> Speak
|
| 450 |
+
# ----------------------------
|
| 451 |
+
async def answer_and_speak(ws: WebSocket, st: CallState, user_text: str):
|
| 452 |
+
st.cancel_llm = CancelFlag(False)
|
| 453 |
+
|
| 454 |
+
# store user
|
| 455 |
+
st.history.append({"role": "user", "content": user_text})
|
| 456 |
+
st.history = st.history[:1] + st.history[-8:]
|
| 457 |
+
|
| 458 |
+
loop = asyncio.get_running_loop()
|
| 459 |
+
|
| 460 |
+
def worker():
|
| 461 |
+
return openai_answer_blocking(st.history, user_text)
|
| 462 |
+
|
| 463 |
+
ans = await loop.run_in_executor(None, worker)
|
| 464 |
+
ans = (ans or "").strip()
|
| 465 |
+
if not ans:
|
| 466 |
+
ans = "Sorry, I didn’t catch that."
|
| 467 |
+
|
| 468 |
+
P("LLM_ANS>", ans)
|
| 469 |
+
|
| 470 |
+
# store assistant
|
| 471 |
+
st.history.append({"role": "assistant", "content": ans})
|
| 472 |
+
st.history = st.history[:1] + st.history[-8:]
|
| 473 |
+
|
| 474 |
+
await speak_text(ws, st, ans)
|
| 475 |
+
|
| 476 |
+
# ----------------------------
|
| 477 |
+
# Barge-in (clear + drain)
|
| 478 |
+
# ----------------------------
|
| 479 |
+
async def barge_in(ws: WebSocket, st: CallState):
|
| 480 |
+
st.cancel_llm.set()
|
| 481 |
+
st.bump_tts_generation() # invalidate older audio
|
| 482 |
+
|
| 483 |
+
if st.stream_sid:
|
| 484 |
+
try:
|
| 485 |
+
await ws.send_text(json.dumps({"event": "clear", "streamSid": st.stream_sid}))
|
| 486 |
+
P("TWILIO>", "sent_clear")
|
| 487 |
+
except Exception:
|
| 488 |
+
pass
|
| 489 |
+
|
| 490 |
+
await drain_queue(st.outbound_q)
|
| 491 |
+
st.bot_speaking = False
|
| 492 |
+
|
| 493 |
+
# ----------------------------
|
| 494 |
+
# Speak / TTS with generation-id
|
| 495 |
+
# ----------------------------
|
| 496 |
+
async def speak_text(ws: WebSocket, st: CallState, text: str):
|
| 497 |
+
gen = st.bump_tts_generation()
|
| 498 |
+
|
| 499 |
+
# clear previous audio
|
| 500 |
+
if st.stream_sid:
|
| 501 |
+
try:
|
| 502 |
+
await ws.send_text(json.dumps({"event": "clear", "streamSid": st.stream_sid}))
|
| 503 |
+
P("TWILIO>", "sent_clear")
|
| 504 |
+
except Exception:
|
| 505 |
+
pass
|
| 506 |
+
await drain_queue(st.outbound_q)
|
| 507 |
+
|
| 508 |
+
await tts_enqueue(st, text, gen)
|
| 509 |
+
|
| 510 |
+
async def tts_enqueue(st: CallState, text: str, gen: int):
|
| 511 |
+
my_gen = gen
|
| 512 |
+
st.bot_speaking = True
|
| 513 |
+
P("TTS>", f"text={text} gen={my_gen}")
|
| 514 |
+
|
| 515 |
+
loop = asyncio.get_running_loop()
|
| 516 |
+
try:
|
| 517 |
+
mulaw_bytes = await loop.run_in_executor(None, piper_tts_to_mulaw, text)
|
| 518 |
+
except Exception as e:
|
| 519 |
+
P("TTS_ERR>", str(e))
|
| 520 |
+
st.bot_speaking = False
|
| 521 |
+
return
|
| 522 |
+
|
| 523 |
+
if my_gen != st.tts_generation_id:
|
| 524 |
+
P("TTS>", f"discard_gen my_gen={my_gen} current_gen={st.tts_generation_id}")
|
| 525 |
+
return
|
| 526 |
+
|
| 527 |
+
for fr in split_mulaw_frames(mulaw_bytes):
|
| 528 |
+
if my_gen != st.tts_generation_id:
|
| 529 |
+
P("TTS>", f"discard_midstream my_gen={my_gen} current_gen={st.tts_generation_id}")
|
| 530 |
+
return
|
| 531 |
+
await st.outbound_q.put(base64.b64encode(fr).decode("ascii"))
|
| 532 |
+
|
| 533 |
+
await st.outbound_q.put("__END_CHUNK__")
|
| 534 |
+
|
| 535 |
+
async def outbound_sender(ws: WebSocket, st: CallState):
|
| 536 |
+
try:
|
| 537 |
+
while True:
|
| 538 |
+
item = await st.outbound_q.get()
|
| 539 |
+
|
| 540 |
+
if item == "__END_CHUNK__":
|
| 541 |
+
await asyncio.sleep(0.02)
|
| 542 |
+
if st.outbound_q.empty():
|
| 543 |
+
st.bot_speaking = False
|
| 544 |
+
st.outbound_q.task_done()
|
| 545 |
+
continue
|
| 546 |
+
|
| 547 |
+
if not st.stream_sid:
|
| 548 |
+
st.outbound_q.task_done()
|
| 549 |
+
continue
|
| 550 |
+
|
| 551 |
+
await ws.send_text(json.dumps({
|
| 552 |
+
"event": "media",
|
| 553 |
+
"streamSid": st.stream_sid,
|
| 554 |
+
"media": {"payload": item},
|
| 555 |
+
}))
|
| 556 |
+
|
| 557 |
+
st.outbound_q.task_done()
|
| 558 |
+
await asyncio.sleep(FRAME_MS / 1000.0)
|
| 559 |
+
|
| 560 |
+
except asyncio.CancelledError:
|
| 561 |
+
return
|
| 562 |
+
except Exception as e:
|
| 563 |
+
P("SYS>", f"outbound_sender_error={e}")
|
| 564 |
+
log.exception("outbound_sender_error")
|
| 565 |
+
|
| 566 |
+
# ----------------------------
|
| 567 |
+
# main
|
| 568 |
+
# ----------------------------
|
| 569 |
+
if __name__ == "__main__":
|
| 570 |
+
import uvicorn
|
| 571 |
+
P("SYS>", f"starting {HOST}:{PORT}")
|
| 572 |
+
uvicorn.run(app, host=HOST, port=PORT)
|
start.sh
ADDED
|
@@ -0,0 +1,67 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/bin/bash
|
| 2 |
+
|
| 3 |
+
# Colors
|
| 4 |
+
GREEN='\033[0;32m'
|
| 5 |
+
YELLOW='\033[1;33m'
|
| 6 |
+
NC='\033[0m' # No Color
|
| 7 |
+
|
| 8 |
+
echo -e "${GREEN}Starting setup for pipe_method3.py...${NC}"
|
| 9 |
+
|
| 10 |
+
# 1. Activate Virtual Environment
|
| 11 |
+
if [ -d ".venv" ]; then
|
| 12 |
+
source .venv/bin/activate
|
| 13 |
+
else
|
| 14 |
+
echo -e "${YELLOW}Virtual environment not found. Creating one...${NC}"
|
| 15 |
+
python3 -m venv .venv
|
| 16 |
+
source .venv/bin/activate
|
| 17 |
+
pip install fastapi vosk openai uvicorn websockets
|
| 18 |
+
fi
|
| 19 |
+
|
| 20 |
+
# 2. Set Piper Paths
|
| 21 |
+
export PIPER_BIN="$(pwd)/.venv/bin/piper"
|
| 22 |
+
export PIPER_MODEL_PATH="$(pwd)/models/piper/en_US-lessac-medium.onnx"
|
| 23 |
+
|
| 24 |
+
# 3. Check OpenAI API Key
|
| 25 |
+
if [ -z "$OPENAI_API_KEY" ]; then
|
| 26 |
+
echo -e "${YELLOW}OPENAI_API_KEY is not set.${NC}"
|
| 27 |
+
read -p "Please enter your OpenAI API Key: " OPENAI_API_KEY
|
| 28 |
+
export OPENAI_API_KEY
|
| 29 |
+
fi
|
| 30 |
+
|
| 31 |
+
# 4. Setup Ngrok
|
| 32 |
+
echo -e "${GREEN}Checking ngrok...${NC}"
|
| 33 |
+
NGROK_URL=""
|
| 34 |
+
|
| 35 |
+
# Check if ngrok is already running
|
| 36 |
+
if pgrep -x "ngrok" > /dev/null; then
|
| 37 |
+
echo "ngrok is already running."
|
| 38 |
+
else
|
| 39 |
+
echo "Starting ngrok..."
|
| 40 |
+
ngrok http 8002 > /dev/null &
|
| 41 |
+
sleep 3
|
| 42 |
+
fi
|
| 43 |
+
|
| 44 |
+
# Fetch ngrok URL
|
| 45 |
+
NGROK_API_URL="http://127.0.0.1:4040/api/tunnels"
|
| 46 |
+
if command -v curl > /dev/null; then
|
| 47 |
+
NGROK_URL=$(curl -s $NGROK_API_URL | grep -o '"public_url":"[^"]*' | grep -o '[^"]*$' | head -n 1)
|
| 48 |
+
fi
|
| 49 |
+
|
| 50 |
+
if [ -z "$NGROK_URL" ]; then
|
| 51 |
+
echo -e "${YELLOW}Could not automatically fetch ngrok URL.${NC}"
|
| 52 |
+
echo "Please ensure ngrok is running (ngrok http 8002) and set TWILIO_STREAM_URL manually if needed."
|
| 53 |
+
else
|
| 54 |
+
# Convert http/https to wss
|
| 55 |
+
WSS_URL="${NGROK_URL/https/wss}"
|
| 56 |
+
WSS_URL="${WSS_URL/http/wss}"
|
| 57 |
+
WSS_URL="$WSS_URL/stream"
|
| 58 |
+
|
| 59 |
+
export TWILIO_STREAM_URL="$WSS_URL"
|
| 60 |
+
echo -e "${GREEN}Twilio Stream URL set to: ${WSS_URL}${NC}"
|
| 61 |
+
echo -e "${YELLOW}IMPORTANT: Copy the URL above (or the https version for the webhook) to your Twilio Phone Number configuration.${NC}"
|
| 62 |
+
echo -e "Webhook URL: ${NGROK_URL}/voice"
|
| 63 |
+
fi
|
| 64 |
+
|
| 65 |
+
# 5. Run Script
|
| 66 |
+
echo -e "${GREEN}Starting pipe_method3.py...${NC}"
|
| 67 |
+
python3 pipe_method3.py
|
stt_llm_ttsopenai.py
ADDED
|
@@ -0,0 +1,636 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import asyncio
|
| 2 |
+
import base64
|
| 3 |
+
import json
|
| 4 |
+
import logging
|
| 5 |
+
import os
|
| 6 |
+
import re
|
| 7 |
+
import tempfile
|
| 8 |
+
import time
|
| 9 |
+
import audioop
|
| 10 |
+
import subprocess
|
| 11 |
+
from dataclasses import dataclass, field
|
| 12 |
+
from typing import Optional, List, Dict
|
| 13 |
+
|
| 14 |
+
from fastapi import FastAPI, WebSocket, WebSocketDisconnect, Request
|
| 15 |
+
from fastapi.responses import PlainTextResponse, Response
|
| 16 |
+
from vosk import Model, KaldiRecognizer
|
| 17 |
+
|
| 18 |
+
from openai import OpenAI
|
| 19 |
+
|
| 20 |
+
# ----------------------------
|
| 21 |
+
# Logging
|
| 22 |
+
# ----------------------------
|
| 23 |
+
logging.basicConfig(
|
| 24 |
+
level=logging.INFO,
|
| 25 |
+
format="%(asctime)s %(levelname)s %(message)s",
|
| 26 |
+
)
|
| 27 |
+
log = logging.getLogger("stt-llm-tts")
|
| 28 |
+
|
| 29 |
+
# ----------------------------
|
| 30 |
+
# Env
|
| 31 |
+
# ----------------------------
|
| 32 |
+
VOSK_MODEL_PATH = os.getenv("VOSK_MODEL_PATH", "models/vosk-model-en-us-0.22-lgraph")
|
| 33 |
+
TWILIO_STREAM_URL = os.getenv("TWILIO_STREAM_URL") # must be wss://.../stream
|
| 34 |
+
|
| 35 |
+
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY", "")
|
| 36 |
+
OPENAI_MODEL = os.getenv("OPENAI_MODEL", "gpt-4o-mini")
|
| 37 |
+
|
| 38 |
+
PIPER_BIN = os.getenv("PIPER_BIN", "piper")
|
| 39 |
+
PIPER_MODEL_PATH = os.getenv("PIPER_MODEL_PATH", "")
|
| 40 |
+
|
| 41 |
+
HOST = os.getenv("HOST", "0.0.0.0")
|
| 42 |
+
PORT = int(os.getenv("PORT", "8002"))
|
| 43 |
+
|
| 44 |
+
# ----------------------------
|
| 45 |
+
# Tunables (latency vs accuracy)
|
| 46 |
+
# ----------------------------
|
| 47 |
+
# Endpointing:
|
| 48 |
+
SILENCE_MS = 700 # if no audio frames for this long -> commit utterance
|
| 49 |
+
STABLE_PARTIAL_MS = 650 # if partial text hasn't changed for this long -> commit
|
| 50 |
+
MIN_UTTER_WORDS = 2 # ignore single-word junk
|
| 51 |
+
MAX_UTTER_CHARS = 220 # safety cap
|
| 52 |
+
|
| 53 |
+
# LLM->TTS chunking:
|
| 54 |
+
CHUNK_MAX_CHARS = 90 # emit TTS chunk if buffer grows
|
| 55 |
+
CHUNK_END_RE = re.compile(r"[.!?\n]")
|
| 56 |
+
|
| 57 |
+
# Twilio pacing:
|
| 58 |
+
FRAME_MS = 20
|
| 59 |
+
MULAW_RATE = 8000
|
| 60 |
+
BYTES_PER_20MS_MULAW = int(MULAW_RATE * (FRAME_MS / 1000.0)) # 160 bytes at 8kHz mulaw
|
| 61 |
+
|
| 62 |
+
# Filter common garbage utterances:
|
| 63 |
+
SINGLE_WORD_IGNORE = {
|
| 64 |
+
"the", "a", "an", "yeah", "yes", "no", "okay", "ok", "hmm", "um", "uh"
|
| 65 |
+
}
|
| 66 |
+
|
| 67 |
+
SYSTEM_PROMPT = (
|
| 68 |
+
"You are a fast phone-call assistant. "
|
| 69 |
+
"Reply in 1-2 short sentences. "
|
| 70 |
+
"Ask only one question at a time. "
|
| 71 |
+
"Be concise."
|
| 72 |
+
)
|
| 73 |
+
|
| 74 |
+
app = FastAPI()
|
| 75 |
+
|
| 76 |
+
# ----------------------------
|
| 77 |
+
# Frontend Clients
|
| 78 |
+
# ----------------------------
|
| 79 |
+
connected_clients: List[WebSocket] = []
|
| 80 |
+
|
| 81 |
+
|
| 82 |
+
async def broadcast_transcript(role: str, text: str):
|
| 83 |
+
"""Broadcasts a transcript message to all connected frontend clients."""
|
| 84 |
+
if not connected_clients:
|
| 85 |
+
return
|
| 86 |
+
|
| 87 |
+
message = {
|
| 88 |
+
"type": "transcript",
|
| 89 |
+
"role": role,
|
| 90 |
+
"text": text,
|
| 91 |
+
"timestamp": now_ms()
|
| 92 |
+
}
|
| 93 |
+
|
| 94 |
+
disconnected = []
|
| 95 |
+
for client in connected_clients:
|
| 96 |
+
try:
|
| 97 |
+
await client.send_json(message)
|
| 98 |
+
except Exception:
|
| 99 |
+
disconnected.append(client)
|
| 100 |
+
|
| 101 |
+
for client in disconnected:
|
| 102 |
+
if client in connected_clients:
|
| 103 |
+
connected_clients.remove(client)
|
| 104 |
+
|
| 105 |
+
|
| 106 |
+
# ----------------------------
|
| 107 |
+
# Helpers
|
| 108 |
+
# ----------------------------
|
| 109 |
+
def now_ms() -> int:
|
| 110 |
+
return int(time.time() * 1000)
|
| 111 |
+
|
| 112 |
+
|
| 113 |
+
def safe_strip_key(key: str) -> str:
|
| 114 |
+
return (key or "").strip().replace("\r", "").replace("\n", "")
|
| 115 |
+
|
| 116 |
+
|
| 117 |
+
def split_mulaw_frames(mulaw_bytes: bytes) -> List[bytes]:
|
| 118 |
+
frames = []
|
| 119 |
+
for i in range(0, len(mulaw_bytes), BYTES_PER_20MS_MULAW):
|
| 120 |
+
chunk = mulaw_bytes[i:i + BYTES_PER_20MS_MULAW]
|
| 121 |
+
if len(chunk) < BYTES_PER_20MS_MULAW:
|
| 122 |
+
# pad with silence (mu-law silence is 0xFF)
|
| 123 |
+
chunk += b"\xFF" * (BYTES_PER_20MS_MULAW - len(chunk))
|
| 124 |
+
frames.append(chunk)
|
| 125 |
+
return frames
|
| 126 |
+
|
| 127 |
+
|
| 128 |
+
def is_junk_utterance(text: str) -> bool:
|
| 129 |
+
t = (text or "").strip().lower()
|
| 130 |
+
if not t:
|
| 131 |
+
return True
|
| 132 |
+
if len(t) > MAX_UTTER_CHARS:
|
| 133 |
+
return False
|
| 134 |
+
words = [w for w in t.split() if w]
|
| 135 |
+
if len(words) < MIN_UTTER_WORDS and (t in SINGLE_WORD_IGNORE):
|
| 136 |
+
return True
|
| 137 |
+
if len(words) < MIN_UTTER_WORDS and len(t) < 4:
|
| 138 |
+
return True
|
| 139 |
+
return False
|
| 140 |
+
|
| 141 |
+
|
| 142 |
+
def build_twiml(stream_url: str) -> str:
|
| 143 |
+
return f"""<?xml version="1.0" encoding="UTF-8"?>
|
| 144 |
+
<Response>
|
| 145 |
+
<Connect>
|
| 146 |
+
<Stream url="{stream_url}" />
|
| 147 |
+
</Connect>
|
| 148 |
+
<Pause length="600"/>
|
| 149 |
+
</Response>
|
| 150 |
+
"""
|
| 151 |
+
|
| 152 |
+
|
| 153 |
+
async def drain_queue(q: asyncio.Queue):
|
| 154 |
+
try:
|
| 155 |
+
while True:
|
| 156 |
+
q.get_nowait()
|
| 157 |
+
q.task_done()
|
| 158 |
+
except asyncio.QueueEmpty:
|
| 159 |
+
return
|
| 160 |
+
|
| 161 |
+
|
| 162 |
+
# ----------------------------
|
| 163 |
+
# Piper TTS -> mulaw 8k
|
| 164 |
+
# ----------------------------
|
| 165 |
+
def piper_tts_to_mulaw(text: str) -> bytes:
|
| 166 |
+
"""
|
| 167 |
+
Generates 8k mulaw raw bytes suitable for Twilio Media Streams.
|
| 168 |
+
Pipeline: piper -> wav (22k/16k depends) -> ffmpeg -> mulaw 8k raw
|
| 169 |
+
"""
|
| 170 |
+
if not PIPER_MODEL_PATH:
|
| 171 |
+
raise RuntimeError("Set PIPER_MODEL_PATH to a valid .onnx voice model")
|
| 172 |
+
if not shutil_which(PIPER_BIN):
|
| 173 |
+
raise RuntimeError(f"piper binary not found: {PIPER_BIN} (set PIPER_BIN to full path)")
|
| 174 |
+
|
| 175 |
+
text = (text or "").strip()
|
| 176 |
+
if not text:
|
| 177 |
+
return b""
|
| 178 |
+
|
| 179 |
+
with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as wavf:
|
| 180 |
+
wav_path = wavf.name
|
| 181 |
+
with tempfile.NamedTemporaryFile(suffix=".mulaw", delete=False) as mlf:
|
| 182 |
+
mulaw_path = mlf.name
|
| 183 |
+
|
| 184 |
+
try:
|
| 185 |
+
# piper writes wav
|
| 186 |
+
# piper usage: piper --model <onnx> --output_file out.wav
|
| 187 |
+
# it reads text from stdin
|
| 188 |
+
p = subprocess.run(
|
| 189 |
+
[PIPER_BIN, "--model", PIPER_MODEL_PATH, "--output_file", wav_path],
|
| 190 |
+
input=text.encode("utf-8"),
|
| 191 |
+
stdout=subprocess.PIPE,
|
| 192 |
+
stderr=subprocess.PIPE,
|
| 193 |
+
check=True
|
| 194 |
+
)
|
| 195 |
+
|
| 196 |
+
# Convert wav -> raw mulaw 8k
|
| 197 |
+
subprocess.run(
|
| 198 |
+
[
|
| 199 |
+
"ffmpeg", "-y",
|
| 200 |
+
"-i", wav_path,
|
| 201 |
+
"-ac", "1",
|
| 202 |
+
"-ar", "8000",
|
| 203 |
+
"-f", "mulaw",
|
| 204 |
+
mulaw_path
|
| 205 |
+
],
|
| 206 |
+
stdout=subprocess.PIPE,
|
| 207 |
+
stderr=subprocess.PIPE,
|
| 208 |
+
check=True
|
| 209 |
+
)
|
| 210 |
+
|
| 211 |
+
with open(mulaw_path, "rb") as f:
|
| 212 |
+
return f.read()
|
| 213 |
+
|
| 214 |
+
finally:
|
| 215 |
+
try:
|
| 216 |
+
os.unlink(wav_path)
|
| 217 |
+
except Exception:
|
| 218 |
+
pass
|
| 219 |
+
try:
|
| 220 |
+
os.unlink(mulaw_path)
|
| 221 |
+
except Exception:
|
| 222 |
+
pass
|
| 223 |
+
|
| 224 |
+
|
| 225 |
+
def shutil_which(cmd: str) -> Optional[str]:
|
| 226 |
+
# tiny "which" helper without importing shutil (keeps deps minimal)
|
| 227 |
+
if os.path.isabs(cmd) and os.path.exists(cmd):
|
| 228 |
+
return cmd
|
| 229 |
+
for p in os.getenv("PATH", "").split(os.pathsep):
|
| 230 |
+
full = os.path.join(p, cmd)
|
| 231 |
+
if os.path.exists(full) and os.access(full, os.X_OK):
|
| 232 |
+
return full
|
| 233 |
+
return None
|
| 234 |
+
|
| 235 |
+
|
| 236 |
+
# ----------------------------
|
| 237 |
+
# OpenAI streaming
|
| 238 |
+
# ----------------------------
|
| 239 |
+
def openai_stream_tokens_blocking(messages: List[Dict], model: str, cancel_flag: "CancelFlag"):
|
| 240 |
+
"""
|
| 241 |
+
Blocking generator for streaming tokens.
|
| 242 |
+
We stop yielding if cancel_flag is set.
|
| 243 |
+
"""
|
| 244 |
+
key = safe_strip_key(OPENAI_API_KEY)
|
| 245 |
+
if not key:
|
| 246 |
+
raise RuntimeError("OPENAI_API_KEY is empty. Set it in your environment.")
|
| 247 |
+
client = OpenAI(api_key=key)
|
| 248 |
+
|
| 249 |
+
stream = client.chat.completions.create(
|
| 250 |
+
model=model,
|
| 251 |
+
messages=messages,
|
| 252 |
+
temperature=0.4,
|
| 253 |
+
stream=True,
|
| 254 |
+
max_tokens=180,
|
| 255 |
+
)
|
| 256 |
+
|
| 257 |
+
for event in stream:
|
| 258 |
+
if cancel_flag.is_set:
|
| 259 |
+
break
|
| 260 |
+
delta = event.choices[0].delta
|
| 261 |
+
if delta and delta.content:
|
| 262 |
+
yield delta.content
|
| 263 |
+
|
| 264 |
+
|
| 265 |
+
class CancelFlag:
|
| 266 |
+
def __init__(self):
|
| 267 |
+
self.is_set = False
|
| 268 |
+
|
| 269 |
+
def set(self):
|
| 270 |
+
self.is_set = True
|
| 271 |
+
|
| 272 |
+
|
| 273 |
+
# ----------------------------
|
| 274 |
+
# Call state
|
| 275 |
+
# ----------------------------
|
| 276 |
+
@dataclass
|
| 277 |
+
class CallState:
|
| 278 |
+
call_id: str
|
| 279 |
+
stream_sid: str = ""
|
| 280 |
+
# audio
|
| 281 |
+
last_audio_ms: int = field(default_factory=now_ms)
|
| 282 |
+
# partial tracking
|
| 283 |
+
last_partial: str = ""
|
| 284 |
+
last_partial_change_ms: int = field(default_factory=now_ms)
|
| 285 |
+
# recognizer
|
| 286 |
+
rec: Optional[KaldiRecognizer] = None
|
| 287 |
+
# outbound audio
|
| 288 |
+
outbound_q: asyncio.Queue = field(default_factory=lambda: asyncio.Queue(maxsize=4000))
|
| 289 |
+
# barge-in / cancellation
|
| 290 |
+
bot_speaking: bool = False
|
| 291 |
+
cancel_llm: CancelFlag = field(default_factory=CancelFlag)
|
| 292 |
+
cancel_tts: CancelFlag = field(default_factory=CancelFlag)
|
| 293 |
+
# conversation
|
| 294 |
+
history: List[Dict] = field(default_factory=list)
|
| 295 |
+
# tasks
|
| 296 |
+
outbound_task: Optional[asyncio.Task] = None
|
| 297 |
+
# lock so only one bot response at a time
|
| 298 |
+
bot_lock: asyncio.Lock = field(default_factory=asyncio.Lock)
|
| 299 |
+
|
| 300 |
+
def reset_cancels(self):
|
| 301 |
+
self.cancel_llm = CancelFlag()
|
| 302 |
+
self.cancel_tts = CancelFlag()
|
| 303 |
+
|
| 304 |
+
|
| 305 |
+
# ----------------------------
|
| 306 |
+
# FastAPI endpoints
|
| 307 |
+
# ----------------------------
|
| 308 |
+
@app.get("/health")
|
| 309 |
+
async def health():
|
| 310 |
+
return {"ok": True}
|
| 311 |
+
|
| 312 |
+
|
| 313 |
+
@app.post("/voice")
|
| 314 |
+
async def voice(request: Request):
|
| 315 |
+
if not TWILIO_STREAM_URL:
|
| 316 |
+
return PlainTextResponse("TWILIO_STREAM_URL is not set", status_code=500)
|
| 317 |
+
xml = build_twiml(TWILIO_STREAM_URL)
|
| 318 |
+
log.info("Returning TwiML:\n%s", xml)
|
| 319 |
+
return Response(content=xml, media_type="application/xml")
|
| 320 |
+
|
| 321 |
+
@app.websocket("/client-ws")
|
| 322 |
+
async def client_websocket(ws: WebSocket):
|
| 323 |
+
await ws.accept()
|
| 324 |
+
connected_clients.append(ws)
|
| 325 |
+
log.info("Frontend client connected. Total clients: %d", len(connected_clients))
|
| 326 |
+
try:
|
| 327 |
+
while True:
|
| 328 |
+
# Keep connection alive
|
| 329 |
+
await ws.receive_text()
|
| 330 |
+
except WebSocketDisconnect:
|
| 331 |
+
if ws in connected_clients:
|
| 332 |
+
connected_clients.remove(ws)
|
| 333 |
+
log.info("Frontend client disconnected. Total clients: %d", len(connected_clients))
|
| 334 |
+
except Exception as e:
|
| 335 |
+
log.error("Frontend client error: %s", e)
|
| 336 |
+
if ws in connected_clients:
|
| 337 |
+
connected_clients.remove(ws)
|
| 338 |
+
|
| 339 |
+
@app.websocket("/stream")
|
| 340 |
+
async def stream(ws: WebSocket):
|
| 341 |
+
await ws.accept()
|
| 342 |
+
|
| 343 |
+
call_id = str(id(ws))
|
| 344 |
+
st = CallState(call_id=call_id)
|
| 345 |
+
st.history = [{"role": "system", "content": SYSTEM_PROMPT}]
|
| 346 |
+
|
| 347 |
+
log.info("[%s] connection open", call_id)
|
| 348 |
+
|
| 349 |
+
# Load Vosk model once per process? (simple: load per call is slow)
|
| 350 |
+
# We'll cache it globally to reduce call start latency.
|
| 351 |
+
global _VOSK_MODEL
|
| 352 |
+
if _VOSK_MODEL is None:
|
| 353 |
+
log.info("Loading Vosk model: %s", VOSK_MODEL_PATH)
|
| 354 |
+
_VOSK_MODEL = Model(VOSK_MODEL_PATH)
|
| 355 |
+
log.info("Vosk model loaded.")
|
| 356 |
+
st.rec = KaldiRecognizer(_VOSK_MODEL, 16000)
|
| 357 |
+
st.rec.SetWords(False)
|
| 358 |
+
|
| 359 |
+
# Outbound sender task (pacing)
|
| 360 |
+
st.outbound_task = asyncio.create_task(outbound_sender(ws, st))
|
| 361 |
+
|
| 362 |
+
try:
|
| 363 |
+
while True:
|
| 364 |
+
raw = await ws.receive_text()
|
| 365 |
+
msg = json.loads(raw)
|
| 366 |
+
|
| 367 |
+
event = msg.get("event")
|
| 368 |
+
if event == "start":
|
| 369 |
+
st.stream_sid = msg["start"]["streamSid"]
|
| 370 |
+
enc = msg["start"].get("mediaFormat", {}).get("encoding") or msg["start"].get("mediaFormat", {}).get("codec")
|
| 371 |
+
sr = msg["start"].get("mediaFormat", {}).get("sampleRate")
|
| 372 |
+
log.info("[%s] start streamSid=%s encoding=%s sr=%s", call_id, st.stream_sid, enc, sr)
|
| 373 |
+
|
| 374 |
+
# greet once
|
| 375 |
+
await speak_text(ws, st, "Hi! How can I help you today?")
|
| 376 |
+
|
| 377 |
+
elif event == "media":
|
| 378 |
+
st.last_audio_ms = now_ms()
|
| 379 |
+
payload_b64 = msg["media"]["payload"]
|
| 380 |
+
mulaw = base64.b64decode(payload_b64)
|
| 381 |
+
|
| 382 |
+
# decode mulaw 8k -> PCM 16k 16-bit
|
| 383 |
+
pcm16 = audioop.ulaw2lin(mulaw, 2) # 16-bit
|
| 384 |
+
pcm16_16k, _ = audioop.ratecv(pcm16, 2, 1, 8000, 16000, None)
|
| 385 |
+
|
| 386 |
+
# feed recognizer
|
| 387 |
+
if st.rec.AcceptWaveform(pcm16_16k):
|
| 388 |
+
j = json.loads(st.rec.Result() or "{}")
|
| 389 |
+
text = (j.get("text") or "").strip()
|
| 390 |
+
if text:
|
| 391 |
+
await on_utterance(ws, st, text, reason="vosk_final")
|
| 392 |
+
else:
|
| 393 |
+
j = json.loads(st.rec.PartialResult() or "{}")
|
| 394 |
+
partial = (j.get("partial") or "").strip()
|
| 395 |
+
if partial:
|
| 396 |
+
await on_partial(ws, st, partial)
|
| 397 |
+
|
| 398 |
+
# endpointing checks (silence/stability)
|
| 399 |
+
await maybe_endpoint(ws, st)
|
| 400 |
+
|
| 401 |
+
elif event == "stop":
|
| 402 |
+
log.info("[%s] stop", call_id)
|
| 403 |
+
break
|
| 404 |
+
|
| 405 |
+
except WebSocketDisconnect:
|
| 406 |
+
log.info("[%s] websocket disconnected", call_id)
|
| 407 |
+
except Exception as e:
|
| 408 |
+
log.exception("[%s] websocket error: %s", call_id, e)
|
| 409 |
+
finally:
|
| 410 |
+
# cancel outbound task
|
| 411 |
+
if st.outbound_task:
|
| 412 |
+
st.outbound_task.cancel()
|
| 413 |
+
log.info("[%s] connection closed", call_id)
|
| 414 |
+
|
| 415 |
+
|
| 416 |
+
# cache model
|
| 417 |
+
_VOSK_MODEL = None
|
| 418 |
+
|
| 419 |
+
|
| 420 |
+
# ----------------------------
|
| 421 |
+
# Endpointing + Barge-in
|
| 422 |
+
# ----------------------------
|
| 423 |
+
async def on_partial(ws: WebSocket, st: CallState, partial: str):
|
| 424 |
+
# barge-in trigger: user starts speaking while bot speaking
|
| 425 |
+
# use partial length threshold
|
| 426 |
+
words = partial.split()
|
| 427 |
+
if st.bot_speaking and len(words) >= 2:
|
| 428 |
+
log.info("[%s] BARGE-IN detected (partial=%r)", st.call_id, partial)
|
| 429 |
+
await barge_in(ws, st)
|
| 430 |
+
|
| 431 |
+
if partial != st.last_partial:
|
| 432 |
+
st.last_partial = partial
|
| 433 |
+
st.last_partial_change_ms = now_ms()
|
| 434 |
+
log.info("[%s] partial: %s", st.call_id, partial)
|
| 435 |
+
|
| 436 |
+
|
| 437 |
+
async def maybe_endpoint(ws: WebSocket, st: CallState):
|
| 438 |
+
# stable partial endpoint
|
| 439 |
+
if st.last_partial:
|
| 440 |
+
stable_ms = now_ms() - st.last_partial_change_ms
|
| 441 |
+
if stable_ms >= STABLE_PARTIAL_MS:
|
| 442 |
+
# commit partial as utterance
|
| 443 |
+
text = st.last_partial.strip()
|
| 444 |
+
st.last_partial = ""
|
| 445 |
+
if text and not is_junk_utterance(text):
|
| 446 |
+
await on_utterance(ws, st, text, reason=f"stable_partial_{stable_ms}ms")
|
| 447 |
+
|
| 448 |
+
# silence endpoint
|
| 449 |
+
silence_ms = now_ms() - st.last_audio_ms
|
| 450 |
+
if silence_ms >= SILENCE_MS and st.last_partial:
|
| 451 |
+
text = st.last_partial.strip()
|
| 452 |
+
st.last_partial = ""
|
| 453 |
+
if text and not is_junk_utterance(text):
|
| 454 |
+
await on_utterance(ws, st, text, reason=f"silence_{silence_ms}ms")
|
| 455 |
+
|
| 456 |
+
|
| 457 |
+
async def on_utterance(ws: WebSocket, st: CallState, text: str, reason: str):
|
| 458 |
+
text = (text or "").strip()
|
| 459 |
+
if not text:
|
| 460 |
+
return
|
| 461 |
+
if is_junk_utterance(text):
|
| 462 |
+
log.info("[%s] ignore utterance=%r reason=%s", st.call_id, text, reason)
|
| 463 |
+
return
|
| 464 |
+
|
| 465 |
+
# print to terminal clearly
|
| 466 |
+
print("\n" + "=" * 70)
|
| 467 |
+
print(f"STT ({reason}): {text}")
|
| 468 |
+
print("LLM: ", end="", flush=True)
|
| 469 |
+
|
| 470 |
+
# broadcast to frontend
|
| 471 |
+
await broadcast_transcript("user", text)
|
| 472 |
+
|
| 473 |
+
# ensure only one bot response runs at a time
|
| 474 |
+
async with st.bot_lock:
|
| 475 |
+
st.reset_cancels()
|
| 476 |
+
await run_llm_stream_and_tts(ws, st, text)
|
| 477 |
+
|
| 478 |
+
|
| 479 |
+
async def barge_in(ws: WebSocket, st: CallState):
|
| 480 |
+
# cancel ongoing LLM/TTS
|
| 481 |
+
st.cancel_llm.set()
|
| 482 |
+
st.cancel_tts.set()
|
| 483 |
+
|
| 484 |
+
# stop playback on Twilio side (important)
|
| 485 |
+
try:
|
| 486 |
+
await ws.send_text(json.dumps({"event": "clear", "streamSid": st.stream_sid}))
|
| 487 |
+
except Exception:
|
| 488 |
+
pass
|
| 489 |
+
|
| 490 |
+
# clear our outbound queue
|
| 491 |
+
await drain_queue(st.outbound_q)
|
| 492 |
+
|
| 493 |
+
st.bot_speaking = False
|
| 494 |
+
|
| 495 |
+
|
| 496 |
+
# ----------------------------
|
| 497 |
+
# LLM streaming -> chunk -> TTS -> queue -> paced playback
|
| 498 |
+
# ----------------------------
|
| 499 |
+
async def run_llm_stream_and_tts(ws: WebSocket, st: CallState, user_text: str):
|
| 500 |
+
# build short rolling history
|
| 501 |
+
st.history.append({"role": "user", "content": user_text})
|
| 502 |
+
st.history = st.history[:1] + st.history[-8:] # keep system + last 8 msgs
|
| 503 |
+
|
| 504 |
+
loop = asyncio.get_running_loop()
|
| 505 |
+
token_q: asyncio.Queue = asyncio.Queue()
|
| 506 |
+
|
| 507 |
+
def worker():
|
| 508 |
+
try:
|
| 509 |
+
for tok in openai_stream_tokens_blocking(st.history, OPENAI_MODEL, st.cancel_llm):
|
| 510 |
+
if st.cancel_llm.is_set:
|
| 511 |
+
break
|
| 512 |
+
asyncio.run_coroutine_threadsafe(token_q.put(tok), loop)
|
| 513 |
+
finally:
|
| 514 |
+
asyncio.run_coroutine_threadsafe(token_q.put(None), loop)
|
| 515 |
+
|
| 516 |
+
# start blocking OpenAI stream in a thread
|
| 517 |
+
await loop.run_in_executor(None, worker)
|
| 518 |
+
|
| 519 |
+
# read tokens and chunk them
|
| 520 |
+
buf = ""
|
| 521 |
+
full = ""
|
| 522 |
+
|
| 523 |
+
while True:
|
| 524 |
+
tok = await token_q.get()
|
| 525 |
+
if tok is None:
|
| 526 |
+
break
|
| 527 |
+
if st.cancel_llm.is_set:
|
| 528 |
+
break
|
| 529 |
+
|
| 530 |
+
full += tok
|
| 531 |
+
buf += tok
|
| 532 |
+
|
| 533 |
+
# print as it streams
|
| 534 |
+
print(tok, end="", flush=True)
|
| 535 |
+
|
| 536 |
+
# chunk rule: punctuation OR length
|
| 537 |
+
if CHUNK_END_RE.search(buf) or len(buf) >= CHUNK_MAX_CHARS:
|
| 538 |
+
chunk = buf.strip()
|
| 539 |
+
buf = ""
|
| 540 |
+
if chunk:
|
| 541 |
+
await tts_enqueue(ws, st, chunk)
|
| 542 |
+
|
| 543 |
+
# flush remaining
|
| 544 |
+
rem = buf.strip()
|
| 545 |
+
if rem and not st.cancel_llm.is_set:
|
| 546 |
+
await tts_enqueue(ws, st, rem)
|
| 547 |
+
|
| 548 |
+
# store assistant message for context (only if not cancelled)
|
| 549 |
+
if full.strip() and not st.cancel_llm.is_set:
|
| 550 |
+
st.history.append({"role": "assistant", "content": full.strip()})
|
| 551 |
+
# broadcast to frontend
|
| 552 |
+
await broadcast_transcript("assistant", full.strip())
|
| 553 |
+
|
| 554 |
+
|
| 555 |
+
async def tts_enqueue(ws: WebSocket, st: CallState, text: str):
|
| 556 |
+
if st.cancel_tts.is_set:
|
| 557 |
+
return
|
| 558 |
+
|
| 559 |
+
st.bot_speaking = True
|
| 560 |
+
log.info("[%s] TTS start (chars=%d)", st.call_id, len(text))
|
| 561 |
+
|
| 562 |
+
# run piper+ffmpeg in executor (blocking)
|
| 563 |
+
loop = asyncio.get_running_loop()
|
| 564 |
+
mulaw_bytes = await loop.run_in_executor(None, piper_tts_to_mulaw, text)
|
| 565 |
+
|
| 566 |
+
if st.cancel_tts.is_set:
|
| 567 |
+
return
|
| 568 |
+
|
| 569 |
+
frames = split_mulaw_frames(mulaw_bytes)
|
| 570 |
+
log.info("[%s] TTS ready (frames=%d)", st.call_id, len(frames))
|
| 571 |
+
|
| 572 |
+
# enqueue frames
|
| 573 |
+
for fr in frames:
|
| 574 |
+
if st.cancel_tts.is_set:
|
| 575 |
+
break
|
| 576 |
+
b64 = base64.b64encode(fr).decode("ascii")
|
| 577 |
+
await st.outbound_q.put(b64)
|
| 578 |
+
|
| 579 |
+
# marker: end of this chunk
|
| 580 |
+
await st.outbound_q.put("__END_CHUNK__")
|
| 581 |
+
|
| 582 |
+
|
| 583 |
+
async def speak_text(ws: WebSocket, st: CallState, text: str):
|
| 584 |
+
# used for initial greeting
|
| 585 |
+
await barge_in(ws, st) # clear any previous
|
| 586 |
+
await tts_enqueue(ws, st, text)
|
| 587 |
+
|
| 588 |
+
|
| 589 |
+
async def outbound_sender(ws: WebSocket, st: CallState):
|
| 590 |
+
"""
|
| 591 |
+
Sends queued audio to Twilio at real-time pace (20ms per frame).
|
| 592 |
+
Also turns off bot_speaking after chunk ends and queue drains.
|
| 593 |
+
"""
|
| 594 |
+
sent_last_sec = 0
|
| 595 |
+
sec_tick = time.time()
|
| 596 |
+
|
| 597 |
+
try:
|
| 598 |
+
while True:
|
| 599 |
+
item = await st.outbound_q.get()
|
| 600 |
+
|
| 601 |
+
if item == "__END_CHUNK__":
|
| 602 |
+
# if queue empty after a short moment -> bot not speaking
|
| 603 |
+
await asyncio.sleep(0.02)
|
| 604 |
+
if st.outbound_q.empty():
|
| 605 |
+
st.bot_speaking = False
|
| 606 |
+
st.outbound_q.task_done()
|
| 607 |
+
continue
|
| 608 |
+
|
| 609 |
+
# Twilio media message
|
| 610 |
+
msg = {
|
| 611 |
+
"event": "media",
|
| 612 |
+
"streamSid": st.stream_sid,
|
| 613 |
+
"media": {"payload": item},
|
| 614 |
+
}
|
| 615 |
+
await ws.send_text(json.dumps(msg))
|
| 616 |
+
st.outbound_q.task_done()
|
| 617 |
+
|
| 618 |
+
# pacing
|
| 619 |
+
await asyncio.sleep(FRAME_MS / 1000.0)
|
| 620 |
+
|
| 621 |
+
# stats
|
| 622 |
+
sent_last_sec += 1
|
| 623 |
+
if time.time() - sec_tick >= 1.0:
|
| 624 |
+
log.info("[%s] outbound media messages sent last 1s: %d", st.call_id, sent_last_sec)
|
| 625 |
+
sent_last_sec = 0
|
| 626 |
+
sec_tick = time.time()
|
| 627 |
+
|
| 628 |
+
except asyncio.CancelledError:
|
| 629 |
+
return
|
| 630 |
+
except Exception as e:
|
| 631 |
+
log.exception("[%s] outbound sender error: %s", st.call_id, e)
|
| 632 |
+
|
| 633 |
+
|
| 634 |
+
if __name__ == "__main__":
|
| 635 |
+
import uvicorn
|
| 636 |
+
uvicorn.run(app, host=HOST, port=PORT)
|
web_demo/.gitignore
ADDED
|
@@ -0,0 +1,24 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Logs
|
| 2 |
+
logs
|
| 3 |
+
*.log
|
| 4 |
+
npm-debug.log*
|
| 5 |
+
yarn-debug.log*
|
| 6 |
+
yarn-error.log*
|
| 7 |
+
pnpm-debug.log*
|
| 8 |
+
lerna-debug.log*
|
| 9 |
+
|
| 10 |
+
node_modules
|
| 11 |
+
dist
|
| 12 |
+
dist-ssr
|
| 13 |
+
*.local
|
| 14 |
+
.env
|
| 15 |
+
# Editor directories and files
|
| 16 |
+
.vscode/*
|
| 17 |
+
!.vscode/extensions.json
|
| 18 |
+
.idea
|
| 19 |
+
.DS_Store
|
| 20 |
+
*.suo
|
| 21 |
+
*.ntvs*
|
| 22 |
+
*.njsproj
|
| 23 |
+
*.sln
|
| 24 |
+
*.sw?
|
web_demo/README.md
ADDED
|
@@ -0,0 +1,16 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# React + Vite
|
| 2 |
+
|
| 3 |
+
This template provides a minimal setup to get React working in Vite with HMR and some ESLint rules.
|
| 4 |
+
|
| 5 |
+
Currently, two official plugins are available:
|
| 6 |
+
|
| 7 |
+
- [@vitejs/plugin-react](https://github.com/vitejs/vite-plugin-react/blob/main/packages/plugin-react) uses [Babel](https://babeljs.io/) (or [oxc](https://oxc.rs) when used in [rolldown-vite](https://vite.dev/guide/rolldown)) for Fast Refresh
|
| 8 |
+
- [@vitejs/plugin-react-swc](https://github.com/vitejs/vite-plugin-react/blob/main/packages/plugin-react-swc) uses [SWC](https://swc.rs/) for Fast Refresh
|
| 9 |
+
|
| 10 |
+
## React Compiler
|
| 11 |
+
|
| 12 |
+
The React Compiler is not enabled on this template because of its impact on dev & build performances. To add it, see [this documentation](https://react.dev/learn/react-compiler/installation).
|
| 13 |
+
|
| 14 |
+
## Expanding the ESLint configuration
|
| 15 |
+
|
| 16 |
+
If you are developing a production application, we recommend using TypeScript with type-aware lint rules enabled. Check out the [TS template](https://github.com/vitejs/vite/tree/main/packages/create-vite/template-react-ts) for information on how to integrate TypeScript and [`typescript-eslint`](https://typescript-eslint.io) in your project.
|
web_demo/envdatavars.txt
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
VITE_OPENAI_API_KEY=
|
web_demo/eslint.config.js
ADDED
|
@@ -0,0 +1,29 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import js from '@eslint/js'
|
| 2 |
+
import globals from 'globals'
|
| 3 |
+
import reactHooks from 'eslint-plugin-react-hooks'
|
| 4 |
+
import reactRefresh from 'eslint-plugin-react-refresh'
|
| 5 |
+
import { defineConfig, globalIgnores } from 'eslint/config'
|
| 6 |
+
|
| 7 |
+
export default defineConfig([
|
| 8 |
+
globalIgnores(['dist']),
|
| 9 |
+
{
|
| 10 |
+
files: ['**/*.{js,jsx}'],
|
| 11 |
+
extends: [
|
| 12 |
+
js.configs.recommended,
|
| 13 |
+
reactHooks.configs.flat.recommended,
|
| 14 |
+
reactRefresh.configs.vite,
|
| 15 |
+
],
|
| 16 |
+
languageOptions: {
|
| 17 |
+
ecmaVersion: 2020,
|
| 18 |
+
globals: globals.browser,
|
| 19 |
+
parserOptions: {
|
| 20 |
+
ecmaVersion: 'latest',
|
| 21 |
+
ecmaFeatures: { jsx: true },
|
| 22 |
+
sourceType: 'module',
|
| 23 |
+
},
|
| 24 |
+
},
|
| 25 |
+
rules: {
|
| 26 |
+
'no-unused-vars': ['error', { varsIgnorePattern: '^[A-Z_]' }],
|
| 27 |
+
},
|
| 28 |
+
},
|
| 29 |
+
])
|
web_demo/index.html
ADDED
|
@@ -0,0 +1,13 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<!doctype html>
|
| 2 |
+
<html lang="en">
|
| 3 |
+
<head>
|
| 4 |
+
<meta charset="UTF-8" />
|
| 5 |
+
<link rel="icon" type="image/svg+xml" href="/vite.svg" />
|
| 6 |
+
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
| 7 |
+
<title>web_demo</title>
|
| 8 |
+
</head>
|
| 9 |
+
<body>
|
| 10 |
+
<div id="root"></div>
|
| 11 |
+
<script type="module" src="/src/main.jsx"></script>
|
| 12 |
+
</body>
|
| 13 |
+
</html>
|
web_demo/package-lock.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
web_demo/package.json
ADDED
|
@@ -0,0 +1,28 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"name": "web_demo",
|
| 3 |
+
"private": true,
|
| 4 |
+
"version": "0.0.0",
|
| 5 |
+
"type": "module",
|
| 6 |
+
"scripts": {
|
| 7 |
+
"dev": "vite",
|
| 8 |
+
"build": "vite build",
|
| 9 |
+
"lint": "eslint .",
|
| 10 |
+
"preview": "vite preview"
|
| 11 |
+
},
|
| 12 |
+
"dependencies": {
|
| 13 |
+
"lucide-react": "^0.562.0",
|
| 14 |
+
"react": "^19.2.0",
|
| 15 |
+
"react-dom": "^19.2.0"
|
| 16 |
+
},
|
| 17 |
+
"devDependencies": {
|
| 18 |
+
"@eslint/js": "^9.39.1",
|
| 19 |
+
"@types/react": "^19.2.5",
|
| 20 |
+
"@types/react-dom": "^19.2.3",
|
| 21 |
+
"@vitejs/plugin-react": "^5.1.1",
|
| 22 |
+
"eslint": "^9.39.1",
|
| 23 |
+
"eslint-plugin-react-hooks": "^7.0.1",
|
| 24 |
+
"eslint-plugin-react-refresh": "^0.4.24",
|
| 25 |
+
"globals": "^16.5.0",
|
| 26 |
+
"vite": "^7.2.4"
|
| 27 |
+
}
|
| 28 |
+
}
|
web_demo/public/vite.svg
ADDED
|
|
web_demo/src/App.css
ADDED
|
@@ -0,0 +1,211 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
:root {
|
| 2 |
+
--bg-color: #0f172a;
|
| 3 |
+
--card-bg: #1e293b;
|
| 4 |
+
--text-primary: #f8fafc;
|
| 5 |
+
--text-secondary: #94a3b8;
|
| 6 |
+
--accent-color: #3b82f6;
|
| 7 |
+
--user-bubble: #334155;
|
| 8 |
+
--assistant-bubble: #2563eb;
|
| 9 |
+
--error-color: #ef4444;
|
| 10 |
+
--success-color: #22c55e;
|
| 11 |
+
}
|
| 12 |
+
|
| 13 |
+
* {
|
| 14 |
+
box-sizing: border-box;
|
| 15 |
+
margin: 0;
|
| 16 |
+
padding: 0;
|
| 17 |
+
}
|
| 18 |
+
|
| 19 |
+
body {
|
| 20 |
+
font-family: 'Inter', system-ui, -apple-system, sans-serif;
|
| 21 |
+
background-color: var(--bg-color);
|
| 22 |
+
color: var(--text-primary);
|
| 23 |
+
height: 100vh;
|
| 24 |
+
overflow: hidden;
|
| 25 |
+
}
|
| 26 |
+
|
| 27 |
+
.app-container {
|
| 28 |
+
display: flex;
|
| 29 |
+
flex-direction: column;
|
| 30 |
+
height: 100vh;
|
| 31 |
+
width: 100%;
|
| 32 |
+
max-width: 100vw;
|
| 33 |
+
margin: 0;
|
| 34 |
+
padding: 1.5rem;
|
| 35 |
+
}
|
| 36 |
+
|
| 37 |
+
.header {
|
| 38 |
+
display: flex;
|
| 39 |
+
justify-content: space-between;
|
| 40 |
+
align-items: center;
|
| 41 |
+
padding: 1rem;
|
| 42 |
+
background-color: var(--card-bg);
|
| 43 |
+
border-radius: 12px;
|
| 44 |
+
margin-bottom: 1rem;
|
| 45 |
+
box-shadow: 0 4px 6px -1px rgba(0, 0, 0, 0.1);
|
| 46 |
+
}
|
| 47 |
+
|
| 48 |
+
/* Tabs */
|
| 49 |
+
.tabs {
|
| 50 |
+
display: flex;
|
| 51 |
+
gap: 0.5rem;
|
| 52 |
+
margin-bottom: 1rem;
|
| 53 |
+
background-color: var(--card-bg);
|
| 54 |
+
padding: 0.5rem;
|
| 55 |
+
border-radius: 12px;
|
| 56 |
+
box-shadow: 0 4px 6px -1px rgba(0, 0, 0, 0.1);
|
| 57 |
+
}
|
| 58 |
+
|
| 59 |
+
.tab {
|
| 60 |
+
flex: 1;
|
| 61 |
+
display: flex;
|
| 62 |
+
align-items: center;
|
| 63 |
+
justify-content: center;
|
| 64 |
+
gap: 0.5rem;
|
| 65 |
+
padding: 0.75rem 1rem;
|
| 66 |
+
background: transparent;
|
| 67 |
+
border: none;
|
| 68 |
+
border-radius: 8px;
|
| 69 |
+
color: var(--text-secondary);
|
| 70 |
+
font-size: 0.875rem;
|
| 71 |
+
font-weight: 500;
|
| 72 |
+
cursor: pointer;
|
| 73 |
+
transition: all 0.2s ease;
|
| 74 |
+
font-family: inherit;
|
| 75 |
+
}
|
| 76 |
+
|
| 77 |
+
.tab:hover {
|
| 78 |
+
background-color: rgba(255, 255, 255, 0.05);
|
| 79 |
+
color: var(--text-primary);
|
| 80 |
+
}
|
| 81 |
+
|
| 82 |
+
.tab.active {
|
| 83 |
+
background-color: var(--accent-color);
|
| 84 |
+
color: white;
|
| 85 |
+
box-shadow: 0 2px 8px rgba(59, 130, 246, 0.3);
|
| 86 |
+
}
|
| 87 |
+
|
| 88 |
+
.tab svg {
|
| 89 |
+
flex-shrink: 0;
|
| 90 |
+
}
|
| 91 |
+
|
| 92 |
+
.logo {
|
| 93 |
+
display: flex;
|
| 94 |
+
align-items: center;
|
| 95 |
+
gap: 0.75rem;
|
| 96 |
+
}
|
| 97 |
+
|
| 98 |
+
.icon-logo {
|
| 99 |
+
color: var(--accent-color);
|
| 100 |
+
}
|
| 101 |
+
|
| 102 |
+
h1 {
|
| 103 |
+
font-size: 1.25rem;
|
| 104 |
+
font-weight: 600;
|
| 105 |
+
}
|
| 106 |
+
|
| 107 |
+
.status-badge {
|
| 108 |
+
display: flex;
|
| 109 |
+
align-items: center;
|
| 110 |
+
gap: 0.5rem;
|
| 111 |
+
padding: 0.5rem 1rem;
|
| 112 |
+
border-radius: 9999px;
|
| 113 |
+
font-size: 0.875rem;
|
| 114 |
+
background-color: rgba(255, 255, 255, 0.05);
|
| 115 |
+
}
|
| 116 |
+
|
| 117 |
+
.status-badge.connected {
|
| 118 |
+
color: var(--success-color);
|
| 119 |
+
background-color: rgba(34, 197, 94, 0.1);
|
| 120 |
+
}
|
| 121 |
+
|
| 122 |
+
.status-badge.disconnected {
|
| 123 |
+
color: var(--error-color);
|
| 124 |
+
background-color: rgba(239, 68, 68, 0.1);
|
| 125 |
+
}
|
| 126 |
+
|
| 127 |
+
.main-content {
|
| 128 |
+
flex: 1;
|
| 129 |
+
background-color: var(--card-bg);
|
| 130 |
+
border-radius: 12px;
|
| 131 |
+
padding: 1rem;
|
| 132 |
+
overflow-y: auto;
|
| 133 |
+
position: relative;
|
| 134 |
+
box-shadow: 0 4px 6px -1px rgba(0, 0, 0, 0.1);
|
| 135 |
+
}
|
| 136 |
+
|
| 137 |
+
.transcript-container {
|
| 138 |
+
display: flex;
|
| 139 |
+
flex-direction: column;
|
| 140 |
+
gap: 1rem;
|
| 141 |
+
}
|
| 142 |
+
|
| 143 |
+
.empty-state {
|
| 144 |
+
display: flex;
|
| 145 |
+
flex-direction: column;
|
| 146 |
+
align-items: center;
|
| 147 |
+
justify-content: center;
|
| 148 |
+
height: 100%;
|
| 149 |
+
color: var(--text-secondary);
|
| 150 |
+
gap: 1rem;
|
| 151 |
+
margin-top: 4rem;
|
| 152 |
+
}
|
| 153 |
+
|
| 154 |
+
.transcript-item {
|
| 155 |
+
display: flex;
|
| 156 |
+
flex-direction: column;
|
| 157 |
+
gap: 0.25rem;
|
| 158 |
+
max-width: 80%;
|
| 159 |
+
}
|
| 160 |
+
|
| 161 |
+
.transcript-item.user {
|
| 162 |
+
align-self: flex-end;
|
| 163 |
+
align-items: flex-end;
|
| 164 |
+
}
|
| 165 |
+
|
| 166 |
+
.transcript-item.assistant {
|
| 167 |
+
align-self: flex-start;
|
| 168 |
+
align-items: flex-start;
|
| 169 |
+
}
|
| 170 |
+
|
| 171 |
+
.message-header {
|
| 172 |
+
display: flex;
|
| 173 |
+
gap: 0.5rem;
|
| 174 |
+
font-size: 0.75rem;
|
| 175 |
+
color: var(--text-secondary);
|
| 176 |
+
}
|
| 177 |
+
|
| 178 |
+
.message-bubble {
|
| 179 |
+
padding: 0.75rem 1rem;
|
| 180 |
+
border-radius: 12px;
|
| 181 |
+
line-height: 1.5;
|
| 182 |
+
box-shadow: 0 1px 2px rgba(0, 0, 0, 0.1);
|
| 183 |
+
}
|
| 184 |
+
|
| 185 |
+
.transcript-item.user .message-bubble {
|
| 186 |
+
background-color: var(--user-bubble);
|
| 187 |
+
border-bottom-right-radius: 2px;
|
| 188 |
+
}
|
| 189 |
+
|
| 190 |
+
.transcript-item.assistant .message-bubble {
|
| 191 |
+
background-color: var(--assistant-bubble);
|
| 192 |
+
border-bottom-left-radius: 2px;
|
| 193 |
+
}
|
| 194 |
+
|
| 195 |
+
/* Scrollbar styling */
|
| 196 |
+
::-webkit-scrollbar {
|
| 197 |
+
width: 8px;
|
| 198 |
+
}
|
| 199 |
+
|
| 200 |
+
::-webkit-scrollbar-track {
|
| 201 |
+
background: transparent;
|
| 202 |
+
}
|
| 203 |
+
|
| 204 |
+
::-webkit-scrollbar-thumb {
|
| 205 |
+
background: #475569;
|
| 206 |
+
border-radius: 4px;
|
| 207 |
+
}
|
| 208 |
+
|
| 209 |
+
::-webkit-scrollbar-thumb:hover {
|
| 210 |
+
background: #64748b;
|
| 211 |
+
}
|
web_demo/src/App.jsx
ADDED
|
@@ -0,0 +1,162 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import React, { useState, useEffect, useRef, useCallback } from 'react';
|
| 2 |
+
import { Activity, Wifi, WifiOff, Terminal, MessageSquare, Mic, Volume2, Settings } from 'lucide-react';
|
| 3 |
+
import './App.css';
|
| 4 |
+
import MicrophoneTest from './components/MicrophoneTest.jsx';
|
| 5 |
+
import TextToSpeech from './components/TextToSpeech.jsx';
|
| 6 |
+
import SttLlmTts from './components/SttLlmTts.jsx';
|
| 7 |
+
|
| 8 |
+
function App() {
|
| 9 |
+
const [activeTab, setActiveTab] = useState('transcript');
|
| 10 |
+
const [isConnected, setIsConnected] = useState(false);
|
| 11 |
+
const [transcripts, setTranscripts] = useState([]);
|
| 12 |
+
const [status, setStatus] = useState('Disconnected');
|
| 13 |
+
const wsRef = useRef(null);
|
| 14 |
+
const transcriptsEndRef = useRef(null);
|
| 15 |
+
|
| 16 |
+
// Scroll to bottom of transcripts
|
| 17 |
+
useEffect(() => {
|
| 18 |
+
transcriptsEndRef.current?.scrollIntoView({ behavior: 'smooth' });
|
| 19 |
+
}, [transcripts]);
|
| 20 |
+
|
| 21 |
+
// Connect to WebSocket
|
| 22 |
+
const connectWebSocket = useCallback(() => {
|
| 23 |
+
setStatus('Connecting...');
|
| 24 |
+
// Connect to the backend's frontend-client endpoint
|
| 25 |
+
const wsUrl = `ws://localhost:8080/client-ws`;
|
| 26 |
+
const ws = new WebSocket(wsUrl);
|
| 27 |
+
|
| 28 |
+
ws.onopen = () => {
|
| 29 |
+
setIsConnected(true);
|
| 30 |
+
setStatus('Connected (Waiting for call data)');
|
| 31 |
+
};
|
| 32 |
+
|
| 33 |
+
ws.onclose = () => {
|
| 34 |
+
setIsConnected(false);
|
| 35 |
+
setStatus('Disconnected');
|
| 36 |
+
// Try reconnecting after 3 seconds
|
| 37 |
+
setTimeout(connectWebSocket, 3000);
|
| 38 |
+
};
|
| 39 |
+
|
| 40 |
+
ws.onerror = (error) => {
|
| 41 |
+
console.error('WebSocket error:', error);
|
| 42 |
+
setStatus('Error connecting');
|
| 43 |
+
};
|
| 44 |
+
|
| 45 |
+
ws.onmessage = (event) => {
|
| 46 |
+
try {
|
| 47 |
+
const data = JSON.parse(event.data);
|
| 48 |
+
handleServerMessage(data);
|
| 49 |
+
} catch (e) {
|
| 50 |
+
console.error('Error parsing message:', e);
|
| 51 |
+
}
|
| 52 |
+
};
|
| 53 |
+
|
| 54 |
+
wsRef.current = ws;
|
| 55 |
+
}, []);
|
| 56 |
+
|
| 57 |
+
useEffect(() => {
|
| 58 |
+
connectWebSocket();
|
| 59 |
+
return () => {
|
| 60 |
+
if (wsRef.current) {
|
| 61 |
+
wsRef.current.close();
|
| 62 |
+
}
|
| 63 |
+
};
|
| 64 |
+
}, [connectWebSocket]);
|
| 65 |
+
|
| 66 |
+
const handleServerMessage = (data) => {
|
| 67 |
+
if (data.type === 'transcript') {
|
| 68 |
+
setTranscripts(prev => [...prev, {
|
| 69 |
+
id: Date.now(),
|
| 70 |
+
role: data.role,
|
| 71 |
+
text: data.text,
|
| 72 |
+
timestamp: new Date(data.timestamp).toLocaleTimeString()
|
| 73 |
+
}]);
|
| 74 |
+
}
|
| 75 |
+
};
|
| 76 |
+
|
| 77 |
+
return (
|
| 78 |
+
<div className="app-container">
|
| 79 |
+
<header className="header">
|
| 80 |
+
<div className="logo">
|
| 81 |
+
<Activity className="icon-logo" />
|
| 82 |
+
<h1>NeuralVoice AI</h1>
|
| 83 |
+
</div>
|
| 84 |
+
<div className={`status-badge ${isConnected ? 'connected' : 'disconnected'}`}>
|
| 85 |
+
{isConnected ? <Wifi size={16} /> : <WifiOff size={16} />}
|
| 86 |
+
<span>{status}</span>
|
| 87 |
+
</div>
|
| 88 |
+
</header>
|
| 89 |
+
|
| 90 |
+
<div className="tabs">
|
| 91 |
+
<button
|
| 92 |
+
className={`tab ${activeTab === 'transcript' ? 'active' : ''}`}
|
| 93 |
+
onClick={() => setActiveTab('transcript')}
|
| 94 |
+
>
|
| 95 |
+
<MessageSquare size={18} />
|
| 96 |
+
<span>Live Call Transcript</span>
|
| 97 |
+
</button>
|
| 98 |
+
<button
|
| 99 |
+
className={`tab ${activeTab === 'microphone' ? 'active' : ''}`}
|
| 100 |
+
onClick={() => setActiveTab('microphone')}
|
| 101 |
+
>
|
| 102 |
+
<Mic size={18} />
|
| 103 |
+
<span>Microphone Test (STT)</span>
|
| 104 |
+
</button>
|
| 105 |
+
<button
|
| 106 |
+
className={`tab ${activeTab === 'tts' ? 'active' : ''}`}
|
| 107 |
+
onClick={() => setActiveTab('tts')}
|
| 108 |
+
>
|
| 109 |
+
<Volume2 size={18} />
|
| 110 |
+
<span>Text-to-Speech (TTS)</span>
|
| 111 |
+
</button>
|
| 112 |
+
<button
|
| 113 |
+
className={`tab ${activeTab === 'stt-llm-tts' ? 'active' : ''}`}
|
| 114 |
+
onClick={() => setActiveTab('stt-llm-tts')}
|
| 115 |
+
>
|
| 116 |
+
<Settings size={18} />
|
| 117 |
+
<span>STT-LLM-TTS</span>
|
| 118 |
+
</button>
|
| 119 |
+
</div>
|
| 120 |
+
|
| 121 |
+
<main className="main-content">
|
| 122 |
+
{activeTab === 'transcript' && (
|
| 123 |
+
<div className="transcript-container">
|
| 124 |
+
{transcripts.length === 0 && (
|
| 125 |
+
<div className="empty-state">
|
| 126 |
+
<Terminal size={48} />
|
| 127 |
+
<p>Waiting for call activity...</p>
|
| 128 |
+
</div>
|
| 129 |
+
)}
|
| 130 |
+
|
| 131 |
+
{transcripts.map((t) => (
|
| 132 |
+
<div key={t.id} className={`transcript-item ${t.role}`}>
|
| 133 |
+
<div className="message-header">
|
| 134 |
+
<span className="role">{t.role === 'user' ? 'Caller' : 'AI Assistant'}</span>
|
| 135 |
+
<span className="timestamp">{t.timestamp}</span>
|
| 136 |
+
</div>
|
| 137 |
+
<div className="message-bubble">
|
| 138 |
+
<p className="text">{t.text}</p>
|
| 139 |
+
</div>
|
| 140 |
+
</div>
|
| 141 |
+
))}
|
| 142 |
+
<div ref={transcriptsEndRef} />
|
| 143 |
+
</div>
|
| 144 |
+
)}
|
| 145 |
+
|
| 146 |
+
{activeTab === 'microphone' && (
|
| 147 |
+
<MicrophoneTest />
|
| 148 |
+
)}
|
| 149 |
+
|
| 150 |
+
{activeTab === 'tts' && (
|
| 151 |
+
<TextToSpeech />
|
| 152 |
+
)}
|
| 153 |
+
|
| 154 |
+
{activeTab === 'stt-llm-tts' && (
|
| 155 |
+
<SttLlmTts />
|
| 156 |
+
)}
|
| 157 |
+
</main>
|
| 158 |
+
</div>
|
| 159 |
+
);
|
| 160 |
+
}
|
| 161 |
+
|
| 162 |
+
export default App;
|
web_demo/src/assets/react.svg
ADDED
|
|
web_demo/src/components/MicrophoneTest.css
ADDED
|
@@ -0,0 +1,315 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
.microphone-test {
|
| 2 |
+
display: flex;
|
| 3 |
+
flex-direction: column;
|
| 4 |
+
gap: 1.5rem;
|
| 5 |
+
padding: 0;
|
| 6 |
+
height: 100%;
|
| 7 |
+
width: 100%;
|
| 8 |
+
}
|
| 9 |
+
|
| 10 |
+
.mic-header {
|
| 11 |
+
text-align: center;
|
| 12 |
+
}
|
| 13 |
+
|
| 14 |
+
.mic-header h2 {
|
| 15 |
+
font-size: 1.5rem;
|
| 16 |
+
font-weight: 600;
|
| 17 |
+
margin-bottom: 0.5rem;
|
| 18 |
+
color: var(--text-primary);
|
| 19 |
+
}
|
| 20 |
+
|
| 21 |
+
.language-selector-stt {
|
| 22 |
+
display: flex;
|
| 23 |
+
flex-direction: column;
|
| 24 |
+
align-items: center;
|
| 25 |
+
gap: 0.5rem;
|
| 26 |
+
background: rgba(255, 255, 255, 0.03);
|
| 27 |
+
padding: 1rem;
|
| 28 |
+
border-radius: 8px;
|
| 29 |
+
margin-bottom: 1rem;
|
| 30 |
+
}
|
| 31 |
+
|
| 32 |
+
.language-selector-stt label {
|
| 33 |
+
font-size: 0.875rem;
|
| 34 |
+
color: var(--text-secondary);
|
| 35 |
+
font-weight: 500;
|
| 36 |
+
}
|
| 37 |
+
|
| 38 |
+
.tiny-info {
|
| 39 |
+
font-size: 0.75rem;
|
| 40 |
+
color: var(--error-color);
|
| 41 |
+
margin-top: 0.25rem;
|
| 42 |
+
}
|
| 43 |
+
|
| 44 |
+
.subtitle {
|
| 45 |
+
color: var(--text-secondary);
|
| 46 |
+
font-size: 0.875rem;
|
| 47 |
+
}
|
| 48 |
+
|
| 49 |
+
.error-message {
|
| 50 |
+
background-color: rgba(239, 68, 68, 0.1);
|
| 51 |
+
border: 1px solid var(--error-color);
|
| 52 |
+
color: var(--error-color);
|
| 53 |
+
padding: 0.75rem 1rem;
|
| 54 |
+
border-radius: 8px;
|
| 55 |
+
font-size: 0.875rem;
|
| 56 |
+
}
|
| 57 |
+
|
| 58 |
+
/* Audio Visualizer */
|
| 59 |
+
.audio-visualizer {
|
| 60 |
+
background: rgba(255, 255, 255, 0.03);
|
| 61 |
+
border-radius: 12px;
|
| 62 |
+
padding: 1.5rem;
|
| 63 |
+
display: flex;
|
| 64 |
+
flex-direction: column;
|
| 65 |
+
gap: 1rem;
|
| 66 |
+
}
|
| 67 |
+
|
| 68 |
+
.visualizer-container {
|
| 69 |
+
display: flex;
|
| 70 |
+
align-items: center;
|
| 71 |
+
justify-content: center;
|
| 72 |
+
gap: 4px;
|
| 73 |
+
height: 80px;
|
| 74 |
+
background: rgba(0, 0, 0, 0.2);
|
| 75 |
+
border-radius: 8px;
|
| 76 |
+
padding: 1rem;
|
| 77 |
+
}
|
| 78 |
+
|
| 79 |
+
.visualizer-bar {
|
| 80 |
+
width: 6px;
|
| 81 |
+
min-height: 5px;
|
| 82 |
+
background: linear-gradient(to top, var(--accent-color), #60a5fa);
|
| 83 |
+
border-radius: 3px;
|
| 84 |
+
transition: height 0.1s ease;
|
| 85 |
+
}
|
| 86 |
+
|
| 87 |
+
.audio-level-indicator {
|
| 88 |
+
display: flex;
|
| 89 |
+
align-items: center;
|
| 90 |
+
gap: 0.75rem;
|
| 91 |
+
color: var(--text-secondary);
|
| 92 |
+
}
|
| 93 |
+
|
| 94 |
+
.level-bar {
|
| 95 |
+
flex: 1;
|
| 96 |
+
height: 8px;
|
| 97 |
+
background: rgba(255, 255, 255, 0.1);
|
| 98 |
+
border-radius: 4px;
|
| 99 |
+
overflow: hidden;
|
| 100 |
+
}
|
| 101 |
+
|
| 102 |
+
.level-fill {
|
| 103 |
+
height: 100%;
|
| 104 |
+
background: linear-gradient(to right, var(--success-color), #4ade80);
|
| 105 |
+
transition: width 0.1s ease;
|
| 106 |
+
border-radius: 4px;
|
| 107 |
+
}
|
| 108 |
+
|
| 109 |
+
/* Controls */
|
| 110 |
+
.controls {
|
| 111 |
+
display: flex;
|
| 112 |
+
gap: 0.75rem;
|
| 113 |
+
justify-content: center;
|
| 114 |
+
flex-wrap: wrap;
|
| 115 |
+
}
|
| 116 |
+
|
| 117 |
+
.btn {
|
| 118 |
+
display: flex;
|
| 119 |
+
align-items: center;
|
| 120 |
+
gap: 0.5rem;
|
| 121 |
+
padding: 0.75rem 1.5rem;
|
| 122 |
+
border: none;
|
| 123 |
+
border-radius: 8px;
|
| 124 |
+
font-size: 0.875rem;
|
| 125 |
+
font-weight: 500;
|
| 126 |
+
cursor: pointer;
|
| 127 |
+
transition: all 0.2s ease;
|
| 128 |
+
font-family: inherit;
|
| 129 |
+
}
|
| 130 |
+
|
| 131 |
+
.btn:hover {
|
| 132 |
+
transform: translateY(-2px);
|
| 133 |
+
box-shadow: 0 4px 12px rgba(0, 0, 0, 0.2);
|
| 134 |
+
}
|
| 135 |
+
|
| 136 |
+
.btn:active {
|
| 137 |
+
transform: translateY(0);
|
| 138 |
+
}
|
| 139 |
+
|
| 140 |
+
.btn-primary {
|
| 141 |
+
background: var(--accent-color);
|
| 142 |
+
color: white;
|
| 143 |
+
}
|
| 144 |
+
|
| 145 |
+
.btn-primary:hover {
|
| 146 |
+
background: #2563eb;
|
| 147 |
+
}
|
| 148 |
+
|
| 149 |
+
.btn-success {
|
| 150 |
+
background: var(--success-color);
|
| 151 |
+
color: white;
|
| 152 |
+
}
|
| 153 |
+
|
| 154 |
+
.btn-success:hover {
|
| 155 |
+
background: #16a34a;
|
| 156 |
+
}
|
| 157 |
+
|
| 158 |
+
.btn-warning {
|
| 159 |
+
background: #f59e0b;
|
| 160 |
+
color: white;
|
| 161 |
+
}
|
| 162 |
+
|
| 163 |
+
.btn-warning:hover {
|
| 164 |
+
background: #d97706;
|
| 165 |
+
}
|
| 166 |
+
|
| 167 |
+
.btn-danger {
|
| 168 |
+
background: var(--error-color);
|
| 169 |
+
color: white;
|
| 170 |
+
}
|
| 171 |
+
|
| 172 |
+
.btn-danger:hover {
|
| 173 |
+
background: #dc2626;
|
| 174 |
+
}
|
| 175 |
+
|
| 176 |
+
.btn-secondary {
|
| 177 |
+
background: rgba(255, 255, 255, 0.1);
|
| 178 |
+
color: var(--text-primary);
|
| 179 |
+
}
|
| 180 |
+
|
| 181 |
+
.btn-secondary:hover {
|
| 182 |
+
background: rgba(255, 255, 255, 0.15);
|
| 183 |
+
}
|
| 184 |
+
|
| 185 |
+
/* Transcript Box */
|
| 186 |
+
.transcript-box {
|
| 187 |
+
flex: 1;
|
| 188 |
+
background: rgba(255, 255, 255, 0.03);
|
| 189 |
+
border-radius: 12px;
|
| 190 |
+
padding: 1.5rem;
|
| 191 |
+
display: flex;
|
| 192 |
+
flex-direction: column;
|
| 193 |
+
min-height: 200px;
|
| 194 |
+
}
|
| 195 |
+
|
| 196 |
+
.transcript-header {
|
| 197 |
+
display: flex;
|
| 198 |
+
justify-content: space-between;
|
| 199 |
+
align-items: center;
|
| 200 |
+
margin-bottom: 1rem;
|
| 201 |
+
padding-bottom: 0.75rem;
|
| 202 |
+
border-bottom: 1px solid rgba(255, 255, 255, 0.1);
|
| 203 |
+
}
|
| 204 |
+
|
| 205 |
+
.transcript-header h3 {
|
| 206 |
+
font-size: 1.125rem;
|
| 207 |
+
font-weight: 600;
|
| 208 |
+
color: var(--text-primary);
|
| 209 |
+
}
|
| 210 |
+
|
| 211 |
+
.recording-indicator {
|
| 212 |
+
display: flex;
|
| 213 |
+
align-items: center;
|
| 214 |
+
gap: 0.5rem;
|
| 215 |
+
font-size: 0.875rem;
|
| 216 |
+
color: var(--error-color);
|
| 217 |
+
font-weight: 500;
|
| 218 |
+
}
|
| 219 |
+
|
| 220 |
+
.pulse-dot {
|
| 221 |
+
width: 8px;
|
| 222 |
+
height: 8px;
|
| 223 |
+
background: var(--error-color);
|
| 224 |
+
border-radius: 50%;
|
| 225 |
+
animation: pulse 1.5s ease-in-out infinite;
|
| 226 |
+
}
|
| 227 |
+
|
| 228 |
+
@keyframes pulse {
|
| 229 |
+
|
| 230 |
+
0%,
|
| 231 |
+
100% {
|
| 232 |
+
opacity: 1;
|
| 233 |
+
transform: scale(1);
|
| 234 |
+
}
|
| 235 |
+
|
| 236 |
+
50% {
|
| 237 |
+
opacity: 0.5;
|
| 238 |
+
transform: scale(1.2);
|
| 239 |
+
}
|
| 240 |
+
}
|
| 241 |
+
|
| 242 |
+
.transcript-content {
|
| 243 |
+
flex: 1;
|
| 244 |
+
overflow-y: auto;
|
| 245 |
+
line-height: 1.6;
|
| 246 |
+
}
|
| 247 |
+
|
| 248 |
+
.placeholder {
|
| 249 |
+
color: var(--text-secondary);
|
| 250 |
+
font-style: italic;
|
| 251 |
+
text-align: center;
|
| 252 |
+
margin-top: 2rem;
|
| 253 |
+
}
|
| 254 |
+
|
| 255 |
+
.final-transcript {
|
| 256 |
+
color: var(--text-primary);
|
| 257 |
+
margin-bottom: 0.5rem;
|
| 258 |
+
}
|
| 259 |
+
|
| 260 |
+
.interim-transcript {
|
| 261 |
+
color: var(--text-secondary);
|
| 262 |
+
font-style: italic;
|
| 263 |
+
}
|
| 264 |
+
|
| 265 |
+
/* Info Box */
|
| 266 |
+
.info-box {
|
| 267 |
+
background: rgba(59, 130, 246, 0.1);
|
| 268 |
+
border: 1px solid rgba(59, 130, 246, 0.3);
|
| 269 |
+
border-radius: 8px;
|
| 270 |
+
padding: 1rem;
|
| 271 |
+
}
|
| 272 |
+
|
| 273 |
+
.info-box h4 {
|
| 274 |
+
color: var(--accent-color);
|
| 275 |
+
font-size: 0.875rem;
|
| 276 |
+
font-weight: 600;
|
| 277 |
+
margin-bottom: 0.5rem;
|
| 278 |
+
}
|
| 279 |
+
|
| 280 |
+
.info-box ul {
|
| 281 |
+
list-style: none;
|
| 282 |
+
padding: 0;
|
| 283 |
+
margin: 0;
|
| 284 |
+
}
|
| 285 |
+
|
| 286 |
+
.info-box li {
|
| 287 |
+
color: var(--text-secondary);
|
| 288 |
+
font-size: 0.8125rem;
|
| 289 |
+
padding: 0.25rem 0;
|
| 290 |
+
padding-left: 1.25rem;
|
| 291 |
+
position: relative;
|
| 292 |
+
}
|
| 293 |
+
|
| 294 |
+
.info-box li::before {
|
| 295 |
+
content: "•";
|
| 296 |
+
position: absolute;
|
| 297 |
+
left: 0.5rem;
|
| 298 |
+
color: var(--accent-color);
|
| 299 |
+
}
|
| 300 |
+
|
| 301 |
+
/* Responsive */
|
| 302 |
+
@media (max-width: 640px) {
|
| 303 |
+
.controls {
|
| 304 |
+
flex-direction: column;
|
| 305 |
+
}
|
| 306 |
+
|
| 307 |
+
.btn {
|
| 308 |
+
width: 100%;
|
| 309 |
+
justify-content: center;
|
| 310 |
+
}
|
| 311 |
+
|
| 312 |
+
.visualizer-container {
|
| 313 |
+
height: 60px;
|
| 314 |
+
}
|
| 315 |
+
}
|
web_demo/src/components/MicrophoneTest.jsx
ADDED
|
@@ -0,0 +1,307 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import React, { useState, useRef, useEffect } from 'react';
|
| 2 |
+
import { Mic, MicOff, Play, Pause, Trash2, Volume2 } from 'lucide-react';
|
| 3 |
+
import './MicrophoneTest.css';
|
| 4 |
+
|
| 5 |
+
function MicrophoneTest() {
|
| 6 |
+
const [isRecording, setIsRecording] = useState(false);
|
| 7 |
+
const [isPaused, setIsPaused] = useState(false);
|
| 8 |
+
const [transcript, setTranscript] = useState('');
|
| 9 |
+
const [interimTranscript, setInterimTranscript] = useState('');
|
| 10 |
+
const [audioLevel, setAudioLevel] = useState(0);
|
| 11 |
+
const [error, setError] = useState('');
|
| 12 |
+
const [selectedLang, setSelectedLang] = useState('en-IN'); // Default to Indian English
|
| 13 |
+
|
| 14 |
+
const recognitionRef = useRef(null);
|
| 15 |
+
const audioContextRef = useRef(null);
|
| 16 |
+
const analyserRef = useRef(null);
|
| 17 |
+
const microphoneRef = useRef(null);
|
| 18 |
+
const animationFrameRef = useRef(null);
|
| 19 |
+
|
| 20 |
+
const languages = [
|
| 21 |
+
{ code: 'en-IN', name: 'English (India)' },
|
| 22 |
+
{ code: 'en-US', name: 'English (US)' },
|
| 23 |
+
{ code: 'en-GB', name: 'English (UK)' },
|
| 24 |
+
{ code: 'hi-IN', name: 'Hindi (India)' },
|
| 25 |
+
];
|
| 26 |
+
|
| 27 |
+
// Initialize Web Speech API
|
| 28 |
+
useEffect(() => {
|
| 29 |
+
if ('webkitSpeechRecognition' in window || 'SpeechRecognition' in window) {
|
| 30 |
+
const SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition;
|
| 31 |
+
recognitionRef.current = new SpeechRecognition();
|
| 32 |
+
recognitionRef.current.continuous = true;
|
| 33 |
+
recognitionRef.current.interimResults = true;
|
| 34 |
+
recognitionRef.current.lang = selectedLang;
|
| 35 |
+
|
| 36 |
+
recognitionRef.current.onresult = (event) => {
|
| 37 |
+
let interim = '';
|
| 38 |
+
let final = '';
|
| 39 |
+
|
| 40 |
+
for (let i = event.resultIndex; i < event.results.length; i++) {
|
| 41 |
+
const transcriptPiece = event.results[i][0].transcript;
|
| 42 |
+
if (event.results[i].isFinal) {
|
| 43 |
+
final += transcriptPiece + ' ';
|
| 44 |
+
} else {
|
| 45 |
+
interim += transcriptPiece;
|
| 46 |
+
}
|
| 47 |
+
}
|
| 48 |
+
|
| 49 |
+
if (final) {
|
| 50 |
+
setTranscript(prev => prev + final);
|
| 51 |
+
setInterimTranscript('');
|
| 52 |
+
} else {
|
| 53 |
+
setInterimTranscript(interim);
|
| 54 |
+
}
|
| 55 |
+
};
|
| 56 |
+
|
| 57 |
+
recognitionRef.current.onerror = (event) => {
|
| 58 |
+
console.error('Speech recognition error:', event.error);
|
| 59 |
+
if (event.error === 'no-speech') {
|
| 60 |
+
// Ignore no-speech errors to prevent UI flicker
|
| 61 |
+
return;
|
| 62 |
+
}
|
| 63 |
+
setError(`Error: ${event.error}`);
|
| 64 |
+
};
|
| 65 |
+
|
| 66 |
+
recognitionRef.current.onend = () => {
|
| 67 |
+
if (isRecording && !isPaused) {
|
| 68 |
+
try {
|
| 69 |
+
recognitionRef.current.start();
|
| 70 |
+
} catch (e) {
|
| 71 |
+
console.error("Failed to restart recognition:", e);
|
| 72 |
+
}
|
| 73 |
+
}
|
| 74 |
+
};
|
| 75 |
+
} else {
|
| 76 |
+
setError('Speech recognition is not supported in this browser. Please use Chrome or Edge.');
|
| 77 |
+
}
|
| 78 |
+
|
| 79 |
+
return () => {
|
| 80 |
+
if (recognitionRef.current) {
|
| 81 |
+
recognitionRef.current.stop();
|
| 82 |
+
}
|
| 83 |
+
stopAudioVisualization();
|
| 84 |
+
};
|
| 85 |
+
}, [selectedLang]); // Re-initialize when language changes
|
| 86 |
+
|
| 87 |
+
// Audio visualization
|
| 88 |
+
const startAudioVisualization = async () => {
|
| 89 |
+
try {
|
| 90 |
+
const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
|
| 91 |
+
audioContextRef.current = new (window.AudioContext || window.webkitAudioContext)();
|
| 92 |
+
analyserRef.current = audioContextRef.current.createAnalyser();
|
| 93 |
+
microphoneRef.current = audioContextRef.current.createMediaStreamSource(stream);
|
| 94 |
+
|
| 95 |
+
analyserRef.current.fftSize = 256;
|
| 96 |
+
microphoneRef.current.connect(analyserRef.current);
|
| 97 |
+
|
| 98 |
+
const dataArray = new Uint8Array(analyserRef.current.frequencyBinCount);
|
| 99 |
+
|
| 100 |
+
const updateLevel = () => {
|
| 101 |
+
if (!analyserRef.current) return;
|
| 102 |
+
analyserRef.current.getByteFrequencyData(dataArray);
|
| 103 |
+
const average = dataArray.reduce((a, b) => a + b) / dataArray.length;
|
| 104 |
+
setAudioLevel(average);
|
| 105 |
+
animationFrameRef.current = requestAnimationFrame(updateLevel);
|
| 106 |
+
};
|
| 107 |
+
|
| 108 |
+
updateLevel();
|
| 109 |
+
} catch (err) {
|
| 110 |
+
console.error('Error accessing microphone:', err);
|
| 111 |
+
setError('Could not access microphone. Please check permissions.');
|
| 112 |
+
}
|
| 113 |
+
};
|
| 114 |
+
|
| 115 |
+
const stopAudioVisualization = () => {
|
| 116 |
+
if (animationFrameRef.current) {
|
| 117 |
+
cancelAnimationFrame(animationFrameRef.current);
|
| 118 |
+
}
|
| 119 |
+
if (microphoneRef.current && microphoneRef.current.mediaStream) {
|
| 120 |
+
microphoneRef.current.mediaStream.getTracks().forEach(track => track.stop());
|
| 121 |
+
}
|
| 122 |
+
if (audioContextRef.current) {
|
| 123 |
+
audioContextRef.current.close();
|
| 124 |
+
audioContextRef.current = null;
|
| 125 |
+
analyserRef.current = null;
|
| 126 |
+
}
|
| 127 |
+
};
|
| 128 |
+
|
| 129 |
+
const startRecording = () => {
|
| 130 |
+
setError('');
|
| 131 |
+
setIsRecording(true);
|
| 132 |
+
setIsPaused(false);
|
| 133 |
+
|
| 134 |
+
if (recognitionRef.current) {
|
| 135 |
+
try {
|
| 136 |
+
recognitionRef.current.start();
|
| 137 |
+
} catch (e) {
|
| 138 |
+
console.error("Recognition already started:", e);
|
| 139 |
+
}
|
| 140 |
+
}
|
| 141 |
+
startAudioVisualization();
|
| 142 |
+
};
|
| 143 |
+
|
| 144 |
+
const pauseRecording = () => {
|
| 145 |
+
setIsPaused(true);
|
| 146 |
+
if (recognitionRef.current) {
|
| 147 |
+
recognitionRef.current.stop();
|
| 148 |
+
}
|
| 149 |
+
};
|
| 150 |
+
|
| 151 |
+
const resumeRecording = () => {
|
| 152 |
+
setIsPaused(false);
|
| 153 |
+
if (recognitionRef.current) {
|
| 154 |
+
try {
|
| 155 |
+
recognitionRef.current.start();
|
| 156 |
+
} catch (e) {
|
| 157 |
+
console.error("Failed to resume recognition:", e);
|
| 158 |
+
}
|
| 159 |
+
}
|
| 160 |
+
};
|
| 161 |
+
|
| 162 |
+
const stopRecording = () => {
|
| 163 |
+
setIsRecording(false);
|
| 164 |
+
setIsPaused(false);
|
| 165 |
+
|
| 166 |
+
if (recognitionRef.current) {
|
| 167 |
+
recognitionRef.current.stop();
|
| 168 |
+
}
|
| 169 |
+
stopAudioVisualization();
|
| 170 |
+
setAudioLevel(0);
|
| 171 |
+
};
|
| 172 |
+
|
| 173 |
+
const clearTranscript = () => {
|
| 174 |
+
setTranscript('');
|
| 175 |
+
setInterimTranscript('');
|
| 176 |
+
setError('');
|
| 177 |
+
};
|
| 178 |
+
|
| 179 |
+
return (
|
| 180 |
+
<div className="microphone-test">
|
| 181 |
+
<div className="mic-header">
|
| 182 |
+
<h2>Speech-to-Text Test</h2>
|
| 183 |
+
<p className="subtitle">Test your microphone with Indian English support</p>
|
| 184 |
+
</div>
|
| 185 |
+
|
| 186 |
+
<div className="language-selector-stt">
|
| 187 |
+
<label htmlFor="stt-lang">Recognition Language: </label>
|
| 188 |
+
<select
|
| 189 |
+
id="stt-lang"
|
| 190 |
+
value={selectedLang}
|
| 191 |
+
onChange={(e) => {
|
| 192 |
+
const wasRecording = isRecording;
|
| 193 |
+
if (wasRecording) stopRecording();
|
| 194 |
+
setSelectedLang(e.target.value);
|
| 195 |
+
}}
|
| 196 |
+
disabled={isRecording}
|
| 197 |
+
className="voice-select"
|
| 198 |
+
>
|
| 199 |
+
{languages.map(lang => (
|
| 200 |
+
<option key={lang.code} value={lang.code}>{lang.name}</option>
|
| 201 |
+
))}
|
| 202 |
+
</select>
|
| 203 |
+
{isRecording && <p className="tiny-info">Stop recording to change language</p>}
|
| 204 |
+
</div>
|
| 205 |
+
|
| 206 |
+
{error && (
|
| 207 |
+
<div className="error-message">
|
| 208 |
+
<span>⚠️ {error}</span>
|
| 209 |
+
</div>
|
| 210 |
+
)}
|
| 211 |
+
|
| 212 |
+
<div className="audio-visualizer">
|
| 213 |
+
<div className="visualizer-container">
|
| 214 |
+
{[...Array(20)].map((_, i) => (
|
| 215 |
+
<div
|
| 216 |
+
key={i}
|
| 217 |
+
className="visualizer-bar"
|
| 218 |
+
style={{
|
| 219 |
+
height: `${isRecording && !isPaused ? Math.random() * audioLevel * 2 : 5}px`,
|
| 220 |
+
animationDelay: `${i * 0.05}s`
|
| 221 |
+
}}
|
| 222 |
+
/>
|
| 223 |
+
))}
|
| 224 |
+
</div>
|
| 225 |
+
<div className="audio-level-indicator">
|
| 226 |
+
<Volume2 size={20} />
|
| 227 |
+
<div className="level-bar">
|
| 228 |
+
<div
|
| 229 |
+
className="level-fill"
|
| 230 |
+
style={{ width: `${Math.min(audioLevel, 100)}%` }}
|
| 231 |
+
/>
|
| 232 |
+
</div>
|
| 233 |
+
<span>{Math.round(audioLevel)}%</span>
|
| 234 |
+
</div>
|
| 235 |
+
</div>
|
| 236 |
+
|
| 237 |
+
<div className="controls">
|
| 238 |
+
{!isRecording ? (
|
| 239 |
+
<button className="btn btn-primary" onClick={startRecording}>
|
| 240 |
+
<Mic size={20} />
|
| 241 |
+
<span>Start Recording</span>
|
| 242 |
+
</button>
|
| 243 |
+
) : (
|
| 244 |
+
<>
|
| 245 |
+
{!isPaused ? (
|
| 246 |
+
<button className="btn btn-warning" onClick={pauseRecording}>
|
| 247 |
+
<Pause size={20} />
|
| 248 |
+
<span>Pause</span>
|
| 249 |
+
</button>
|
| 250 |
+
) : (
|
| 251 |
+
<button className="btn btn-success" onClick={resumeRecording}>
|
| 252 |
+
<Play size={20} />
|
| 253 |
+
<span>Resume</span>
|
| 254 |
+
</button>
|
| 255 |
+
)}
|
| 256 |
+
<button className="btn btn-danger" onClick={stopRecording}>
|
| 257 |
+
<MicOff size={20} />
|
| 258 |
+
<span>Stop</span>
|
| 259 |
+
</button>
|
| 260 |
+
</>
|
| 261 |
+
)}
|
| 262 |
+
{transcript && (
|
| 263 |
+
<button className="btn btn-secondary" onClick={clearTranscript}>
|
| 264 |
+
<Trash2 size={20} />
|
| 265 |
+
<span>Clear</span>
|
| 266 |
+
</button>
|
| 267 |
+
)}
|
| 268 |
+
</div>
|
| 269 |
+
|
| 270 |
+
<div className="transcript-box">
|
| 271 |
+
<div className="transcript-header">
|
| 272 |
+
<h3>Transcript ({languages.find(l => l.code === selectedLang)?.name})</h3>
|
| 273 |
+
{isRecording && (
|
| 274 |
+
<span className="recording-indicator">
|
| 275 |
+
<span className="pulse-dot"></span>
|
| 276 |
+
{isPaused ? 'Paused' : 'Recording...'}
|
| 277 |
+
</span>
|
| 278 |
+
)}
|
| 279 |
+
</div>
|
| 280 |
+
<div className="transcript-content">
|
| 281 |
+
{!transcript && !interimTranscript ? (
|
| 282 |
+
<p className="placeholder">Your transcription will appear here...</p>
|
| 283 |
+
) : (
|
| 284 |
+
<>
|
| 285 |
+
<p className="final-transcript">{transcript}</p>
|
| 286 |
+
{interimTranscript && (
|
| 287 |
+
<p className="interim-transcript">{interimTranscript}</p>
|
| 288 |
+
)}
|
| 289 |
+
</>
|
| 290 |
+
)}
|
| 291 |
+
</div>
|
| 292 |
+
</div>
|
| 293 |
+
|
| 294 |
+
<div className="info-box">
|
| 295 |
+
<h4>💡 Tips:</h4>
|
| 296 |
+
<ul>
|
| 297 |
+
<li>Selecting <b>English (India)</b> will significantly improve recognition for Indian accents.</li>
|
| 298 |
+
<li>You can even try <b>Hindi</b> if you want to test multilingual support!</li>
|
| 299 |
+
<li>Make sure your browser has microphone permissions enabled</li>
|
| 300 |
+
<li>Works best in Chrome, Edge, or Safari</li>
|
| 301 |
+
</ul>
|
| 302 |
+
</div>
|
| 303 |
+
</div>
|
| 304 |
+
);
|
| 305 |
+
}
|
| 306 |
+
|
| 307 |
+
export default MicrophoneTest;
|
web_demo/src/components/SttLlmTts.css
ADDED
|
@@ -0,0 +1,653 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
.stt-llm-tts-test {
|
| 2 |
+
display: flex;
|
| 3 |
+
flex-direction: column;
|
| 4 |
+
gap: 2rem;
|
| 5 |
+
padding: 1rem 0;
|
| 6 |
+
height: calc(100vh - 180px);
|
| 7 |
+
/* Fill the vertical space */
|
| 8 |
+
width: 100%;
|
| 9 |
+
margin: 0;
|
| 10 |
+
}
|
| 11 |
+
|
| 12 |
+
.test-header {
|
| 13 |
+
display: flex;
|
| 14 |
+
justify-content: space-between;
|
| 15 |
+
align-items: center;
|
| 16 |
+
background: var(--card-bg);
|
| 17 |
+
padding: 1.25rem;
|
| 18 |
+
border-radius: 12px;
|
| 19 |
+
box-shadow: 0 4px 6px -1px rgba(0, 0, 0, 0.1);
|
| 20 |
+
}
|
| 21 |
+
|
| 22 |
+
.title-group h2 {
|
| 23 |
+
font-size: 1.25rem;
|
| 24 |
+
font-weight: 600;
|
| 25 |
+
margin-bottom: 0.25rem;
|
| 26 |
+
}
|
| 27 |
+
|
| 28 |
+
.title-group .subtitle {
|
| 29 |
+
font-size: 0.875rem;
|
| 30 |
+
color: var(--text-secondary);
|
| 31 |
+
}
|
| 32 |
+
|
| 33 |
+
.action-buttons {
|
| 34 |
+
display: flex;
|
| 35 |
+
gap: 0.75rem;
|
| 36 |
+
align-items: center;
|
| 37 |
+
}
|
| 38 |
+
|
| 39 |
+
.settings-btn {
|
| 40 |
+
display: flex;
|
| 41 |
+
align-items: center;
|
| 42 |
+
justify-content: center;
|
| 43 |
+
width: 42px;
|
| 44 |
+
height: 42px;
|
| 45 |
+
border: 1px solid rgba(255, 255, 255, 0.1);
|
| 46 |
+
background: rgba(255, 255, 255, 0.05);
|
| 47 |
+
color: var(--text-secondary);
|
| 48 |
+
border-radius: 8px;
|
| 49 |
+
cursor: pointer;
|
| 50 |
+
transition: all 0.2s ease;
|
| 51 |
+
}
|
| 52 |
+
|
| 53 |
+
.settings-btn:hover {
|
| 54 |
+
background: rgba(59, 130, 246, 0.1);
|
| 55 |
+
color: var(--accent-color);
|
| 56 |
+
border-color: var(--accent-color);
|
| 57 |
+
}
|
| 58 |
+
|
| 59 |
+
.record-toggle {
|
| 60 |
+
display: flex;
|
| 61 |
+
align-items: center;
|
| 62 |
+
gap: 0.75rem;
|
| 63 |
+
padding: 0.75rem 1.5rem;
|
| 64 |
+
border: none;
|
| 65 |
+
border-radius: 8px;
|
| 66 |
+
background: var(--accent-color);
|
| 67 |
+
color: white;
|
| 68 |
+
font-weight: 600;
|
| 69 |
+
cursor: pointer;
|
| 70 |
+
transition: all 0.2s ease;
|
| 71 |
+
}
|
| 72 |
+
|
| 73 |
+
.record-toggle.recording {
|
| 74 |
+
background: var(--error-color);
|
| 75 |
+
animation: pulse-red 2s infinite;
|
| 76 |
+
}
|
| 77 |
+
|
| 78 |
+
.reset-session {
|
| 79 |
+
display: flex;
|
| 80 |
+
align-items: center;
|
| 81 |
+
justify-content: center;
|
| 82 |
+
width: 42px;
|
| 83 |
+
height: 42px;
|
| 84 |
+
border: 1px solid rgba(255, 255, 255, 0.1);
|
| 85 |
+
background: rgba(255, 255, 255, 0.05);
|
| 86 |
+
color: var(--text-secondary);
|
| 87 |
+
border-radius: 8px;
|
| 88 |
+
cursor: pointer;
|
| 89 |
+
transition: all 0.2s ease;
|
| 90 |
+
}
|
| 91 |
+
|
| 92 |
+
.reset-session:hover {
|
| 93 |
+
background: rgba(255, 255, 255, 0.1);
|
| 94 |
+
color: var(--text-primary);
|
| 95 |
+
}
|
| 96 |
+
|
| 97 |
+
/* Pipeline Columns */
|
| 98 |
+
.pipeline-columns {
|
| 99 |
+
display: grid;
|
| 100 |
+
grid-template-columns: repeat(3, 1fr);
|
| 101 |
+
gap: 1.5rem;
|
| 102 |
+
flex: 2;
|
| 103 |
+
/* Take more relative space */
|
| 104 |
+
min-height: 450px;
|
| 105 |
+
/* Force substantial height */
|
| 106 |
+
}
|
| 107 |
+
|
| 108 |
+
.pipeline-col {
|
| 109 |
+
background: var(--card-bg);
|
| 110 |
+
border-radius: 12px;
|
| 111 |
+
display: flex;
|
| 112 |
+
flex-direction: column;
|
| 113 |
+
overflow: hidden;
|
| 114 |
+
border: 1px solid rgba(255, 255, 255, 0.05);
|
| 115 |
+
box-shadow: 0 4px 6px -1px rgba(0, 0, 0, 0.1);
|
| 116 |
+
}
|
| 117 |
+
|
| 118 |
+
.col-header {
|
| 119 |
+
display: flex;
|
| 120 |
+
justify-content: space-between;
|
| 121 |
+
align-items: center;
|
| 122 |
+
gap: 0.75rem;
|
| 123 |
+
padding: 1rem 1.25rem;
|
| 124 |
+
background: rgba(255, 255, 255, 0.03);
|
| 125 |
+
border-bottom: 1px solid rgba(255, 255, 255, 0.05);
|
| 126 |
+
}
|
| 127 |
+
|
| 128 |
+
.title-with-model {
|
| 129 |
+
display: flex;
|
| 130 |
+
align-items: center;
|
| 131 |
+
gap: 0.75rem;
|
| 132 |
+
}
|
| 133 |
+
|
| 134 |
+
.model-tag {
|
| 135 |
+
font-size: 0.625rem;
|
| 136 |
+
font-weight: 700;
|
| 137 |
+
text-transform: uppercase;
|
| 138 |
+
padding: 0.25rem 0.625rem;
|
| 139 |
+
background: rgba(255, 255, 255, 0.06);
|
| 140 |
+
border: 1px solid rgba(255, 255, 255, 0.1);
|
| 141 |
+
border-radius: 999px;
|
| 142 |
+
color: var(--text-secondary);
|
| 143 |
+
letter-spacing: 0.05em;
|
| 144 |
+
}
|
| 145 |
+
|
| 146 |
+
.col-header.stt .model-tag {
|
| 147 |
+
color: #60a5fa;
|
| 148 |
+
border-color: rgba(96, 165, 250, 0.3);
|
| 149 |
+
background: rgba(96, 165, 250, 0.1);
|
| 150 |
+
}
|
| 151 |
+
|
| 152 |
+
.col-header.llm .model-tag {
|
| 153 |
+
color: #a78bfa;
|
| 154 |
+
border-color: rgba(167, 139, 250, 0.3);
|
| 155 |
+
background: rgba(167, 139, 250, 0.1);
|
| 156 |
+
}
|
| 157 |
+
|
| 158 |
+
.col-header.tts .model-tag {
|
| 159 |
+
color: #4ade80;
|
| 160 |
+
border-color: rgba(74, 222, 128, 0.3);
|
| 161 |
+
background: rgba(74, 222, 128, 0.1);
|
| 162 |
+
}
|
| 163 |
+
|
| 164 |
+
.col-header h3 {
|
| 165 |
+
font-size: 1rem;
|
| 166 |
+
font-weight: 700;
|
| 167 |
+
letter-spacing: 0.025em;
|
| 168 |
+
text-transform: uppercase;
|
| 169 |
+
}
|
| 170 |
+
|
| 171 |
+
.col-header.stt {
|
| 172 |
+
color: #60a5fa;
|
| 173 |
+
}
|
| 174 |
+
|
| 175 |
+
.col-header.llm {
|
| 176 |
+
color: #a78bfa;
|
| 177 |
+
}
|
| 178 |
+
|
| 179 |
+
.col-header.tts {
|
| 180 |
+
color: #4ade80;
|
| 181 |
+
}
|
| 182 |
+
|
| 183 |
+
.col-content {
|
| 184 |
+
flex: 1;
|
| 185 |
+
padding: 1.25rem;
|
| 186 |
+
display: flex;
|
| 187 |
+
flex-direction: column;
|
| 188 |
+
overflow-y: auto;
|
| 189 |
+
position: relative;
|
| 190 |
+
}
|
| 191 |
+
|
| 192 |
+
.text-display {
|
| 193 |
+
flex: 1;
|
| 194 |
+
padding: 1.5rem;
|
| 195 |
+
background: rgba(15, 23, 42, 0.4);
|
| 196 |
+
border: 1px solid rgba(255, 255, 255, 0.05);
|
| 197 |
+
border-radius: 12px;
|
| 198 |
+
font-size: 1rem;
|
| 199 |
+
line-height: 1.7;
|
| 200 |
+
backdrop-filter: blur(4px);
|
| 201 |
+
transition: all 0.3s ease;
|
| 202 |
+
}
|
| 203 |
+
|
| 204 |
+
.text-display:hover {
|
| 205 |
+
background: rgba(15, 23, 42, 0.6);
|
| 206 |
+
border-color: rgba(255, 255, 255, 0.1);
|
| 207 |
+
}
|
| 208 |
+
|
| 209 |
+
.empty-msg {
|
| 210 |
+
color: var(--text-secondary);
|
| 211 |
+
font-style: italic;
|
| 212 |
+
font-size: 0.875rem;
|
| 213 |
+
text-align: center;
|
| 214 |
+
margin-top: 2rem;
|
| 215 |
+
}
|
| 216 |
+
|
| 217 |
+
/* STT Column */
|
| 218 |
+
.final-text {
|
| 219 |
+
color: var(--text-primary);
|
| 220 |
+
}
|
| 221 |
+
|
| 222 |
+
.interim-text {
|
| 223 |
+
color: var(--text-secondary);
|
| 224 |
+
font-style: italic;
|
| 225 |
+
}
|
| 226 |
+
|
| 227 |
+
.recording-pulse {
|
| 228 |
+
margin-top: 1rem;
|
| 229 |
+
font-size: 0.75rem;
|
| 230 |
+
color: var(--error-color);
|
| 231 |
+
display: flex;
|
| 232 |
+
align-items: center;
|
| 233 |
+
gap: 0.5rem;
|
| 234 |
+
}
|
| 235 |
+
|
| 236 |
+
.recording-pulse::before {
|
| 237 |
+
content: '';
|
| 238 |
+
width: 8px;
|
| 239 |
+
height: 8px;
|
| 240 |
+
background: var(--error-color);
|
| 241 |
+
border-radius: 50%;
|
| 242 |
+
animation: pulse-red 1s infinite;
|
| 243 |
+
}
|
| 244 |
+
|
| 245 |
+
.mic-muted-status {
|
| 246 |
+
margin-top: 1rem;
|
| 247 |
+
font-size: 0.75rem;
|
| 248 |
+
color: var(--accent-color);
|
| 249 |
+
display: flex;
|
| 250 |
+
align-items: center;
|
| 251 |
+
gap: 0.5rem;
|
| 252 |
+
padding: 0.5rem;
|
| 253 |
+
background: rgba(59, 130, 246, 0.1);
|
| 254 |
+
border-radius: 6px;
|
| 255 |
+
animation: fade-in 0.3s ease-out;
|
| 256 |
+
}
|
| 257 |
+
|
| 258 |
+
/* LLM Column */
|
| 259 |
+
.loading-state {
|
| 260 |
+
display: flex;
|
| 261 |
+
flex-direction: column;
|
| 262 |
+
align-items: center;
|
| 263 |
+
justify-content: center;
|
| 264 |
+
height: 100%;
|
| 265 |
+
gap: 1rem;
|
| 266 |
+
color: #a78bfa;
|
| 267 |
+
}
|
| 268 |
+
|
| 269 |
+
.spinner {
|
| 270 |
+
animation: rotate 2s linear infinite;
|
| 271 |
+
}
|
| 272 |
+
|
| 273 |
+
.response-box {
|
| 274 |
+
animation: fade-in 0.3s ease-out;
|
| 275 |
+
}
|
| 276 |
+
|
| 277 |
+
.response-text {
|
| 278 |
+
color: #e9d5ff;
|
| 279 |
+
}
|
| 280 |
+
|
| 281 |
+
/* TTS Column */
|
| 282 |
+
.tts-status {
|
| 283 |
+
flex: 1;
|
| 284 |
+
display: flex;
|
| 285 |
+
flex-direction: column;
|
| 286 |
+
align-items: center;
|
| 287 |
+
justify-content: center;
|
| 288 |
+
gap: 1.5rem;
|
| 289 |
+
}
|
| 290 |
+
|
| 291 |
+
.status-indicator {
|
| 292 |
+
display: flex;
|
| 293 |
+
flex-direction: column;
|
| 294 |
+
align-items: center;
|
| 295 |
+
gap: 1rem;
|
| 296 |
+
color: var(--text-secondary);
|
| 297 |
+
transition: all 0.3s ease;
|
| 298 |
+
}
|
| 299 |
+
|
| 300 |
+
.status-indicator.playing {
|
| 301 |
+
color: #4ade80;
|
| 302 |
+
}
|
| 303 |
+
|
| 304 |
+
.bouncing {
|
| 305 |
+
animation: bounce 1s infinite ease-in-out;
|
| 306 |
+
}
|
| 307 |
+
|
| 308 |
+
.replay-btn {
|
| 309 |
+
display: flex;
|
| 310 |
+
align-items: center;
|
| 311 |
+
gap: 0.5rem;
|
| 312 |
+
padding: 0.5rem 1rem;
|
| 313 |
+
background: rgba(255, 255, 255, 0.1);
|
| 314 |
+
border: 1px solid rgba(255, 255, 255, 0.1);
|
| 315 |
+
border-radius: 6px;
|
| 316 |
+
color: var(--text-primary);
|
| 317 |
+
font-size: 0.8125rem;
|
| 318 |
+
cursor: pointer;
|
| 319 |
+
}
|
| 320 |
+
|
| 321 |
+
.voice-selection-compact {
|
| 322 |
+
display: flex;
|
| 323 |
+
flex-direction: column;
|
| 324 |
+
gap: 0.5rem;
|
| 325 |
+
margin-bottom: 1.5rem;
|
| 326 |
+
padding: 1rem;
|
| 327 |
+
background: rgba(255, 255, 255, 0.03);
|
| 328 |
+
border: 1px solid rgba(255, 255, 255, 0.05);
|
| 329 |
+
border-radius: 10px;
|
| 330 |
+
}
|
| 331 |
+
|
| 332 |
+
.voice-selection-compact label {
|
| 333 |
+
font-size: 0.75rem;
|
| 334 |
+
text-transform: uppercase;
|
| 335 |
+
font-weight: 700;
|
| 336 |
+
color: var(--text-secondary);
|
| 337 |
+
letter-spacing: 0.05em;
|
| 338 |
+
}
|
| 339 |
+
|
| 340 |
+
.voice-selection-compact select {
|
| 341 |
+
background: rgba(15, 23, 42, 0.6);
|
| 342 |
+
border: 1px solid rgba(255, 255, 255, 0.1);
|
| 343 |
+
color: white;
|
| 344 |
+
padding: 0.625rem;
|
| 345 |
+
border-radius: 6px;
|
| 346 |
+
font-size: 0.875rem;
|
| 347 |
+
outline: none;
|
| 348 |
+
cursor: pointer;
|
| 349 |
+
transition: all 0.2s ease;
|
| 350 |
+
}
|
| 351 |
+
|
| 352 |
+
.voice-selection-compact select:hover {
|
| 353 |
+
background: rgba(15, 23, 42, 0.8);
|
| 354 |
+
border-color: var(--accent-color);
|
| 355 |
+
}
|
| 356 |
+
|
| 357 |
+
.auto-toggle {
|
| 358 |
+
display: flex;
|
| 359 |
+
align-items: center;
|
| 360 |
+
gap: 0.75rem;
|
| 361 |
+
padding-top: 1.25rem;
|
| 362 |
+
border-top: 1px solid rgba(255, 255, 255, 0.05);
|
| 363 |
+
font-size: 0.8125rem;
|
| 364 |
+
color: var(--text-secondary);
|
| 365 |
+
}
|
| 366 |
+
|
| 367 |
+
/* History Tray */
|
| 368 |
+
.history-tray {
|
| 369 |
+
flex: 1;
|
| 370 |
+
/* Limit growth compared to pipeline */
|
| 371 |
+
min-height: 150px;
|
| 372 |
+
background: rgba(30, 41, 59, 0.5);
|
| 373 |
+
backdrop-filter: blur(8px);
|
| 374 |
+
border-radius: 16px;
|
| 375 |
+
padding: 1.5rem;
|
| 376 |
+
border: 1px solid rgba(255, 255, 255, 0.08);
|
| 377 |
+
display: flex;
|
| 378 |
+
flex-direction: column;
|
| 379 |
+
}
|
| 380 |
+
|
| 381 |
+
.history-tray h4 {
|
| 382 |
+
font-size: 0.875rem;
|
| 383 |
+
font-weight: 600;
|
| 384 |
+
margin-bottom: 1rem;
|
| 385 |
+
color: var(--text-secondary);
|
| 386 |
+
}
|
| 387 |
+
|
| 388 |
+
.history-list {
|
| 389 |
+
display: flex;
|
| 390 |
+
flex-direction: column;
|
| 391 |
+
gap: 0.75rem;
|
| 392 |
+
max-height: 300px;
|
| 393 |
+
overflow-y: auto;
|
| 394 |
+
}
|
| 395 |
+
|
| 396 |
+
.history-item {
|
| 397 |
+
font-size: 0.875rem;
|
| 398 |
+
display: flex;
|
| 399 |
+
gap: 0.5rem;
|
| 400 |
+
}
|
| 401 |
+
|
| 402 |
+
.h-role {
|
| 403 |
+
font-weight: 600;
|
| 404 |
+
min-width: 40px;
|
| 405 |
+
}
|
| 406 |
+
|
| 407 |
+
.user .h-role {
|
| 408 |
+
color: #60a5fa;
|
| 409 |
+
}
|
| 410 |
+
|
| 411 |
+
.assistant .h-role {
|
| 412 |
+
color: #a78bfa;
|
| 413 |
+
}
|
| 414 |
+
|
| 415 |
+
.no-history {
|
| 416 |
+
font-size: 0.8125rem;
|
| 417 |
+
color: var(--text-secondary);
|
| 418 |
+
font-style: italic;
|
| 419 |
+
}
|
| 420 |
+
|
| 421 |
+
/* Switch Toggle */
|
| 422 |
+
.switch {
|
| 423 |
+
position: relative;
|
| 424 |
+
display: inline-block;
|
| 425 |
+
width: 34px;
|
| 426 |
+
height: 20px;
|
| 427 |
+
}
|
| 428 |
+
|
| 429 |
+
.switch input {
|
| 430 |
+
opacity: 0;
|
| 431 |
+
width: 0;
|
| 432 |
+
height: 0;
|
| 433 |
+
}
|
| 434 |
+
|
| 435 |
+
.slider {
|
| 436 |
+
position: absolute;
|
| 437 |
+
cursor: pointer;
|
| 438 |
+
top: 0;
|
| 439 |
+
left: 0;
|
| 440 |
+
right: 0;
|
| 441 |
+
bottom: 0;
|
| 442 |
+
background-color: #334155;
|
| 443 |
+
transition: .4s;
|
| 444 |
+
}
|
| 445 |
+
|
| 446 |
+
.slider:before {
|
| 447 |
+
position: absolute;
|
| 448 |
+
content: "";
|
| 449 |
+
height: 14px;
|
| 450 |
+
width: 14px;
|
| 451 |
+
left: 3px;
|
| 452 |
+
bottom: 3px;
|
| 453 |
+
background-color: white;
|
| 454 |
+
transition: .4s;
|
| 455 |
+
}
|
| 456 |
+
|
| 457 |
+
input:checked+.slider {
|
| 458 |
+
background-color: var(--success-color);
|
| 459 |
+
}
|
| 460 |
+
|
| 461 |
+
input:checked+.slider:before {
|
| 462 |
+
transform: translateX(14px);
|
| 463 |
+
}
|
| 464 |
+
|
| 465 |
+
.slider.round {
|
| 466 |
+
border-radius: 34px;
|
| 467 |
+
}
|
| 468 |
+
|
| 469 |
+
.slider.round:before {
|
| 470 |
+
border-radius: 50%;
|
| 471 |
+
}
|
| 472 |
+
|
| 473 |
+
/* Animations */
|
| 474 |
+
@keyframes pulse-red {
|
| 475 |
+
0% {
|
| 476 |
+
box-shadow: 0 0 0 0 rgba(239, 68, 68, 0.4);
|
| 477 |
+
}
|
| 478 |
+
|
| 479 |
+
70% {
|
| 480 |
+
box-shadow: 0 0 0 10px rgba(239, 68, 68, 0);
|
| 481 |
+
}
|
| 482 |
+
|
| 483 |
+
100% {
|
| 484 |
+
box-shadow: 0 0 0 0 rgba(239, 68, 68, 0);
|
| 485 |
+
}
|
| 486 |
+
}
|
| 487 |
+
|
| 488 |
+
@keyframes rotate {
|
| 489 |
+
from {
|
| 490 |
+
transform: rotate(0deg);
|
| 491 |
+
}
|
| 492 |
+
|
| 493 |
+
to {
|
| 494 |
+
transform: rotate(360deg);
|
| 495 |
+
}
|
| 496 |
+
}
|
| 497 |
+
|
| 498 |
+
@keyframes bounce {
|
| 499 |
+
|
| 500 |
+
0%,
|
| 501 |
+
100% {
|
| 502 |
+
transform: translateY(0);
|
| 503 |
+
}
|
| 504 |
+
|
| 505 |
+
50% {
|
| 506 |
+
transform: translateY(-10px);
|
| 507 |
+
}
|
| 508 |
+
}
|
| 509 |
+
|
| 510 |
+
@keyframes fade-in {
|
| 511 |
+
from {
|
| 512 |
+
opacity: 0;
|
| 513 |
+
transform: translateY(5px);
|
| 514 |
+
}
|
| 515 |
+
|
| 516 |
+
to {
|
| 517 |
+
opacity: 1;
|
| 518 |
+
transform: translateY(0);
|
| 519 |
+
}
|
| 520 |
+
}
|
| 521 |
+
|
| 522 |
+
/* Settings Overlay */
|
| 523 |
+
.pipeline-settings-overlay {
|
| 524 |
+
position: fixed;
|
| 525 |
+
top: 0;
|
| 526 |
+
left: 0;
|
| 527 |
+
right: 0;
|
| 528 |
+
bottom: 0;
|
| 529 |
+
background: rgba(15, 23, 42, 0.8);
|
| 530 |
+
backdrop-filter: blur(8px);
|
| 531 |
+
display: flex;
|
| 532 |
+
align-items: center;
|
| 533 |
+
justify-content: center;
|
| 534 |
+
z-index: 1000;
|
| 535 |
+
animation: fade-in 0.2s ease-out;
|
| 536 |
+
}
|
| 537 |
+
|
| 538 |
+
.settings-card {
|
| 539 |
+
background: var(--card-bg);
|
| 540 |
+
padding: 2.5rem;
|
| 541 |
+
border-radius: 20px;
|
| 542 |
+
width: 100%;
|
| 543 |
+
max-width: 650px;
|
| 544 |
+
border: 1px solid rgba(255, 255, 255, 0.1);
|
| 545 |
+
box-shadow: 0 25px 50px -12px rgba(0, 0, 0, 0.5);
|
| 546 |
+
}
|
| 547 |
+
|
| 548 |
+
.settings-card h3 {
|
| 549 |
+
margin-bottom: 1.5rem;
|
| 550 |
+
font-size: 1.25rem;
|
| 551 |
+
}
|
| 552 |
+
|
| 553 |
+
.setting-item {
|
| 554 |
+
display: flex;
|
| 555 |
+
flex-direction: column;
|
| 556 |
+
gap: 0.75rem;
|
| 557 |
+
margin-bottom: 2rem;
|
| 558 |
+
}
|
| 559 |
+
|
| 560 |
+
.setting-item label {
|
| 561 |
+
font-size: 0.875rem;
|
| 562 |
+
color: var(--text-secondary);
|
| 563 |
+
font-weight: 500;
|
| 564 |
+
}
|
| 565 |
+
|
| 566 |
+
.status-badge {
|
| 567 |
+
padding: 0.75rem 1rem;
|
| 568 |
+
background: rgba(0, 0, 0, 0.2);
|
| 569 |
+
border: 1px solid rgba(255, 255, 255, 0.1);
|
| 570 |
+
border-radius: 8px;
|
| 571 |
+
font-size: 0.875rem;
|
| 572 |
+
display: flex;
|
| 573 |
+
align-items: center;
|
| 574 |
+
gap: 0.5rem;
|
| 575 |
+
}
|
| 576 |
+
|
| 577 |
+
.setting-item textarea {
|
| 578 |
+
width: 100%;
|
| 579 |
+
padding: 0.75rem 1rem;
|
| 580 |
+
background: rgba(0, 0, 0, 0.3);
|
| 581 |
+
border: 1px solid rgba(255, 255, 255, 0.1);
|
| 582 |
+
border-radius: 8px;
|
| 583 |
+
color: white;
|
| 584 |
+
font-size: 0.9rem;
|
| 585 |
+
resize: none;
|
| 586 |
+
line-height: 1.5;
|
| 587 |
+
outline: none;
|
| 588 |
+
transition: border-color 0.2s;
|
| 589 |
+
}
|
| 590 |
+
|
| 591 |
+
.setting-item textarea:focus {
|
| 592 |
+
border-color: var(--accent-color);
|
| 593 |
+
}
|
| 594 |
+
|
| 595 |
+
.prompt-presets {
|
| 596 |
+
margin-top: 1.5rem;
|
| 597 |
+
margin-bottom: 1.5rem;
|
| 598 |
+
}
|
| 599 |
+
|
| 600 |
+
.prompt-presets label {
|
| 601 |
+
display: block;
|
| 602 |
+
font-size: 0.75rem;
|
| 603 |
+
text-transform: uppercase;
|
| 604 |
+
font-weight: 700;
|
| 605 |
+
color: var(--text-secondary);
|
| 606 |
+
margin-bottom: 0.75rem;
|
| 607 |
+
}
|
| 608 |
+
|
| 609 |
+
.preset-btns {
|
| 610 |
+
display: flex;
|
| 611 |
+
flex-wrap: wrap;
|
| 612 |
+
gap: 0.5rem;
|
| 613 |
+
}
|
| 614 |
+
|
| 615 |
+
.preset-btn {
|
| 616 |
+
background: rgba(255, 255, 255, 0.05);
|
| 617 |
+
border: 1px solid rgba(255, 255, 255, 0.1);
|
| 618 |
+
color: var(--text-secondary);
|
| 619 |
+
padding: 0.5rem 0.75rem;
|
| 620 |
+
border-radius: 6px;
|
| 621 |
+
font-size: 0.8rem;
|
| 622 |
+
cursor: pointer;
|
| 623 |
+
transition: all 0.2s;
|
| 624 |
+
}
|
| 625 |
+
|
| 626 |
+
.preset-btn:hover {
|
| 627 |
+
background: rgba(255, 255, 255, 0.1);
|
| 628 |
+
color: white;
|
| 629 |
+
border-color: var(--accent-color);
|
| 630 |
+
}
|
| 631 |
+
|
| 632 |
+
.hint {
|
| 633 |
+
font-size: 0.75rem;
|
| 634 |
+
color: var(--text-secondary);
|
| 635 |
+
font-style: italic;
|
| 636 |
+
}
|
| 637 |
+
|
| 638 |
+
.close-settings {
|
| 639 |
+
width: 100%;
|
| 640 |
+
padding: 1rem;
|
| 641 |
+
background: var(--accent-color);
|
| 642 |
+
border: none;
|
| 643 |
+
border-radius: 10px;
|
| 644 |
+
color: white;
|
| 645 |
+
font-weight: 600;
|
| 646 |
+
cursor: pointer;
|
| 647 |
+
transition: all 0.2s ease;
|
| 648 |
+
}
|
| 649 |
+
|
| 650 |
+
.close-settings:hover {
|
| 651 |
+
background: var(--accent-hover);
|
| 652 |
+
transform: translateY(-2px);
|
| 653 |
+
}
|
web_demo/src/components/SttLlmTts.jsx
ADDED
|
@@ -0,0 +1,505 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import React, { useState, useEffect, useRef } from 'react';
|
| 2 |
+
import { Mic, MicOff, MessageSquare, Volume2, Loader2, Send, RotateCcw, Settings } from 'lucide-react';
|
| 3 |
+
import './SttLlmTts.css';
|
| 4 |
+
|
| 5 |
+
function SttLlmTts() {
|
| 6 |
+
const [isRecording, setIsRecording] = useState(false);
|
| 7 |
+
const [sttText, setSttText] = useState('');
|
| 8 |
+
const [interimStt, setInterimStt] = useState('');
|
| 9 |
+
const [llmResponse, setLlmResponse] = useState('');
|
| 10 |
+
const [isLlmLoading, setIsLlmLoading] = useState(false);
|
| 11 |
+
const [ttsStatus, setTtsStatus] = useState('Idle');
|
| 12 |
+
const [history, setHistory] = useState(() => {
|
| 13 |
+
const saved = localStorage.getItem('nv_history');
|
| 14 |
+
return saved ? JSON.parse(saved) : [];
|
| 15 |
+
});
|
| 16 |
+
const [error, setError] = useState('');
|
| 17 |
+
const [autoMode, setAutoMode] = useState(true);
|
| 18 |
+
const [apiKey, setApiKey] = useState(import.meta.env.VITE_OPENAI_API_KEY || ''); // Load from .env if available
|
| 19 |
+
const [showSettings, setShowSettings] = useState(false);
|
| 20 |
+
const [voices, setVoices] = useState([]);
|
| 21 |
+
const [selectedVoiceURI, setSelectedVoiceURI] = useState('');
|
| 22 |
+
const [systemPrompt, setSystemPrompt] = useState(() => {
|
| 23 |
+
return localStorage.getItem('nv_system_prompt') || 'You are a professional Health Insurance Seller. Start by greeting the user and asking if they want a plan for themselves or their family. Keep answers brief.';
|
| 24 |
+
});
|
| 25 |
+
|
| 26 |
+
const recognitionRef = useRef(null);
|
| 27 |
+
const synthRef = useRef(window.speechSynthesis);
|
| 28 |
+
const scrollRef = useRef(null);
|
| 29 |
+
const isBusyRef = useRef(false);
|
| 30 |
+
const autoModeRef = useRef(true);
|
| 31 |
+
const isRecordingRef = useRef(false); // New: Track recording state for handlers
|
| 32 |
+
const isMicActiveRef = useRef(false); // New: Track hardware status to prevent lock-up
|
| 33 |
+
const silenceTimerRef = useRef(null); // Ref for auto-processing on silence
|
| 34 |
+
const [isMicActuallyListening, setIsMicActuallyListening] = useState(false);
|
| 35 |
+
|
| 36 |
+
// Persistent Storage
|
| 37 |
+
useEffect(() => {
|
| 38 |
+
localStorage.setItem('nv_history', JSON.stringify(history));
|
| 39 |
+
}, [history]);
|
| 40 |
+
|
| 41 |
+
useEffect(() => {
|
| 42 |
+
localStorage.setItem('nv_system_prompt', systemPrompt);
|
| 43 |
+
}, [systemPrompt]);
|
| 44 |
+
|
| 45 |
+
// Auto-scroll to bottom
|
| 46 |
+
useEffect(() => {
|
| 47 |
+
if (scrollRef.current) {
|
| 48 |
+
scrollRef.current.scrollTop = scrollRef.current.scrollHeight;
|
| 49 |
+
}
|
| 50 |
+
}, [history, sttText, interimStt, llmResponse]);
|
| 51 |
+
|
| 52 |
+
// Load voices
|
| 53 |
+
useEffect(() => {
|
| 54 |
+
const loadVoices = () => {
|
| 55 |
+
const availableVoices = synthRef.current.getVoices();
|
| 56 |
+
setVoices(availableVoices);
|
| 57 |
+
|
| 58 |
+
// Default to Indian English if not already set
|
| 59 |
+
if (!selectedVoiceURI && availableVoices.length > 0) {
|
| 60 |
+
const indianVoice = availableVoices.find(v => v.lang === 'en-IN' || v.name.includes('India'));
|
| 61 |
+
const defaultVoice = indianVoice || availableVoices[0];
|
| 62 |
+
setSelectedVoiceURI(defaultVoice.voiceURI || defaultVoice.name);
|
| 63 |
+
}
|
| 64 |
+
};
|
| 65 |
+
|
| 66 |
+
loadVoices();
|
| 67 |
+
if (synthRef.current.onvoiceschanged !== undefined) {
|
| 68 |
+
synthRef.current.onvoiceschanged = loadVoices;
|
| 69 |
+
}
|
| 70 |
+
}, [selectedVoiceURI]);
|
| 71 |
+
|
| 72 |
+
// Initialize STT
|
| 73 |
+
useEffect(() => {
|
| 74 |
+
if ('webkitSpeechRecognition' in window || 'SpeechRecognition' in window) {
|
| 75 |
+
const SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition;
|
| 76 |
+
recognitionRef.current = new SpeechRecognition();
|
| 77 |
+
recognitionRef.current.continuous = false; // Stop when the user pauses
|
| 78 |
+
recognitionRef.current.interimResults = true;
|
| 79 |
+
recognitionRef.current.lang = 'en-IN'; // Indian English as default
|
| 80 |
+
|
| 81 |
+
recognitionRef.current.onresult = (event) => {
|
| 82 |
+
let interim = '';
|
| 83 |
+
let final = '';
|
| 84 |
+
|
| 85 |
+
for (let i = event.resultIndex; i < event.results.length; i++) {
|
| 86 |
+
const piece = event.results[i][0].transcript;
|
| 87 |
+
if (event.results[i].isFinal) {
|
| 88 |
+
final += piece;
|
| 89 |
+
} else {
|
| 90 |
+
interim += piece;
|
| 91 |
+
}
|
| 92 |
+
}
|
| 93 |
+
|
| 94 |
+
if (final) {
|
| 95 |
+
clearTimeout(silenceTimerRef.current);
|
| 96 |
+
handleFinalStt(final);
|
| 97 |
+
} else {
|
| 98 |
+
setInterimStt(interim);
|
| 99 |
+
|
| 100 |
+
// AUTO-STOP: If we have interim text but no final for 1.5s, process it anyway
|
| 101 |
+
if (interim.trim()) {
|
| 102 |
+
clearTimeout(silenceTimerRef.current);
|
| 103 |
+
silenceTimerRef.current = setTimeout(() => {
|
| 104 |
+
handleFinalStt(interim);
|
| 105 |
+
if (recognitionRef.current) recognitionRef.current.stop();
|
| 106 |
+
}, 1500);
|
| 107 |
+
}
|
| 108 |
+
}
|
| 109 |
+
};
|
| 110 |
+
|
| 111 |
+
recognitionRef.current.onerror = (event) => {
|
| 112 |
+
if (event.error !== 'no-speech') {
|
| 113 |
+
setError(`STT Error: ${event.error}`);
|
| 114 |
+
}
|
| 115 |
+
};
|
| 116 |
+
|
| 117 |
+
recognitionRef.current.onstart = () => {
|
| 118 |
+
isMicActiveRef.current = true;
|
| 119 |
+
setIsMicActuallyListening(true);
|
| 120 |
+
};
|
| 121 |
+
|
| 122 |
+
recognitionRef.current.onend = () => {
|
| 123 |
+
isMicActiveRef.current = false;
|
| 124 |
+
setIsMicActuallyListening(false);
|
| 125 |
+
|
| 126 |
+
// Hardware cooldown: Wait 300ms before attempting to restart to avoid hardware lock
|
| 127 |
+
setTimeout(() => {
|
| 128 |
+
if (isRecordingRef.current && !isBusyRef.current && !isMicActiveRef.current) {
|
| 129 |
+
try {
|
| 130 |
+
recognitionRef.current.start();
|
| 131 |
+
} catch (e) {
|
| 132 |
+
console.log("Mic restart safe-check:", e.message);
|
| 133 |
+
}
|
| 134 |
+
}
|
| 135 |
+
}, 300);
|
| 136 |
+
};
|
| 137 |
+
} else {
|
| 138 |
+
setError('Speech recognition not supported in this browser.');
|
| 139 |
+
}
|
| 140 |
+
|
| 141 |
+
return () => {
|
| 142 |
+
if (recognitionRef.current) recognitionRef.current.stop();
|
| 143 |
+
synthRef.current.cancel();
|
| 144 |
+
};
|
| 145 |
+
}, [isRecording]);
|
| 146 |
+
|
| 147 |
+
const handleFinalStt = (text) => {
|
| 148 |
+
if (!text.trim() || isBusyRef.current) return;
|
| 149 |
+
|
| 150 |
+
clearTimeout(silenceTimerRef.current);
|
| 151 |
+
setSttText(text); // Clear previous and show new turn
|
| 152 |
+
setInterimStt('');
|
| 153 |
+
|
| 154 |
+
// STEP 1: Lock the turn
|
| 155 |
+
isBusyRef.current = true;
|
| 156 |
+
|
| 157 |
+
if (recognitionRef.current) {
|
| 158 |
+
try { recognitionRef.current.stop(); } catch (e) { }
|
| 159 |
+
}
|
| 160 |
+
processLlm(text);
|
| 161 |
+
};
|
| 162 |
+
|
| 163 |
+
const processLlm = async (text) => {
|
| 164 |
+
setIsLlmLoading(true);
|
| 165 |
+
setLlmResponse('Thinking...');
|
| 166 |
+
|
| 167 |
+
try {
|
| 168 |
+
let responseText = '';
|
| 169 |
+
|
| 170 |
+
if (apiKey) {
|
| 171 |
+
// REAL AI CALL
|
| 172 |
+
const response = await fetch('https://api.openai.com/v1/chat/completions', {
|
| 173 |
+
method: 'POST',
|
| 174 |
+
headers: {
|
| 175 |
+
'Content-Type': 'application/json',
|
| 176 |
+
'Authorization': `Bearer ${apiKey}`
|
| 177 |
+
},
|
| 178 |
+
body: JSON.stringify({
|
| 179 |
+
model: 'gpt-4o-mini',
|
| 180 |
+
messages: [
|
| 181 |
+
{ role: 'system', content: systemPrompt },
|
| 182 |
+
...history.slice(-12),
|
| 183 |
+
{ role: 'user', content: text }
|
| 184 |
+
],
|
| 185 |
+
temperature: 0.7,
|
| 186 |
+
max_tokens: 100
|
| 187 |
+
})
|
| 188 |
+
});
|
| 189 |
+
|
| 190 |
+
const data = await response.json();
|
| 191 |
+
if (data.error) throw new Error(data.error.message);
|
| 192 |
+
responseText = data.choices[0].message.content;
|
| 193 |
+
} else {
|
| 194 |
+
// FALLBACK SMART MOCK (for when no key is present)
|
| 195 |
+
await new Promise(r => setTimeout(r, 1000));
|
| 196 |
+
responseText = generateFallbackResponse(text);
|
| 197 |
+
}
|
| 198 |
+
|
| 199 |
+
setLlmResponse(responseText);
|
| 200 |
+
setHistory(prev => [...prev, { role: 'user', content: text }, { role: 'assistant', content: responseText }]);
|
| 201 |
+
|
| 202 |
+
if (autoModeRef.current) {
|
| 203 |
+
speakText(responseText);
|
| 204 |
+
} else {
|
| 205 |
+
isBusyRef.current = false;
|
| 206 |
+
// Re-awaken mic if auto-play is off
|
| 207 |
+
setTimeout(() => {
|
| 208 |
+
if (isRecordingRef.current && !isMicActiveRef.current) {
|
| 209 |
+
try { recognitionRef.current.start(); } catch (e) { }
|
| 210 |
+
}
|
| 211 |
+
}, 300);
|
| 212 |
+
}
|
| 213 |
+
} catch (err) {
|
| 214 |
+
setError(`LLM Error: ${err.message}`);
|
| 215 |
+
setIsLlmLoading(false);
|
| 216 |
+
isBusyRef.current = false;
|
| 217 |
+
// Re-awaken mic on error
|
| 218 |
+
setTimeout(() => {
|
| 219 |
+
if (isRecordingRef.current && !isMicActiveRef.current) {
|
| 220 |
+
try { recognitionRef.current.start(); } catch (e) { }
|
| 221 |
+
}
|
| 222 |
+
}, 300);
|
| 223 |
+
} finally {
|
| 224 |
+
setIsLlmLoading(false);
|
| 225 |
+
}
|
| 226 |
+
};
|
| 227 |
+
|
| 228 |
+
const generateFallbackResponse = (input) => {
|
| 229 |
+
const text = input.toLowerCase();
|
| 230 |
+
if (text.includes('prime minister') && text.includes('india')) return "As of January 2026, Narendra Modi is the Prime Minister of India.";
|
| 231 |
+
if (text.includes('hello') || text.includes('hi')) return "Hello! How can I help you today?";
|
| 232 |
+
if (text.includes('time')) return `The current time is ${new Date().toLocaleTimeString()}.`;
|
| 233 |
+
return `I processed your request: "${input}". For real answers, please add your OpenAI API Key in settings.`;
|
| 234 |
+
};
|
| 235 |
+
|
| 236 |
+
const speakText = (text) => {
|
| 237 |
+
synthRef.current.cancel();
|
| 238 |
+
const utterance = new SpeechSynthesisUtterance(text);
|
| 239 |
+
|
| 240 |
+
// ALWAYS find the fresh voice object from the system right before speaking
|
| 241 |
+
const currentVoices = synthRef.current.getVoices();
|
| 242 |
+
const voice = currentVoices.find(v => (v.voiceURI || v.name) === selectedVoiceURI);
|
| 243 |
+
|
| 244 |
+
if (voice) {
|
| 245 |
+
utterance.voice = voice;
|
| 246 |
+
}
|
| 247 |
+
|
| 248 |
+
utterance.onstart = () => {
|
| 249 |
+
setTtsStatus('Playing...');
|
| 250 |
+
// Mic is already stopped by handleFinalStt, but we ensure busy state remains
|
| 251 |
+
isBusyRef.current = true;
|
| 252 |
+
};
|
| 253 |
+
|
| 254 |
+
utterance.onend = () => {
|
| 255 |
+
setTtsStatus('Finished');
|
| 256 |
+
isBusyRef.current = false;
|
| 257 |
+
// STEP 3: Clear busy state and resume mic with cooldown
|
| 258 |
+
setTimeout(() => {
|
| 259 |
+
if (isRecordingRef.current && !isMicActiveRef.current) {
|
| 260 |
+
try { recognitionRef.current.start(); } catch (e) { }
|
| 261 |
+
}
|
| 262 |
+
}, 300);
|
| 263 |
+
};
|
| 264 |
+
|
| 265 |
+
utterance.onerror = () => {
|
| 266 |
+
setTtsStatus('Error');
|
| 267 |
+
isBusyRef.current = false;
|
| 268 |
+
setTimeout(() => {
|
| 269 |
+
if (isRecordingRef.current && !isMicActiveRef.current) {
|
| 270 |
+
try { recognitionRef.current.start(); } catch (e) { }
|
| 271 |
+
}
|
| 272 |
+
}, 300);
|
| 273 |
+
};
|
| 274 |
+
|
| 275 |
+
synthRef.current.speak(utterance);
|
| 276 |
+
};
|
| 277 |
+
|
| 278 |
+
const toggleRecording = () => {
|
| 279 |
+
if (isRecording) {
|
| 280 |
+
isRecordingRef.current = false;
|
| 281 |
+
isBusyRef.current = false;
|
| 282 |
+
try { recognitionRef.current.stop(); } catch (e) { }
|
| 283 |
+
setIsRecording(false);
|
| 284 |
+
} else {
|
| 285 |
+
setSttText('');
|
| 286 |
+
setInterimStt('');
|
| 287 |
+
setError('');
|
| 288 |
+
isRecordingRef.current = true;
|
| 289 |
+
isBusyRef.current = false;
|
| 290 |
+
try { recognitionRef.current.start(); } catch (e) { }
|
| 291 |
+
setIsRecording(true);
|
| 292 |
+
}
|
| 293 |
+
};
|
| 294 |
+
|
| 295 |
+
const resetAll = () => {
|
| 296 |
+
setHistory([]);
|
| 297 |
+
setSttText('');
|
| 298 |
+
setInterimStt('');
|
| 299 |
+
setLlmResponse('');
|
| 300 |
+
setTtsStatus('Idle');
|
| 301 |
+
synthRef.current.cancel();
|
| 302 |
+
};
|
| 303 |
+
|
| 304 |
+
return (
|
| 305 |
+
<div className="stt-llm-tts-test">
|
| 306 |
+
<div className="test-header">
|
| 307 |
+
<div className="title-group">
|
| 308 |
+
<h2>STT → LLM → TTS Pipeline</h2>
|
| 309 |
+
<p className="subtitle">Full Loop: Voice In, AI Processing, Voice Out</p>
|
| 310 |
+
</div>
|
| 311 |
+
<div className="action-buttons">
|
| 312 |
+
<button className="settings-btn" onClick={() => setShowSettings(!showSettings)} title="AI Configuration">
|
| 313 |
+
<Settings size={18} />
|
| 314 |
+
</button>
|
| 315 |
+
<button className={`record-toggle ${isRecording ? 'recording' : ''}`} onClick={toggleRecording}>
|
| 316 |
+
{isRecording ? <MicOff size={20} /> : <Mic size={20} />}
|
| 317 |
+
<span>{isRecording ? 'Stop Session' : 'Start Session'}</span>
|
| 318 |
+
</button>
|
| 319 |
+
<button className="reset-session" onClick={resetAll}>
|
| 320 |
+
<RotateCcw size={18} />
|
| 321 |
+
</button>
|
| 322 |
+
</div>
|
| 323 |
+
</div>
|
| 324 |
+
|
| 325 |
+
{showSettings && (
|
| 326 |
+
<div className="pipeline-settings-overlay">
|
| 327 |
+
<div className="settings-card">
|
| 328 |
+
<h3>AI Agent Configuration</h3>
|
| 329 |
+
<div className="setting-item">
|
| 330 |
+
<label>Agent Personality (System Prompt)</label>
|
| 331 |
+
<textarea
|
| 332 |
+
value={systemPrompt}
|
| 333 |
+
onChange={(e) => setSystemPrompt(e.target.value)}
|
| 334 |
+
rows={4}
|
| 335 |
+
placeholder="Example: You are a professional doctor assistant..."
|
| 336 |
+
/>
|
| 337 |
+
<p className="hint">This tells the AI how to behave and what to ask.</p>
|
| 338 |
+
</div>
|
| 339 |
+
|
| 340 |
+
<div className="prompt-presets">
|
| 341 |
+
<label>Quick Presets</label>
|
| 342 |
+
<div className="preset-btns">
|
| 343 |
+
<button className="preset-btn" onClick={() => setSystemPrompt('You are a concise voice assistant. Give short answers (max 20 words).')}>
|
| 344 |
+
General Assistant
|
| 345 |
+
</button>
|
| 346 |
+
<button className="preset-btn" onClick={() => setSystemPrompt(`You are a highly professional Health Insurance Sales Agent.
|
| 347 |
+
Follow this EXACT conversation flow:
|
| 348 |
+
1. Greet the user and ask if they are looking for a plan for themselves or their family.
|
| 349 |
+
2. Once they answer, ask for the ages of the people to be insured.
|
| 350 |
+
3. Next, ask if anyone has any pre-existing medical conditions (Yes/No).
|
| 351 |
+
4. Finally, ask for their preferred annual budget.
|
| 352 |
+
|
| 353 |
+
Rules:
|
| 354 |
+
- Ask only ONE question at a time.
|
| 355 |
+
- Keep your responses under 20 words.
|
| 356 |
+
- Be polite, empathetic, and professional.
|
| 357 |
+
- If they say something unrelated, steer them back to the last question.`)}>
|
| 358 |
+
Health Insurance Agent (Structured)
|
| 359 |
+
</button>
|
| 360 |
+
<button className="preset-btn" onClick={() => setSystemPrompt('You are a helpful travel agent. Ask the user about their favorite destination. Keep answers short.')}>
|
| 361 |
+
Travel Agent
|
| 362 |
+
</button>
|
| 363 |
+
</div>
|
| 364 |
+
</div>
|
| 365 |
+
|
| 366 |
+
<button className="close-settings" onClick={() => setShowSettings(false)}>Save & Close</button>
|
| 367 |
+
</div>
|
| 368 |
+
</div>
|
| 369 |
+
)}
|
| 370 |
+
|
| 371 |
+
<div className="pipeline-columns">
|
| 372 |
+
{/* Column 1: STT */}
|
| 373 |
+
<div className="pipeline-col">
|
| 374 |
+
<div className="col-header stt">
|
| 375 |
+
<div className="title-with-model">
|
| 376 |
+
<Mic size={18} />
|
| 377 |
+
<h3>1. Speech-to-Text</h3>
|
| 378 |
+
</div>
|
| 379 |
+
<span className="model-tag">Web Speech API / Vosk</span>
|
| 380 |
+
</div>
|
| 381 |
+
<div className="col-content" ref={scrollRef}>
|
| 382 |
+
<div className="text-display stt-display">
|
| 383 |
+
{sttText && <p className="final-text">{sttText}</p>}
|
| 384 |
+
{interimStt && <p className="interim-text">{interimStt}</p>}
|
| 385 |
+
{!sttText && !interimStt && (
|
| 386 |
+
<p className="empty-msg">Speak into your mic to start...</p>
|
| 387 |
+
)}
|
| 388 |
+
</div>
|
| 389 |
+
{isMicActuallyListening && (
|
| 390 |
+
<div className="mic-muted-status listening">
|
| 391 |
+
<div className="pulse-dot"></div> Listening...
|
| 392 |
+
</div>
|
| 393 |
+
)}
|
| 394 |
+
{isRecording && isBusyRef.current && (
|
| 395 |
+
<div className="mic-muted-status processing">
|
| 396 |
+
<Volume2 size={14} /> AI is processing/speaking...
|
| 397 |
+
</div>
|
| 398 |
+
)}
|
| 399 |
+
</div>
|
| 400 |
+
</div>
|
| 401 |
+
|
| 402 |
+
{/* Column 2: LLM */}
|
| 403 |
+
<div className="pipeline-col">
|
| 404 |
+
<div className="col-header llm">
|
| 405 |
+
<div className="title-with-model">
|
| 406 |
+
<MessageSquare size={18} />
|
| 407 |
+
<h3>2. LLM Processing</h3>
|
| 408 |
+
</div>
|
| 409 |
+
<span className="model-tag">GPT-4o-mini</span>
|
| 410 |
+
</div>
|
| 411 |
+
<div className="col-content">
|
| 412 |
+
<div className="text-display llm-display">
|
| 413 |
+
{isLlmLoading ? (
|
| 414 |
+
<div className="loading-state">
|
| 415 |
+
<Loader2 className="spinner" size={32} />
|
| 416 |
+
<p>Processing...</p>
|
| 417 |
+
</div>
|
| 418 |
+
) : llmResponse ? (
|
| 419 |
+
<div className="response-box">
|
| 420 |
+
<p className="response-text">{llmResponse}</p>
|
| 421 |
+
</div>
|
| 422 |
+
) : (
|
| 423 |
+
<p className="empty-msg">Waiting for STT input...</p>
|
| 424 |
+
)}
|
| 425 |
+
</div>
|
| 426 |
+
</div>
|
| 427 |
+
</div>
|
| 428 |
+
|
| 429 |
+
{/* Column 3: TTS */}
|
| 430 |
+
<div className="pipeline-col">
|
| 431 |
+
<div className="col-header tts">
|
| 432 |
+
<div className="title-with-model">
|
| 433 |
+
<Volume2 size={18} />
|
| 434 |
+
<h3>3. Text-to-Speech</h3>
|
| 435 |
+
</div>
|
| 436 |
+
<span className="model-tag">Web Speech API / Piper</span>
|
| 437 |
+
</div>
|
| 438 |
+
<div className="col-content">
|
| 439 |
+
<div className="tts-status">
|
| 440 |
+
<div className={`status-indicator ${ttsStatus.toLowerCase().replace('...', '')}`}>
|
| 441 |
+
<Volume2 size={48} className={ttsStatus === 'Playing...' ? 'bouncing' : ''} />
|
| 442 |
+
<p>{ttsStatus}</p>
|
| 443 |
+
</div>
|
| 444 |
+
{llmResponse && !isLlmLoading && (
|
| 445 |
+
<button className="replay-btn" onClick={() => speakText(llmResponse)}>
|
| 446 |
+
<Volume2 size={16} /> Replay
|
| 447 |
+
</button>
|
| 448 |
+
)}
|
| 449 |
+
</div>
|
| 450 |
+
|
| 451 |
+
<div className="voice-selection-compact">
|
| 452 |
+
<label>Voice Output</label>
|
| 453 |
+
<select
|
| 454 |
+
value={selectedVoiceURI}
|
| 455 |
+
onChange={(e) => setSelectedVoiceURI(e.target.value)}
|
| 456 |
+
>
|
| 457 |
+
{voices.map(v => (
|
| 458 |
+
<option key={v.voiceURI || v.name} value={v.voiceURI || v.name}>
|
| 459 |
+
{v.name} ({v.lang})
|
| 460 |
+
</option>
|
| 461 |
+
))}
|
| 462 |
+
</select>
|
| 463 |
+
</div>
|
| 464 |
+
|
| 465 |
+
<div className="auto-toggle">
|
| 466 |
+
<label className="switch">
|
| 467 |
+
<input
|
| 468 |
+
type="checkbox"
|
| 469 |
+
checked={autoMode}
|
| 470 |
+
onChange={(e) => {
|
| 471 |
+
const val = e.target.checked;
|
| 472 |
+
setAutoMode(val);
|
| 473 |
+
autoModeRef.current = val;
|
| 474 |
+
}}
|
| 475 |
+
/>
|
| 476 |
+
<span className="slider round"></span>
|
| 477 |
+
</label>
|
| 478 |
+
<span>Auto-play TTS</span>
|
| 479 |
+
</div>
|
| 480 |
+
</div>
|
| 481 |
+
</div>
|
| 482 |
+
</div>
|
| 483 |
+
|
| 484 |
+
{error && <div className="pipeline-error">{error}</div>}
|
| 485 |
+
|
| 486 |
+
<div className="history-tray">
|
| 487 |
+
<h4>Recent Interactions</h4>
|
| 488 |
+
<div className="history-list">
|
| 489 |
+
{history.length === 0 ? (
|
| 490 |
+
<p className="no-history">No history yet</p>
|
| 491 |
+
) : (
|
| 492 |
+
history.map((h, i) => (
|
| 493 |
+
<div key={i} className={`history-item ${h.role}`}>
|
| 494 |
+
<span className="h-role">{h.role === 'user' ? 'You' : 'AI'}:</span>
|
| 495 |
+
<span className="h-text">{h.content}</span>
|
| 496 |
+
</div>
|
| 497 |
+
))
|
| 498 |
+
)}
|
| 499 |
+
</div>
|
| 500 |
+
</div>
|
| 501 |
+
</div>
|
| 502 |
+
);
|
| 503 |
+
}
|
| 504 |
+
|
| 505 |
+
export default SttLlmTts;
|
web_demo/src/components/TextToSpeech.css
ADDED
|
@@ -0,0 +1,321 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
.text-to-speech {
|
| 2 |
+
display: flex;
|
| 3 |
+
flex-direction: column;
|
| 4 |
+
gap: 1.5rem;
|
| 5 |
+
padding: 0;
|
| 6 |
+
height: 100%;
|
| 7 |
+
width: 100%;
|
| 8 |
+
}
|
| 9 |
+
|
| 10 |
+
.tts-header {
|
| 11 |
+
text-align: center;
|
| 12 |
+
}
|
| 13 |
+
|
| 14 |
+
.tts-header h2 {
|
| 15 |
+
font-size: 1.5rem;
|
| 16 |
+
font-weight: 600;
|
| 17 |
+
margin-bottom: 0.5rem;
|
| 18 |
+
color: var(--text-primary);
|
| 19 |
+
}
|
| 20 |
+
|
| 21 |
+
/* Text Input Section */
|
| 22 |
+
.text-input-section {
|
| 23 |
+
display: flex;
|
| 24 |
+
flex-direction: column;
|
| 25 |
+
gap: 0.5rem;
|
| 26 |
+
}
|
| 27 |
+
|
| 28 |
+
.input-header {
|
| 29 |
+
display: flex;
|
| 30 |
+
justify-content: space-between;
|
| 31 |
+
align-items: center;
|
| 32 |
+
}
|
| 33 |
+
|
| 34 |
+
.input-header label {
|
| 35 |
+
font-weight: 500;
|
| 36 |
+
color: var(--text-primary);
|
| 37 |
+
font-size: 0.875rem;
|
| 38 |
+
}
|
| 39 |
+
|
| 40 |
+
.char-count {
|
| 41 |
+
font-size: 0.75rem;
|
| 42 |
+
color: var(--text-secondary);
|
| 43 |
+
}
|
| 44 |
+
|
| 45 |
+
.text-input {
|
| 46 |
+
width: 100%;
|
| 47 |
+
padding: 1rem;
|
| 48 |
+
background: rgba(255, 255, 255, 0.05);
|
| 49 |
+
border: 1px solid rgba(255, 255, 255, 0.1);
|
| 50 |
+
border-radius: 8px;
|
| 51 |
+
color: var(--text-primary);
|
| 52 |
+
font-size: 0.875rem;
|
| 53 |
+
font-family: inherit;
|
| 54 |
+
resize: vertical;
|
| 55 |
+
transition: all 0.2s ease;
|
| 56 |
+
}
|
| 57 |
+
|
| 58 |
+
.text-input:focus {
|
| 59 |
+
outline: none;
|
| 60 |
+
border-color: var(--accent-color);
|
| 61 |
+
background: rgba(255, 255, 255, 0.08);
|
| 62 |
+
box-shadow: 0 0 0 3px rgba(59, 130, 246, 0.1);
|
| 63 |
+
}
|
| 64 |
+
|
| 65 |
+
.text-input::placeholder {
|
| 66 |
+
color: var(--text-secondary);
|
| 67 |
+
}
|
| 68 |
+
|
| 69 |
+
/* Sample Texts */
|
| 70 |
+
.sample-texts {
|
| 71 |
+
background: rgba(255, 255, 255, 0.03);
|
| 72 |
+
border-radius: 8px;
|
| 73 |
+
padding: 1rem;
|
| 74 |
+
}
|
| 75 |
+
|
| 76 |
+
.sample-label {
|
| 77 |
+
font-size: 0.75rem;
|
| 78 |
+
color: var(--text-secondary);
|
| 79 |
+
margin-bottom: 0.5rem;
|
| 80 |
+
font-weight: 500;
|
| 81 |
+
}
|
| 82 |
+
|
| 83 |
+
.sample-buttons {
|
| 84 |
+
display: flex;
|
| 85 |
+
gap: 0.5rem;
|
| 86 |
+
flex-wrap: wrap;
|
| 87 |
+
}
|
| 88 |
+
|
| 89 |
+
.sample-btn {
|
| 90 |
+
padding: 0.5rem 0.75rem;
|
| 91 |
+
background: rgba(255, 255, 255, 0.1);
|
| 92 |
+
border: 1px solid rgba(255, 255, 255, 0.1);
|
| 93 |
+
border-radius: 6px;
|
| 94 |
+
color: var(--text-secondary);
|
| 95 |
+
font-size: 0.75rem;
|
| 96 |
+
cursor: pointer;
|
| 97 |
+
transition: all 0.2s ease;
|
| 98 |
+
font-family: inherit;
|
| 99 |
+
}
|
| 100 |
+
|
| 101 |
+
.sample-btn:hover {
|
| 102 |
+
background: rgba(255, 255, 255, 0.15);
|
| 103 |
+
color: var(--text-primary);
|
| 104 |
+
border-color: var(--accent-color);
|
| 105 |
+
}
|
| 106 |
+
|
| 107 |
+
/* Voice Selector */
|
| 108 |
+
.voice-selector {
|
| 109 |
+
display: flex;
|
| 110 |
+
flex-direction: column;
|
| 111 |
+
gap: 0.5rem;
|
| 112 |
+
}
|
| 113 |
+
|
| 114 |
+
.voice-selector label {
|
| 115 |
+
display: flex;
|
| 116 |
+
align-items: center;
|
| 117 |
+
gap: 0.5rem;
|
| 118 |
+
font-weight: 500;
|
| 119 |
+
color: var(--text-primary);
|
| 120 |
+
font-size: 0.875rem;
|
| 121 |
+
}
|
| 122 |
+
|
| 123 |
+
.voice-select {
|
| 124 |
+
width: 100%;
|
| 125 |
+
padding: 0.75rem;
|
| 126 |
+
background: rgba(255, 255, 255, 0.05);
|
| 127 |
+
border: 1px solid rgba(255, 255, 255, 0.1);
|
| 128 |
+
border-radius: 8px;
|
| 129 |
+
color: var(--text-primary);
|
| 130 |
+
font-size: 0.875rem;
|
| 131 |
+
cursor: pointer;
|
| 132 |
+
font-family: inherit;
|
| 133 |
+
transition: all 0.2s ease;
|
| 134 |
+
}
|
| 135 |
+
|
| 136 |
+
.voice-select:focus {
|
| 137 |
+
outline: none;
|
| 138 |
+
border-color: var(--accent-color);
|
| 139 |
+
box-shadow: 0 0 0 3px rgba(59, 130, 246, 0.1);
|
| 140 |
+
}
|
| 141 |
+
|
| 142 |
+
.voice-select option {
|
| 143 |
+
background: var(--card-bg);
|
| 144 |
+
color: var(--text-primary);
|
| 145 |
+
}
|
| 146 |
+
|
| 147 |
+
/* Settings Section */
|
| 148 |
+
.settings-section {
|
| 149 |
+
background: rgba(255, 255, 255, 0.03);
|
| 150 |
+
border-radius: 8px;
|
| 151 |
+
overflow: hidden;
|
| 152 |
+
}
|
| 153 |
+
|
| 154 |
+
.settings-toggle {
|
| 155 |
+
width: 100%;
|
| 156 |
+
display: flex;
|
| 157 |
+
align-items: center;
|
| 158 |
+
justify-content: center;
|
| 159 |
+
gap: 0.5rem;
|
| 160 |
+
padding: 0.75rem;
|
| 161 |
+
background: transparent;
|
| 162 |
+
border: none;
|
| 163 |
+
color: var(--text-secondary);
|
| 164 |
+
font-size: 0.875rem;
|
| 165 |
+
cursor: pointer;
|
| 166 |
+
transition: all 0.2s ease;
|
| 167 |
+
font-family: inherit;
|
| 168 |
+
}
|
| 169 |
+
|
| 170 |
+
.settings-toggle:hover {
|
| 171 |
+
background: rgba(255, 255, 255, 0.05);
|
| 172 |
+
color: var(--text-primary);
|
| 173 |
+
}
|
| 174 |
+
|
| 175 |
+
.settings-panel {
|
| 176 |
+
padding: 1rem;
|
| 177 |
+
display: flex;
|
| 178 |
+
flex-direction: column;
|
| 179 |
+
gap: 1.5rem;
|
| 180 |
+
border-top: 1px solid rgba(255, 255, 255, 0.1);
|
| 181 |
+
}
|
| 182 |
+
|
| 183 |
+
.setting-control {
|
| 184 |
+
display: flex;
|
| 185 |
+
flex-direction: column;
|
| 186 |
+
gap: 0.5rem;
|
| 187 |
+
}
|
| 188 |
+
|
| 189 |
+
.setting-control label {
|
| 190 |
+
font-size: 0.875rem;
|
| 191 |
+
font-weight: 500;
|
| 192 |
+
color: var(--text-primary);
|
| 193 |
+
}
|
| 194 |
+
|
| 195 |
+
.slider {
|
| 196 |
+
width: 100%;
|
| 197 |
+
height: 6px;
|
| 198 |
+
border-radius: 3px;
|
| 199 |
+
background: rgba(255, 255, 255, 0.1);
|
| 200 |
+
outline: none;
|
| 201 |
+
-webkit-appearance: none;
|
| 202 |
+
appearance: none;
|
| 203 |
+
}
|
| 204 |
+
|
| 205 |
+
.slider::-webkit-slider-thumb {
|
| 206 |
+
-webkit-appearance: none;
|
| 207 |
+
appearance: none;
|
| 208 |
+
width: 18px;
|
| 209 |
+
height: 18px;
|
| 210 |
+
border-radius: 50%;
|
| 211 |
+
background: var(--accent-color);
|
| 212 |
+
cursor: pointer;
|
| 213 |
+
transition: all 0.2s ease;
|
| 214 |
+
}
|
| 215 |
+
|
| 216 |
+
.slider::-webkit-slider-thumb:hover {
|
| 217 |
+
background: #2563eb;
|
| 218 |
+
transform: scale(1.1);
|
| 219 |
+
}
|
| 220 |
+
|
| 221 |
+
.slider::-moz-range-thumb {
|
| 222 |
+
width: 18px;
|
| 223 |
+
height: 18px;
|
| 224 |
+
border-radius: 50%;
|
| 225 |
+
background: var(--accent-color);
|
| 226 |
+
cursor: pointer;
|
| 227 |
+
border: none;
|
| 228 |
+
transition: all 0.2s ease;
|
| 229 |
+
}
|
| 230 |
+
|
| 231 |
+
.slider::-moz-range-thumb:hover {
|
| 232 |
+
background: #2563eb;
|
| 233 |
+
transform: scale(1.1);
|
| 234 |
+
}
|
| 235 |
+
|
| 236 |
+
.slider-labels {
|
| 237 |
+
display: flex;
|
| 238 |
+
justify-content: space-between;
|
| 239 |
+
font-size: 0.7rem;
|
| 240 |
+
color: var(--text-secondary);
|
| 241 |
+
}
|
| 242 |
+
|
| 243 |
+
.reset-btn {
|
| 244 |
+
display: flex;
|
| 245 |
+
align-items: center;
|
| 246 |
+
justify-content: center;
|
| 247 |
+
gap: 0.5rem;
|
| 248 |
+
padding: 0.5rem 1rem;
|
| 249 |
+
background: rgba(255, 255, 255, 0.1);
|
| 250 |
+
border: 1px solid rgba(255, 255, 255, 0.1);
|
| 251 |
+
border-radius: 6px;
|
| 252 |
+
color: var(--text-secondary);
|
| 253 |
+
font-size: 0.8125rem;
|
| 254 |
+
cursor: pointer;
|
| 255 |
+
transition: all 0.2s ease;
|
| 256 |
+
font-family: inherit;
|
| 257 |
+
align-self: flex-start;
|
| 258 |
+
}
|
| 259 |
+
|
| 260 |
+
.reset-btn:hover {
|
| 261 |
+
background: rgba(255, 255, 255, 0.15);
|
| 262 |
+
color: var(--text-primary);
|
| 263 |
+
border-color: var(--accent-color);
|
| 264 |
+
}
|
| 265 |
+
|
| 266 |
+
/* Speaking Indicator */
|
| 267 |
+
.speaking-indicator {
|
| 268 |
+
display: flex;
|
| 269 |
+
flex-direction: column;
|
| 270 |
+
align-items: center;
|
| 271 |
+
gap: 1rem;
|
| 272 |
+
padding: 1.5rem;
|
| 273 |
+
background: rgba(59, 130, 246, 0.1);
|
| 274 |
+
border: 1px solid rgba(59, 130, 246, 0.3);
|
| 275 |
+
border-radius: 8px;
|
| 276 |
+
}
|
| 277 |
+
|
| 278 |
+
.sound-wave {
|
| 279 |
+
display: flex;
|
| 280 |
+
align-items: center;
|
| 281 |
+
justify-content: center;
|
| 282 |
+
gap: 4px;
|
| 283 |
+
height: 40px;
|
| 284 |
+
}
|
| 285 |
+
|
| 286 |
+
.wave-bar {
|
| 287 |
+
width: 4px;
|
| 288 |
+
height: 10px;
|
| 289 |
+
background: var(--accent-color);
|
| 290 |
+
border-radius: 2px;
|
| 291 |
+
animation: wave 1s ease-in-out infinite;
|
| 292 |
+
}
|
| 293 |
+
|
| 294 |
+
@keyframes wave {
|
| 295 |
+
|
| 296 |
+
0%,
|
| 297 |
+
100% {
|
| 298 |
+
height: 10px;
|
| 299 |
+
}
|
| 300 |
+
|
| 301 |
+
50% {
|
| 302 |
+
height: 30px;
|
| 303 |
+
}
|
| 304 |
+
}
|
| 305 |
+
|
| 306 |
+
.speaking-indicator span {
|
| 307 |
+
color: var(--accent-color);
|
| 308 |
+
font-weight: 500;
|
| 309 |
+
font-size: 0.875rem;
|
| 310 |
+
}
|
| 311 |
+
|
| 312 |
+
/* Responsive */
|
| 313 |
+
@media (max-width: 640px) {
|
| 314 |
+
.sample-buttons {
|
| 315 |
+
flex-direction: column;
|
| 316 |
+
}
|
| 317 |
+
|
| 318 |
+
.sample-btn {
|
| 319 |
+
width: 100%;
|
| 320 |
+
}
|
| 321 |
+
}
|
web_demo/src/components/TextToSpeech.jsx
ADDED
|
@@ -0,0 +1,327 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import React, { useState, useEffect, useRef } from 'react';
|
| 2 |
+
import { Volume2, VolumeX, Play, Pause, RotateCcw, Settings } from 'lucide-react';
|
| 3 |
+
import './TextToSpeech.css';
|
| 4 |
+
|
| 5 |
+
function TextToSpeech() {
|
| 6 |
+
const [text, setText] = useState('');
|
| 7 |
+
const [isSpeaking, setIsSpeaking] = useState(false);
|
| 8 |
+
const [isPaused, setIsPaused] = useState(false);
|
| 9 |
+
const [voices, setVoices] = useState([]);
|
| 10 |
+
const [selectedVoice, setSelectedVoice] = useState(null);
|
| 11 |
+
const [rate, setRate] = useState(1);
|
| 12 |
+
const [pitch, setPitch] = useState(1);
|
| 13 |
+
const [volume, setVolume] = useState(1);
|
| 14 |
+
const [error, setError] = useState('');
|
| 15 |
+
const [showSettings, setShowSettings] = useState(false);
|
| 16 |
+
|
| 17 |
+
const synthRef = useRef(window.speechSynthesis);
|
| 18 |
+
|
| 19 |
+
// Load available voices
|
| 20 |
+
useEffect(() => {
|
| 21 |
+
const loadVoices = () => {
|
| 22 |
+
const availableVoices = synthRef.current.getVoices();
|
| 23 |
+
setVoices(availableVoices);
|
| 24 |
+
|
| 25 |
+
// Prioritize Indian English voices (en-IN)
|
| 26 |
+
const indianVoice = availableVoices.find(voice =>
|
| 27 |
+
voice.lang === 'en-IN' ||
|
| 28 |
+
voice.lang === 'en_IN' ||
|
| 29 |
+
voice.name.toLowerCase().includes('india')
|
| 30 |
+
);
|
| 31 |
+
|
| 32 |
+
const defaultVoice = indianVoice ||
|
| 33 |
+
availableVoices.find(voice => voice.lang.startsWith('en')) ||
|
| 34 |
+
availableVoices[0];
|
| 35 |
+
|
| 36 |
+
setSelectedVoice(defaultVoice);
|
| 37 |
+
};
|
| 38 |
+
|
| 39 |
+
loadVoices();
|
| 40 |
+
|
| 41 |
+
// Chrome loads voices asynchronously
|
| 42 |
+
if (synthRef.current.onvoiceschanged !== undefined) {
|
| 43 |
+
synthRef.current.onvoiceschanged = loadVoices;
|
| 44 |
+
}
|
| 45 |
+
|
| 46 |
+
return () => {
|
| 47 |
+
synthRef.current.cancel();
|
| 48 |
+
};
|
| 49 |
+
}, []);
|
| 50 |
+
|
| 51 |
+
const speak = () => {
|
| 52 |
+
if (!text.trim()) {
|
| 53 |
+
setError('Please enter some text to speak');
|
| 54 |
+
return;
|
| 55 |
+
}
|
| 56 |
+
|
| 57 |
+
setError('');
|
| 58 |
+
synthRef.current.cancel(); // Cancel any ongoing speech
|
| 59 |
+
|
| 60 |
+
const utterance = new SpeechSynthesisUtterance(text);
|
| 61 |
+
|
| 62 |
+
if (selectedVoice) {
|
| 63 |
+
utterance.voice = selectedVoice;
|
| 64 |
+
}
|
| 65 |
+
|
| 66 |
+
utterance.rate = rate;
|
| 67 |
+
utterance.pitch = pitch;
|
| 68 |
+
utterance.volume = volume;
|
| 69 |
+
|
| 70 |
+
utterance.onstart = () => {
|
| 71 |
+
setIsSpeaking(true);
|
| 72 |
+
setIsPaused(false);
|
| 73 |
+
};
|
| 74 |
+
|
| 75 |
+
utterance.onend = () => {
|
| 76 |
+
setIsSpeaking(false);
|
| 77 |
+
setIsPaused(false);
|
| 78 |
+
};
|
| 79 |
+
|
| 80 |
+
utterance.onerror = (event) => {
|
| 81 |
+
console.error('Speech synthesis error:', event);
|
| 82 |
+
setError(`Error: ${event.error}`);
|
| 83 |
+
setIsSpeaking(false);
|
| 84 |
+
setIsPaused(false);
|
| 85 |
+
};
|
| 86 |
+
|
| 87 |
+
synthRef.current.speak(utterance);
|
| 88 |
+
};
|
| 89 |
+
|
| 90 |
+
const pause = () => {
|
| 91 |
+
if (synthRef.current.speaking && !synthRef.current.paused) {
|
| 92 |
+
synthRef.current.pause();
|
| 93 |
+
setIsPaused(true);
|
| 94 |
+
}
|
| 95 |
+
};
|
| 96 |
+
|
| 97 |
+
const resume = () => {
|
| 98 |
+
if (synthRef.current.paused) {
|
| 99 |
+
synthRef.current.resume();
|
| 100 |
+
setIsPaused(false);
|
| 101 |
+
}
|
| 102 |
+
};
|
| 103 |
+
|
| 104 |
+
const stop = () => {
|
| 105 |
+
synthRef.current.cancel();
|
| 106 |
+
setIsSpeaking(false);
|
| 107 |
+
setIsPaused(false);
|
| 108 |
+
};
|
| 109 |
+
|
| 110 |
+
const reset = () => {
|
| 111 |
+
setRate(1);
|
| 112 |
+
setPitch(1);
|
| 113 |
+
setVolume(1);
|
| 114 |
+
};
|
| 115 |
+
|
| 116 |
+
const sampleTexts = [
|
| 117 |
+
"Namaste! This is an Indian English voice test. How can I help you today?",
|
| 118 |
+
"Hello! This is a test of the text-to-speech system.",
|
| 119 |
+
"The quick brown fox jumps over the lazy dog.",
|
| 120 |
+
"Welcome to NeuralVoice AI. We're testing the speech synthesis capabilities of your browser.",
|
| 121 |
+
];
|
| 122 |
+
|
| 123 |
+
const loadSampleText = (sample) => {
|
| 124 |
+
setText(sample);
|
| 125 |
+
setError('');
|
| 126 |
+
};
|
| 127 |
+
|
| 128 |
+
return (
|
| 129 |
+
<div className="text-to-speech">
|
| 130 |
+
<div className="tts-header">
|
| 131 |
+
<h2>Text-to-Speech Test</h2>
|
| 132 |
+
<p className="subtitle">Enter text and hear it spoken aloud</p>
|
| 133 |
+
</div>
|
| 134 |
+
|
| 135 |
+
{error && (
|
| 136 |
+
<div className="error-message">
|
| 137 |
+
<span>⚠️ {error}</span>
|
| 138 |
+
</div>
|
| 139 |
+
)}
|
| 140 |
+
|
| 141 |
+
<div className="text-input-section">
|
| 142 |
+
<div className="input-header">
|
| 143 |
+
<label htmlFor="text-input">Enter Text</label>
|
| 144 |
+
<span className="char-count">{text.length} characters</span>
|
| 145 |
+
</div>
|
| 146 |
+
<textarea
|
| 147 |
+
id="text-input"
|
| 148 |
+
className="text-input"
|
| 149 |
+
value={text}
|
| 150 |
+
onChange={(e) => setText(e.target.value)}
|
| 151 |
+
placeholder="Type or paste text here to convert to speech..."
|
| 152 |
+
rows={6}
|
| 153 |
+
/>
|
| 154 |
+
</div>
|
| 155 |
+
|
| 156 |
+
<div className="sample-texts">
|
| 157 |
+
<p className="sample-label">Quick samples:</p>
|
| 158 |
+
<div className="sample-buttons">
|
| 159 |
+
{sampleTexts.map((sample, index) => (
|
| 160 |
+
<button
|
| 161 |
+
key={index}
|
| 162 |
+
className="sample-btn"
|
| 163 |
+
onClick={() => loadSampleText(sample)}
|
| 164 |
+
>
|
| 165 |
+
Sample {index + 1}
|
| 166 |
+
</button>
|
| 167 |
+
))}
|
| 168 |
+
</div>
|
| 169 |
+
</div>
|
| 170 |
+
|
| 171 |
+
<div className="voice-selector">
|
| 172 |
+
<label htmlFor="voice-select">
|
| 173 |
+
<Settings size={16} />
|
| 174 |
+
Voice
|
| 175 |
+
</label>
|
| 176 |
+
<select
|
| 177 |
+
id="voice-select"
|
| 178 |
+
value={selectedVoice?.name || ''}
|
| 179 |
+
onChange={(e) => {
|
| 180 |
+
const voice = voices.find(v => v.name === e.target.value);
|
| 181 |
+
setSelectedVoice(voice);
|
| 182 |
+
}}
|
| 183 |
+
className="voice-select"
|
| 184 |
+
>
|
| 185 |
+
{voices.map((voice) => (
|
| 186 |
+
<option key={voice.name} value={voice.name}>
|
| 187 |
+
{voice.name} ({voice.lang})
|
| 188 |
+
</option>
|
| 189 |
+
))}
|
| 190 |
+
</select>
|
| 191 |
+
</div>
|
| 192 |
+
|
| 193 |
+
<div className="settings-section">
|
| 194 |
+
<button
|
| 195 |
+
className="settings-toggle"
|
| 196 |
+
onClick={() => setShowSettings(!showSettings)}
|
| 197 |
+
>
|
| 198 |
+
<Settings size={18} />
|
| 199 |
+
<span>{showSettings ? 'Hide' : 'Show'} Advanced Settings</span>
|
| 200 |
+
</button>
|
| 201 |
+
|
| 202 |
+
{showSettings && (
|
| 203 |
+
<div className="settings-panel">
|
| 204 |
+
<div className="setting-control">
|
| 205 |
+
<label>
|
| 206 |
+
Speed: {rate.toFixed(1)}x
|
| 207 |
+
</label>
|
| 208 |
+
<input
|
| 209 |
+
type="range"
|
| 210 |
+
min="0.5"
|
| 211 |
+
max="2"
|
| 212 |
+
step="0.1"
|
| 213 |
+
value={rate}
|
| 214 |
+
onChange={(e) => setRate(parseFloat(e.target.value))}
|
| 215 |
+
className="slider"
|
| 216 |
+
/>
|
| 217 |
+
<div className="slider-labels">
|
| 218 |
+
<span>Slow</span>
|
| 219 |
+
<span>Normal</span>
|
| 220 |
+
<span>Fast</span>
|
| 221 |
+
</div>
|
| 222 |
+
</div>
|
| 223 |
+
|
| 224 |
+
<div className="setting-control">
|
| 225 |
+
<label>
|
| 226 |
+
Pitch: {pitch.toFixed(1)}
|
| 227 |
+
</label>
|
| 228 |
+
<input
|
| 229 |
+
type="range"
|
| 230 |
+
min="0.5"
|
| 231 |
+
max="2"
|
| 232 |
+
step="0.1"
|
| 233 |
+
value={pitch}
|
| 234 |
+
onChange={(e) => setPitch(parseFloat(e.target.value))}
|
| 235 |
+
className="slider"
|
| 236 |
+
/>
|
| 237 |
+
<div className="slider-labels">
|
| 238 |
+
<span>Low</span>
|
| 239 |
+
<span>Normal</span>
|
| 240 |
+
<span>High</span>
|
| 241 |
+
</div>
|
| 242 |
+
</div>
|
| 243 |
+
|
| 244 |
+
<div className="setting-control">
|
| 245 |
+
<label>
|
| 246 |
+
Volume: {Math.round(volume * 100)}%
|
| 247 |
+
</label>
|
| 248 |
+
<input
|
| 249 |
+
type="range"
|
| 250 |
+
min="0"
|
| 251 |
+
max="1"
|
| 252 |
+
step="0.1"
|
| 253 |
+
value={volume}
|
| 254 |
+
onChange={(e) => setVolume(parseFloat(e.target.value))}
|
| 255 |
+
className="slider"
|
| 256 |
+
/>
|
| 257 |
+
<div className="slider-labels">
|
| 258 |
+
<span>Quiet</span>
|
| 259 |
+
<span>Normal</span>
|
| 260 |
+
<span>Loud</span>
|
| 261 |
+
</div>
|
| 262 |
+
</div>
|
| 263 |
+
|
| 264 |
+
<button className="reset-btn" onClick={reset}>
|
| 265 |
+
<RotateCcw size={16} />
|
| 266 |
+
Reset to Defaults
|
| 267 |
+
</button>
|
| 268 |
+
</div>
|
| 269 |
+
)}
|
| 270 |
+
</div>
|
| 271 |
+
|
| 272 |
+
<div className="controls">
|
| 273 |
+
{!isSpeaking ? (
|
| 274 |
+
<button className="btn btn-primary" onClick={speak}>
|
| 275 |
+
<Play size={20} />
|
| 276 |
+
<span>Speak</span>
|
| 277 |
+
</button>
|
| 278 |
+
) : (
|
| 279 |
+
<>
|
| 280 |
+
{!isPaused ? (
|
| 281 |
+
<button className="btn btn-warning" onClick={pause}>
|
| 282 |
+
<Pause size={20} />
|
| 283 |
+
<span>Pause</span>
|
| 284 |
+
</button>
|
| 285 |
+
) : (
|
| 286 |
+
<button className="btn btn-success" onClick={resume}>
|
| 287 |
+
<Play size={20} />
|
| 288 |
+
<span>Resume</span>
|
| 289 |
+
</button>
|
| 290 |
+
)}
|
| 291 |
+
<button className="btn btn-danger" onClick={stop}>
|
| 292 |
+
<VolumeX size={20} />
|
| 293 |
+
<span>Stop</span>
|
| 294 |
+
</button>
|
| 295 |
+
</>
|
| 296 |
+
)}
|
| 297 |
+
</div>
|
| 298 |
+
|
| 299 |
+
{isSpeaking && (
|
| 300 |
+
<div className="speaking-indicator">
|
| 301 |
+
<div className="sound-wave">
|
| 302 |
+
{[...Array(5)].map((_, i) => (
|
| 303 |
+
<div
|
| 304 |
+
key={i}
|
| 305 |
+
className="wave-bar"
|
| 306 |
+
style={{ animationDelay: `${i * 0.1}s` }}
|
| 307 |
+
/>
|
| 308 |
+
))}
|
| 309 |
+
</div>
|
| 310 |
+
<span>{isPaused ? 'Paused' : 'Speaking...'}</span>
|
| 311 |
+
</div>
|
| 312 |
+
)}
|
| 313 |
+
|
| 314 |
+
<div className="info-box">
|
| 315 |
+
<h4>💡 Tips:</h4>
|
| 316 |
+
<ul>
|
| 317 |
+
<li>Choose different voices to hear various accents and styles</li>
|
| 318 |
+
<li>Adjust speed, pitch, and volume for customized speech</li>
|
| 319 |
+
<li>Works in all modern browsers (Chrome, Firefox, Safari, Edge)</li>
|
| 320 |
+
<li>Try longer texts to test natural speech flow</li>
|
| 321 |
+
</ul>
|
| 322 |
+
</div>
|
| 323 |
+
</div>
|
| 324 |
+
);
|
| 325 |
+
}
|
| 326 |
+
|
| 327 |
+
export default TextToSpeech;
|
web_demo/src/index.css
ADDED
|
@@ -0,0 +1,47 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
:root {
|
| 2 |
+
--bg-primary: #0f172a;
|
| 3 |
+
--bg-secondary: #1e293b;
|
| 4 |
+
--text-primary: #f8fafc;
|
| 5 |
+
--text-secondary: #94a3b8;
|
| 6 |
+
--accent-primary: #3b82f6;
|
| 7 |
+
--accent-hover: #2563eb;
|
| 8 |
+
--accent-glow: rgba(59, 130, 246, 0.5);
|
| 9 |
+
--success: #10b981;
|
| 10 |
+
--error: #ef4444;
|
| 11 |
+
--font-sans: 'Inter', system-ui, -apple-system, sans-serif;
|
| 12 |
+
}
|
| 13 |
+
|
| 14 |
+
body {
|
| 15 |
+
margin: 0;
|
| 16 |
+
padding: 0;
|
| 17 |
+
background-color: var(--bg-primary);
|
| 18 |
+
color: var(--text-primary);
|
| 19 |
+
font-family: var(--font-sans);
|
| 20 |
+
-webkit-font-smoothing: antialiased;
|
| 21 |
+
min-height: 100vh;
|
| 22 |
+
}
|
| 23 |
+
|
| 24 |
+
#root {
|
| 25 |
+
width: 100%;
|
| 26 |
+
min-height: 100vh;
|
| 27 |
+
display: flex;
|
| 28 |
+
flex-direction: column;
|
| 29 |
+
}
|
| 30 |
+
|
| 31 |
+
/* Scrollbar */
|
| 32 |
+
::-webkit-scrollbar {
|
| 33 |
+
width: 8px;
|
| 34 |
+
}
|
| 35 |
+
|
| 36 |
+
::-webkit-scrollbar-track {
|
| 37 |
+
background: var(--bg-primary);
|
| 38 |
+
}
|
| 39 |
+
|
| 40 |
+
::-webkit-scrollbar-thumb {
|
| 41 |
+
background: var(--bg-secondary);
|
| 42 |
+
border-radius: 4px;
|
| 43 |
+
}
|
| 44 |
+
|
| 45 |
+
::-webkit-scrollbar-thumb:hover {
|
| 46 |
+
background: #334155;
|
| 47 |
+
}
|
web_demo/src/main.jsx
ADDED
|
@@ -0,0 +1,10 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import { StrictMode } from 'react'
|
| 2 |
+
import { createRoot } from 'react-dom/client'
|
| 3 |
+
import './index.css'
|
| 4 |
+
import App from './App.jsx'
|
| 5 |
+
|
| 6 |
+
createRoot(document.getElementById('root')).render(
|
| 7 |
+
<StrictMode>
|
| 8 |
+
<App />
|
| 9 |
+
</StrictMode>,
|
| 10 |
+
)
|
web_demo/vite.config.js
ADDED
|
@@ -0,0 +1,7 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import { defineConfig } from 'vite'
|
| 2 |
+
import react from '@vitejs/plugin-react'
|
| 3 |
+
|
| 4 |
+
// https://vite.dev/config/
|
| 5 |
+
export default defineConfig({
|
| 6 |
+
plugins: [react()],
|
| 7 |
+
})
|