mozzicato commited on
Commit
3dca110
Β·
verified Β·
1 Parent(s): 9e9b7ad

Upload voc6.py

Browse files
Files changed (1) hide show
  1. voc6.py +2338 -0
voc6.py ADDED
@@ -0,0 +1,2338 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # -*- coding: utf-8 -*-
2
+ """voc6.ipynb
3
+
4
+ Automatically generated by Colab.
5
+
6
+ Original file is located at
7
+ https://colab.research.google.com/drive/17WecCovbP3TgYvHDyZ4Yckj77r2q5Nam
8
+ """
9
+
10
+ !pip install langchain langchain-google-genai langchain-core sentence-transformers faiss-cpu numpy gradio
11
+ !pip install langchain-google-genai
12
+ # Cell 1: Install packages
13
+ !pip install spitch gradio pydub python-dotenv
14
+
15
+ # Cell to add FIRST - Your Original WemaRAGSystem
16
+ import json
17
+ import re
18
+ from typing import List, Dict, Tuple
19
+ import numpy as np
20
+ import faiss
21
+ from sentence_transformers import SentenceTransformer
22
+ from dataclasses import dataclass
23
+ import pickle
24
+ import os
25
+ import io
26
+ from typing import Optional
27
+ from spitch import Spitch
28
+ import gradio as gr
29
+ from google.colab import userdata
30
+
31
+ # ============================================================================
32
+ # Wema Bank Voice-Enabled RAG Chatbot with Spitch Integration - CORRECTED
33
+ # ============================================================================
34
+
35
+ import tempfile
36
+ import os
37
+ import atexit
38
+ import glob
39
+ import io
40
+ from typing import Optional
41
+ from spitch import Spitch
42
+ import gradio as gr
43
+ from google.colab import userdata
44
+
45
+
46
+ # ============================================================================
47
+ # STEP 1: Initialize Spitch Client
48
+ # ============================================================================
49
+
50
+ class SpitchVoiceHandler:
51
+ """
52
+ Handles all voice-related operations using Spitch API.
53
+ Supports multilingual speech-to-text and text-to-speech.
54
+ """
55
+
56
+ def __init__(self, api_key: str):
57
+ """
58
+ Initialize Spitch client.
59
+
60
+ Args:
61
+ api_key: Your Spitch API key
62
+ """
63
+ self.client = Spitch(api_key=api_key)
64
+
65
+ def transcribe_audio(
66
+ self,
67
+ audio_file,
68
+ source_language: str = "en",
69
+ model: str = "mansa_v1"
70
+ ) -> str:
71
+ """
72
+ Transcribe audio to text using Spitch.
73
+ Supports multiple African and international languages.
74
+
75
+ Args:
76
+ audio_file: Audio file path or file-like object
77
+ source_language: Language code (e.g., 'en', 'yo', 'ig', 'ha')
78
+ model: Spitch model to use (default: mansa_v1)
79
+
80
+ Returns:
81
+ Transcribed text
82
+ """
83
+ try:
84
+ print(f"🎀 Transcribing audio file: {audio_file}")
85
+
86
+ # If audio_file is a path, open it
87
+ if isinstance(audio_file, str):
88
+ with open(audio_file, 'rb') as f:
89
+ response = self.client.speech.transcribe(
90
+ content=f,
91
+ language=source_language,
92
+ model=model
93
+ )
94
+ else:
95
+ # Assume it's already a file-like object (from Gradio)
96
+ response = self.client.speech.transcribe(
97
+ content=audio_file,
98
+ language=source_language,
99
+ model=model
100
+ )
101
+
102
+ print(f"Response type: {type(response)}")
103
+
104
+ # βœ… Spitch transcribe returns a response object with .text or json()
105
+ if hasattr(response, 'text') and callable(response.text):
106
+ # It's a method, not an attribute
107
+ transcription_text = response.text()
108
+ elif hasattr(response, 'text'):
109
+ # It's an attribute
110
+ transcription_text = response.text
111
+ elif hasattr(response, 'json'):
112
+ # Try to parse JSON response
113
+ json_data = response.json()
114
+ transcription_text = json_data.get('text', str(json_data))
115
+ else:
116
+ # Try to convert response to string
117
+ transcription_text = str(response)
118
+
119
+ print(f"βœ… Transcription: {transcription_text}")
120
+ return transcription_text
121
+
122
+ except Exception as e:
123
+ print(f"❌ Transcription error: {e}")
124
+ import traceback
125
+ traceback.print_exc()
126
+ return f"Sorry, I couldn't understand the audio. Error: {str(e)}"
127
+
128
+ def translate_to_english(self, text: str, source_lang: str = "auto") -> str:
129
+ """
130
+ Translate text to English using Spitch translation API.
131
+
132
+ Args:
133
+ text: Text to translate
134
+ source_lang: Source language code or 'auto' for auto-detection
135
+
136
+ Returns:
137
+ Translated text in English
138
+ """
139
+ try:
140
+ # If already in English, return as is
141
+ if source_lang == "en":
142
+ return text
143
+
144
+ translation = self.client.text.translate(
145
+ text=text,
146
+ source=source_lang,
147
+ target="en"
148
+ )
149
+ return translation.text
150
+
151
+ except Exception as e:
152
+ print(f"Translation error: {e}")
153
+ return text # Return original if translation fails
154
+
155
+ def synthesize_speech(
156
+ self,
157
+ text: str,
158
+ target_language: str = "en",
159
+ voice: str = "lina"
160
+ ) -> bytes:
161
+ """
162
+ Convert text to speech using Spitch TTS.
163
+
164
+ Args:
165
+ text: Text to convert to speech
166
+ target_language: Target language for speech
167
+ voice: Voice to use (e.g., 'lina', 'ada', 'kofi')
168
+
169
+ Returns:
170
+ Audio bytes
171
+ """
172
+ try:
173
+ # Call Spitch TTS API
174
+ response = self.client.speech.generate(
175
+ text=text,
176
+ language=target_language,
177
+ voice=voice
178
+ )
179
+
180
+ # βœ… FIX: Spitch returns BinaryAPIResponse, use .read() to get bytes
181
+ if hasattr(response, 'read'):
182
+ audio_bytes = response.read()
183
+ print(f"βœ… TTS generated {len(audio_bytes)} bytes of audio")
184
+ return audio_bytes
185
+ else:
186
+ print(f"❌ Response type: {type(response)}")
187
+ print(f"❌ Response attributes: {dir(response)}")
188
+ return None
189
+
190
+ except Exception as e:
191
+ print(f"❌ TTS error: {e}")
192
+ import traceback
193
+ traceback.print_exc()
194
+ return None
195
+
196
+
197
+ # ============================================================================
198
+ # STEP 2: Integrate Voice with Your LangChain RAG System
199
+ # ============================================================================
200
+
201
+ class WemaVoiceAssistant:
202
+ """
203
+ Complete voice-enabled assistant combining Spitch voice I/O
204
+ with your existing Wema RAG system.
205
+ """
206
+
207
+ def __init__(
208
+ self,
209
+ rag_system,
210
+ chain,
211
+ spitch_api_key: str
212
+ ):
213
+ """
214
+ Initialize the voice assistant.
215
+
216
+ Args:
217
+ rag_system: Your initialized WemaRAGSystem
218
+ chain: Your LangChain RAG chain (already created)
219
+ spitch_api_key: Spitch API key
220
+ """
221
+ self.rag_system = rag_system
222
+ self.voice_handler = SpitchVoiceHandler(spitch_api_key)
223
+ self.chain = chain
224
+
225
+ def process_voice_query(
226
+ self,
227
+ audio_input,
228
+ input_language: str = "en",
229
+ output_language: str = "en",
230
+ voice: str = "lina"
231
+ ):
232
+ """
233
+ Complete voice interaction pipeline:
234
+ 1. Speech to text (any language)
235
+ 2. Translate to English if needed
236
+ 3. Query RAG system
237
+ 4. Generate response
238
+ 5. Translate response if needed
239
+ 6. Text to speech
240
+
241
+ Args:
242
+ audio_input: Audio file from user
243
+ input_language: User's spoken language
244
+ output_language: Desired response language
245
+ voice: TTS voice to use
246
+
247
+ Returns:
248
+ tuple: (response_text, response_audio)
249
+ """
250
+ try:
251
+ # Step 1: Transcribe audio to text
252
+ print(f"Transcribing audio in {input_language}...")
253
+ transcribed_text = self.voice_handler.transcribe_audio(
254
+ audio_input,
255
+ source_language=input_language
256
+ )
257
+ print(f"Transcribed: {transcribed_text}")
258
+
259
+ # Step 2: Translate to English if not already
260
+ if input_language != "en":
261
+ print("Translating to English...")
262
+ english_query = self.voice_handler.translate_to_english(
263
+ transcribed_text,
264
+ source_lang=input_language
265
+ )
266
+ else:
267
+ english_query = transcribed_text
268
+
269
+ print(f"English query: {english_query}")
270
+
271
+ # Step 3: Get response from RAG system (in English)
272
+ print("Querying RAG system...")
273
+ response_text = self.chain.invoke({"query": english_query})
274
+ print(f"RAG response: {response_text[:100]}...")
275
+
276
+ # Step 4: Translate response if needed
277
+ if output_language != "en":
278
+ print(f"Translating response to {output_language}...")
279
+ translation = self.voice_handler.client.text.translate(
280
+ text=response_text,
281
+ source="en",
282
+ target=output_language
283
+ )
284
+ final_text = translation.text
285
+ else:
286
+ final_text = response_text
287
+
288
+ # Step 5: Generate speech
289
+ print("Generating speech...")
290
+ audio_response = self.voice_handler.synthesize_speech(
291
+ final_text,
292
+ target_language=output_language,
293
+ voice=voice
294
+ )
295
+
296
+ return final_text, audio_response
297
+
298
+ except Exception as e:
299
+ error_msg = f"An error occurred: {str(e)}"
300
+ print(error_msg)
301
+ return error_msg, None
302
+
303
+
304
+ # ============================================================================
305
+ # STEP 3: Helper Functions for Audio File Management
306
+ # ============================================================================
307
+
308
+ def save_audio_to_temp_file(audio_bytes):
309
+ """Save audio bytes to a temporary file and return the path."""
310
+ if audio_bytes is None:
311
+ return None
312
+
313
+ temp_file = tempfile.NamedTemporaryFile(delete=False, suffix='.mp3')
314
+ temp_file.write(audio_bytes)
315
+ temp_file.close()
316
+
317
+ return temp_file.name
318
+
319
+
320
+ def cleanup_temp_audio_files():
321
+ """Clean up temporary audio files on exit."""
322
+ temp_dir = tempfile.gettempdir()
323
+ for temp_file in glob.glob(os.path.join(temp_dir, "tmp*.mp3")):
324
+ try:
325
+ os.remove(temp_file)
326
+ except:
327
+ pass
328
+
329
+
330
+ # Register cleanup function to run on exit
331
+ atexit.register(cleanup_temp_audio_files)
332
+
333
+
334
+ # ============================================================================
335
+ # STEP 4: Create Gradio Interface (With Text AND Voice Options)
336
+ # ============================================================================
337
+
338
+ def create_voice_gradio_interface(
339
+ rag_system,
340
+ chain,
341
+ spitch_api_key: str
342
+ ):
343
+ """
344
+ Create a Gradio interface with BOTH text and voice input/output capabilities.
345
+
346
+ Args:
347
+ rag_system: Your initialized WemaRAGSystem
348
+ chain: Your LangChain RAG chain (already created)
349
+ spitch_api_key: Spitch API key
350
+
351
+ Returns:
352
+ Gradio Interface
353
+ """
354
+
355
+ # Initialize voice assistant
356
+ assistant = WemaVoiceAssistant(rag_system, chain, spitch_api_key)
357
+
358
+ # βœ… CORRECT: Exact voice-language mapping from Spitch documentation
359
+ LANGUAGE_CONFIG = {
360
+ "English": {
361
+ "code": "en",
362
+ "voices": ["john", "lucy", "lina", "jude", "henry", "kani", "kingsley",
363
+ "favour", "comfort", "daniel", "remi"]
364
+ },
365
+ "Yoruba": {
366
+ "code": "yo",
367
+ "voices": ["sade", "funmi", "segun", "femi"]
368
+ },
369
+ "Igbo": {
370
+ "code": "ig",
371
+ "voices": ["obinna", "ngozi", "amara", "ebuka"]
372
+ },
373
+ "Hausa": {
374
+ "code": "ha",
375
+ "voices": ["hasan", "amina", "zainab", "aliyu"]
376
+ }
377
+ }
378
+
379
+ # Extract just language names for dropdowns
380
+ ALL_LANGUAGES = list(LANGUAGE_CONFIG.keys())
381
+
382
+ # βœ… FIXED: Only voices that actually exist in Spitch
383
+ # Check Spitch docs for exact voice names
384
+ VOICES = ["lina", "ada", "kofi"] # Verify these exist
385
+
386
+ def handle_text_query(text_input):
387
+ """Handle text-only queries."""
388
+ if not text_input or text_input.strip() == "":
389
+ return "Please enter a question.", None
390
+
391
+ try:
392
+ response = chain.invoke({"query": text_input})
393
+ return response, None
394
+ except Exception as e:
395
+ return f"Error: {str(e)}", None
396
+
397
+ def update_voices(language):
398
+ """Update voice dropdown based on selected language."""
399
+ voices = LANGUAGE_CONFIG.get(language, {}).get("voices", ["lina"])
400
+ return gr.Dropdown(choices=voices, value=voices[0])
401
+
402
+ def handle_voice_interaction(audio, input_lang, output_lang, voice):
403
+ """Gradio handler function for voice - FIXED VERSION."""
404
+ print("="*60)
405
+ print("VOICE INTERACTION STARTED")
406
+ print(f"Audio input: {audio}")
407
+ print(f"Input language: {input_lang}")
408
+ print(f"Output language: {output_lang}")
409
+ print(f"Voice: {voice}")
410
+ print("="*60)
411
+
412
+ if audio is None:
413
+ return "Please record or upload audio.", None
414
+
415
+ # Get language codes and voices
416
+ input_config = LANGUAGE_CONFIG.get(input_lang, LANGUAGE_CONFIG["English"])
417
+ output_config = LANGUAGE_CONFIG.get(output_lang, LANGUAGE_CONFIG["English"])
418
+
419
+ input_code = input_config["code"]
420
+ output_code = output_config["code"]
421
+
422
+ # Validate voice for output language
423
+ available_voices = output_config["voices"]
424
+ if voice not in available_voices:
425
+ voice = available_voices[0]
426
+ print(f"⚠️ Voice changed to {voice} for {output_lang}")
427
+
428
+ try:
429
+ # Process voice query
430
+ print("\n🎀 Processing voice query...")
431
+
432
+ # Step 1: Transcribe (supports more languages)
433
+ transcribed_text = assistant.voice_handler.transcribe_audio(
434
+ audio,
435
+ source_language=input_code
436
+ )
437
+ print(f"πŸ“ Transcribed: {transcribed_text}")
438
+
439
+ # Step 2: Translate to English if needed
440
+ if input_code != "en":
441
+ print("🌍 Translating to English...")
442
+ english_query = assistant.voice_handler.translate_to_english(
443
+ transcribed_text,
444
+ source_lang=input_code
445
+ )
446
+ else:
447
+ english_query = transcribed_text
448
+
449
+ print(f"πŸ‡¬πŸ‡§ English query: {english_query}")
450
+
451
+ # Step 3: Get RAG response
452
+ print("πŸ” Querying RAG system...")
453
+ response_text = assistant.chain.invoke({"query": english_query})
454
+ print(f"βœ… RAG response: {response_text[:100]}...")
455
+
456
+ # Step 4: Translate response text if needed
457
+ if output_code != "en":
458
+ print(f"🌍 Translating response to {output_lang}...")
459
+ try:
460
+ translation = assistant.voice_handler.client.text.translate(
461
+ text=response_text,
462
+ source="en",
463
+ target=output_code
464
+ )
465
+ final_text = translation.text
466
+ except Exception as e:
467
+ print(f"⚠️ Translation failed: {e}, using English")
468
+ final_text = response_text
469
+ else:
470
+ final_text = response_text
471
+
472
+ # Step 5: Generate speech in the target language with correct voice
473
+ print(f"πŸ”Š Generating speech in {output_lang} with voice {voice}...")
474
+ audio_bytes = assistant.voice_handler.synthesize_speech(
475
+ final_text,
476
+ target_language=output_code,
477
+ voice=voice
478
+ )
479
+
480
+ print(f"πŸ”Š Audio bytes type: {type(audio_bytes)}")
481
+ print(f"πŸ”Š Audio bytes length: {len(audio_bytes) if audio_bytes else 0}")
482
+
483
+ # βœ… FIX: Convert audio bytes to file path
484
+ audio_file_path = None
485
+ if audio_bytes:
486
+ print("\nπŸ’Ύ Saving audio to temp file...")
487
+ audio_file_path = save_audio_to_temp_file(audio_bytes)
488
+ print(f"βœ… Audio saved to: {audio_file_path}")
489
+
490
+ # Verify file exists and has content
491
+ if audio_file_path and os.path.exists(audio_file_path):
492
+ file_size = os.path.getsize(audio_file_path)
493
+ print(f"βœ… File size: {file_size} bytes")
494
+ else:
495
+ print("❌ File was not created properly!")
496
+ else:
497
+ print("❌ No audio bytes received from TTS")
498
+
499
+ print("="*60)
500
+ return final_text, audio_file_path
501
+
502
+ except Exception as e:
503
+ error_msg = f"Error processing voice: {str(e)}"
504
+ print(f"\n❌ ERROR: {error_msg}")
505
+ import traceback
506
+ traceback.print_exc()
507
+ print("="*60)
508
+ return error_msg, None
509
+
510
+ # Create Gradio interface with BOTH text and voice
511
+ with gr.Blocks(theme=gr.themes.Soft()) as demo:
512
+ gr.Markdown("""
513
+ # 🏦 Wema Bank AI Assistant
514
+ ### Powered by Spitch AI & LangChain RAG
515
+
516
+ Choose how you want to interact: Type or Speak!
517
+ """)
518
+
519
+ with gr.Tabs():
520
+ # TEXT TAB
521
+ with gr.Tab("πŸ’¬ Text Chat"):
522
+ gr.Markdown("### Type your banking questions")
523
+
524
+ text_input = gr.Textbox(
525
+ label="Your Question",
526
+ placeholder="Ask me anything about Wema Bank products and services...",
527
+ lines=3
528
+ )
529
+
530
+ text_submit_btn = gr.Button("πŸ“€ Send", variant="primary", size="lg")
531
+
532
+ text_output = gr.Textbox(
533
+ label="Response",
534
+ lines=10,
535
+ interactive=False
536
+ )
537
+
538
+ # Examples for text
539
+ gr.Examples(
540
+ examples=[
541
+ ["What is ALAT?"],
542
+ ["How do I open a savings account?"],
543
+ ["Tell me about Wema Kiddies Account"],
544
+ ["How can I avoid phishing scams?"],
545
+ ["What loans does Wema Bank offer?"]
546
+ ],
547
+ inputs=text_input,
548
+ label="πŸ’‘ Try these questions"
549
+ )
550
+
551
+ text_submit_btn.click(
552
+ fn=handle_text_query,
553
+ inputs=text_input,
554
+ outputs=[text_output, gr.Audio(visible=False)]
555
+ )
556
+
557
+ # Also submit on Enter
558
+ text_input.submit(
559
+ fn=handle_text_query,
560
+ inputs=text_input,
561
+ outputs=[text_output, gr.Audio(visible=False)]
562
+ )
563
+
564
+ # VOICE TAB
565
+ with gr.Tab("🎀 Voice Chat"):
566
+ gr.Markdown("""
567
+ ### Speak your banking questions in your language!
568
+
569
+ **βœ… Fully Supported Nigerian Languages:**
570
+ - πŸ‡¬πŸ‡§ **English** - 11 voices available
571
+ - πŸ‡³πŸ‡¬ **Yoruba** - 4 voices (Sade, Funmi, Segun, Femi)
572
+ - πŸ‡³πŸ‡¬ **Igbo** - 4 voices (Obinna, Ngozi, Amara, Ebuka)
573
+ - πŸ‡³πŸ‡¬ **Hausa** - 4 voices (Hasan, Amina, Zainab, Aliyu)
574
+
575
+ Speak naturally and get responses in both text and audio in your preferred language!
576
+ """)
577
+
578
+ with gr.Row():
579
+ with gr.Column():
580
+ audio_input = gr.Audio(
581
+ sources=["microphone", "upload"],
582
+ type="filepath",
583
+ label="πŸŽ™οΈ Record or Upload Audio"
584
+ )
585
+
586
+ input_language = gr.Dropdown(
587
+ choices=ALL_LANGUAGES,
588
+ value="English",
589
+ label="Your Language (Speech Input)"
590
+ )
591
+
592
+ with gr.Column():
593
+ output_language = gr.Dropdown(
594
+ choices=ALL_LANGUAGES,
595
+ value="English",
596
+ label="Response Language (Audio Output)"
597
+ )
598
+
599
+ voice_selection = gr.Dropdown(
600
+ choices=LANGUAGE_CONFIG["English"]["voices"],
601
+ value="lina",
602
+ label="Voice"
603
+ )
604
+
605
+ # Update voices when output language changes
606
+ output_language.change(
607
+ fn=update_voices,
608
+ inputs=output_language,
609
+ outputs=voice_selection
610
+ )
611
+
612
+ voice_submit_btn = gr.Button("πŸš€ Ask Wema Assist", variant="primary", size="lg")
613
+
614
+ voice_text_output = gr.Textbox(
615
+ label="πŸ“ Text Response",
616
+ lines=8,
617
+ interactive=False
618
+ )
619
+
620
+ voice_audio_output = gr.Audio(
621
+ label="πŸ”Š Audio Response",
622
+ type="filepath" # βœ… Important: must be filepath
623
+ )
624
+
625
+ voice_submit_btn.click(
626
+ fn=handle_voice_interaction,
627
+ inputs=[audio_input, input_language, output_language, voice_selection],
628
+ outputs=[voice_text_output, voice_audio_output]
629
+ )
630
+
631
+ gr.Markdown("""
632
+ ---
633
+ ### πŸ“Œ Features
634
+ - **Text Chat**: Fast and simple - just type and get instant responses
635
+ - **Voice Chat**: Full support for Nigerian languages!
636
+
637
+ ### πŸ‡³πŸ‡¬ Supported Nigerian Languages
638
+ βœ… **English** - 11 different voices (male & female)
639
+ βœ… **Yoruba** - E ku ọjọ! (4 authentic Yoruba voices)
640
+ βœ… **Igbo** - Nnọọ! (4 authentic Igbo voices)
641
+ βœ… **Hausa** - Sannu! (4 authentic Hausa voices)
642
+
643
+ πŸ’‘ **All features work in every language:**
644
+ - 🎀 Speak your question in your language
645
+ - πŸ“ Get text response translated
646
+ - πŸ”Š Hear authentic audio response in your language
647
+ - πŸ”„ Seamless translation between languages
648
+ """)
649
+
650
+ return demo
651
+
652
+
653
+ # ============================================================================
654
+ # ALTERNATIVE: Simpler Hybrid Interface
655
+ # ============================================================================
656
+
657
+ def create_hybrid_interface(
658
+ rag_system,
659
+ chain,
660
+ spitch_api_key: str
661
+ ):
662
+ """
663
+ Creates a simpler interface supporting both text and voice input.
664
+
665
+ Args:
666
+ rag_system: Your initialized WemaRAGSystem
667
+ chain: Your LangChain RAG chain (already created)
668
+ spitch_api_key: Spitch API key
669
+
670
+ Returns:
671
+ Gradio Interface
672
+ """
673
+
674
+ assistant = WemaVoiceAssistant(rag_system, chain, spitch_api_key)
675
+
676
+ def handle_text_query(text_input):
677
+ """Handle text-only query."""
678
+ try:
679
+ response = chain.invoke({"query": text_input})
680
+ return response, None
681
+ except Exception as e:
682
+ return f"Error: {str(e)}", None
683
+
684
+ def handle_voice_query(audio, input_lang, output_lang, voice):
685
+ """Handle voice query."""
686
+ if audio is None:
687
+ return "Please provide audio input.", None
688
+
689
+ LANGUAGES = {
690
+ "English": "en",
691
+ "Yoruba": "yo",
692
+ "Igbo": "ig",
693
+ "Hausa": "ha"
694
+ }
695
+
696
+ input_code = LANGUAGES.get(input_lang, "en")
697
+ output_code = LANGUAGES.get(output_lang, "en")
698
+
699
+ # Process voice query
700
+ text_response, audio_bytes = assistant.process_voice_query(
701
+ audio,
702
+ input_language=input_code,
703
+ output_language=output_code,
704
+ voice=voice
705
+ )
706
+
707
+ # Convert audio bytes to file path
708
+ audio_file_path = None
709
+ if audio_bytes:
710
+ audio_file_path = save_audio_to_temp_file(audio_bytes)
711
+
712
+ return text_response, audio_file_path
713
+
714
+ # Create tabbed interface
715
+ with gr.Blocks(theme=gr.themes.Soft()) as demo:
716
+ gr.Markdown("# 🏦 Wema Bank AI Assistant")
717
+
718
+ with gr.Tabs():
719
+ # Text Tab
720
+ with gr.Tab("πŸ’¬ Text Chat"):
721
+ text_input = gr.Textbox(
722
+ label="Type your question",
723
+ placeholder="Ask about Wema Bank products and services..."
724
+ )
725
+ text_submit = gr.Button("Send")
726
+ text_output = gr.Textbox(label="Response", lines=10)
727
+
728
+ text_submit.click(
729
+ fn=handle_text_query,
730
+ inputs=text_input,
731
+ outputs=[text_output, gr.Audio(visible=False)]
732
+ )
733
+
734
+ # Voice Tab
735
+ with gr.Tab("🎀 Voice Chat"):
736
+ audio_input = gr.Audio(sources=["microphone", "upload"], type="filepath")
737
+
738
+ with gr.Row():
739
+ input_lang = gr.Dropdown(
740
+ ["English", "Yoruba", "Igbo", "Hausa"],
741
+ value="English",
742
+ label="Input Language"
743
+ )
744
+ output_lang = gr.Dropdown(
745
+ ["English", "Yoruba", "Igbo", "Hausa"],
746
+ value="English",
747
+ label="Output Language"
748
+ )
749
+ voice = gr.Dropdown(
750
+ ["lina", "ada", "kofi"],
751
+ value="lina",
752
+ label="Voice"
753
+ )
754
+
755
+ voice_submit = gr.Button("Ask")
756
+ voice_text_output = gr.Textbox(label="Response Text", lines=8)
757
+ voice_audio_output = gr.Audio(label="Audio Response", type="filepath")
758
+
759
+ voice_submit.click(
760
+ fn=handle_voice_query,
761
+ inputs=[audio_input, input_lang, output_lang, voice],
762
+ outputs=[voice_text_output, voice_audio_output]
763
+ )
764
+
765
+ return demo
766
+
767
+ @dataclass
768
+ class DocumentChunk:
769
+ """Represents a chunk of text with metadata."""
770
+ text: str
771
+ metadata: Dict
772
+ chunk_id: int
773
+
774
+ class WemaDocumentChunker:
775
+ """Handles intelligent chunking of Wema Bank documents."""
776
+
777
+ def __init__(self, chunk_size: int = 800, overlap: int = 150):
778
+ """
779
+ Initialize the chunker.
780
+
781
+ Args:
782
+ chunk_size: Target size for each chunk in characters
783
+ overlap: Number of characters to overlap between chunks
784
+ """
785
+ self.chunk_size = chunk_size
786
+ self.overlap = overlap
787
+
788
+ def identify_sections(self, text: str) -> List[Tuple[str, str]]:
789
+ """
790
+ Identify logical sections in the document.
791
+
792
+ Returns:
793
+ List of tuples (section_title, section_content)
794
+ """
795
+ sections = []
796
+
797
+ # Common section headers in banking documents
798
+ section_patterns = [
799
+ r'(Avoiding Financial and Phishing Scams)',
800
+ r'(Keeping Your Card.*?Safe)',
801
+ r'(E-mails and calls from.*?)',
802
+ r'(Scam Alert Tips)',
803
+ r'(Guard Yourself)',
804
+ r'(Bank Verification Number)',
805
+ r'(Personal Banking)',
806
+ r'(Business Banking)',
807
+ r'(Corporate Banking)',
808
+ r'(.*?Account)',
809
+ r'(.*?Loan.*?)',
810
+ ]
811
+
812
+ # Try to split by recognizable headers
813
+ combined_pattern = '|'.join(section_patterns)
814
+ matches = list(re.finditer(combined_pattern, text, re.IGNORECASE))
815
+
816
+ if matches:
817
+ for i, match in enumerate(matches):
818
+ start = match.start()
819
+ end = matches[i + 1].start() if i + 1 < len(matches) else len(text)
820
+ section_title = match.group(0).strip()
821
+ section_content = text[start:end].strip()
822
+ sections.append((section_title, section_content))
823
+ else:
824
+ # If no clear sections, treat as one section
825
+ sections.append(("General Content", text))
826
+
827
+ return sections
828
+
829
+ def chunk_text(self, text: str, metadata: Dict) -> List[DocumentChunk]:
830
+ """
831
+ Chunk text with semantic awareness and overlap.
832
+
833
+ Args:
834
+ text: Text to chunk
835
+ metadata: Metadata to attach to chunks
836
+
837
+ Returns:
838
+ List of DocumentChunk objects
839
+ """
840
+ chunks = []
841
+
842
+ # First, try to identify sections
843
+ sections = self.identify_sections(text)
844
+
845
+ chunk_id = 0
846
+ for section_title, section_content in sections:
847
+ # If section is smaller than chunk_size, keep it whole
848
+ if len(section_content) <= self.chunk_size:
849
+ chunk_metadata = metadata.copy()
850
+ chunk_metadata['section'] = section_title
851
+ chunks.append(DocumentChunk(
852
+ text=section_content,
853
+ metadata=chunk_metadata,
854
+ chunk_id=chunk_id
855
+ ))
856
+ chunk_id += 1
857
+ else:
858
+ # Split section into smaller chunks with overlap
859
+ sentences = self._split_into_sentences(section_content)
860
+ current_chunk = []
861
+ current_length = 0
862
+
863
+ for sentence in sentences:
864
+ sentence_length = len(sentence)
865
+
866
+ if current_length + sentence_length > self.chunk_size and current_chunk:
867
+ # Save current chunk
868
+ chunk_text = ' '.join(current_chunk)
869
+ chunk_metadata = metadata.copy()
870
+ chunk_metadata['section'] = section_title
871
+ chunks.append(DocumentChunk(
872
+ text=chunk_text,
873
+ metadata=chunk_metadata,
874
+ chunk_id=chunk_id
875
+ ))
876
+ chunk_id += 1
877
+
878
+ # Keep overlap sentences for next chunk
879
+ overlap_text = chunk_text[-self.overlap:] if len(chunk_text) > self.overlap else chunk_text
880
+ overlap_sentences = self._split_into_sentences(overlap_text)
881
+ current_chunk = overlap_sentences
882
+ current_length = sum(len(s) for s in current_chunk)
883
+
884
+ current_chunk.append(sentence)
885
+ current_length += sentence_length
886
+
887
+ # Add remaining chunk
888
+ if current_chunk:
889
+ chunk_metadata = metadata.copy()
890
+ chunk_metadata['section'] = section_title
891
+ chunks.append(DocumentChunk(
892
+ text=' '.join(current_chunk),
893
+ metadata=chunk_metadata,
894
+ chunk_id=chunk_id
895
+ ))
896
+ chunk_id += 1
897
+
898
+ return chunks
899
+
900
+ def _split_into_sentences(self, text: str) -> List[str]:
901
+ """Split text into sentences."""
902
+ # Simple sentence splitter
903
+ sentences = re.split(r'(?<=[.!?])\s+', text)
904
+ return [s.strip() for s in sentences if s.strip()]
905
+
906
+
907
+ class WemaRAGSystem:
908
+ """Complete RAG system for Wema Bank documents."""
909
+
910
+ def __init__(self, model_name: str = 'sentence-transformers/all-MiniLM-L6-v2'):
911
+ """
912
+ Initialize the RAG system.
913
+
914
+ Args:
915
+ model_name: Name of the sentence transformer model to use
916
+ """
917
+ print(f"Loading embedding model: {model_name}")
918
+ self.model = SentenceTransformer(model_name)
919
+ self.dimension = self.model.get_sentence_embedding_dimension()
920
+ self.index = None
921
+ self.chunks = []
922
+ self.chunker = WemaDocumentChunker()
923
+
924
+ def load_and_process_document(self, json_path: str):
925
+ """
926
+ Load JSON document, chunk it, and create embeddings.
927
+
928
+ Args:
929
+ json_path: Path to the JSON file
930
+ """
931
+ print(f"Loading document from: {json_path}")
932
+
933
+ with open(json_path, 'r', encoding='utf-8') as f:
934
+ data = json.load(f)
935
+
936
+ # Process each document in the JSON
937
+ all_chunks = []
938
+ if isinstance(data, list):
939
+ documents = data
940
+ elif isinstance(data, dict):
941
+ documents = [data]
942
+ else:
943
+ raise ValueError("JSON must contain a document object or list of documents")
944
+
945
+ for doc in documents:
946
+ text = doc.get('text', '')
947
+ metadata = {
948
+ 'url': doc.get('url', ''),
949
+ 'title': doc.get('title', ''),
950
+ 'meta_description': doc.get('meta_description', '')
951
+ }
952
+
953
+ # Chunk the document
954
+ chunks = self.chunker.chunk_text(text, metadata)
955
+ all_chunks.extend(chunks)
956
+ print(f"Created {len(chunks)} chunks from document: {metadata['title'][:50]}...")
957
+
958
+ self.chunks = all_chunks
959
+ print(f"Total chunks created: {len(self.chunks)}")
960
+
961
+ # Generate embeddings
962
+ self._create_embeddings()
963
+
964
+ def _create_embeddings(self):
965
+ """Generate embeddings for all chunks and create FAISS index."""
966
+ print("Generating embeddings...")
967
+
968
+ texts = [chunk.text for chunk in self.chunks]
969
+ embeddings = self.model.encode(texts, show_progress_bar=True)
970
+
971
+ # Create FAISS index
972
+ print("Creating FAISS index...")
973
+ self.index = faiss.IndexFlatL2(self.dimension)
974
+ self.index.add(embeddings.astype('float32'))
975
+
976
+ print(f"FAISS index created with {self.index.ntotal} vectors")
977
+
978
+ def save(self, index_path: str = 'wema_faiss.index',
979
+ chunks_path: str = 'wema_chunks.pkl'):
980
+ """
981
+ Save FAISS index and chunks to disk.
982
+
983
+ Args:
984
+ index_path: Path to save FAISS index
985
+ chunks_path: Path to save chunks metadata
986
+ """
987
+ if self.index is None:
988
+ raise ValueError("No index to save. Process documents first.")
989
+
990
+ print(f"Saving FAISS index to: {index_path}")
991
+ faiss.write_index(self.index, index_path)
992
+
993
+ print(f"Saving chunks metadata to: {chunks_path}")
994
+ with open(chunks_path, 'wb') as f:
995
+ pickle.dump(self.chunks, f)
996
+
997
+ print("Save complete!")
998
+
999
+ def load(self, index_path: str = 'wema_faiss.index',
1000
+ chunks_path: str = 'wema_chunks.pkl'):
1001
+ """
1002
+ Load FAISS index and chunks from disk.
1003
+
1004
+ Args:
1005
+ index_path: Path to FAISS index
1006
+ chunks_path: Path to chunks metadata
1007
+ """
1008
+ print(f"Loading FAISS index from: {index_path}")
1009
+ self.index = faiss.read_index(index_path)
1010
+
1011
+ print(f"Loading chunks metadata from: {chunks_path}")
1012
+ with open(chunks_path, 'rb') as f:
1013
+ self.chunks = pickle.load(f)
1014
+
1015
+ print(f"Loaded {len(self.chunks)} chunks with index size {self.index.ntotal}")
1016
+
1017
+ def search(self, query: str, top_k: int = 5) -> List[Dict]:
1018
+ """
1019
+ Search for relevant chunks given a query.
1020
+
1021
+ Args:
1022
+ query: Search query
1023
+ top_k: Number of results to return
1024
+
1025
+ Returns:
1026
+ List of dictionaries containing chunk text, metadata, and similarity score
1027
+ """
1028
+ if self.index is None:
1029
+ raise ValueError("No index loaded. Load or create an index first.")
1030
+
1031
+ # Encode query
1032
+ query_embedding = self.model.encode([query])[0].astype('float32').reshape(1, -1)
1033
+
1034
+ # Search
1035
+ distances, indices = self.index.search(query_embedding, top_k)
1036
+
1037
+ # Prepare results
1038
+ results = []
1039
+ for i, idx in enumerate(indices[0]):
1040
+ chunk = self.chunks[idx]
1041
+ results.append({
1042
+ 'text': chunk.text,
1043
+ 'metadata': chunk.metadata,
1044
+ 'score': float(distances[0][i]),
1045
+ 'chunk_id': chunk.chunk_id
1046
+ })
1047
+
1048
+ return results
1049
+
1050
+ def get_context_for_rag(self, query: str, top_k: int = 3,
1051
+ max_context_length: int = 2000) -> str:
1052
+ """
1053
+ Get formatted context for RAG applications.
1054
+
1055
+ Args:
1056
+ query: Search query
1057
+ top_k: Number of chunks to retrieve
1058
+ max_context_length: Maximum length of context to return
1059
+
1060
+ Returns:
1061
+ Formatted context string
1062
+ """
1063
+ results = self.search(query, top_k)
1064
+
1065
+ context_parts = []
1066
+ current_length = 0
1067
+
1068
+ for i, result in enumerate(results, 1):
1069
+ chunk_text = result['text']
1070
+ section = result['metadata'].get('section', 'N/A')
1071
+
1072
+ # Format context with source information
1073
+ formatted = f"[Source {i} - {section}]\n{chunk_text}\n"
1074
+
1075
+ if current_length + len(formatted) > max_context_length:
1076
+ break
1077
+
1078
+ context_parts.append(formatted)
1079
+ current_length += len(formatted)
1080
+
1081
+ return "\n".join(context_parts)
1082
+
1083
+ from langchain_core.runnables import RunnablePassthrough, RunnableParallel, RunnableLambda
1084
+ from langchain_core.prompts import ChatPromptTemplate
1085
+ from langchain_core.output_parsers import StrOutputParser
1086
+ from langchain_google_genai import ChatGoogleGenerativeAI
1087
+ import gradio as gr
1088
+ from typing import Dict, Any, List
1089
+ import json
1090
+
1091
+ class WemaDocumentProcessorRunnable:
1092
+ """
1093
+ Wraps the document loading, chunking, embedding, and storing as a LangChain Runnable.
1094
+ This preserves ALL the original WemaRAGSystem functionality.
1095
+ """
1096
+
1097
+ def __init__(self, rag_system):
1098
+ """
1099
+ Initialize with a WemaRAGSystem instance.
1100
+
1101
+ Args:
1102
+ rag_system: An initialized WemaRAGSystem object
1103
+ """
1104
+ self.rag = rag_system
1105
+
1106
+ # Create runnables for each step
1107
+ self.document_loader = RunnableLambda(self._load_document)
1108
+ self.chunker = RunnableLambda(self._chunk_documents)
1109
+ self.embedder = RunnableLambda(self._create_embeddings)
1110
+ self.storer = RunnableLambda(self._store_index)
1111
+
1112
+ # Complete pipeline runnable
1113
+ self.full_pipeline = (
1114
+ self.document_loader
1115
+ | self.chunker
1116
+ | self.embedder
1117
+ | self.storer
1118
+ )
1119
+
1120
+ def _load_document(self, inputs: Dict[str, Any]) -> Dict[str, Any]:
1121
+ """
1122
+ Loads JSON document(s).
1123
+
1124
+ Args:
1125
+ inputs: Dictionary with 'json_path' key
1126
+
1127
+ Returns:
1128
+ Dictionary with loaded documents
1129
+ """
1130
+ json_path = inputs.get("json_path", inputs) if isinstance(inputs, dict) else inputs
1131
+
1132
+ print(f"Loading document from: {json_path}")
1133
+
1134
+ with open(json_path, 'r', encoding='utf-8') as f:
1135
+ data = json.load(f)
1136
+
1137
+ # Process documents
1138
+ if isinstance(data, list):
1139
+ documents = data
1140
+ elif isinstance(data, dict):
1141
+ documents = [data]
1142
+ else:
1143
+ raise ValueError("JSON must contain a document object or list of documents")
1144
+
1145
+ return {
1146
+ "json_path": json_path,
1147
+ "documents": documents,
1148
+ "status": "loaded"
1149
+ }
1150
+
1151
+ def _chunk_documents(self, inputs: Dict[str, Any]) -> Dict[str, Any]:
1152
+ """
1153
+ Chunks documents using WemaDocumentChunker.
1154
+
1155
+ Args:
1156
+ inputs: Dictionary with 'documents' key
1157
+
1158
+ Returns:
1159
+ Dictionary with chunked documents
1160
+ """
1161
+ documents = inputs["documents"]
1162
+
1163
+ print("Chunking documents...")
1164
+ all_chunks = []
1165
+
1166
+ for doc in documents:
1167
+ text = doc.get('text', '')
1168
+ metadata = {
1169
+ 'url': doc.get('url', ''),
1170
+ 'title': doc.get('title', ''),
1171
+ 'meta_description': doc.get('meta_description', '')
1172
+ }
1173
+
1174
+ # Use the original chunker from WemaRAGSystem
1175
+ chunks = self.rag.chunker.chunk_text(text, metadata)
1176
+ all_chunks.extend(chunks)
1177
+ print(f"Created {len(chunks)} chunks from document: {metadata['title'][:50]}...")
1178
+
1179
+ self.rag.chunks = all_chunks
1180
+ print(f"Total chunks created: {len(self.rag.chunks)}")
1181
+
1182
+ return {
1183
+ "json_path": inputs.get("json_path"),
1184
+ "documents": documents,
1185
+ "chunks": all_chunks,
1186
+ "chunk_count": len(all_chunks),
1187
+ "status": "chunked"
1188
+ }
1189
+
1190
+ def _create_embeddings(self, inputs: Dict[str, Any]) -> Dict[str, Any]:
1191
+ """
1192
+ Creates embeddings and FAISS index using the original method.
1193
+
1194
+ Args:
1195
+ inputs: Dictionary with 'chunks' key
1196
+
1197
+ Returns:
1198
+ Dictionary with embedding info
1199
+ """
1200
+ print("Generating embeddings...")
1201
+
1202
+ # Use the original _create_embeddings method
1203
+ self.rag._create_embeddings()
1204
+
1205
+ return {
1206
+ "json_path": inputs.get("json_path"),
1207
+ "documents": inputs["documents"],
1208
+ "chunks": inputs["chunks"],
1209
+ "chunk_count": inputs["chunk_count"],
1210
+ "index_size": self.rag.index.ntotal,
1211
+ "status": "embedded"
1212
+ }
1213
+
1214
+ def _store_index(self, inputs: Dict[str, Any]) -> Dict[str, Any]:
1215
+ """
1216
+ Saves FAISS index and chunks to disk.
1217
+
1218
+ Args:
1219
+ inputs: Dictionary with processing results
1220
+
1221
+ Returns:
1222
+ Dictionary with save status
1223
+ """
1224
+ index_path = inputs.get("index_path", "wema_faiss.index")
1225
+ chunks_path = inputs.get("chunks_path", "wema_chunks.pkl")
1226
+
1227
+ # Use the original save method
1228
+ self.rag.save(index_path=index_path, chunks_path=chunks_path)
1229
+
1230
+ return {
1231
+ "json_path": inputs.get("json_path"),
1232
+ "chunk_count": inputs["chunk_count"],
1233
+ "index_size": inputs["index_size"],
1234
+ "index_path": index_path,
1235
+ "chunks_path": chunks_path,
1236
+ "status": "saved"
1237
+ }
1238
+
1239
+ def get_full_pipeline(self):
1240
+ """Returns the complete processing pipeline as a LangChain Runnable."""
1241
+ return self.full_pipeline
1242
+
1243
+ def get_loader_runnable(self):
1244
+ """Returns just the document loader."""
1245
+ return self.document_loader
1246
+
1247
+ def get_chunker_runnable(self):
1248
+ """Returns just the chunker."""
1249
+ return self.chunker
1250
+
1251
+ def get_embedder_runnable(self):
1252
+ """Returns just the embedder."""
1253
+ return self.embedder
1254
+
1255
+ def get_storer_runnable(self):
1256
+ """Returns just the storer."""
1257
+ return self.storer
1258
+
1259
+
1260
+
1261
+ class WemaRAGRetrieverRunnable:
1262
+ """
1263
+ Wraps the retrieval functionality as a LangChain Runnable.
1264
+ """
1265
+
1266
+ def __init__(self, rag_system):
1267
+ """
1268
+ Initialize with an existing WemaRAGSystem instance.
1269
+
1270
+ Args:
1271
+ rag_system: An initialized WemaRAGSystem object
1272
+ """
1273
+ self.rag = rag_system
1274
+ self.retriever = RunnableLambda(self._retrieve_context)
1275
+
1276
+ def _retrieve_context(self, inputs: Dict[str, Any]) -> Dict[str, Any]:
1277
+ """
1278
+ Retrieves context from the RAG system using the original search method.
1279
+
1280
+ Args:
1281
+ inputs: Dictionary containing 'query' key
1282
+
1283
+ Returns:
1284
+ Dictionary with query and context
1285
+ """
1286
+ query = inputs.get("query", inputs) if isinstance(inputs, dict) else inputs
1287
+
1288
+ # Use the original get_context_for_rag method
1289
+ context = self.rag.get_context_for_rag(query, top_k=3)
1290
+
1291
+ return {
1292
+ "query": query,
1293
+ "context": context
1294
+ }
1295
+
1296
+ def get_retriever_runnable(self):
1297
+ """Returns the retriever as a LangChain Runnable."""
1298
+ return self.retriever
1299
+
1300
+ class WemaRAGLoaderRunnable:
1301
+ """
1302
+ Wraps the loading functionality as a LangChain Runnable.
1303
+ """
1304
+
1305
+ def __init__(self, rag_system):
1306
+ """
1307
+ Initialize with a WemaRAGSystem instance.
1308
+
1309
+ Args:
1310
+ rag_system: An initialized WemaRAGSystem object
1311
+ """
1312
+ self.rag = rag_system
1313
+ self.loader = RunnableLambda(self._load_index)
1314
+
1315
+ def _load_index(self, inputs: Dict[str, Any]) -> Dict[str, Any]:
1316
+ """
1317
+ Loads FAISS index and chunks from disk using the original method.
1318
+
1319
+ Args:
1320
+ inputs: Dictionary with 'index_path' and 'chunks_path' keys
1321
+
1322
+ Returns:
1323
+ Dictionary with load status
1324
+ """
1325
+ index_path = inputs.get("index_path", "wema_faiss.index")
1326
+ chunks_path = inputs.get("chunks_path", "wema_chunks.pkl")
1327
+
1328
+ # Use the original load method
1329
+ self.rag.load(index_path=index_path, chunks_path=chunks_path)
1330
+
1331
+ return {
1332
+ "index_path": index_path,
1333
+ "chunks_path": chunks_path,
1334
+ "chunk_count": len(self.rag.chunks),
1335
+ "index_size": self.rag.index.ntotal,
1336
+ "status": "loaded"
1337
+ }
1338
+
1339
+ def get_loader_runnable(self):
1340
+ """Returns the loader as a LangChain Runnable."""
1341
+ return self.loader
1342
+
1343
+ def create_wema_rag_chain(rag_system, google_api_key: str):
1344
+ """
1345
+ Creates a complete LangChain RAG chain using the WemaRAGSystem.
1346
+
1347
+ Args:
1348
+ rag_system: An initialized WemaRAGSystem object
1349
+ google_api_key: Google API key for Gemini
1350
+
1351
+ Returns:
1352
+ A LangChain runnable chain
1353
+ """
1354
+
1355
+ # Wrap the RAG retriever as a runnable
1356
+ wema_retriever = WemaRAGRetrieverRunnable(rag_system)
1357
+
1358
+ # Initialize the LLM
1359
+ llm = ChatGoogleGenerativeAI(
1360
+ model="gemini-2.0-flash-exp",
1361
+ google_api_key=google_api_key,
1362
+ temperature=0.7
1363
+ )
1364
+
1365
+ # Create the prompt template
1366
+ prompt = ChatPromptTemplate.from_template("""
1367
+ You are **Wema Assist**, an intelligent, reliable, and customer-friendly virtual assistant for **Wema Bank**.
1368
+ Your goal is to provide accurate, thoughtful, and helpful responses to users β€” just like a top-tier banking representative β€” but with the warmth and empathy of a human advisor.
1369
+ You have access to verified Wema Bank documentation provided below as *Context*.
1370
+
1371
+ Your tone should be:
1372
+ - professional yet conversational,
1373
+ - confident but not pushy,
1374
+ - and focused on solving the user's need while representing Wema Bank's products effectively.
1375
+
1376
+ ---
1377
+
1378
+ ### 🎯 Core Guidelines:
1379
+ 1. **Use the provided "Context" first.**
1380
+ - It contains factual information from official Wema Bank documents.
1381
+ - Do not invent product details β€” rely on the context where possible.
1382
+
1383
+ 2. **Always connect answers to Wema Bank offerings.**
1384
+ - When relevant, mention products or services such as:
1385
+ - Wema Savings Account
1386
+ - ALAT Digital Bank or ALAT Savings Goals
1387
+ - Wema Kiddies Account
1388
+ - Business or SME Banking
1389
+ - Wema Loans
1390
+ - Wema Security Tips or Scam Alerts
1391
+ - Even if the user query seems general, highlight any Wema product that could help.
1392
+
1393
+ 3. **Be natural and practical.**
1394
+ - Offer useful, step-by-step guidance.
1395
+ - Use phrasing like:
1396
+ - "At Wema Bank, you can..."
1397
+ - "A good option through Wema is..."
1398
+ - "Wema's ALAT platform allows you to..."
1399
+
1400
+ 4. **If the context isn't related to the query:**
1401
+ - Simply give a general, thoughtful answer β€” *without apologizing or saying the context is irrelevant.*
1402
+
1403
+ ---
1404
+
1405
+ ### πŸ“˜ Information You Have:
1406
+
1407
+ **Context:**
1408
+ {context}
1409
+
1410
+ **User Query:**
1411
+ {query}
1412
+
1413
+ ---
1414
+
1415
+ ### 🧠 Task:
1416
+ Answer the query in a complete, natural, and customer-friendly way β€” integrating Wema Bank products or services wherever relevant.
1417
+ If the RAG and context are not related, just give a general answer and don't complain.
1418
+
1419
+ ### πŸ’¬ Final Answer:
1420
+ """)
1421
+
1422
+ # Build the chain using LCEL (LangChain Expression Language)
1423
+ chain = (
1424
+ RunnablePassthrough()
1425
+ | wema_retriever.get_retriever_runnable()
1426
+ | prompt
1427
+ | llm
1428
+ | StrOutputParser()
1429
+ )
1430
+
1431
+ return chain
1432
+
1433
+ def create_gradio_interface(rag_system, google_api_key: str):
1434
+ """
1435
+ Creates a Gradio interface using the LangChain RAG chain.
1436
+
1437
+ Args:
1438
+ rag_system: An initialized WemaRAGSystem object
1439
+ google_api_key: Google API key for Gemini
1440
+
1441
+ Returns:
1442
+ Gradio Interface object
1443
+ """
1444
+
1445
+ # Create the LangChain chain
1446
+ chain = create_wema_rag_chain(rag_system, google_api_key)
1447
+
1448
+ def chat_function(query: str) -> str:
1449
+ """Wrapper function for Gradio."""
1450
+ try:
1451
+ response = chain.invoke({"query": query})
1452
+ return response
1453
+ except Exception as e:
1454
+ return f"An error occurred: {str(e)}"
1455
+
1456
+ # Create Gradio interface
1457
+ iface = gr.Interface(
1458
+ fn=chat_function,
1459
+ inputs=gr.Textbox(
1460
+ label="Enter your query about Wema Bank:",
1461
+ placeholder="Ask me anything about Wema Bank products and services..."
1462
+ ),
1463
+ outputs=gr.Textbox(
1464
+ label="Wema Assist Response:",
1465
+ lines=10
1466
+ ),
1467
+ title="🏦 Wema Bank RAG Chatbot (LangChain Edition)",
1468
+ description="Powered by LangChain and your custom Wema RAG System",
1469
+ theme="soft"
1470
+ )
1471
+
1472
+ return iface
1473
+
1474
+ # Initialize RAG system
1475
+ rag = WemaRAGSystem()
1476
+
1477
+ # Wrap it as a LangChain runnable
1478
+ processor = WemaDocumentProcessorRunnable(rag)
1479
+
1480
+ # Cell 3: Run the complete pipeline (load β†’ chunk β†’ embed β†’ store)
1481
+ result = processor.get_full_pipeline().invoke({
1482
+ "json_path": "wema_cleaned.json",
1483
+ "index_path": "wema_faiss.index",
1484
+ "chunks_path": "wema_chunks.pkl"
1485
+ })
1486
+
1487
+ print(f"Processing complete!")
1488
+ print(f"Chunks created: {result['chunk_count']}")
1489
+ print(f"Index size: {result['index_size']}")
1490
+ print(f"Saved to: {result['index_path']}")
1491
+
1492
+ # Assuming you have an instance of WemaRAGSystem called 'rag'
1493
+ #rag = WemaRAGSystem()
1494
+
1495
+ # Replace 'your_document.json' with the actual path to your file
1496
+ #rag.load_and_process_document("your_document.json")
1497
+
1498
+ """
1499
+ # Cell 4: Create and launch Gradio interface
1500
+ from google.colab import userdata
1501
+
1502
+ GOOGLE_API_KEY = userdata.get('GOOGLE_API_KEY')
1503
+ iface = create_gradio_interface(rag, GOOGLE_API_KEY)
1504
+ iface.launch()
1505
+ """
1506
+
1507
+ '''
1508
+ # Cell 2: Set up your RAG system (your existing code)
1509
+ rag = WemaRAGSystem()
1510
+ rag.load() # Load your existing index
1511
+
1512
+ # Cell 3: Initialize API keys
1513
+ from google.colab import userdata
1514
+
1515
+ GOOGLE_API_KEY = userdata.get('GOOGLE_API_KEY')
1516
+ SPITCH_API_KEY = userdata.get('SPITCH_API_KEY') # Add this to your Colab secrets
1517
+
1518
+ # Cell 4: Launch voice interface
1519
+ iface = create_voice_gradio_interface(
1520
+ rag_system=rag,
1521
+ google_api_key=GOOGLE_API_KEY,
1522
+ spitch_api_key=SPITCH_API_KEY
1523
+ )
1524
+ iface.launch(share=True)
1525
+ '''
1526
+
1527
+ # Cell 2: Set up your RAG system (your existing code)
1528
+ rag = WemaRAGSystem()
1529
+ rag.load() # Load your existing index
1530
+
1531
+ # Cell 3: Initialize API keys
1532
+ from google.colab import userdata
1533
+
1534
+ GOOGLE_API_KEY = userdata.get('GOOGLE_API_KEY')
1535
+ SPITCH_API_KEY = userdata.get('SPITCH_API_KEY') # Add this to your Colab secrets
1536
+
1537
+ # Cell 4: Launch voice interface
1538
+ # The create_voice_gradio_interface function needs the chain, not the google_api_key directly.
1539
+ # We need to create the chain first.
1540
+ chain = create_wema_rag_chain(rag, GOOGLE_API_KEY)
1541
+
1542
+ iface = create_voice_gradio_interface(
1543
+ rag_system=rag,
1544
+ chain=chain, # Pass the created chain
1545
+ spitch_api_key=SPITCH_API_KEY
1546
+ )
1547
+
1548
+ iface.launch(share=True, debug=True)
1549
+
1550
+ # ============================================================================
1551
+ # Wema Bank Voice-Enabled RAG Chatbot with Spitch Integration - CORRECTED
1552
+ # ============================================================================
1553
+
1554
+ import tempfile
1555
+ import os
1556
+ import atexit
1557
+ import glob
1558
+ import io
1559
+ from typing import Optional
1560
+ from spitch import Spitch
1561
+ import gradio as gr
1562
+ from google.colab import userdata
1563
+
1564
+
1565
+ # ============================================================================
1566
+ # STEP 1: Initialize Spitch Client
1567
+ # ============================================================================
1568
+
1569
+ class SpitchVoiceHandler:
1570
+ """
1571
+ Handles all voice-related operations using Spitch API.
1572
+ Supports multilingual speech-to-text and text-to-speech.
1573
+ """
1574
+
1575
+ def __init__(self, api_key: str):
1576
+ """
1577
+ Initialize Spitch client.
1578
+
1579
+ Args:
1580
+ api_key: Your Spitch API key
1581
+ """
1582
+ self.client = Spitch(api_key=api_key)
1583
+
1584
+ def transcribe_audio(
1585
+ self,
1586
+ audio_file,
1587
+ source_language: str = "en",
1588
+ model: str = "mansa_v1"
1589
+ ) -> str:
1590
+ """
1591
+ Transcribe audio to text using Spitch.
1592
+ Supports multiple African and international languages.
1593
+
1594
+ Args:
1595
+ audio_file: Audio file path or file-like object
1596
+ source_language: Language code (e.g., 'en', 'yo', 'ig', 'ha')
1597
+ model: Spitch model to use (default: mansa_v1)
1598
+
1599
+ Returns:
1600
+ Transcribed text
1601
+ """
1602
+ try:
1603
+ print(f"🎀 Transcribing audio file: {audio_file}")
1604
+
1605
+ # If audio_file is a path, open it
1606
+ if isinstance(audio_file, str):
1607
+ with open(audio_file, 'rb') as f:
1608
+ response = self.client.speech.transcribe(
1609
+ content=f,
1610
+ language=source_language,
1611
+ model=model
1612
+ )
1613
+ else:
1614
+ # Assume it's already a file-like object (from Gradio)
1615
+ response = self.client.speech.transcribe(
1616
+ content=audio_file,
1617
+ language=source_language,
1618
+ model=model
1619
+ )
1620
+
1621
+ print(f"Response type: {type(response)}")
1622
+
1623
+ # βœ… Spitch transcribe returns a response object with .text or json()
1624
+ if hasattr(response, 'text') and callable(response.text):
1625
+ # It's a method, not an attribute
1626
+ transcription_text = response.text()
1627
+ elif hasattr(response, 'text'):
1628
+ # It's an attribute
1629
+ transcription_text = response.text
1630
+ elif hasattr(response, 'json'):
1631
+ # Try to parse JSON response
1632
+ json_data = response.json()
1633
+ transcription_text = json_data.get('text', str(json_data))
1634
+ else:
1635
+ # Try to convert response to string
1636
+ transcription_text = str(response)
1637
+
1638
+ print(f"βœ… Transcription: {transcription_text}")
1639
+ return transcription_text
1640
+
1641
+ except Exception as e:
1642
+ print(f"❌ Transcription error: {e}")
1643
+ import traceback
1644
+ traceback.print_exc()
1645
+ return f"Sorry, I couldn't understand the audio. Error: {str(e)}"
1646
+
1647
+ def translate_to_english(self, text: str, source_lang: str = "auto") -> str:
1648
+ """
1649
+ Translate text to English using Spitch translation API.
1650
+
1651
+ Args:
1652
+ text: Text to translate
1653
+ source_lang: Source language code or 'auto' for auto-detection
1654
+
1655
+ Returns:
1656
+ Translated text in English
1657
+ """
1658
+ try:
1659
+ # If already in English, return as is
1660
+ if source_lang == "en":
1661
+ return text
1662
+
1663
+ print(f"🌍 Translating from {source_lang} to English...")
1664
+ print(f"πŸ“ Original text: {text}")
1665
+
1666
+ translation = self.client.text.translate(
1667
+ text=text,
1668
+ source=source_lang,
1669
+ target="en"
1670
+ )
1671
+
1672
+ english_text = translation.text
1673
+ print(f"βœ… Translated to English: {english_text}")
1674
+
1675
+ return english_text
1676
+
1677
+ except Exception as e:
1678
+ error_msg = f"Translation failed: {str(e)}"
1679
+ print(f"❌ {error_msg}")
1680
+ import traceback
1681
+ traceback.print_exc()
1682
+ # Return original if translation fails
1683
+ return text
1684
+
1685
+ def synthesize_speech(
1686
+ self,
1687
+ text: str,
1688
+ target_language: str = "en",
1689
+ voice: str = "lina"
1690
+ ) -> bytes:
1691
+ """
1692
+ Convert text to speech using Spitch TTS.
1693
+
1694
+ Args:
1695
+ text: Text to convert to speech
1696
+ target_language: Target language for speech
1697
+ voice: Voice to use (e.g., 'lina', 'ada', 'kofi')
1698
+
1699
+ Returns:
1700
+ Audio bytes
1701
+ """
1702
+ try:
1703
+ # Call Spitch TTS API
1704
+ response = self.client.speech.generate(
1705
+ text=text,
1706
+ language=target_language,
1707
+ voice=voice
1708
+ )
1709
+
1710
+ # βœ… FIX: Spitch returns BinaryAPIResponse, use .read() to get bytes
1711
+ if hasattr(response, 'read'):
1712
+ audio_bytes = response.read()
1713
+ print(f"βœ… TTS generated {len(audio_bytes)} bytes of audio")
1714
+ return audio_bytes
1715
+ else:
1716
+ print(f"❌ Response type: {type(response)}")
1717
+ print(f"❌ Response attributes: {dir(response)}")
1718
+ return None
1719
+
1720
+ except Exception as e:
1721
+ print(f"❌ TTS error: {e}")
1722
+ import traceback
1723
+ traceback.print_exc()
1724
+ return None
1725
+
1726
+
1727
+ # ============================================================================
1728
+ # STEP 2: Integrate Voice with Your LangChain RAG System
1729
+ # ============================================================================
1730
+
1731
+ class WemaVoiceAssistant:
1732
+ """
1733
+ Complete voice-enabled assistant combining Spitch voice I/O
1734
+ with your existing Wema RAG system.
1735
+ """
1736
+
1737
+ def __init__(
1738
+ self,
1739
+ rag_system,
1740
+ chain,
1741
+ spitch_api_key: str
1742
+ ):
1743
+ """
1744
+ Initialize the voice assistant.
1745
+
1746
+ Args:
1747
+ rag_system: Your initialized WemaRAGSystem
1748
+ chain: Your LangChain RAG chain (already created)
1749
+ spitch_api_key: Spitch API key
1750
+ """
1751
+ self.rag_system = rag_system
1752
+ self.voice_handler = SpitchVoiceHandler(spitch_api_key)
1753
+ self.chain = chain
1754
+
1755
+ def process_voice_query(
1756
+ self,
1757
+ audio_input,
1758
+ input_language: str = "en",
1759
+ output_language: str = "en",
1760
+ voice: str = "lina"
1761
+ ):
1762
+ """
1763
+ Complete voice interaction pipeline:
1764
+ 1. Speech to text (any language)
1765
+ 2. Translate to English if needed
1766
+ 3. Query RAG system
1767
+ 4. Generate response
1768
+ 5. Translate response if needed
1769
+ 6. Text to speech
1770
+
1771
+ Args:
1772
+ audio_input: Audio file from user
1773
+ input_language: User's spoken language
1774
+ output_language: Desired response language
1775
+ voice: TTS voice to use
1776
+
1777
+ Returns:
1778
+ tuple: (response_text, response_audio)
1779
+ """
1780
+ try:
1781
+ # Step 1: Transcribe audio to text
1782
+ print(f"Transcribing audio in {input_language}...")
1783
+ transcribed_text = self.voice_handler.transcribe_audio(
1784
+ audio_input,
1785
+ source_language=input_language
1786
+ )
1787
+ print(f"Transcribed: {transcribed_text}")
1788
+
1789
+ # Step 2: Translate to English if not already
1790
+ if input_language != "en":
1791
+ print("Translating to English...")
1792
+ english_query = self.voice_handler.translate_to_english(
1793
+ transcribed_text,
1794
+ source_lang=input_language
1795
+ )
1796
+ else:
1797
+ english_query = transcribed_text
1798
+
1799
+ print(f"English query: {english_query}")
1800
+
1801
+ # Step 3: Get response from RAG system (in English)
1802
+ print("Querying RAG system...")
1803
+ response_text = self.chain.invoke({"query": english_query})
1804
+ print(f"RAG response: {response_text[:100]}...")
1805
+
1806
+ # Step 4: Translate response if needed
1807
+ if output_language != "en":
1808
+ print(f"Translating response to {output_language}...")
1809
+ translation = self.voice_handler.client.text.translate(
1810
+ text=response_text,
1811
+ source="en",
1812
+ target=output_language
1813
+ )
1814
+ final_text = translation.text
1815
+ else:
1816
+ final_text = response_text
1817
+
1818
+ # Step 5: Generate speech
1819
+ print("Generating speech...")
1820
+ audio_response = self.voice_handler.synthesize_speech(
1821
+ final_text,
1822
+ target_language=output_language,
1823
+ voice=voice
1824
+ )
1825
+
1826
+ return final_text, audio_response
1827
+
1828
+ except Exception as e:
1829
+ error_msg = f"An error occurred: {str(e)}"
1830
+ print(error_msg)
1831
+ return error_msg, None
1832
+
1833
+
1834
+ # ============================================================================
1835
+ # STEP 3: Helper Functions for Audio File Management
1836
+ # ============================================================================
1837
+
1838
+ def save_audio_to_temp_file(audio_bytes):
1839
+ """Save audio bytes to a temporary file and return the path."""
1840
+ if audio_bytes is None:
1841
+ return None
1842
+
1843
+ temp_file = tempfile.NamedTemporaryFile(delete=False, suffix='.mp3')
1844
+ temp_file.write(audio_bytes)
1845
+ temp_file.close()
1846
+
1847
+ return temp_file.name
1848
+
1849
+
1850
+ def cleanup_temp_audio_files():
1851
+ """Clean up temporary audio files on exit."""
1852
+ temp_dir = tempfile.gettempdir()
1853
+ for temp_file in glob.glob(os.path.join(temp_dir, "tmp*.mp3")):
1854
+ try:
1855
+ os.remove(temp_file)
1856
+ except:
1857
+ pass
1858
+
1859
+
1860
+ # Register cleanup function to run on exit
1861
+ atexit.register(cleanup_temp_audio_files)
1862
+
1863
+
1864
+ # ============================================================================
1865
+ # STEP 4: Create Gradio Interface (With Text AND Voice Options)
1866
+ # ============================================================================
1867
+
1868
+ def create_voice_gradio_interface(
1869
+ rag_system,
1870
+ chain,
1871
+ spitch_api_key: str
1872
+ ):
1873
+ """
1874
+ Create a Gradio interface with BOTH text and voice input/output capabilities.
1875
+
1876
+ Args:
1877
+ rag_system: Your initialized WemaRAGSystem
1878
+ chain: Your LangChain RAG chain (already created)
1879
+ spitch_api_key: Spitch API key
1880
+
1881
+ Returns:
1882
+ Gradio Interface
1883
+ """
1884
+
1885
+ # Initialize voice assistant
1886
+ assistant = WemaVoiceAssistant(rag_system, chain, spitch_api_key)
1887
+
1888
+ # βœ… CORRECT: Exact voice-language mapping from Spitch documentation
1889
+ LANGUAGE_CONFIG = {
1890
+ "English": {
1891
+ "code": "en",
1892
+ "voices": ["john", "lucy", "lina", "jude", "henry", "kani", "kingsley",
1893
+ "favour", "comfort", "daniel", "remi"]
1894
+ },
1895
+ "Yoruba": {
1896
+ "code": "yo",
1897
+ "voices": ["sade", "funmi", "segun", "femi"]
1898
+ },
1899
+ "Igbo": {
1900
+ "code": "ig",
1901
+ "voices": ["obinna", "ngozi", "amara", "ebuka"]
1902
+ },
1903
+ "Hausa": {
1904
+ "code": "ha",
1905
+ "voices": ["hasan", "amina", "zainab", "aliyu"]
1906
+ }
1907
+ }
1908
+
1909
+ # Extract just language names for dropdowns
1910
+ ALL_LANGUAGES = list(LANGUAGE_CONFIG.keys())
1911
+
1912
+ # βœ… FIXED: Only voices that actually exist in Spitch
1913
+ # Check Spitch docs for exact voice names
1914
+ VOICES = ["lina", "ada", "kofi"] # Verify these exist
1915
+
1916
+ def handle_text_query(text_input):
1917
+ """Handle text-only queries."""
1918
+ if not text_input or text_input.strip() == "":
1919
+ return "Please enter a question.", None
1920
+
1921
+ try:
1922
+ response = chain.invoke({"query": text_input})
1923
+ return response, None
1924
+ except Exception as e:
1925
+ return f"Error: {str(e)}", None
1926
+
1927
+ def update_voices(language):
1928
+ """Update voice dropdown based on selected language."""
1929
+ voices = LANGUAGE_CONFIG.get(language, {}).get("voices", ["lina"])
1930
+ return gr.Dropdown(choices=voices, value=voices[0])
1931
+
1932
+ def handle_voice_interaction(audio, input_lang, output_lang, voice):
1933
+ """Gradio handler function for voice - FIXED VERSION."""
1934
+ print("="*60)
1935
+ print("VOICE INTERACTION STARTED")
1936
+ print(f"Audio input: {audio}")
1937
+ print(f"Input language: {input_lang}")
1938
+ print(f"Output language: {output_lang}")
1939
+ print(f"Voice: {voice}")
1940
+ print("="*60)
1941
+
1942
+ if audio is None:
1943
+ return "Please record or upload audio.", None
1944
+
1945
+ # Get language codes and voices
1946
+ input_config = LANGUAGE_CONFIG.get(input_lang, LANGUAGE_CONFIG["English"])
1947
+ output_config = LANGUAGE_CONFIG.get(output_lang, LANGUAGE_CONFIG["English"])
1948
+
1949
+ input_code = input_config["code"]
1950
+ output_code = output_config["code"]
1951
+
1952
+ # Validate voice for output language
1953
+ available_voices = output_config["voices"]
1954
+ if voice not in available_voices:
1955
+ voice = available_voices[0]
1956
+ print(f"⚠️ Voice changed to {voice} for {output_lang}")
1957
+
1958
+ try:
1959
+ # Process voice query
1960
+ print("\n🎀 Processing voice query...")
1961
+
1962
+ # Step 1: Transcribe (supports more languages)
1963
+ transcribed_text = assistant.voice_handler.transcribe_audio(
1964
+ audio,
1965
+ source_language=input_code
1966
+ )
1967
+ print(f"πŸ“ Transcribed ({input_lang}): {transcribed_text}")
1968
+
1969
+ # Check if transcription failed
1970
+ if "Error" in transcribed_text or "Sorry" in transcribed_text:
1971
+ return transcribed_text, None
1972
+
1973
+ # Step 2: Translate to English if needed
1974
+ if input_code != "en":
1975
+ print("🌍 Translating to English...")
1976
+ english_query = assistant.voice_handler.translate_to_english(
1977
+ transcribed_text,
1978
+ source_lang=input_code
1979
+ )
1980
+ print(f"πŸ‡¬πŸ‡§ English query: {english_query}")
1981
+ else:
1982
+ english_query = transcribed_text
1983
+
1984
+ # Step 3: Get RAG response (ALWAYS in English first)
1985
+ print("πŸ” Querying RAG system...")
1986
+ try:
1987
+ response_text = assistant.chain.invoke({"query": english_query})
1988
+ print(f"βœ… RAG response (English): {response_text[:200]}...")
1989
+ except Exception as e:
1990
+ error_msg = f"Error getting response: {str(e)}"
1991
+ print(f"❌ RAG Error: {error_msg}")
1992
+ return error_msg, None
1993
+
1994
+ # Step 4: Decide what to do with translation
1995
+ if output_code != "en":
1996
+ print(f"🌍 Translating response from English to {output_lang}...")
1997
+
1998
+ # ⚠️ IMPORTANT: Keep response short for better translation
1999
+ # Long technical responses translate poorly
2000
+ if len(response_text) > 500:
2001
+ print(f"⚠️ Response is long ({len(response_text)} chars), keeping English for accuracy")
2002
+ final_text = response_text
2003
+ tts_text = response_text
2004
+ tts_language = "en"
2005
+ tts_voice = "lina"
2006
+ translation_note = f"\n\n⚠️ (Audio response is in English for accuracy. Full {output_lang} translation above.)"
2007
+ else:
2008
+ try:
2009
+ translation = assistant.voice_handler.client.text.translate(
2010
+ text=response_text,
2011
+ source="en",
2012
+ target=output_code
2013
+ )
2014
+ translated_text = translation.text
2015
+ print(f"βœ… Translated to {output_lang}: {translated_text[:200]}...")
2016
+
2017
+ final_text = translated_text
2018
+ tts_text = translated_text
2019
+ tts_language = output_code
2020
+ tts_voice = voice
2021
+ translation_note = ""
2022
+
2023
+ except Exception as e:
2024
+ print(f"⚠️ Translation failed: {e}, using English")
2025
+ final_text = response_text
2026
+ tts_text = response_text
2027
+ tts_language = "en"
2028
+ tts_voice = "lina"
2029
+ translation_note = f"\n\n⚠️ (Translation to {output_lang} failed, showing English response)"
2030
+ else:
2031
+ final_text = response_text
2032
+ tts_text = response_text
2033
+ tts_language = "en"
2034
+ tts_voice = voice
2035
+ translation_note = ""
2036
+
2037
+ # Step 5: Generate speech
2038
+ print(f"πŸ”Š Generating speech in {tts_language} with voice {tts_voice}...")
2039
+ print(f"πŸ”Š TTS Text preview: {tts_text[:100]}...")
2040
+
2041
+ audio_bytes = assistant.voice_handler.synthesize_speech(
2042
+ tts_text,
2043
+ target_language=tts_language,
2044
+ voice=tts_voice
2045
+ )
2046
+
2047
+ print(f"πŸ”Š Audio bytes type: {type(audio_bytes)}")
2048
+ print(f"πŸ”Š Audio bytes length: {len(audio_bytes) if audio_bytes else 0}")
2049
+
2050
+ # βœ… FIX: Convert audio bytes to file path
2051
+ audio_file_path = None
2052
+ if audio_bytes:
2053
+ print("\nπŸ’Ύ Saving audio to temp file...")
2054
+ audio_file_path = save_audio_to_temp_file(audio_bytes)
2055
+ print(f"βœ… Audio saved to: {audio_file_path}")
2056
+
2057
+ # Verify file exists and has content
2058
+ if audio_file_path and os.path.exists(audio_file_path):
2059
+ file_size = os.path.getsize(audio_file_path)
2060
+ print(f"βœ… File size: {file_size} bytes")
2061
+ else:
2062
+ print("❌ File was not created properly!")
2063
+ else:
2064
+ print("❌ No audio bytes received from TTS")
2065
+
2066
+ # Add translation note if needed
2067
+ final_text = final_text + translation_note
2068
+
2069
+ print("="*60)
2070
+ return final_text, audio_file_path
2071
+
2072
+ except Exception as e:
2073
+ error_msg = f"Error processing voice: {str(e)}"
2074
+ print(f"\n❌ ERROR: {error_msg}")
2075
+ import traceback
2076
+ traceback.print_exc()
2077
+ print("="*60)
2078
+ return error_msg, None
2079
+
2080
+ # Create Gradio interface with BOTH text and voice
2081
+ with gr.Blocks(theme=gr.themes.Soft()) as demo:
2082
+ gr.Markdown("""
2083
+ # 🏦 Wema Bank AI Assistant
2084
+ ### Powered by Spitch AI & LangChain RAG
2085
+
2086
+ Choose how you want to interact: Type or Speak!
2087
+ """)
2088
+
2089
+ with gr.Tabs():
2090
+ # TEXT TAB
2091
+ with gr.Tab("πŸ’¬ Text Chat"):
2092
+ gr.Markdown("### Type your banking questions")
2093
+
2094
+ text_input = gr.Textbox(
2095
+ label="Your Question",
2096
+ placeholder="Ask me anything about Wema Bank products and services...",
2097
+ lines=3
2098
+ )
2099
+
2100
+ text_submit_btn = gr.Button("πŸ“€ Send", variant="primary", size="lg")
2101
+
2102
+ text_output = gr.Textbox(
2103
+ label="Response",
2104
+ lines=10,
2105
+ interactive=False
2106
+ )
2107
+
2108
+ # Examples for text
2109
+ gr.Examples(
2110
+ examples=[
2111
+ ["What is ALAT?"],
2112
+ ["How do I open a savings account?"],
2113
+ ["Tell me about Wema Kiddies Account"],
2114
+ ["How can I avoid phishing scams?"],
2115
+ ["What loans does Wema Bank offer?"]
2116
+ ],
2117
+ inputs=text_input,
2118
+ label="πŸ’‘ Try these questions"
2119
+ )
2120
+
2121
+ text_submit_btn.click(
2122
+ fn=handle_text_query,
2123
+ inputs=text_input,
2124
+ outputs=[text_output, gr.Audio(visible=False)]
2125
+ )
2126
+
2127
+ # Also submit on Enter
2128
+ text_input.submit(
2129
+ fn=handle_text_query,
2130
+ inputs=text_input,
2131
+ outputs=[text_output, gr.Audio(visible=False)]
2132
+ )
2133
+
2134
+ # VOICE TAB
2135
+ with gr.Tab("🎀 Voice Chat"):
2136
+ gr.Markdown("""
2137
+ ### Speak your banking questions in your language!
2138
+
2139
+ **βœ… Fully Supported Nigerian Languages:**
2140
+ - πŸ‡¬πŸ‡§ **English** - 11 voices available
2141
+ - πŸ‡³πŸ‡¬ **Yoruba** - 4 voices (Sade, Funmi, Segun, Femi)
2142
+ - πŸ‡³πŸ‡¬ **Igbo** - 4 voices (Obinna, Ngozi, Amara, Ebuka)
2143
+ - πŸ‡³πŸ‡¬ **Hausa** - 4 voices (Hasan, Amina, Zainab, Aliyu)
2144
+
2145
+ **πŸ’‘ Translation Tips:**
2146
+ - Simple questions translate best (e.g., "What is ALAT?", "How do I save money?")
2147
+ - Long technical responses may be kept in English for accuracy
2148
+ - You can always ask in your language and get text in both languages!
2149
+ """)
2150
+
2151
+ with gr.Row():
2152
+ with gr.Column():
2153
+ audio_input = gr.Audio(
2154
+ sources=["microphone", "upload"],
2155
+ type="filepath",
2156
+ label="πŸŽ™οΈ Record or Upload Audio"
2157
+ )
2158
+
2159
+ input_language = gr.Dropdown(
2160
+ choices=ALL_LANGUAGES,
2161
+ value="English",
2162
+ label="Your Language (Speech Input)"
2163
+ )
2164
+
2165
+ with gr.Column():
2166
+ output_language = gr.Dropdown(
2167
+ choices=ALL_LANGUAGES,
2168
+ value="English",
2169
+ label="Response Language (Audio Output)"
2170
+ )
2171
+
2172
+ voice_selection = gr.Dropdown(
2173
+ choices=LANGUAGE_CONFIG["English"]["voices"],
2174
+ value="lina",
2175
+ label="Voice"
2176
+ )
2177
+
2178
+ # Update voices when output language changes
2179
+ output_language.change(
2180
+ fn=update_voices,
2181
+ inputs=output_language,
2182
+ outputs=voice_selection
2183
+ )
2184
+
2185
+ voice_submit_btn = gr.Button("πŸš€ Ask Wema Assist", variant="primary", size="lg")
2186
+
2187
+ voice_text_output = gr.Textbox(
2188
+ label="πŸ“ Text Response",
2189
+ lines=8,
2190
+ interactive=False
2191
+ )
2192
+
2193
+ voice_audio_output = gr.Audio(
2194
+ label="πŸ”Š Audio Response",
2195
+ type="filepath" # βœ… Important: must be filepath
2196
+ )
2197
+
2198
+ voice_submit_btn.click(
2199
+ fn=handle_voice_interaction,
2200
+ inputs=[audio_input, input_language, output_language, voice_selection],
2201
+ outputs=[voice_text_output, voice_audio_output]
2202
+ )
2203
+
2204
+ gr.Markdown("""
2205
+ ---
2206
+ ### πŸ“Œ Features
2207
+ - **Text Chat**: Fast and simple - just type and get instant responses
2208
+ - **Voice Chat**: Full support for Nigerian languages!
2209
+
2210
+ ### πŸ‡³πŸ‡¬ Supported Nigerian Languages
2211
+ βœ… **English** - 11 different voices (male & female)
2212
+ βœ… **Yoruba** - E ku ọjọ! (4 authentic Yoruba voices)
2213
+ βœ… **Igbo** - Nnọọ! (4 authentic Igbo voices)
2214
+ βœ… **Hausa** - Sannu! (4 authentic Hausa voices)
2215
+
2216
+ πŸ’‘ **All features work in every language:**
2217
+ - 🎀 Speak your question in your language
2218
+ - πŸ“ Get text response translated
2219
+ - πŸ”Š Hear authentic audio response in your language
2220
+ - πŸ”„ Seamless translation between languages
2221
+ """)
2222
+
2223
+ return demo
2224
+
2225
+
2226
+ # ============================================================================
2227
+ # ALTERNATIVE: Simpler Hybrid Interface
2228
+ # ============================================================================
2229
+
2230
+ def create_hybrid_interface(
2231
+ rag_system,
2232
+ chain,
2233
+ spitch_api_key: str
2234
+ ):
2235
+ """
2236
+ Creates a simpler interface supporting both text and voice input.
2237
+
2238
+ Args:
2239
+ rag_system: Your initialized WemaRAGSystem
2240
+ chain: Your LangChain RAG chain (already created)
2241
+ spitch_api_key: Spitch API key
2242
+
2243
+ Returns:
2244
+ Gradio Interface
2245
+ """
2246
+
2247
+ assistant = WemaVoiceAssistant(rag_system, chain, spitch_api_key)
2248
+
2249
+ def handle_text_query(text_input):
2250
+ """Handle text-only query."""
2251
+ try:
2252
+ response = chain.invoke({"query": text_input})
2253
+ return response, None
2254
+ except Exception as e:
2255
+ return f"Error: {str(e)}", None
2256
+
2257
+ def handle_voice_query(audio, input_lang, output_lang, voice):
2258
+ """Handle voice query."""
2259
+ if audio is None:
2260
+ return "Please provide audio input.", None
2261
+
2262
+ LANGUAGES = {
2263
+ "English": "en",
2264
+ "Yoruba": "yo",
2265
+ "Igbo": "ig",
2266
+ "Hausa": "ha"
2267
+ }
2268
+
2269
+ input_code = LANGUAGES.get(input_lang, "en")
2270
+ output_code = LANGUAGES.get(output_lang, "en")
2271
+
2272
+ # Process voice query
2273
+ text_response, audio_bytes = assistant.process_voice_query(
2274
+ audio,
2275
+ input_language=input_code,
2276
+ output_language=output_code,
2277
+ voice=voice
2278
+ )
2279
+
2280
+ # Convert audio bytes to file path
2281
+ audio_file_path = None
2282
+ if audio_bytes:
2283
+ audio_file_path = save_audio_to_temp_file(audio_bytes)
2284
+
2285
+ return text_response, audio_file_path
2286
+
2287
+ # Create tabbed interface
2288
+ with gr.Blocks(theme=gr.themes.Soft()) as demo:
2289
+ gr.Markdown("# 🏦 Wema Bank AI Assistant")
2290
+
2291
+ with gr.Tabs():
2292
+ # Text Tab
2293
+ with gr.Tab("πŸ’¬ Text Chat"):
2294
+ text_input = gr.Textbox(
2295
+ label="Type your question",
2296
+ placeholder="Ask about Wema Bank products and services..."
2297
+ )
2298
+ text_submit = gr.Button("Send")
2299
+ text_output = gr.Textbox(label="Response", lines=10)
2300
+
2301
+ text_submit.click(
2302
+ fn=handle_text_query,
2303
+ inputs=text_input,
2304
+ outputs=[text_output, gr.Audio(visible=False)]
2305
+ )
2306
+
2307
+ # Voice Tab
2308
+ with gr.Tab("🎀 Voice Chat"):
2309
+ audio_input = gr.Audio(sources=["microphone", "upload"], type="filepath")
2310
+
2311
+ with gr.Row():
2312
+ input_lang = gr.Dropdown(
2313
+ ["English", "Yoruba", "Igbo", "Hausa"],
2314
+ value="English",
2315
+ label="Input Language"
2316
+ )
2317
+ output_lang = gr.Dropdown(
2318
+ ["English", "Yoruba", "Igbo", "Hausa"],
2319
+ value="English",
2320
+ label="Output Language"
2321
+ )
2322
+ voice = gr.Dropdown(
2323
+ ["lina", "ada", "kofi"],
2324
+ value="lina",
2325
+ label="Voice"
2326
+ )
2327
+
2328
+ voice_submit = gr.Button("Ask")
2329
+ voice_text_output = gr.Textbox(label="Response Text", lines=8)
2330
+ voice_audio_output = gr.Audio(label="Audio Response", type="filepath")
2331
+
2332
+ voice_submit.click(
2333
+ fn=handle_voice_query,
2334
+ inputs=[audio_input, input_lang, output_lang, voice],
2335
+ outputs=[voice_text_output, voice_audio_output]
2336
+ )
2337
+
2338
+ return demo