Spaces:

Nguyen5
/

chatbot1

Sleeping

App Files Files Community

Nguyen5 commited on Dec 8, 2025

Commit

bca3e7a

1 Parent(s): 921fc8a

commit

Browse files

Files changed (5) hide show

.trae/documents/Triển khai OpenAI Audio API + Audiomodus live cho chatbot.md +67 -0
app.py +1 -14
realtime_server.py +97 -0
requirements.txt +4 -0
speech_io.py +0 -1

.trae/documents/Triển khai OpenAI Audio API + Audiomodus live cho chatbot.md ADDED Viewed

	@@ -0,0 +1,67 @@

+## Lựa chọn nền tảng
+- Chọn OpenAI làm nền tảng chính cho Audio API (Whisper-1 và Realtime API) vì:
+  - Độ chính xác đa ngôn ngữ cao, ổn định
+  - SDK Python đơn giản, tương thích với hệ thống đang dùng OpenAI Embeddings/LLM/Chat
+  - Có Realtime API cho khả năng hội thoại live hai chiều
+## Kiến trúc tổng quan
+- Tầng Audio API:
+  - Transcribe: OpenAI Whisper (`audio.transcriptions.create`, model `whisper-1`) – xử lý file WAV từ Gradio
+  - Audiomodus (live): Gradio streaming + VAD để phát hiện nói, auto gửi transcript vào chat; tùy chọn tích hợp OpenAI Realtime API cho streaming real-time
+- Tầng Chatbot/RAG giữ nguyên; thêm state điều phối audio: `is_listening`, `status_text`, `last_record_path`
+## Files sẽ chỉnh sửa
+- `app.py`
+  - Thêm tuỳ chọn Audiomodus (live): streaming callback, VAD indicator, auto send transcript
+  - Tạo state quản lý hội thoại và trạng thái ghi âm
+  - UI: thanh nhập pill trong khung chat, mic icon, toggle Audiomodus/Text, trạng thái rõ ràng
+- `speech_io.py`
+  - Thêm `transcribe_with_openai(audio_path, language)` dùng Whisper-1
+  - Giữ `transcribe_audio` (local) để fallback
+  - VAD đơn giản (`detect_voice_activity`) để hands‑free
+- `requirements.txt`
+  - Đảm bảo có `openai`
+## Tools/Function calls
+- Có sử dụng function calls nội bộ:
+  - `transcribe_with_openai(audio_path, language)` – gửi WAV lên OpenAI, trả `text`
+  - `detect_voice_activity(audio_data, sr, threshold)` – quyết định khi nào gửi transcript
+  - `transcribe_audio_optimized(audio_path, language)` – router backend theo ENV (ưu tiên OpenAI)
+- Tùy chọn Realtime API (phase 2):
+  - WebRTC/WebSocket client bên trình duyệt (Gradio JS hook) để stream audio tới OpenAI Realtime
+  - Python server relay (nếu cần) để giữ khóa API an toàn
+## Các bước triển khai
+1. `speech_io.py`:
+  - Thêm `OPENAI_API_KEY`; viết `transcribe_with_openai(...)` dùng `OpenAI().audio.transcriptions.create(model="whisper-1")`
+  - Cải thiện tiền xử lý: high‑pass, normalize, mono, resample 16kHz; tăng `ASR_MAX_DURATION_S`
+  - VAD đơn giản: tính RMS/peak + frame energy để phát hiện nói
+2. `app.py`:
+  - State `ConversationState` và UI control (Audiomodus toggle, status, VAD indicator)
+  - `chat_audio.stream/change` điền transcript vào `chat_text` và chain gọi chat để gửi tự động
+  - Hiển thị “Gesprochener Text wird gesendet” và player bản ghi
+3. ENV & cấu hình:
+  - `OPENAI_API_KEY`, `ASR_LANGUAGE=auto|de|en|vi`, `ASR_MAX_DURATION_S`
+4. Tùy chọn Realtime API (phase 2):
+  - Thêm triển khai WebRTC client; server relay để giữ an toàn API key
+## Kiểm thử
+- Test cases:
+  - Happy: câu nói 5–15s, tiếng Đức/Anh/Việt, transcript chính xác và gửi thẳng vào chat
+  - Error: API key thiếu/sai, file rỗng, tiếng nói quá nhỏ, VAD không phát hiện – không crash, có thông báo
+  - Streaming: transcript điền dần, tự gửi khi kết thúc nói
+- Metrics:
+  - Latency end‑to‑end (kết thúc nói → có câu trả lời)
+  - WER/char‑accuracy ước lượng (mẫu test nội bộ)
+  - Tỷ lệ no‑speech/mishear
+  - Sử dụng CPU/RAM khi local fallback
+## Bảo mật và mở rộng
+- API key đọc từ ENV, không log dữ liệu âm thanh
+- Cho phép xóa bản ghi khỏi server sau khi dùng (UI nút xoá)
+- Dễ mở rộng Realtime API và thêm TTS trả lời nếu bật
+## Deliverables
+- Code cập nhật ở `app.py`, `speech_io.py`, `requirements.txt`
+- UI Audiomodus live với VAD indicator, auto‑send transcript
+- Hướng dẫn cấu hình ENV và test nhanh

app.py CHANGED Viewed

@@ -699,13 +699,6 @@ with gr.Blocks(title="Prüfungsrechts-Chatbot (RAG + Sprache) - Enhanced") as de
         on_audio_change,
         inputs=[chat_audio, vad_toggle],
         outputs=[chat_text, vad_indicator, status_display]
-    ).then(
-        process_chat,
-        inputs=[chat_text, chat_audio, chatbot, lang_selector, vad_toggle],
-        outputs=[chatbot, chat_text, chat_audio, status_display]
-    ).then(
-        lambda: update_vad_indicator(),
-        outputs=[vad_indicator]
     )
     # Audio Streaming
@@ -713,13 +706,6 @@ with gr.Blocks(title="Prüfungsrechts-Chatbot (RAG + Sprache) - Enhanced") as de
         on_audio_change,
         inputs=[chat_audio, vad_toggle],
         outputs=[chat_text, vad_indicator, status_display]
-    ).then(
-        process_chat,
-        inputs=[chat_text, chat_audio, chatbot, lang_selector, vad_toggle],
-        outputs=[chatbot, chat_text, chat_audio, status_display]
-    ).then(
-        lambda: update_vad_indicator(),
-        outputs=[vad_indicator]
     )
     # TTS Button
@@ -744,3 +730,4 @@ with gr.Blocks(title="Prüfungsrechts-Chatbot (RAG + Sprache) - Enhanced") as de
 if __name__ == "__main__":
     demo.queue().launch(ssr_mode=False, show_error=True)

         on_audio_change,
         inputs=[chat_audio, vad_toggle],
         outputs=[chat_text, vad_indicator, status_display]
     )
     # Audio Streaming
         on_audio_change,
         inputs=[chat_audio, vad_toggle],
         outputs=[chat_text, vad_indicator, status_display]
     )
     # TTS Button
 if __name__ == "__main__":
     demo.queue().launch(ssr_mode=False, show_error=True)

realtime_server.py ADDED Viewed

	@@ -0,0 +1,97 @@

+"""
+realtime_server.py — v0.1 (2025-12-08)
+Realtime signaling & streaming server (WebSocket-based) for live audio chat.
+This module is optional and preserves backward compatibility with existing
+Gradio UI. When enabled, clients can stream microphone audio chunks to
+`/ws` and receive live transcripts (OpenAI Whisper API) and bot replies.
+NOTE: A full WebRTC peer-to-peer relay with SDP/ICE is scaffolded via
+`/webrtc/offer` but returns 501 until the upstream Realtime API is wired.
+"""
+import os
+import asyncio
+import json
+from typing import Optional
+from fastapi import FastAPI, WebSocket, WebSocketDisconnect
+from fastapi.responses import JSONResponse
+# Minimal import guard for OpenAI
+try:
+    from openai import OpenAI
+    OPENAI_AVAILABLE = True
+except Exception:
+    OPENAI_AVAILABLE = False
+OPENAI_API_KEY = os.getenv("OPENAI_API_KEY", "")
+app = FastAPI()
+def _openai_transcribe_file(path: str, language: Optional[str] = None) -> str:
+    """Transcribe a local WAV chunk via OpenAI Whisper-1.
+    Returns empty string on failure to keep the stream resilient."""
+    if not (OPENAI_AVAILABLE and OPENAI_API_KEY and path and os.path.exists(path)):
+        return ""
+    try:
+        client = OpenAI(api_key=OPENAI_API_KEY)
+        with open(path, "rb") as f:
+            resp = client.audio.transcriptions.create(
+                model="whisper-1",
+                file=f,
+                language=language if language and language != "auto" else None,
+            )
+        txt = getattr(resp, "text", "") or (resp.get("text") if isinstance(resp, dict) else "")
+        return (txt or "").strip()
+    except Exception:
+        return ""
+@app.get("/health")
+async def health():
+    """Basic health endpoint."""
+    return JSONResponse({"status": "ok"})
+@app.post("/webrtc/offer")
+async def webrtc_offer(body: dict):
+    """SDP offer scaffold (not fully implemented).
+    Returns 501 until Realtime API relay is wired (to keep backward compatibility)."""
+    return JSONResponse({"error": "not_implemented"}, status_code=501)
+@app.websocket("/ws")
+async def ws_stream(ws: WebSocket):
+    """WebSocket bidirectional streaming.
+    Client sends JSON frames:
+      {"type":"audio_chunk","path":"/tmp/chunk.wav","lang":"de"}
+    Server responds with transcript frames:
+      {"type":"transcript","text":"..."}
+    and bot reply frames (if desired in future).
+    """
+    await ws.accept()
+    try:
+        while True:
+            raw = await ws.receive_text()
+            try:
+                msg = json.loads(raw)
+            except Exception:
+                await ws.send_text(json.dumps({"type": "error", "message": "invalid_json"}))
+                continue
+            if msg.get("type") == "audio_chunk":
+                path = msg.get("path")
+                lang = msg.get("lang")
+                text = _openai_transcribe_file(path, language=lang)
+                await ws.send_text(json.dumps({"type": "transcript", "text": text}))
+            else:
+                await ws.send_text(json.dumps({"type": "error", "message": "unknown_type"}))
+    except WebSocketDisconnect:
+        pass
+    except Exception:
+        try:
+            await ws.close()
+        except Exception:
+            pass

requirements.txt CHANGED Viewed

@@ -15,6 +15,10 @@ langchain-text-splitters
 langchain-openai
 huggingface-hub
 groq
 # === VectorStore ===
 faiss-cpu

 langchain-openai
 huggingface-hub
 groq
+google-generativeai
+fastapi
+uvicorn
+websockets
 # === VectorStore ===
 faiss-cpu

speech_io.py CHANGED Viewed

@@ -517,4 +517,3 @@ __all__ = [
     'normalize_audio',
     'preprocess_audio_for_vad'
 ]

     'normalize_audio',
     'preprocess_audio_for_vad'
 ]