Abduallah Abuhassan commited on
Commit
3b627eb
·
1 Parent(s): aca7403

Add application file

Browse files
Files changed (4) hide show
  1. Dockerfile +30 -0
  2. README.md +24 -7
  3. app.py +500 -4
  4. requirements.txt +10 -0
Dockerfile ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM python:3.12-slim
2
+
3
+ # System dependencies for faster-whisper (ctranslate2) and audio processing
4
+ RUN apt-get update && apt-get install -y --no-install-recommends \
5
+ build-essential \
6
+ ffmpeg \
7
+ libsndfile1 \
8
+ && rm -rf /var/lib/apt/lists/*
9
+
10
+ # Create non-root user (required by HF Spaces)
11
+ RUN useradd -m -u 1000 user
12
+ USER user
13
+ ENV HOME=/home/user \
14
+ PATH=/home/user/.local/bin:$PATH
15
+
16
+ WORKDIR /home/user/app
17
+
18
+ # Install Python dependencies
19
+ COPY --chown=user requirements.txt .
20
+ RUN pip install --no-cache-dir --upgrade pip && \
21
+ pip install --no-cache-dir -r requirements.txt
22
+
23
+ # Copy application code
24
+ COPY --chown=user . .
25
+
26
+ # Expose Gradio port
27
+ EXPOSE 7860
28
+
29
+ # Run the app
30
+ CMD ["python", "app.py"]
README.md CHANGED
@@ -1,13 +1,30 @@
1
  ---
2
  title: Reachy Mini Open Conversation
3
- emoji: 😻
4
  colorFrom: indigo
5
- colorTo: indigo
6
- sdk: gradio
7
- sdk_version: 6.8.0
8
- app_file: app.py
9
  pinned: false
10
- short_description: ReachyMini_Open Conversation app that uses open source model
11
  ---
12
 
13
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  title: Reachy Mini Open Conversation
3
+ emoji: 🤖
4
  colorFrom: indigo
5
+ colorTo: purple
6
+ sdk: docker
 
 
7
  pinned: false
8
+ short_description: Talk with Reachy Mini using open-source models (Ollama + faster-whisper + edge-tts)
9
  ---
10
 
11
+ # 🤖 Reachy Mini Open Conversation
12
+
13
+ A voice conversation app powered by fully open-source models:
14
+
15
+ - **STT**: [faster-whisper](https://github.com/SYSTRAN/faster-whisper) — fast speech-to-text
16
+ - **LLM**: [Ollama](https://ollama.com/) — local LLM inference (llama3.2 by default)
17
+ - **TTS**: [edge-tts](https://github.com/rany2/edge-tts) — high-quality text-to-speech
18
+
19
+ ## Setup
20
+
21
+ Set these environment variables (as Space secrets for HF Spaces):
22
+
23
+ | Variable | Default | Description |
24
+ |---|---|---|
25
+ | `OLLAMA_BASE_URL` | `http://localhost:11434` | URL of your Ollama server |
26
+ | `MODEL_NAME` | `llama3.2` | Ollama model to use |
27
+ | `STT_MODEL` | `base` | faster-whisper model size (tiny/base/small/medium/large-v3) |
28
+ | `TTS_VOICE` | `en-US-AriaNeural` | edge-tts voice name |
29
+
30
+ > **Note**: You need a running Ollama server accessible from the Space. Set `OLLAMA_BASE_URL` to point to your remote Ollama instance.
app.py CHANGED
@@ -1,7 +1,503 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  import gradio as gr
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
 
3
- def greet(name):
4
- return "Hello " + name + "!!"
 
 
 
 
5
 
6
- demo = gr.Interface(fn=greet, inputs="text", outputs="text")
7
- demo.launch()
 
1
+ """Reachy Mini Open Conversation — Hugging Face Spaces App.
2
+
3
+ Standalone conversation app using open-source models:
4
+ Audio In → faster-whisper (STT) → Ollama (LLM) → edge-tts (TTS) → Audio Out
5
+
6
+ No robot hardware dependencies — runs entirely in the browser via Gradio + FastRTC.
7
+ """
8
+
9
+ import os
10
+ import json
11
+ import asyncio
12
+ import logging
13
+ from typing import Any, Final, Tuple
14
+ from datetime import datetime
15
+
16
+ import numpy as np
17
  import gradio as gr
18
+ import edge_tts
19
+ import miniaudio
20
+ from ollama import AsyncClient as OllamaAsyncClient
21
+ from fastrtc import AdditionalOutputs, AsyncStreamHandler, Stream, wait_for_item, audio_to_int16
22
+ from numpy.typing import NDArray
23
+ from scipy.signal import resample
24
+
25
+
26
+ # ---------------------------------------------------------------------------
27
+ # Logging
28
+ # ---------------------------------------------------------------------------
29
+ logging.basicConfig(
30
+ level=logging.INFO,
31
+ format="%(asctime)s %(levelname)s %(name)s:%(lineno)d | %(message)s",
32
+ )
33
+ logger = logging.getLogger("reachy-mini-open")
34
+
35
+ # Tame noisy libraries
36
+ for lib in ("aiortc", "aioice", "httpx", "websockets"):
37
+ logging.getLogger(lib).setLevel(logging.WARNING)
38
+
39
+ # ---------------------------------------------------------------------------
40
+ # Configuration (env vars — set as HF Space secrets)
41
+ # ---------------------------------------------------------------------------
42
+ OLLAMA_BASE_URL = os.getenv("OLLAMA_BASE_URL", "http://localhost:11434")
43
+ MODEL_NAME = os.getenv("MODEL_NAME", "llama3.2")
44
+ STT_MODEL = os.getenv("STT_MODEL", "base")
45
+ TTS_VOICE = os.getenv("TTS_VOICE", "en-US-AriaNeural")
46
+
47
+ # ---------------------------------------------------------------------------
48
+ # Audio constants
49
+ # ---------------------------------------------------------------------------
50
+ HANDLER_SAMPLE_RATE: Final[int] = 24000
51
+ WHISPER_SAMPLE_RATE: Final[int] = 16000
52
+
53
+ # VAD thresholds
54
+ SILENCE_RMS_THRESHOLD: Final[float] = 500.0
55
+ SILENCE_DURATION_S: Final[float] = 0.8
56
+ MIN_SPEECH_DURATION_S: Final[float] = 0.3
57
+
58
+ # ---------------------------------------------------------------------------
59
+ # System prompts
60
+ # ---------------------------------------------------------------------------
61
+ DEFAULT_PROMPT = """\
62
+ ## IDENTITY
63
+ You are Reachy Mini: a friendly, compact robot assistant with a calm voice and a subtle sense of humor.
64
+ Personality: concise, helpful, and lightly witty — never sarcastic or over the top.
65
+ You speak English by default and switch languages only if explicitly told.
66
+
67
+ ## CRITICAL RESPONSE RULES
68
+ Respond in 1–2 sentences maximum.
69
+ Be helpful first, then add a small touch of humor if it fits naturally.
70
+ Avoid long explanations or filler words.
71
+ Keep responses under 25 words when possible.
72
+
73
+ ## CORE TRAITS
74
+ Warm, efficient, and approachable.
75
+ Light humor only: gentle quips, small self-awareness, or playful understatement.
76
+ No sarcasm, no teasing.
77
+ If unsure, admit it briefly and offer help ("Not sure yet, but I can check!").
78
+
79
+ ## BEHAVIOR RULES
80
+ Be helpful, clear, and respectful in every reply.
81
+ Use humor sparingly — clarity comes first.
82
+ Admit mistakes briefly and correct them.
83
+ """
84
+
85
+ PERSONALITIES = {
86
+ "Default (Reachy Mini)": DEFAULT_PROMPT,
87
+ "Friendly Assistant": (
88
+ "You are a warm, helpful assistant. Keep answers concise (1-2 sentences). "
89
+ "Be friendly and approachable."
90
+ ),
91
+ "Technical Expert": (
92
+ "You are a precise technical expert. Give clear, accurate answers in 1-2 sentences. "
93
+ "Use technical terms when appropriate but explain simply."
94
+ ),
95
+ "Creative Storyteller": (
96
+ "You are a creative storyteller. Keep responses short but vivid and imaginative. "
97
+ "Add a touch of wonder to your replies."
98
+ ),
99
+ }
100
+
101
+ # ---------------------------------------------------------------------------
102
+ # Available TTS voices
103
+ # ---------------------------------------------------------------------------
104
+ TTS_VOICES = [
105
+ "en-US-AriaNeural",
106
+ "en-US-GuyNeural",
107
+ "en-US-JennyNeural",
108
+ "en-US-ChristopherNeural",
109
+ "en-GB-SoniaNeural",
110
+ "en-GB-RyanNeural",
111
+ "de-DE-ConradNeural",
112
+ "de-DE-KatjaNeural",
113
+ "fr-FR-DeniseNeural",
114
+ "fr-FR-HenriNeural",
115
+ "it-IT-ElsaNeural",
116
+ "it-IT-DiegoNeural",
117
+ ]
118
+
119
+
120
+ # ---------------------------------------------------------------------------
121
+ # Conversation Handler
122
+ # ---------------------------------------------------------------------------
123
+ class ConversationHandler(AsyncStreamHandler):
124
+ """Audio streaming handler: STT → Ollama LLM → edge-tts TTS."""
125
+
126
+ def __init__(self) -> None:
127
+ """Initialize the handler."""
128
+ super().__init__(
129
+ expected_layout="mono",
130
+ output_sample_rate=HANDLER_SAMPLE_RATE,
131
+ input_sample_rate=HANDLER_SAMPLE_RATE,
132
+ )
133
+
134
+ # Output queue
135
+ self.output_queue: asyncio.Queue[Tuple[int, NDArray[np.int16]] | AdditionalOutputs] = asyncio.Queue()
136
+
137
+ # Clients (initialized in start_up)
138
+ self.ollama_client: OllamaAsyncClient | None = None
139
+ self.whisper_model: Any = None
140
+
141
+ # Conversation history
142
+ self._messages: list[dict[str, Any]] = []
143
+
144
+ # Audio buffering for VAD
145
+ self._audio_buffer: list[NDArray[np.int16]] = []
146
+ self._is_speaking: bool = False
147
+ self._silence_frame_count: int = 0
148
+ self._speech_frame_count: int = 0
149
+
150
+ # TTS voice
151
+ self._tts_voice: str = TTS_VOICE
152
+
153
+ # Lifecycle
154
+ self._shutdown_requested: bool = False
155
+
156
+ def copy(self) -> "ConversationHandler":
157
+ """Create a copy of this handler."""
158
+ return ConversationHandler()
159
+
160
+ # ------------------------------------------------------------------ #
161
+ # Startup
162
+ # ------------------------------------------------------------------ #
163
+
164
+ async def start_up(self) -> None:
165
+ """Initialize STT model and Ollama client."""
166
+ # 1. Ollama client
167
+ self.ollama_client = OllamaAsyncClient(host=OLLAMA_BASE_URL)
168
+ try:
169
+ await self.ollama_client.list()
170
+ logger.info("Connected to Ollama at %s", OLLAMA_BASE_URL)
171
+ except Exception as e:
172
+ logger.error("Cannot reach Ollama at %s: %s", OLLAMA_BASE_URL, e)
173
+ logger.warning("Proceeding — requests will fail until Ollama is available.")
174
+
175
+ # 2. faster-whisper STT
176
+ try:
177
+ from faster_whisper import WhisperModel
178
+
179
+ self.whisper_model = WhisperModel(
180
+ STT_MODEL,
181
+ device="auto",
182
+ compute_type="int8",
183
+ )
184
+ logger.info("Loaded faster-whisper model: %s", STT_MODEL)
185
+ except Exception as e:
186
+ logger.error("Failed to load STT model '%s': %s", STT_MODEL, e)
187
+
188
+ # 3. System prompt
189
+ self._messages = [{"role": "system", "content": DEFAULT_PROMPT}]
190
+
191
+ logger.info(
192
+ "Handler ready — model=%s stt=%s tts_voice=%s",
193
+ MODEL_NAME,
194
+ STT_MODEL,
195
+ self._tts_voice,
196
+ )
197
+
198
+ # Keep alive
199
+ while not self._shutdown_requested:
200
+ await asyncio.sleep(0.1)
201
+
202
+ # ------------------------------------------------------------------ #
203
+ # Audio receive → VAD → STT → LLM → TTS
204
+ # ------------------------------------------------------------------ #
205
+
206
+ async def receive(self, frame: Tuple[int, NDArray[np.int16]]) -> None:
207
+ """Receive audio from mic, run VAD, kick off pipeline on speech end."""
208
+ if self._shutdown_requested or self.whisper_model is None:
209
+ return
210
+
211
+ input_sample_rate, audio_frame = frame
212
+
213
+ # Reshape to 1-D mono
214
+ if audio_frame.ndim == 2:
215
+ if audio_frame.shape[1] > audio_frame.shape[0]:
216
+ audio_frame = audio_frame.T
217
+ if audio_frame.shape[1] > 1:
218
+ audio_frame = audio_frame[:, 0]
219
+
220
+ # Resample to handler rate
221
+ if input_sample_rate != HANDLER_SAMPLE_RATE:
222
+ audio_frame = resample(
223
+ audio_frame, int(len(audio_frame) * HANDLER_SAMPLE_RATE / input_sample_rate)
224
+ )
225
+
226
+ audio_frame = audio_to_int16(audio_frame)
227
+
228
+ # Energy-based VAD
229
+ rms = float(np.sqrt(np.mean(audio_frame.astype(np.float32) ** 2)))
230
+ frame_duration = len(audio_frame) / HANDLER_SAMPLE_RATE
231
+
232
+ if rms > SILENCE_RMS_THRESHOLD:
233
+ if not self._is_speaking:
234
+ self._is_speaking = True
235
+ self._speech_frame_count = 0
236
+ logger.debug("Speech started (RMS=%.0f)", rms)
237
+ self._silence_frame_count = 0
238
+ self._speech_frame_count += 1
239
+ self._audio_buffer.append(audio_frame)
240
+ else:
241
+ if self._is_speaking:
242
+ self._silence_frame_count += 1
243
+ self._audio_buffer.append(audio_frame)
244
+
245
+ silence_duration = self._silence_frame_count * frame_duration
246
+ if silence_duration >= SILENCE_DURATION_S:
247
+ speech_duration = self._speech_frame_count * frame_duration
248
+
249
+ if speech_duration >= MIN_SPEECH_DURATION_S:
250
+ logger.debug("Speech ended (%.1fs)", speech_duration)
251
+ full_audio = np.concatenate(self._audio_buffer)
252
+ self._audio_buffer = []
253
+ self._is_speaking = False
254
+ self._silence_frame_count = 0
255
+ self._speech_frame_count = 0
256
+ asyncio.create_task(self._process_speech(full_audio))
257
+ else:
258
+ self._audio_buffer = []
259
+ self._is_speaking = False
260
+ self._silence_frame_count = 0
261
+ self._speech_frame_count = 0
262
+
263
+ # ------------------------------------------------------------------ #
264
+ # Speech processing pipeline
265
+ # ------------------------------------------------------------------ #
266
+
267
+ async def _process_speech(self, audio_data: NDArray[np.int16]) -> None:
268
+ """Full pipeline: STT → LLM → TTS."""
269
+ try:
270
+ # 1. Speech-to-text
271
+ text = await self._transcribe(audio_data)
272
+ if not text:
273
+ return
274
+
275
+ logger.info("User: %s", text)
276
+ await self.output_queue.put(AdditionalOutputs({"role": "user", "content": text}))
277
+
278
+ # 2. LLM response
279
+ self._messages.append({"role": "user", "content": text})
280
+ response_text = await self._chat()
281
+
282
+ if response_text:
283
+ logger.info("Assistant: %s", response_text)
284
+ await self.output_queue.put(
285
+ AdditionalOutputs({"role": "assistant", "content": response_text})
286
+ )
287
+
288
+ # 3. Text-to-speech
289
+ await self._synthesize_speech(response_text)
290
+
291
+ except Exception as e:
292
+ logger.error("Speech processing error: %s", e)
293
+ await self.output_queue.put(
294
+ AdditionalOutputs({"role": "assistant", "content": f"[error] {e}"})
295
+ )
296
+
297
+ async def _transcribe(self, audio_data: NDArray[np.int16]) -> str:
298
+ """Run faster-whisper STT on raw PCM audio."""
299
+ float_audio = audio_data.astype(np.float32) / 32768.0
300
+ whisper_audio = resample(
301
+ float_audio,
302
+ int(len(float_audio) * WHISPER_SAMPLE_RATE / HANDLER_SAMPLE_RATE),
303
+ ).astype(np.float32)
304
+
305
+ loop = asyncio.get_event_loop()
306
+ segments, _info = await loop.run_in_executor(
307
+ None,
308
+ lambda: self.whisper_model.transcribe(whisper_audio, beam_size=5),
309
+ )
310
+
311
+ text_parts: list[str] = []
312
+ for seg in segments:
313
+ text_parts.append(seg.text)
314
+ return " ".join(text_parts).strip()
315
+
316
+ async def _chat(self) -> str:
317
+ """Send conversation to Ollama and return response text."""
318
+ if self.ollama_client is None:
319
+ return "Ollama client not initialized."
320
+
321
+ try:
322
+ response = await self.ollama_client.chat(
323
+ model=MODEL_NAME,
324
+ messages=self._messages,
325
+ )
326
+
327
+ response_text = response["message"].get("content", "")
328
+ if response_text:
329
+ self._messages.append({"role": "assistant", "content": response_text})
330
+ return response_text
331
+
332
+ except Exception as e:
333
+ logger.error("Ollama chat error: %s", e)
334
+ return f"Sorry, I couldn't process that. Error: {e}"
335
+
336
+ # ------------------------------------------------------------------ #
337
+ # Text-to-speech
338
+ # ------------------------------------------------------------------ #
339
+
340
+ async def _synthesize_speech(self, text: str) -> None:
341
+ """Convert text to speech via edge-tts and queue audio output."""
342
+ if not text.strip():
343
+ return
344
+ try:
345
+ communicate = edge_tts.Communicate(text, self._tts_voice)
346
+
347
+ mp3_chunks: list[bytes] = []
348
+ async for chunk in communicate.stream():
349
+ if chunk["type"] == "audio":
350
+ mp3_chunks.append(chunk["data"])
351
+
352
+ if not mp3_chunks:
353
+ return
354
+
355
+ mp3_data = b"".join(mp3_chunks)
356
+
357
+ # Decode MP3 → raw PCM
358
+ decoded = miniaudio.decode(
359
+ mp3_data,
360
+ output_format=miniaudio.SampleFormat.SIGNED16,
361
+ nchannels=1,
362
+ sample_rate=HANDLER_SAMPLE_RATE,
363
+ )
364
+ samples = np.frombuffer(decoded.samples, dtype=np.int16)
365
+
366
+ # Stream in ~100ms chunks
367
+ chunk_size = HANDLER_SAMPLE_RATE // 10
368
+ for i in range(0, len(samples), chunk_size):
369
+ audio_chunk = samples[i : i + chunk_size]
370
+ await self.output_queue.put(
371
+ (HANDLER_SAMPLE_RATE, audio_chunk.reshape(1, -1))
372
+ )
373
+
374
+ except Exception as e:
375
+ logger.error("TTS synthesis error: %s", e)
376
+
377
+ # ------------------------------------------------------------------ #
378
+ # Emit (speaker output)
379
+ # ------------------------------------------------------------------ #
380
+
381
+ async def emit(self) -> Tuple[int, NDArray[np.int16]] | AdditionalOutputs | None:
382
+ """Emit next audio frame or chat update."""
383
+ return await wait_for_item(self.output_queue)
384
+
385
+ # ------------------------------------------------------------------ #
386
+ # Personality management
387
+ # ------------------------------------------------------------------ #
388
+
389
+ async def apply_personality(self, name: str) -> str:
390
+ """Apply a personality by name, resetting conversation."""
391
+ prompt = PERSONALITIES.get(name, DEFAULT_PROMPT)
392
+ self._messages = [{"role": "system", "content": prompt}]
393
+ logger.info("Applied personality: %s", name)
394
+ return f"✅ Applied personality: {name}"
395
+
396
+ def set_voice(self, voice: str) -> str:
397
+ """Change TTS voice."""
398
+ self._tts_voice = voice
399
+ logger.info("Changed TTS voice to: %s", voice)
400
+ return f"✅ Voice set to: {voice}"
401
+
402
+ # ------------------------------------------------------------------ #
403
+ # Shutdown
404
+ # ------------------------------------------------------------------ #
405
+
406
+ async def shutdown(self) -> None:
407
+ """Shutdown the handler."""
408
+ self._shutdown_requested = True
409
+ while not self.output_queue.empty():
410
+ try:
411
+ self.output_queue.get_nowait()
412
+ except asyncio.QueueEmpty:
413
+ break
414
+
415
+
416
+ # ---------------------------------------------------------------------------
417
+ # Chatbot update helper
418
+ # ---------------------------------------------------------------------------
419
+ def update_chatbot(chatbot, response):
420
+ """Update the chatbot with AdditionalOutputs."""
421
+ chatbot.append(response)
422
+ return chatbot
423
+
424
+
425
+ # ---------------------------------------------------------------------------
426
+ # Build Gradio UI
427
+ # ---------------------------------------------------------------------------
428
+ def create_app():
429
+ """Create and return the Gradio app."""
430
+
431
+ handler = ConversationHandler()
432
+
433
+ chatbot = gr.Chatbot(
434
+ type="messages",
435
+ label="Conversation",
436
+ height=400,
437
+ )
438
+
439
+ # Personality dropdown
440
+ personality_dropdown = gr.Dropdown(
441
+ label="🎭 Personality",
442
+ choices=list(PERSONALITIES.keys()),
443
+ value="Default (Reachy Mini)",
444
+ )
445
+
446
+ # Voice dropdown
447
+ voice_dropdown = gr.Dropdown(
448
+ label="🎤 TTS Voice",
449
+ choices=TTS_VOICES,
450
+ value=TTS_VOICE,
451
+ )
452
+
453
+ # Status display
454
+ status_md = gr.Markdown(value="", label="Status")
455
+
456
+ stream = Stream(
457
+ handler=handler,
458
+ mode="send-receive",
459
+ modality="audio",
460
+ additional_inputs=[
461
+ chatbot,
462
+ personality_dropdown,
463
+ voice_dropdown,
464
+ status_md,
465
+ ],
466
+ additional_outputs=[chatbot],
467
+ additional_outputs_handler=update_chatbot,
468
+ ui_args={"title": "🤖 Talk with Reachy Mini"},
469
+ )
470
+
471
+ # Wire personality and voice events
472
+ with stream.ui:
473
+ async def _apply_personality(selected: str) -> str:
474
+ result = await handler.apply_personality(selected)
475
+ return result
476
+
477
+ def _set_voice(selected: str) -> str:
478
+ return handler.set_voice(selected)
479
+
480
+ personality_dropdown.change(
481
+ fn=_apply_personality,
482
+ inputs=[personality_dropdown],
483
+ outputs=[status_md],
484
+ )
485
+
486
+ voice_dropdown.change(
487
+ fn=_set_voice,
488
+ inputs=[voice_dropdown],
489
+ outputs=[status_md],
490
+ )
491
+
492
+ return stream
493
+
494
 
495
+ # ---------------------------------------------------------------------------
496
+ # Entrypoint
497
+ # ---------------------------------------------------------------------------
498
+ if __name__ == "__main__":
499
+ logger.info("Starting Reachy Mini Open Conversation")
500
+ logger.info("Config: OLLAMA=%s MODEL=%s STT=%s TTS=%s", OLLAMA_BASE_URL, MODEL_NAME, STT_MODEL, TTS_VOICE)
501
 
502
+ stream = create_app()
503
+ stream.ui.launch(server_name="0.0.0.0", server_port=7860)
requirements.txt ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ gradio==5.50.1.dev1
2
+ fastrtc>=0.0.34
3
+ aiortc>=1.13.0
4
+ ollama>=0.4
5
+ faster-whisper>=1.0
6
+ edge-tts>=7.0
7
+ miniaudio>=1.60
8
+ scipy
9
+ numpy
10
+ opencv-python-headless>=4.12.0.88