marcosremar2 Claude Opus 4.5 commited on
Commit
53b54ed
·
1 Parent(s): 8d0b08c

Initial commit: Avatar integrator server with performance statistics

Browse files

- Web interface for TTS + Wav2Lip avatar integration
- Real-time MJPEG video stream from Wav2Lip
- WebSocket proxy between client and Wav2Lip
- Performance statistics panel with timing metrics:
- Round-trip time measurement
- TTS generation time
- Wav2Lip processing time
- First frame latency
- Request history with averages

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Files changed (2) hide show
  1. server.py +634 -0
  2. startup.sh +116 -0
server.py ADDED
@@ -0,0 +1,634 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Servidor Integrador - Avatar com TTS
3
+ Porta: 8080
4
+ Conecta:
5
+ - Orpheus TTS: localhost:8880
6
+ - Wav2Lip: localhost:8085
7
+ """
8
+ import asyncio
9
+ import json
10
+ import os
11
+ from aiohttp import web, ClientSession
12
+ import aiohttp
13
+
14
+ PORT = 8080
15
+ WAV2LIP_URL = "http://localhost:8085"
16
+ TTS_URL = "http://localhost:8880"
17
+
18
+ # HTML da interface
19
+ HTML_TEMPLATE = """
20
+ <!DOCTYPE html>
21
+ <html lang="pt-BR">
22
+ <head>
23
+ <meta charset="UTF-8">
24
+ <meta name="viewport" content="width=device-width, initial-scale=1.0">
25
+ <title>Avatar Integrado</title>
26
+ <style>
27
+ * { margin: 0; padding: 0; box-sizing: border-box; }
28
+ body {
29
+ font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans-serif;
30
+ background: linear-gradient(135deg, #1a1a2e 0%, #16213e 100%);
31
+ min-height: 100vh;
32
+ display: flex;
33
+ flex-direction: column;
34
+ align-items: center;
35
+ padding: 20px;
36
+ color: #fff;
37
+ }
38
+ h1 {
39
+ margin-bottom: 20px;
40
+ font-weight: 300;
41
+ font-size: 1.8rem;
42
+ }
43
+ .container {
44
+ display: flex;
45
+ gap: 20px;
46
+ flex-wrap: wrap;
47
+ justify-content: center;
48
+ max-width: 1200px;
49
+ }
50
+ .video-container {
51
+ background: #000;
52
+ border-radius: 12px;
53
+ overflow: hidden;
54
+ box-shadow: 0 10px 40px rgba(0,0,0,0.5);
55
+ }
56
+ #avatar-video {
57
+ width: 512px;
58
+ height: 512px;
59
+ object-fit: cover;
60
+ }
61
+ .controls {
62
+ background: rgba(255,255,255,0.1);
63
+ backdrop-filter: blur(10px);
64
+ border-radius: 12px;
65
+ padding: 20px;
66
+ width: 350px;
67
+ }
68
+ .control-group {
69
+ margin-bottom: 15px;
70
+ }
71
+ label {
72
+ display: block;
73
+ margin-bottom: 5px;
74
+ font-size: 0.9rem;
75
+ color: #aaa;
76
+ }
77
+ input, select, textarea {
78
+ width: 100%;
79
+ padding: 10px;
80
+ border: none;
81
+ border-radius: 8px;
82
+ background: rgba(255,255,255,0.1);
83
+ color: #fff;
84
+ font-size: 1rem;
85
+ }
86
+ textarea {
87
+ min-height: 100px;
88
+ resize: vertical;
89
+ }
90
+ button {
91
+ width: 100%;
92
+ padding: 12px;
93
+ border: none;
94
+ border-radius: 8px;
95
+ background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
96
+ color: #fff;
97
+ font-size: 1rem;
98
+ cursor: pointer;
99
+ transition: transform 0.2s, box-shadow 0.2s;
100
+ }
101
+ button:hover {
102
+ transform: translateY(-2px);
103
+ box-shadow: 0 5px 20px rgba(102, 126, 234, 0.4);
104
+ }
105
+ button:disabled {
106
+ opacity: 0.5;
107
+ cursor: not-allowed;
108
+ transform: none;
109
+ }
110
+ .status {
111
+ margin-top: 15px;
112
+ padding: 10px;
113
+ border-radius: 8px;
114
+ background: rgba(0,0,0,0.3);
115
+ font-size: 0.85rem;
116
+ }
117
+ .status.connected { border-left: 3px solid #4caf50; }
118
+ .status.disconnected { border-left: 3px solid #f44336; }
119
+ .status.speaking { border-left: 3px solid #2196f3; }
120
+ #audio-player { display: none; }
121
+
122
+ /* Statistics Panel */
123
+ .stats-panel {
124
+ background: rgba(255,255,255,0.1);
125
+ backdrop-filter: blur(10px);
126
+ border-radius: 12px;
127
+ padding: 20px;
128
+ width: 350px;
129
+ max-height: 500px;
130
+ overflow-y: auto;
131
+ }
132
+ .stats-panel h2 {
133
+ font-size: 1.1rem;
134
+ margin-bottom: 15px;
135
+ color: #aaa;
136
+ font-weight: 400;
137
+ }
138
+ .stat-row {
139
+ display: flex;
140
+ justify-content: space-between;
141
+ padding: 8px 0;
142
+ border-bottom: 1px solid rgba(255,255,255,0.1);
143
+ }
144
+ .stat-label {
145
+ color: #aaa;
146
+ font-size: 0.85rem;
147
+ }
148
+ .stat-value {
149
+ font-weight: 500;
150
+ font-size: 0.9rem;
151
+ }
152
+ .stat-value.fast { color: #4caf50; }
153
+ .stat-value.medium { color: #ff9800; }
154
+ .stat-value.slow { color: #f44336; }
155
+ .stats-history {
156
+ margin-top: 15px;
157
+ }
158
+ .stats-history h3 {
159
+ font-size: 0.95rem;
160
+ margin-bottom: 10px;
161
+ color: #888;
162
+ }
163
+ .history-item {
164
+ background: rgba(0,0,0,0.3);
165
+ border-radius: 8px;
166
+ padding: 10px;
167
+ margin-bottom: 8px;
168
+ font-size: 0.8rem;
169
+ }
170
+ .history-item .timestamp {
171
+ color: #666;
172
+ font-size: 0.75rem;
173
+ }
174
+ .history-item .metrics {
175
+ display: grid;
176
+ grid-template-columns: 1fr 1fr;
177
+ gap: 5px;
178
+ margin-top: 5px;
179
+ }
180
+ .avg-stats {
181
+ background: rgba(102, 126, 234, 0.2);
182
+ border-radius: 8px;
183
+ padding: 10px;
184
+ margin-bottom: 15px;
185
+ }
186
+ .avg-stats h3 {
187
+ font-size: 0.9rem;
188
+ margin-bottom: 8px;
189
+ color: #667eea;
190
+ }
191
+ </style>
192
+ </head>
193
+ <body>
194
+ <h1>Avatar Integrado</h1>
195
+
196
+ <div class="container">
197
+ <div class="video-container">
198
+ <img id="avatar-video" alt="Avatar">
199
+ </div>
200
+
201
+ <div class="controls">
202
+ <div class="control-group">
203
+ <label>Voz</label>
204
+ <select id="voice-select">
205
+ <option value="tara">Tara (Feminina)</option>
206
+ <option value="leah">Leah (Feminina)</option>
207
+ <option value="jess">Jess (Feminina)</option>
208
+ <option value="mia">Mia (Feminina)</option>
209
+ <option value="leo">Leo (Masculina)</option>
210
+ <option value="dan">Dan (Masculina)</option>
211
+ <option value="zac">Zac (Masculina)</option>
212
+ <option value="zoe">Zoe (Feminina)</option>
213
+ </select>
214
+ </div>
215
+
216
+ <div class="control-group">
217
+ <label>Texto para falar</label>
218
+ <textarea id="text-input" placeholder="Enter the text here...">Hello! I am a digital avatar with real-time voice synthesis.</textarea>
219
+ </div>
220
+
221
+ <button id="speak-btn" onclick="speak()">Falar</button>
222
+
223
+ <div id="status" class="status disconnected">
224
+ Conectando...
225
+ </div>
226
+ </div>
227
+
228
+ <!-- Statistics Panel -->
229
+ <div class="stats-panel">
230
+ <h2>Performance Statistics</h2>
231
+
232
+ <div class="avg-stats">
233
+ <h3>Averages (last 10)</h3>
234
+ <div class="stat-row">
235
+ <span class="stat-label">Avg Round-Trip</span>
236
+ <span id="avg-roundtrip" class="stat-value">--</span>
237
+ </div>
238
+ <div class="stat-row">
239
+ <span class="stat-label">Avg TTS Time</span>
240
+ <span id="avg-tts" class="stat-value">--</span>
241
+ </div>
242
+ <div class="stat-row">
243
+ <span class="stat-label">Avg Wav2Lip Time</span>
244
+ <span id="avg-wav2lip" class="stat-value">--</span>
245
+ </div>
246
+ </div>
247
+
248
+ <div class="stat-row">
249
+ <span class="stat-label">Last Round-Trip</span>
250
+ <span id="last-roundtrip" class="stat-value">--</span>
251
+ </div>
252
+ <div class="stat-row">
253
+ <span class="stat-label">Last TTS Time</span>
254
+ <span id="last-tts" class="stat-value">--</span>
255
+ </div>
256
+ <div class="stat-row">
257
+ <span class="stat-label">Last Wav2Lip Time</span>
258
+ <span id="last-wav2lip" class="stat-value">--</span>
259
+ </div>
260
+ <div class="stat-row">
261
+ <span class="stat-label">First Frame</span>
262
+ <span id="last-firstframe" class="stat-value">--</span>
263
+ </div>
264
+ <div class="stat-row">
265
+ <span class="stat-label">Audio Duration</span>
266
+ <span id="last-audioduration" class="stat-value">--</span>
267
+ </div>
268
+ <div class="stat-row">
269
+ <span class="stat-label">Text Length</span>
270
+ <span id="last-textlen" class="stat-value">--</span>
271
+ </div>
272
+
273
+ <div class="stats-history">
274
+ <h3>Request History</h3>
275
+ <div id="history-container"></div>
276
+ </div>
277
+ </div>
278
+ </div>
279
+
280
+ <audio id="audio-player"></audio>
281
+
282
+ <script>
283
+ const statusEl = document.getElementById('status');
284
+ const speakBtn = document.getElementById('speak-btn');
285
+ const audioPlayer = document.getElementById('audio-player');
286
+ let ws = null;
287
+ let isConnected = false;
288
+
289
+ // Statistics tracking
290
+ let requestStartTime = null;
291
+ let statsHistory = [];
292
+ const MAX_HISTORY = 10;
293
+
294
+ // Conectar ao WebSocket do Wav2Lip
295
+ function connectWebSocket() {
296
+ ws = new WebSocket('ws://' + window.location.host + '/ws/avatar');
297
+
298
+ ws.onopen = () => {
299
+ isConnected = true;
300
+ statusEl.textContent = 'Conectado ao servidor';
301
+ statusEl.className = 'status connected';
302
+ };
303
+
304
+ ws.onmessage = (event) => {
305
+ const data = JSON.parse(event.data);
306
+ if (data.status === 'speaking') {
307
+ statusEl.textContent = 'Falando...';
308
+ statusEl.className = 'status speaking';
309
+ speakBtn.disabled = true;
310
+ } else if (data.status === 'idle') {
311
+ statusEl.textContent = 'Pronto';
312
+ statusEl.className = 'status connected';
313
+ speakBtn.disabled = false;
314
+ // Calculate round-trip time when idle
315
+ if (requestStartTime) {
316
+ const roundTrip = (performance.now() - requestStartTime) / 1000;
317
+ updateStats({roundTrip});
318
+ requestStartTime = null;
319
+ }
320
+ } else if (data.status === 'error') {
321
+ statusEl.textContent = 'Erro: ' + data.message;
322
+ statusEl.className = 'status disconnected';
323
+ speakBtn.disabled = false;
324
+ requestStartTime = null;
325
+ } else if (data.audio) {
326
+ // Recebeu áudio para reproduzir
327
+ const audioBlob = base64ToBlob(data.audio, 'audio/wav');
328
+ audioPlayer.src = URL.createObjectURL(audioBlob);
329
+ audioPlayer.play();
330
+ }
331
+ // Process timing stats from server
332
+ if (data.stats) {
333
+ updateStats(data.stats);
334
+ }
335
+ // Also check for individual timing fields
336
+ if (data.tts_time !== undefined || data.wav2lip_time !== undefined || data.first_frame_time !== undefined) {
337
+ updateStats({
338
+ tts_time: data.tts_time,
339
+ wav2lip_time: data.wav2lip_time,
340
+ first_frame_time: data.first_frame_time,
341
+ audio_duration: data.audio_duration,
342
+ text_length: data.text_length
343
+ });
344
+ }
345
+ };
346
+
347
+ ws.onclose = () => {
348
+ isConnected = false;
349
+ statusEl.textContent = 'Desconectado. Reconectando...';
350
+ statusEl.className = 'status disconnected';
351
+ setTimeout(connectWebSocket, 2000);
352
+ };
353
+
354
+ ws.onerror = (error) => {
355
+ console.error('WebSocket error:', error);
356
+ };
357
+ }
358
+
359
+ function base64ToBlob(base64, mimeType) {
360
+ const byteCharacters = atob(base64);
361
+ const byteNumbers = new Array(byteCharacters.length);
362
+ for (let i = 0; i < byteCharacters.length; i++) {
363
+ byteNumbers[i] = byteCharacters.charCodeAt(i);
364
+ }
365
+ const byteArray = new Uint8Array(byteNumbers);
366
+ return new Blob([byteArray], { type: mimeType });
367
+ }
368
+
369
+ async function speak() {
370
+ const text = document.getElementById('text-input').value.trim();
371
+ const voice = document.getElementById('voice-select').value;
372
+
373
+ if (!text) {
374
+ alert('Digite um texto para falar');
375
+ return;
376
+ }
377
+
378
+ if (!isConnected) {
379
+ alert('Não conectado ao servidor');
380
+ return;
381
+ }
382
+
383
+ speakBtn.disabled = true;
384
+ statusEl.textContent = 'Gerando áudio...';
385
+ statusEl.className = 'status speaking';
386
+
387
+ // Record start time for round-trip measurement
388
+ requestStartTime = performance.now();
389
+
390
+ // Enviar comando para falar
391
+ ws.send(JSON.stringify({
392
+ action: 'speak',
393
+ text: text,
394
+ voice: voice,
395
+ text_length: text.length
396
+ }));
397
+ }
398
+
399
+ // Statistics functions
400
+ function formatTime(seconds) {
401
+ if (seconds === undefined || seconds === null) return '--';
402
+ return seconds.toFixed(2) + 's';
403
+ }
404
+
405
+ function getSpeedClass(seconds, thresholds) {
406
+ if (seconds === undefined || seconds === null) return '';
407
+ if (seconds <= thresholds.fast) return 'fast';
408
+ if (seconds <= thresholds.medium) return 'medium';
409
+ return 'slow';
410
+ }
411
+
412
+ function updateStats(newStats) {
413
+ const now = new Date();
414
+ const entry = {
415
+ timestamp: now,
416
+ ...newStats
417
+ };
418
+
419
+ // Update last stats display
420
+ if (newStats.roundTrip !== undefined) {
421
+ const el = document.getElementById('last-roundtrip');
422
+ el.textContent = formatTime(newStats.roundTrip);
423
+ el.className = 'stat-value ' + getSpeedClass(newStats.roundTrip, {fast: 3, medium: 6});
424
+ }
425
+ if (newStats.tts_time !== undefined) {
426
+ const el = document.getElementById('last-tts');
427
+ el.textContent = formatTime(newStats.tts_time);
428
+ el.className = 'stat-value ' + getSpeedClass(newStats.tts_time, {fast: 2, medium: 4});
429
+ }
430
+ if (newStats.wav2lip_time !== undefined) {
431
+ const el = document.getElementById('last-wav2lip');
432
+ el.textContent = formatTime(newStats.wav2lip_time);
433
+ el.className = 'stat-value ' + getSpeedClass(newStats.wav2lip_time, {fast: 1, medium: 2});
434
+ }
435
+ if (newStats.first_frame_time !== undefined) {
436
+ const el = document.getElementById('last-firstframe');
437
+ el.textContent = formatTime(newStats.first_frame_time);
438
+ el.className = 'stat-value ' + getSpeedClass(newStats.first_frame_time, {fast: 3, medium: 5});
439
+ }
440
+ if (newStats.audio_duration !== undefined) {
441
+ document.getElementById('last-audioduration').textContent = formatTime(newStats.audio_duration);
442
+ }
443
+ if (newStats.text_length !== undefined) {
444
+ document.getElementById('last-textlen').textContent = newStats.text_length + ' chars';
445
+ }
446
+
447
+ // Only add to history if we have timing data
448
+ if (newStats.tts_time !== undefined || newStats.roundTrip !== undefined) {
449
+ statsHistory.unshift(entry);
450
+ if (statsHistory.length > MAX_HISTORY) {
451
+ statsHistory.pop();
452
+ }
453
+ updateAverages();
454
+ updateHistory();
455
+ }
456
+ }
457
+
458
+ function updateAverages() {
459
+ const validRoundTrips = statsHistory.filter(s => s.roundTrip !== undefined).map(s => s.roundTrip);
460
+ const validTts = statsHistory.filter(s => s.tts_time !== undefined).map(s => s.tts_time);
461
+ const validWav2lip = statsHistory.filter(s => s.wav2lip_time !== undefined).map(s => s.wav2lip_time);
462
+
463
+ if (validRoundTrips.length > 0) {
464
+ const avg = validRoundTrips.reduce((a, b) => a + b, 0) / validRoundTrips.length;
465
+ const el = document.getElementById('avg-roundtrip');
466
+ el.textContent = formatTime(avg);
467
+ el.className = 'stat-value ' + getSpeedClass(avg, {fast: 3, medium: 6});
468
+ }
469
+ if (validTts.length > 0) {
470
+ const avg = validTts.reduce((a, b) => a + b, 0) / validTts.length;
471
+ const el = document.getElementById('avg-tts');
472
+ el.textContent = formatTime(avg);
473
+ el.className = 'stat-value ' + getSpeedClass(avg, {fast: 2, medium: 4});
474
+ }
475
+ if (validWav2lip.length > 0) {
476
+ const avg = validWav2lip.reduce((a, b) => a + b, 0) / validWav2lip.length;
477
+ const el = document.getElementById('avg-wav2lip');
478
+ el.textContent = formatTime(avg);
479
+ el.className = 'stat-value ' + getSpeedClass(avg, {fast: 1, medium: 2});
480
+ }
481
+ }
482
+
483
+ function updateHistory() {
484
+ const container = document.getElementById('history-container');
485
+ container.innerHTML = statsHistory.map((entry, idx) => {
486
+ const time = entry.timestamp.toLocaleTimeString();
487
+ return `
488
+ <div class="history-item">
489
+ <div class="timestamp">#${idx + 1} - ${time}</div>
490
+ <div class="metrics">
491
+ ${entry.roundTrip !== undefined ? `<span>Round-trip: ${formatTime(entry.roundTrip)}</span>` : ''}
492
+ ${entry.tts_time !== undefined ? `<span>TTS: ${formatTime(entry.tts_time)}</span>` : ''}
493
+ ${entry.wav2lip_time !== undefined ? `<span>Wav2Lip: ${formatTime(entry.wav2lip_time)}</span>` : ''}
494
+ ${entry.first_frame_time !== undefined ? `<span>1st Frame: ${formatTime(entry.first_frame_time)}</span>` : ''}
495
+ </div>
496
+ </div>
497
+ `;
498
+ }).join('');
499
+ }
500
+
501
+ // Iniciar conexão
502
+ connectWebSocket();
503
+
504
+ // Set MJPEG source based on current host (uses port 8085 for Wav2Lip)
505
+ const avatarImg = document.getElementById('avatar-video');
506
+ avatarImg.src = 'http://' + window.location.hostname + ':8085/mjpeg';
507
+ </script>
508
+ </body>
509
+ </html>
510
+ """
511
+
512
+ async def index(request):
513
+ """Serve a página principal"""
514
+ return web.Response(text=HTML_TEMPLATE, content_type='text/html')
515
+
516
+ async def proxy_mjpeg(request):
517
+ """Proxy para o stream MJPEG do Wav2Lip"""
518
+ try:
519
+ async with ClientSession() as session:
520
+ async with session.get(f"{WAV2LIP_URL}/mjpeg") as resp:
521
+ if resp.status == 200:
522
+ response = web.StreamResponse()
523
+ response.content_type = resp.content_type
524
+ await response.prepare(request)
525
+
526
+ async for chunk in resp.content.iter_any():
527
+ await response.write(chunk)
528
+
529
+ return response
530
+ except Exception as e:
531
+ print(f"Erro ao obter vídeo: {e}")
532
+
533
+ return web.Response(status=503, text="Video not available")
534
+
535
+ async def websocket_handler(request):
536
+ """WebSocket handler que conecta ao Wav2Lip e TTS"""
537
+ ws_response = web.WebSocketResponse()
538
+ await ws_response.prepare(request)
539
+
540
+ # Conectar ao WebSocket do Wav2Lip
541
+ wav2lip_ws = None
542
+ try:
543
+ async with ClientSession() as session:
544
+ async with session.ws_connect(f"{WAV2LIP_URL}/ws") as wav2lip_ws:
545
+
546
+ async def forward_from_wav2lip():
547
+ """Encaminha mensagens do Wav2Lip para o cliente"""
548
+ try:
549
+ async for msg in wav2lip_ws:
550
+ if msg.type == aiohttp.WSMsgType.TEXT:
551
+ await ws_response.send_str(msg.data)
552
+ elif msg.type == aiohttp.WSMsgType.BINARY:
553
+ await ws_response.send_bytes(msg.data)
554
+ elif msg.type == aiohttp.WSMsgType.ERROR:
555
+ break
556
+ except Exception as e:
557
+ print(f"Erro ao encaminhar de Wav2Lip: {e}")
558
+
559
+ async def forward_from_client():
560
+ """Encaminha mensagens do cliente para o Wav2Lip"""
561
+ try:
562
+ async for msg in ws_response:
563
+ if msg.type == aiohttp.WSMsgType.TEXT:
564
+ data = json.loads(msg.data)
565
+
566
+ if data.get('action') == 'speak':
567
+ # Envia para o Wav2Lip que já integra com TTS
568
+ await wav2lip_ws.send_str(json.dumps({
569
+ 'action': 'speak',
570
+ 'text': data['text'],
571
+ 'voice': data.get('voice', 'tara')
572
+ }))
573
+ else:
574
+ await wav2lip_ws.send_str(msg.data)
575
+
576
+ elif msg.type == aiohttp.WSMsgType.ERROR:
577
+ break
578
+ except Exception as e:
579
+ print(f"Erro ao encaminhar do cliente: {e}")
580
+
581
+ # Executar ambos em paralelo
582
+ await asyncio.gather(
583
+ forward_from_wav2lip(),
584
+ forward_from_client()
585
+ )
586
+
587
+ except Exception as e:
588
+ print(f"Erro WebSocket: {e}")
589
+ await ws_response.send_json({"status": "error", "message": str(e)})
590
+
591
+ return ws_response
592
+
593
+ async def health(request):
594
+ """Health check endpoint"""
595
+ status = {
596
+ "status": "ok",
597
+ "services": {}
598
+ }
599
+
600
+ async with ClientSession() as session:
601
+ # Check TTS
602
+ try:
603
+ async with session.get(f"{TTS_URL}/") as resp:
604
+ status["services"]["tts"] = resp.status == 200
605
+ except:
606
+ status["services"]["tts"] = False
607
+
608
+ # Check Wav2Lip
609
+ try:
610
+ async with session.get(f"{WAV2LIP_URL}/") as resp:
611
+ status["services"]["wav2lip"] = resp.status == 200
612
+ except:
613
+ status["services"]["wav2lip"] = False
614
+
615
+ return web.json_response(status)
616
+
617
+ def create_app():
618
+ app = web.Application()
619
+ app.router.add_get('/', index)
620
+ app.router.add_get('/mjpeg', proxy_mjpeg)
621
+ app.router.add_get('/ws/avatar', websocket_handler)
622
+ app.router.add_get('/health', health)
623
+ return app
624
+
625
+ if __name__ == '__main__':
626
+ print(f"=== Servidor Integrador ===")
627
+ print(f"Porta: {PORT}")
628
+ print(f"TTS: {TTS_URL}")
629
+ print(f"Wav2Lip: {WAV2LIP_URL}")
630
+ print(f"Acesse: http://localhost:{PORT}")
631
+ print("=" * 30)
632
+
633
+ app = create_app()
634
+ web.run_app(app, host='0.0.0.0', port=PORT)
startup.sh ADDED
@@ -0,0 +1,116 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/bin/bash
2
+ # ===========================================
3
+ # Startup Script - Sistema Speech-to-Speech
4
+ # ===========================================
5
+ #
6
+ # Arquitetura:
7
+ # marcosaudio (8880) -> Orpheus TTS
8
+ # marcosavatar (8085) -> Wav2Lip Lipsync
9
+ # marcosintegrador (8080) -> Interface Web
10
+ #
11
+ # Cada serviço tem sua responsabilidade isolada
12
+ # ===========================================
13
+
14
+ set -e
15
+
16
+ echo "==========================================="
17
+ echo " Sistema Speech-to-Speech - Startup"
18
+ echo "==========================================="
19
+
20
+ # Cores para output
21
+ RED='\033[0;31m'
22
+ GREEN='\033[0;32m'
23
+ YELLOW='\033[1;33m'
24
+ NC='\033[0m' # No Color
25
+
26
+ # Função para verificar se um serviço está rodando
27
+ check_service() {
28
+ local port=$1
29
+ local name=$2
30
+ if curl -s http://localhost:$port/ > /dev/null 2>&1; then
31
+ echo -e "${GREEN}[OK]${NC} $name (porta $port)"
32
+ return 0
33
+ else
34
+ echo -e "${RED}[ERRO]${NC} $name (porta $port)"
35
+ return 1
36
+ fi
37
+ }
38
+
39
+ # Função para esperar serviço subir
40
+ wait_for_service() {
41
+ local port=$1
42
+ local name=$2
43
+ local max_attempts=30
44
+ local attempt=0
45
+
46
+ echo -n "Aguardando $name..."
47
+ while [ $attempt -lt $max_attempts ]; do
48
+ if curl -s http://localhost:$port/ > /dev/null 2>&1; then
49
+ echo -e " ${GREEN}OK${NC}"
50
+ return 0
51
+ fi
52
+ sleep 1
53
+ attempt=$((attempt + 1))
54
+ echo -n "."
55
+ done
56
+ echo -e " ${RED}TIMEOUT${NC}"
57
+ return 1
58
+ }
59
+
60
+ # 1. Verificar se Orpheus TTS está rodando
61
+ echo ""
62
+ echo "1. Verificando Orpheus TTS (porta 8880)..."
63
+ if ! check_service 8880 "Orpheus TTS"; then
64
+ echo " Iniciando Orpheus TTS..."
65
+ cd /home/marcosaudio/orpheus-standalone
66
+ source .venv/bin/activate
67
+ nohup python3 fastapi_app.py > server.log 2>&1 &
68
+ wait_for_service 8880 "Orpheus TTS"
69
+
70
+ # Warmup do modelo
71
+ echo " Fazendo warmup do modelo..."
72
+ for voice in tara leo mia dan; do
73
+ curl -s -X POST http://localhost:8880/v1/audio/speech \
74
+ -H "Content-Type: application/json" \
75
+ -d "{\"input\": \"Teste de aquecimento.\", \"voice\": \"$voice\"}" \
76
+ -o /dev/null &
77
+ done
78
+ wait
79
+ echo -e " ${GREEN}Warmup concluído${NC}"
80
+ fi
81
+
82
+ # 2. Verificar se Wav2Lip está rodando
83
+ echo ""
84
+ echo "2. Verificando Wav2Lip (porta 8085)..."
85
+ if ! check_service 8085 "Wav2Lip"; then
86
+ echo " Iniciando Wav2Lip..."
87
+ cd /home/marcosavatar/realtimeWav2lip
88
+ nohup python3 websocket_server.py > websocket.log 2>&1 &
89
+ wait_for_service 8085 "Wav2Lip"
90
+ fi
91
+
92
+ # 3. Iniciar servidor integrador
93
+ echo ""
94
+ echo "3. Iniciando Servidor Integrador (porta 8080)..."
95
+ cd /home/marcosintegrador/interface
96
+
97
+ # Matar processo anterior se existir
98
+ fuser -k 8080/tcp 2>/dev/null || true
99
+ sleep 1
100
+
101
+ nohup python3 server.py > server.log 2>&1 &
102
+ wait_for_service 8080 "Servidor Integrador"
103
+
104
+ # Status final
105
+ echo ""
106
+ echo "==========================================="
107
+ echo " Status dos Serviços"
108
+ echo "==========================================="
109
+ check_service 8880 "Orpheus TTS"
110
+ check_service 8085 "Wav2Lip"
111
+ check_service 8080 "Interface Web"
112
+ echo ""
113
+ echo "==========================================="
114
+ echo -e " ${GREEN}Sistema iniciado com sucesso!${NC}"
115
+ echo " Acesse: http://localhost:8080"
116
+ echo "==========================================="