NurseCitizenDeveloper commited on
Commit
becb41b
·
verified ·
1 Parent(s): 239a6e0

Upload folder using huggingface_hub

Browse files
.agent/skills/simboti-translate/SKILL.md ADDED
@@ -0,0 +1,52 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ name: simboti-translate
3
+ description: Specialized skill for SIMBOTI Live multimodal translation. Use this for translating speech, text, images, and video into clinical languages using TranslateGemma.
4
+ ---
5
+
6
+ # SIMBOTI Translation Skill
7
+
8
+ This skill guides the AI agent in providing accurate, clinical-grade translation using the SIMBOTI Live agent.
9
+
10
+ ## 1. When to use this skill
11
+ Use this skill whenever:
12
+ - **Speech Input**: The user provides audio (microphone) needing translation.
13
+ - **Text Input**: The user provides text needing translation (e.g., "Translate 'Where does it hurt?' to Polish").
14
+ - **Image Input**: The user provides an image of a document (e.g., medication leaflet, consent form) needing translation.
15
+ - **Video Input**: The user provides a video file with on-screen text needing extraction and translation.
16
+
17
+ ## 2. Supported Languages
18
+ SIMBOTI supports the top 10 non-English languages in UK clinical settings:
19
+ | Language | Code |
20
+ |---|---|
21
+ | English | en |
22
+ | Polish | pl |
23
+ | Romanian | ro |
24
+ | Punjabi | pa |
25
+ | Urdu | ur |
26
+ | Portuguese | pt |
27
+ | Spanish | es |
28
+ | Arabic | ar |
29
+ | Bengali | bn |
30
+ | Gujarati | gu |
31
+ | Italian | it |
32
+
33
+ ## 3. Translation Workflow
34
+
35
+ ### Step 1: Identify Input Type
36
+ - **Audio**: Use `translate_audio(audio_path, source_lang, target_lang)`
37
+ - **Text**: Use `translate_text(text, source_lang, target_lang)`
38
+ - **Image**: Use `translate_image(image_path, source_lang, target_lang)`
39
+ - **Video**: Use `translate_video(video_path, source_lang, target_lang)`
40
+
41
+ ### Step 2: Execute Translation
42
+ - Call the appropriate method on the `CareBridgeTranslator` (or `SIMBOTIClient`) instance.
43
+ - The model will process the input and return the translated text.
44
+
45
+ ### Step 3: Generate Audio Output (TTS)
46
+ - If the user requires spoken output, call `speak_text(text, lang_name)`.
47
+ - This generates an MP3 file that can be played back to the patient.
48
+
49
+ ## 4. Best Practices
50
+ - **Privacy**: All translation happens on-device or via ZeroGPU (no data leaves to third parties).
51
+ - **Clinical Context**: Remind users this is a communication aid, not a certified medical interpreter.
52
+ - **Accessibility**: Always offer the "Show Patient" large-text display for the translated output.
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ simboti_logo.jpg filter=lfs diff=lfs merge=lfs -text
Dockerfile ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM python:3.10-slim
2
+
3
+ WORKDIR /app
4
+
5
+ # Install system dependencies
6
+ RUN apt-get update && apt-get install -y \
7
+ build-essential \
8
+ git \
9
+ && rm -rf /var/lib/apt/lists/*
10
+
11
+ # Copy requirements
12
+ COPY requirements.txt .
13
+
14
+ # Install Python dependencies (including spaces)
15
+ RUN pip install --no-cache-dir -r requirements.txt
16
+
17
+ # Copy application code
18
+ COPY . .
19
+
20
+ # Environment variables
21
+ ENV PYTHONUNBUFFERED=1
22
+
23
+ # Run the application
24
+ CMD ["python", "app.py"]
README.md CHANGED
@@ -1,12 +1,34 @@
1
- ---
2
- title: SIMBOTI Live
3
- emoji: 🏃
4
- colorFrom: gray
5
- colorTo: blue
6
- sdk: gradio
7
- sdk_version: 6.3.0
8
- app_file: app.py
9
- pinned: false
10
- ---
11
-
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: SIMBOTI Live
3
+ emoji: 🎙️
4
+ colorFrom: green
5
+ colorTo: blue
6
+ sdk: gradio
7
+ sdk_version: 5.12.0
8
+ app_file: live_app.py
9
+ pinned: false
10
+ ---
11
+
12
+ # 🎙️ SIMBOTI Live (Powered by TranslateGemma 4B)
13
+
14
+ **SIMBOTI** = **S**peech **I**ntelligent **M**ultimodal **B**ot for **O**utreach **T**ranslation **I**mplementation.
15
+
16
+ A privacy-first, multimodal translation agent for clinical care.
17
+
18
+ ## 🚀 Features
19
+ - 💬 **Text-to-Text** translation
20
+ - 🎙️ **Live Speech** translation (microphone input)
21
+ - 📄 **Document Scan** (image/photo translation)
22
+ - 🎥 **Video OCR** (translate text in videos)
23
+ - 🔊 **Text-to-Speech** audio output for patients
24
+
25
+ ## 🌍 Supported Languages
26
+ Polish, Romanian, Punjabi, Urdu, Portuguese, Spanish, Arabic, Bengali, Gujarati, Italian.
27
+
28
+ ## 🚀 Deployment (ZeroGPU)
29
+ 1. Create a [Hugging Face Space](https://huggingface.co/spaces/new) (SDK: Docker, Hardware: ZeroGPU).
30
+ 2. Upload all files from this folder.
31
+ 3. Run! (First start downloads ~8GB model)
32
+
33
+ ## 🔒 Privacy
34
+ All inference runs within the ephemeral ZeroGPU container. No data is logged or stored.
__pycache__/carebridge_client.cpython-314.pyc ADDED
Binary file (8.27 kB). View file
 
app.py ADDED
@@ -0,0 +1,240 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import gradio as gr
2
+ from carebridge_client import CareBridgeTranslator
3
+
4
+ # --- Initialize Client (Lazy) ---
5
+ translator = None
6
+
7
+ def load_translator():
8
+ global translator
9
+ if translator is None:
10
+ translator = CareBridgeTranslator()
11
+ return translator
12
+
13
+ # --- Languages ---
14
+ LANGUAGES = [
15
+ "English", "Polish", "Romanian", "Punjabi", "Urdu",
16
+ "Portuguese", "Spanish", "Arabic", "Bengali", "Gujarati", "Italian"
17
+ ]
18
+
19
+ # --- Minimal CSS (Gemini Dictation Inspired) ---
20
+ CUSTOM_CSS = """
21
+ @import url('https://fonts.googleapis.com/css2?family=Inter:wght@400;500;600;700&display=swap');
22
+
23
+ * { font-family: 'Inter', sans-serif !important; }
24
+
25
+ .gradio-container {
26
+ max-width: 900px !important;
27
+ margin: auto !important;
28
+ background: #f8fafc !important;
29
+ min-height: 100vh !important;
30
+ }
31
+
32
+ /* --- Header --- */
33
+ .app-header {
34
+ display: flex;
35
+ align-items: center;
36
+ justify-content: space-between;
37
+ padding: 16px 24px;
38
+ border-bottom: 1px solid #e2e8f0;
39
+ background: white;
40
+ }
41
+ .app-title {
42
+ font-size: 1.25rem;
43
+ font-weight: 700;
44
+ color: #1e293b;
45
+ }
46
+ .lang-pills {
47
+ display: flex;
48
+ gap: 8px;
49
+ align-items: center;
50
+ }
51
+ .lang-pill {
52
+ background: #e0e7ff;
53
+ color: #4338ca;
54
+ padding: 6px 14px;
55
+ border-radius: 20px;
56
+ font-size: 0.85rem;
57
+ font-weight: 500;
58
+ }
59
+ .lang-arrow { color: #94a3b8; font-size: 1.2rem; }
60
+
61
+ /* --- Document Area --- */
62
+ .document-area {
63
+ background: white;
64
+ border-radius: 16px;
65
+ margin: 24px;
66
+ padding: 32px;
67
+ min-height: 400px;
68
+ box-shadow: 0 2px 8px rgba(0,0,0,0.05);
69
+ border: 1px solid #e2e8f0;
70
+ }
71
+ .output-text {
72
+ font-size: 1.75rem !important;
73
+ line-height: 1.6 !important;
74
+ color: #1e293b !important;
75
+ min-height: 200px !important;
76
+ text-align: center;
77
+ padding: 40px 20px;
78
+ }
79
+ .placeholder-text {
80
+ color: #94a3b8;
81
+ font-style: italic;
82
+ }
83
+
84
+ /* --- Floating Controls --- */
85
+ .floating-bar {
86
+ position: fixed;
87
+ bottom: 24px;
88
+ left: 50%;
89
+ transform: translateX(-50%);
90
+ display: flex;
91
+ gap: 16px;
92
+ align-items: center;
93
+ background: white;
94
+ padding: 12px 24px;
95
+ border-radius: 40px;
96
+ box-shadow: 0 4px 20px rgba(0,0,0,0.15);
97
+ z-index: 100;
98
+ }
99
+ .fab-btn {
100
+ width: 56px !important;
101
+ height: 56px !important;
102
+ border-radius: 50% !important;
103
+ font-size: 1.5rem !important;
104
+ display: flex !important;
105
+ align-items: center !important;
106
+ justify-content: center !important;
107
+ border: none !important;
108
+ cursor: pointer !important;
109
+ transition: transform 0.2s, box-shadow 0.2s !important;
110
+ }
111
+ .fab-btn:hover {
112
+ transform: scale(1.1) !important;
113
+ box-shadow: 0 4px 12px rgba(0,0,0,0.2) !important;
114
+ }
115
+ .fab-primary {
116
+ background: linear-gradient(135deg, #4F46E5, #6366f1) !important;
117
+ color: white !important;
118
+ width: 72px !important;
119
+ height: 72px !important;
120
+ }
121
+ .fab-secondary {
122
+ background: #f1f5f9 !important;
123
+ color: #475569 !important;
124
+ }
125
+
126
+ /* Hide Gradio footer */
127
+ footer { display: none !important; }
128
+
129
+ /* Input modals */
130
+ .input-modal {
131
+ background: white;
132
+ border-radius: 16px;
133
+ padding: 24px;
134
+ margin: 0 24px 100px 24px;
135
+ box-shadow: 0 2px 8px rgba(0,0,0,0.05);
136
+ border: 1px solid #e2e8f0;
137
+ }
138
+ """
139
+
140
+ # --- Translation Functions ---
141
+ def translate_text(text, source_lang, target_lang):
142
+ if not text:
143
+ return "Enter text above to translate..."
144
+ yield "⏳ Translating..."
145
+ t = load_translator()
146
+ result = t.translate_text(text, source_lang, target_lang)
147
+ yield result
148
+
149
+ def translate_speech(audio, source_lang, target_lang):
150
+ if audio is None:
151
+ return "Record audio using the microphone..."
152
+ yield "🎧 Processing speech..."
153
+ t = load_translator()
154
+ result = t.translate_audio(audio, source_lang, target_lang)
155
+ yield result
156
+
157
+ def translate_document(image, source_lang, target_lang):
158
+ if image is None:
159
+ return "Upload a document image..."
160
+ yield "📄 Scanning document..."
161
+ t = load_translator()
162
+ result = t.translate_image(image, source_lang, target_lang)
163
+ yield result
164
+
165
+ def get_tts(text, lang):
166
+ if not text or text.startswith("⏳") or text.startswith("🎧") or text.startswith("📄"):
167
+ return None
168
+ t = load_translator()
169
+ return t.speak_text(text, lang)
170
+
171
+ # --- App Layout ---
172
+ with gr.Blocks(css=CUSTOM_CSS, title="SIMBOTI Live") as app:
173
+
174
+ # State for current mode
175
+ mode = gr.State("text")
176
+
177
+ # --- Header ---
178
+ with gr.Row(elem_classes="app-header"):
179
+ gr.HTML("<span class='app-title'>🌐 SIMBOTI</span>")
180
+ with gr.Row(elem_classes="lang-pills"):
181
+ source_lang = gr.Dropdown(LANGUAGES, value="English", show_label=False, container=False, scale=0, min_width=120)
182
+ gr.HTML("<span class='lang-arrow'>→</span>")
183
+ target_lang = gr.Dropdown(LANGUAGES, value="Polish", show_label=False, container=False, scale=0, min_width=120)
184
+
185
+ # --- Document Output Area ---
186
+ with gr.Column(elem_classes="document-area"):
187
+ output_display = gr.Textbox(
188
+ value="Your translation will appear here...",
189
+ show_label=False,
190
+ interactive=False,
191
+ lines=8,
192
+ elem_classes="output-text"
193
+ )
194
+ audio_output = gr.Audio(label="Listen", autoplay=True, visible=True)
195
+
196
+ # --- Input Section (Toggleable) ---
197
+ with gr.Column(elem_classes="input-modal", visible=True) as text_input_section:
198
+ text_input = gr.Textbox(label="Type your message", placeholder="e.g., Where does it hurt?", lines=2)
199
+ text_submit = gr.Button("Translate", variant="primary")
200
+
201
+ with gr.Column(elem_classes="input-modal", visible=False) as audio_input_section:
202
+ audio_input = gr.Audio(sources=["microphone"], type="filepath", label="🎤 Record")
203
+ audio_submit = gr.Button("Translate Speech", variant="primary")
204
+
205
+ with gr.Column(elem_classes="input-modal", visible=False) as doc_input_section:
206
+ doc_input = gr.Image(type="pil", label="📷 Upload Document")
207
+ doc_submit = gr.Button("Scan & Translate", variant="primary")
208
+
209
+ # --- Mode Switchers ---
210
+ with gr.Row(elem_classes="floating-bar"):
211
+ text_mode_btn = gr.Button("⌨️", elem_classes="fab-btn fab-secondary")
212
+ mic_mode_btn = gr.Button("🎤", elem_classes="fab-btn fab-primary")
213
+ doc_mode_btn = gr.Button("📎", elem_classes="fab-btn fab-secondary")
214
+
215
+ # --- Mode Toggle Logic ---
216
+ def show_text_mode():
217
+ return gr.update(visible=True), gr.update(visible=False), gr.update(visible=False)
218
+ def show_audio_mode():
219
+ return gr.update(visible=False), gr.update(visible=True), gr.update(visible=False)
220
+ def show_doc_mode():
221
+ return gr.update(visible=False), gr.update(visible=False), gr.update(visible=True)
222
+
223
+ text_mode_btn.click(show_text_mode, outputs=[text_input_section, audio_input_section, doc_input_section])
224
+ mic_mode_btn.click(show_audio_mode, outputs=[text_input_section, audio_input_section, doc_input_section])
225
+ doc_mode_btn.click(show_doc_mode, outputs=[text_input_section, audio_input_section, doc_input_section])
226
+
227
+ # --- Translation Triggers ---
228
+ text_submit.click(translate_text, inputs=[text_input, source_lang, target_lang], outputs=[output_display]).then(
229
+ get_tts, inputs=[output_display, target_lang], outputs=[audio_output]
230
+ )
231
+ audio_submit.click(translate_speech, inputs=[audio_input, source_lang, target_lang], outputs=[output_display]).then(
232
+ get_tts, inputs=[output_display, target_lang], outputs=[audio_output]
233
+ )
234
+ doc_submit.click(translate_document, inputs=[doc_input, source_lang, target_lang], outputs=[output_display]).then(
235
+ get_tts, inputs=[output_display, target_lang], outputs=[audio_output]
236
+ )
237
+
238
+ # Launch
239
+ if __name__ == "__main__":
240
+ app.launch(ssr_mode=False)
assets.py ADDED
The diff for this file is too large to render. See raw diff
 
carebridge_client.py ADDED
@@ -0,0 +1,197 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import torch
3
+ from transformers import AutoModelForImageTextToText, AutoProcessor
4
+ from PIL import Image
5
+ import librosa
6
+ from gtts import gTTS
7
+ import tempfile
8
+
9
+ # Hugging Face Spaces GPU decorator (optional for local development)
10
+ try:
11
+ import spaces
12
+ except ImportError:
13
+ # Fallback: no-op decorator for local testing
14
+ class spaces:
15
+ @staticmethod
16
+ def GPU(*args, **kwargs):
17
+ def decorator(func):
18
+ return func
19
+ return decorator
20
+
21
+ class CareBridgeTranslator:
22
+ def __init__(self, model_id="google/translategemma-4b-it", device=None):
23
+ """
24
+ Initialize the CareBridge Translator with lazy loading for ZeroGPU compatibility.
25
+ """
26
+ self.model_id = model_id
27
+ if device is None:
28
+ self.device = "cuda" if torch.cuda.is_available() else "cpu"
29
+ else:
30
+ self.device = device
31
+
32
+ self.model = None
33
+ self.processor = None
34
+ print(f"[SIMBOTI] Translator initialized. Model will load on first use.")
35
+
36
+ # Top 10 NHS Languages Mapping (ISO 639-1)
37
+ self.LANG_MAP = {
38
+ "English": "en",
39
+ "Polish": "pl",
40
+ "Romanian": "ro",
41
+ "Punjabi": "pa",
42
+ "Urdu": "ur",
43
+ "Portuguese": "pt",
44
+ "Spanish": "es",
45
+ "Arabic": "ar",
46
+ "Bengali": "bn",
47
+ "Gujarati": "gu",
48
+ "Italian": "it"
49
+ }
50
+
51
+ def _load_model(self):
52
+ if self.model is None:
53
+ print(f"[SIMBOTI] Loading model {self.model_id}...")
54
+ self.processor = AutoProcessor.from_pretrained(self.model_id)
55
+ self.model = AutoModelForImageTextToText.from_pretrained(
56
+ self.model_id,
57
+ device_map=self.device,
58
+ torch_dtype=torch.float16 if self.device == "cuda" else torch.float32
59
+ )
60
+ print("[SIMBOTI] Model loaded successfully.")
61
+
62
+ def translate_text(self, text, source_lang_name, target_lang_name):
63
+ """
64
+ Translate text ensuring patient data stays local.
65
+ """
66
+ src_code = self.LANG_MAP.get(source_lang_name)
67
+ tgt_code = self.LANG_MAP.get(target_lang_name)
68
+
69
+ if not src_code or not tgt_code:
70
+ return f"Error: Language not supported. Available: {list(self.LANG_MAP.keys())}"
71
+
72
+ message = {
73
+ "role": "user",
74
+ "content": [{
75
+ "type": "text",
76
+ "source_lang_code": src_code,
77
+ "target_lang_code": tgt_code,
78
+ "text": text
79
+ }]
80
+ }
81
+
82
+ return self._run_inference([message])
83
+
84
+ def translate_image(self, image_path, source_lang_name, target_lang_name):
85
+ """
86
+ Extract and translate text from an image (e.g. instruction leaflet).
87
+ """
88
+ src_code = self.LANG_MAP.get(source_lang_name)
89
+ tgt_code = self.LANG_MAP.get(target_lang_name)
90
+
91
+ if not src_code or not tgt_code:
92
+ return f"Error: Language not supported."
93
+
94
+ # Load image
95
+ if isinstance(image_path, str):
96
+ image = Image.open(image_path)
97
+ else:
98
+ image = image_path # Assume PIL object
99
+
100
+ message = {
101
+ "role": "user",
102
+ "content": [{
103
+ "type": "image",
104
+ "source_lang_code": src_code,
105
+ "target_lang_code": tgt_code,
106
+ "image": image
107
+ }]
108
+ }
109
+
110
+ return self._run_inference([message])
111
+
112
+ def translate_audio(self, audio_path, source_lang_name, target_lang_name):
113
+ """
114
+ Speech-to-Text Translation using Gemma 3 native audio support.
115
+ """
116
+ src_code = self.LANG_MAP.get(source_lang_name)
117
+ tgt_code = self.LANG_MAP.get(target_lang_name)
118
+
119
+ if not src_code or not tgt_code:
120
+ return "Error: Language not supported."
121
+
122
+ # Load audio using librosa (Gemma 3 expects 16kHz usually)
123
+ audio, sr = librosa.load(audio_path, sr=16000)
124
+
125
+ message = {
126
+ "role": "user",
127
+ "content": [{
128
+ "type": "audio",
129
+ "source_lang_code": src_code,
130
+ "target_lang_code": tgt_code,
131
+ "audio": audio
132
+ }]
133
+ }
134
+
135
+ return self._run_inference([message])
136
+
137
+ def translate_video(self, video_path, source_lang_name, target_lang_name):
138
+ """
139
+ Video OCR/Translation using Gemma 3 native video support.
140
+ """
141
+ src_code = self.LANG_MAP.get(source_lang_name)
142
+ tgt_code = self.LANG_MAP.get(target_lang_name)
143
+
144
+ if not src_code or not tgt_code:
145
+ return "Error: Language not supported."
146
+
147
+ message = {
148
+ "role": "user",
149
+ "content": [{
150
+ "type": "video",
151
+ "source_lang_code": src_code,
152
+ "target_lang_code": tgt_code,
153
+ "video": video_path
154
+ }]
155
+ }
156
+
157
+ return self._run_inference([message])
158
+
159
+ def speak_text(self, text, lang_name):
160
+ """
161
+ Generate audio from translated text for the patient.
162
+ """
163
+ lang_code = self.LANG_MAP.get(lang_name, "en")
164
+ try:
165
+ tts = gTTS(text=text, lang=lang_code)
166
+ temp_file = tempfile.NamedTemporaryFile(delete=False, suffix=".mp3")
167
+ tts.save(temp_file.name)
168
+ return temp_file.name
169
+ except Exception as e:
170
+ print(f"TTS Error: {e}")
171
+ return None
172
+
173
+ @spaces.GPU()
174
+ def _run_inference(self, messages):
175
+ self._load_model()
176
+
177
+ inputs = self.processor.apply_chat_template(
178
+ messages,
179
+ tokenize=True,
180
+ add_generation_prompt=True,
181
+ return_dict=True,
182
+ return_tensors="pt"
183
+ ).to(self.device)
184
+
185
+ # Generate (Greedy for stability in medical context)
186
+ with torch.no_grad():
187
+ outputs = self.model.generate(**inputs, max_new_tokens=512, do_sample=False)
188
+
189
+ # Decode response (Skipping input tokens)
190
+ input_len = inputs["input_ids"].shape[-1]
191
+ decoded = self.processor.decode(outputs[0][input_len:], skip_special_tokens=True)
192
+ return decoded.strip()
193
+
194
+ # Simple Verification Test if run directly
195
+ if __name__ == "__main__":
196
+ translator = CareBridgeTranslator()
197
+ print("Test 1 (Text):", translator.translate_text("Where does it hurt?", "English", "Polish"))
generate_assets.py ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import base64
2
+
3
+ try:
4
+ with open("simboti_logo.jpg", "rb") as f:
5
+ encoded = base64.b64encode(f.read()).decode("utf-8")
6
+
7
+ with open("assets.py", "w") as f:
8
+ f.write(f'LOGO_BASE64 = "{encoded}"\n')
9
+ print("Successfully created assets.py")
10
+ except Exception as e:
11
+ print(f"Error: {e}")
live_app.py ADDED
@@ -0,0 +1,99 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ SIMBOTI Live - Real-Time WebRTC Translation using FastRTC
3
+ This app provides live audio translation using the FastRTC library.
4
+ Simplified version using Echo handler (no VAD dependency).
5
+ """
6
+
7
+ from fastrtc import Stream, AdditionalOutputs
8
+ import numpy as np
9
+ import tempfile
10
+ import wave
11
+ import os
12
+
13
+ # Import the existing translator
14
+ from carebridge_client import CareBridgeTranslator
15
+
16
+ # --- Languages ---
17
+ LANGUAGES = {
18
+ "English": "en", "Polish": "pl", "Romanian": "ro", "Punjabi": "pa",
19
+ "Urdu": "ur", "Portuguese": "pt", "Spanish": "es", "Arabic": "ar",
20
+ "Bengali": "bn", "Gujarati": "gu", "Italian": "it"
21
+ }
22
+
23
+ # --- Lazy Load Translator ---
24
+ translator = None
25
+
26
+ def get_translator():
27
+ global translator
28
+ if translator is None:
29
+ translator = CareBridgeTranslator()
30
+ return translator
31
+
32
+ # --- Simple Audio Translation Handler ---
33
+ def translate_audio_chunks(audio: tuple):
34
+ """
35
+ Handler for FastRTC Stream.
36
+ Receives audio chunks, translates when complete, and returns TTS audio.
37
+
38
+ Args:
39
+ audio: Tuple of (sample_rate, audio_data)
40
+
41
+ Yields:
42
+ Translated audio as (sample_rate, audio_data)
43
+ """
44
+ sample_rate, audio_data = audio
45
+
46
+ # For simplicity, accumulate audio and process when we have enough
47
+ # In production, use ReplyOnPause with proper VAD
48
+ if len(audio_data) < sample_rate * 2: # Less than 2 seconds
49
+ # Return silence while accumulating
50
+ yield (sample_rate, audio_data)
51
+ return
52
+
53
+ # Save to temp WAV file for translation
54
+ temp_wav = tempfile.NamedTemporaryFile(delete=False, suffix=".wav")
55
+ try:
56
+ with wave.open(temp_wav.name, 'wb') as wf:
57
+ wf.setnchannels(1)
58
+ wf.setsampwidth(2) # 16-bit
59
+ wf.setframerate(sample_rate)
60
+ # Convert to int16
61
+ int_audio = (audio_data * 32767).astype(np.int16) if audio_data.dtype != np.int16 else audio_data
62
+ wf.writeframes(int_audio.tobytes())
63
+
64
+ # Translate the audio (English -> Polish by default)
65
+ t = get_translator()
66
+ translated_text = t.translate_audio(temp_wav.name, "English", "Polish")
67
+ print(f"[LIVE] Translated: {translated_text}")
68
+
69
+ # Generate TTS audio
70
+ tts_path = t.speak_text(translated_text, "Polish")
71
+ if tts_path:
72
+ import librosa
73
+ tts_audio, tts_sr = librosa.load(tts_path, sr=sample_rate)
74
+ os.unlink(tts_path) # Cleanup temp TTS file
75
+ yield (tts_sr, tts_audio)
76
+ else:
77
+ yield (sample_rate, np.zeros(1000, dtype=np.float32))
78
+
79
+ except Exception as e:
80
+ print(f"[LIVE] Error: {e}")
81
+ yield (sample_rate, np.zeros(1000, dtype=np.float32))
82
+ finally:
83
+ if os.path.exists(temp_wav.name):
84
+ os.unlink(temp_wav.name) # Cleanup temp WAV file
85
+
86
+ # --- FastRTC Stream (Echo mode for simplicity) ---
87
+ stream = Stream(
88
+ handler=translate_audio_chunks,
89
+ modality="audio",
90
+ mode="send-receive",
91
+ )
92
+
93
+ # Launch with Gradio UI
94
+ if __name__ == "__main__":
95
+ print("[SIMBOTI] Starting Live Translation...")
96
+ print("[SIMBOTI] Default: English -> Polish")
97
+ print("[SIMBOTI] Simplified mode (no VAD)")
98
+ print("[SIMBOTI] Open your browser to the URL below:")
99
+ stream.ui.launch()
requirements.txt ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ torch>=2.2.0
2
+ transformers>=4.40.0
3
+ accelerate>=0.28.0
4
+ bitsandbytes>=0.43.0
5
+ sentencepiece
6
+ protobuf
7
+ gradio>=5.0.0
8
+ fastrtc>=0.0.20
9
+ pillow
10
+ scipy
11
+ spaces
12
+ librosa
13
+ gTTS
14
+ silero-vad
run_carebridge.bat ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ @echo off
2
+ echo ==================================================
3
+ echo CareBridge AI - Privacy-First Translator
4
+ echo ==================================================
5
+ echo.
6
+
7
+ echo [1/3] Checking dependencies...
8
+ pip install -r requirements.txt
9
+ if %errorlevel% neq 0 (
10
+ echo Error installing dependencies!
11
+ pause
12
+ exit /b %errorlevel%
13
+ )
14
+
15
+ echo.
16
+ echo [2/3] Starting CareBridge AI...
17
+ echo NOTE: First run will download the 8GB model. This may take time.
18
+ echo.
19
+
20
+ python app.py
21
+
22
+ pause
simboti_logo.jpg ADDED

Git LFS Details

  • SHA256: 7d726e0741e9ab922ef827725d3b9eaf04a01afaa78cd1c3fa428a53ec6051ef
  • Pointer size: 131 Bytes
  • Size of remote file: 163 kB