Spaces:
Sleeping
Sleeping
Implementacion de funcionalidades varias, entre ellas la limitacion de descarga de imagenes desde la herramienta, descarga de etiquetado en diferentes formatos, etc
Browse files- .gitattributes +6 -0
- .gitignore +0 -1
- .streamlit/config.toml +2 -0
- README.md +136 -36
- annotations.db +0 -0
- interface/components/__init__.py +0 -0
- interface/components/downloader.py +102 -0
- interface/components/gallery.py +106 -0
- interface/components/image_protection.py +177 -0
- interface/components/labeler.py +82 -0
- interface/components/recorder.py +186 -0
- interface/components/uploader.py +257 -0
- interface/config.py +38 -0
- interface/database.py +364 -79
- interface/i18n.py +182 -0
- interface/main.py +270 -168
- interface/services/__init__.py +0 -0
- interface/services/auth_service.py +99 -0
- interface/services/export_service.py +202 -0
- interface/services/session_manager.py +157 -0
- interface/services/whisper_service.py +102 -0
- interface/utils.py +22 -58
- requirements.txt +3 -1
.gitattributes
ADDED
|
@@ -0,0 +1,6 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
*.png filter=lfs diff=lfs merge=lfs -text
|
| 2 |
+
*.tif filter=lfs diff=lfs merge=lfs -text
|
| 3 |
+
*.tiff filter=lfs diff=lfs merge=lfs -text
|
| 4 |
+
*.bmp filter=lfs diff=lfs merge=lfs -text
|
| 5 |
+
*.jpg filter=lfs diff=lfs merge=lfs -text
|
| 6 |
+
*.jpeg filter=lfs diff=lfs merge=lfs -text
|
.gitignore
CHANGED
|
@@ -48,7 +48,6 @@ lightning_logs/
|
|
| 48 |
.streamlit/secrets.toml
|
| 49 |
.streamlit/.cache
|
| 50 |
.streamlit/cache
|
| 51 |
-
.streamlit/config.toml
|
| 52 |
|
| 53 |
# =====================
|
| 54 |
# Logs & Temp Files
|
|
|
|
| 48 |
.streamlit/secrets.toml
|
| 49 |
.streamlit/.cache
|
| 50 |
.streamlit/cache
|
|
|
|
| 51 |
|
| 52 |
# =====================
|
| 53 |
# Logs & Temp Files
|
.streamlit/config.toml
CHANGED
|
@@ -2,6 +2,8 @@
|
|
| 2 |
headless = true
|
| 3 |
port = 8501
|
| 4 |
enableCORS = false
|
|
|
|
|
|
|
| 5 |
|
| 6 |
[browser]
|
| 7 |
gatherUsageStats = false
|
|
|
|
| 2 |
headless = true
|
| 3 |
port = 8501
|
| 4 |
enableCORS = false
|
| 5 |
+
maxUploadSize = 50
|
| 6 |
+
enableXsrfProtection = true
|
| 7 |
|
| 8 |
[browser]
|
| 9 |
gatherUsageStats = false
|
README.md
CHANGED
|
@@ -1,59 +1,159 @@
|
|
| 1 |
-
#
|
| 2 |
|
| 3 |
-
|
| 4 |
|
| 5 |
-
|
| 6 |
|
| 7 |
-
|
| 8 |
|
| 9 |
-
|
| 10 |
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
|
|
|
|
|
|
|
|
|
|
| 14 |
|
| 15 |
-
|
| 16 |
|
| 17 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 18 |
|
| 19 |
```bash
|
| 20 |
-
|
| 21 |
-
````
|
| 22 |
|
| 23 |
-
|
|
|
|
| 24 |
|
| 25 |
-
|
|
|
|
|
|
|
| 26 |
|
| 27 |
-
|
| 28 |
|
| 29 |
-
|
| 30 |
-
|
|
|
|
| 31 |
|
| 32 |
-
|
| 33 |
-
cd /path/to/your/project
|
| 34 |
-
```
|
| 35 |
|
| 36 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 37 |
|
| 38 |
-
|
| 39 |
-
streamlit run interface/main.py
|
| 40 |
-
```
|
| 41 |
|
| 42 |
-
|
|
|
|
|
|
|
| 43 |
|
| 44 |
-
|
|
|
|
|
|
|
|
|
|
| 45 |
|
| 46 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 47 |
|
| 48 |
-
|
| 49 |
|
| 50 |
-
|
| 51 |
-
2. Launch Jupyter:
|
| 52 |
|
| 53 |
-
|
| 54 |
-
jupyter notebook medgemma.ipynb
|
| 55 |
-
```
|
| 56 |
|
| 57 |
-
|
| 58 |
-
|
| 59 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# 👁️ OphthalmoCapture
|
| 2 |
|
| 3 |
+
**Sistema de Etiquetado Médico Oftalmológico** — Interfaz web para cargar imágenes de fondo de ojo, etiquetarlas (catarata / no catarata), dictar observaciones por voz con transcripción automática (Whisper) y descargar el paquete de etiquetado completo.
|
| 4 |
|
| 5 |
+
> **Modelo de sesión efímera:** las imágenes y el audio viven únicamente en la memoria del navegador/servidor durante la sesión. Nunca se persisten en disco ni en base de datos. Solo se almacenan metadatos de auditoría (etiqueta, transcripción, médico, fecha).
|
| 6 |
|
| 7 |
+
---
|
| 8 |
|
| 9 |
+
## 1. Requisitos previos
|
| 10 |
|
| 11 |
+
| Requisito | Versión mínima | Notas |
|
| 12 |
+
|-----------|---------------|-------|
|
| 13 |
+
| **Python** | 3.10+ | Recomendado 3.11 |
|
| 14 |
+
| **pip** | 23+ | — |
|
| 15 |
+
| **FFmpeg** | cualquier versión reciente | Necesario para OpenAI Whisper. [Instrucciones de instalación](https://ffmpeg.org/download.html) |
|
| 16 |
+
| **GPU (opcional)** | CUDA 11.8+ | Acelera la transcripción con Whisper. Funciona sin GPU usando CPU. |
|
| 17 |
|
| 18 |
+
---
|
| 19 |
|
| 20 |
+
## 2. Instalación
|
| 21 |
+
|
| 22 |
+
### A. Clonar el repositorio
|
| 23 |
+
|
| 24 |
+
```bash
|
| 25 |
+
git clone <URL_DEL_REPO>
|
| 26 |
+
cd Automatic-Labeling-with-Medgemma
|
| 27 |
+
```
|
| 28 |
+
|
| 29 |
+
### B. Crear un entorno virtual (recomendado)
|
| 30 |
|
| 31 |
```bash
|
| 32 |
+
python -m venv .venv
|
|
|
|
| 33 |
|
| 34 |
+
# Windows
|
| 35 |
+
.venv\Scripts\activate
|
| 36 |
|
| 37 |
+
# Linux / macOS
|
| 38 |
+
source .venv/bin/activate
|
| 39 |
+
```
|
| 40 |
|
| 41 |
+
### C. Instalar dependencias
|
| 42 |
|
| 43 |
+
```bash
|
| 44 |
+
pip install -r requirements.txt
|
| 45 |
+
```
|
| 46 |
|
| 47 |
+
Esto instala: `streamlit`, `openai-whisper`, `torch`, `pandas`, `pillow`, `streamlit-authenticator` y demás dependencias.
|
|
|
|
|
|
|
| 48 |
|
| 49 |
+
> **Nota sobre PyTorch:** si tienes GPU NVIDIA y quieres usarla para Whisper, instala la versión con CUDA antes de instalar los requisitos:
|
| 50 |
+
> ```bash
|
| 51 |
+
> pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
|
| 52 |
+
> pip install -r requirements.txt
|
| 53 |
+
> ```
|
| 54 |
|
| 55 |
+
### D. Verificar FFmpeg
|
|
|
|
|
|
|
| 56 |
|
| 57 |
+
```bash
|
| 58 |
+
ffmpeg -version
|
| 59 |
+
```
|
| 60 |
|
| 61 |
+
Si no está instalado:
|
| 62 |
+
- **Windows:** `winget install ffmpeg` o descargar desde [ffmpeg.org](https://ffmpeg.org/download.html)
|
| 63 |
+
- **macOS:** `brew install ffmpeg`
|
| 64 |
+
- **Linux:** `sudo apt install ffmpeg`
|
| 65 |
|
| 66 |
+
---
|
| 67 |
+
|
| 68 |
+
## 3. Ejecutar la interfaz web (Streamlit)
|
| 69 |
+
|
| 70 |
+
```bash
|
| 71 |
+
streamlit run interface/main.py
|
| 72 |
+
```
|
| 73 |
+
|
| 74 |
+
Se abrirá automáticamente en el navegador en **http://localhost:8501**.
|
| 75 |
+
|
| 76 |
+
### Flujo de uso
|
| 77 |
+
|
| 78 |
+
1. **Autenticación** — Si `streamlit-authenticator` está instalado, inicia sesión con las credenciales configuradas. Si no, entra en modo anónimo automáticamente.
|
| 79 |
+
2. **Cargar imágenes** — Arrastra o selecciona imágenes de fondo de ojo (JPG, PNG, TIFF, máx. 50 MB cada una).
|
| 80 |
+
3. **Galería** — Se muestra una tira de miniaturas con indicadores 🔴 (pendiente) / 🟢 (etiquetada). Haz clic para seleccionar.
|
| 81 |
+
4. **Etiquetar** — Clasifica la imagen como *Catarata* o *No Catarata*.
|
| 82 |
+
5. **Dictar observaciones** — Graba audio con el micrófono. Whisper transcribe automáticamente con timestamps. Puedes editar el texto resultante.
|
| 83 |
+
6. **Descargar** — Descarga un ZIP individual (imagen + metadatos + audio + transcripción) o un paquete de la sesión completa. También disponible en formatos ML (HuggingFace CSV, JSONL).
|
| 84 |
+
7. **Finalizar sesión** — El botón del sidebar limpia toda la memoria. También hay timeout automático de 30 min de inactividad.
|
| 85 |
+
|
| 86 |
+
### Configuración
|
| 87 |
+
|
| 88 |
+
Los parámetros se encuentran en `interface/config.py`:
|
| 89 |
+
|
| 90 |
+
| Parámetro | Valor por defecto | Descripción |
|
| 91 |
+
|-----------|-------------------|-------------|
|
| 92 |
+
| `SESSION_TIMEOUT_MINUTES` | 30 | Minutos de inactividad antes de limpiar la sesión |
|
| 93 |
+
| `MAX_UPLOAD_SIZE_MB` | 50 | Tamaño máximo por imagen |
|
| 94 |
+
| `ALLOWED_EXTENSIONS` | jpg, jpeg, png, tif | Formatos de imagen aceptados |
|
| 95 |
+
| `WHISPER_MODEL_OPTIONS` | tiny → turbo | Modelos de Whisper disponibles |
|
| 96 |
+
| `DEFAULT_WHISPER_LANGUAGE` | es | Idioma por defecto para transcripción |
|
| 97 |
+
| `UI_LANGUAGE` | es | Idioma de la interfaz (es / en) |
|
| 98 |
+
| `LABEL_OPTIONS` | Catarata, No Catarata | Categorías de etiquetado |
|
| 99 |
+
|
| 100 |
+
### Credenciales por defecto (modo autenticación)
|
| 101 |
+
|
| 102 |
+
| Usuario | Contraseña | Rol |
|
| 103 |
+
|---------|------------|-----|
|
| 104 |
+
| admin | admin123 | Administrador |
|
| 105 |
+
| doctor1 | admin123 | Médico |
|
| 106 |
+
| doctor2 | admin123 | Médico |
|
| 107 |
+
|
| 108 |
+
> ⚠️ Cambia estas credenciales en `interface/services/auth_service.py` antes de cualquier uso en producción.
|
| 109 |
+
|
| 110 |
+
---
|
| 111 |
+
|
| 112 |
+
## 4. Estructura del proyecto
|
| 113 |
+
|
| 114 |
+
```
|
| 115 |
+
interface/
|
| 116 |
+
├── main.py # Orquestador principal de Streamlit
|
| 117 |
+
├── config.py # Constantes de configuración
|
| 118 |
+
├── database.py # Persistencia de metadatos (SQLite)
|
| 119 |
+
├── utils.py # Utilidades generales (validación de imágenes)
|
| 120 |
+
├── i18n.py # Internacionalización (es/en)
|
| 121 |
+
├── components/
|
| 122 |
+
│ ├── uploader.py # Carga de imágenes con validación
|
| 123 |
+
│ ├── gallery.py # Galería de miniaturas con estado
|
| 124 |
+
│ ├── labeler.py # Clasificación (catarata / no catarata)
|
| 125 |
+
│ ├── recorder.py # Grabación de audio + transcripción Whisper
|
| 126 |
+
│ └── downloader.py # Descarga individual, masiva y formatos ML
|
| 127 |
+
├── services/
|
| 128 |
+
│ ├── session_manager.py # Gestión de sesión efímera en memoria
|
| 129 |
+
│ ├── whisper_service.py # Carga y transcripción con Whisper
|
| 130 |
+
│ ├── export_service.py # Generación de ZIP, CSV, JSONL
|
| 131 |
+
│ └── auth_service.py # Autenticación (opcional)
|
| 132 |
+
└── .streamlit/
|
| 133 |
+
└── config.toml # Configuración de Streamlit
|
| 134 |
+
```
|
| 135 |
+
|
| 136 |
+
---
|
| 137 |
+
|
| 138 |
+
## 5. Ejecutar el Notebook (Jupyter)
|
| 139 |
+
|
| 140 |
+
Para explorar el modelo MedGemma, afinar parámetros o depurar:
|
| 141 |
+
|
| 142 |
+
```bash
|
| 143 |
+
jupyter notebook medgemma.ipynb
|
| 144 |
+
```
|
| 145 |
|
| 146 |
+
Ejecuta las celdas secuencialmente con **Shift + Enter**.
|
| 147 |
|
| 148 |
+
---
|
|
|
|
| 149 |
|
| 150 |
+
## 6. Solución de problemas
|
|
|
|
|
|
|
| 151 |
|
| 152 |
+
| Problema | Solución |
|
| 153 |
+
|----------|----------|
|
| 154 |
+
| `ModuleNotFoundError: No module named 'whisper'` | `pip install openai-whisper` |
|
| 155 |
+
| `FileNotFoundError: ffmpeg not found` | Instala FFmpeg (ver sección 2.D) |
|
| 156 |
+
| Audio no se graba en el navegador | Asegúrate de acceder por `localhost` o HTTPS. Los navegadores bloquean el micrófono en HTTP no local. |
|
| 157 |
+
| `streamlit-authenticator` no disponible | La app funciona en modo anónimo automáticamente. Instalar con `pip install streamlit-authenticator` si se desea autenticación. |
|
| 158 |
+
| Timeout de sesión inesperado | Ajusta `SESSION_TIMEOUT_MINUTES` en `config.py` |
|
| 159 |
+
| Imágenes no se cargan | Verifica que el formato sea JPG/PNG/TIFF y que no supere 50 MB |
|
annotations.db
ADDED
|
Binary file (20.5 kB). View file
|
|
|
interface/components/__init__.py
ADDED
|
File without changes
|
interface/components/downloader.py
ADDED
|
@@ -0,0 +1,102 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""OphthalmoCapture — Download Component
|
| 2 |
+
|
| 3 |
+
Provides individual and bulk download buttons for the labeling package.
|
| 4 |
+
"""
|
| 5 |
+
|
| 6 |
+
import streamlit as st
|
| 7 |
+
from services.export_service import (
|
| 8 |
+
export_single_image,
|
| 9 |
+
export_full_session,
|
| 10 |
+
get_session_summary,
|
| 11 |
+
export_huggingface_csv,
|
| 12 |
+
export_jsonl,
|
| 13 |
+
)
|
| 14 |
+
|
| 15 |
+
|
| 16 |
+
def render_downloader(image_id: str):
|
| 17 |
+
"""Render the download panel for the current image + bulk download."""
|
| 18 |
+
img = st.session_state.images.get(image_id)
|
| 19 |
+
if img is None:
|
| 20 |
+
return
|
| 21 |
+
|
| 22 |
+
st.subheader("📥 Descarga")
|
| 23 |
+
|
| 24 |
+
# ── Individual download ──────────────────────────────────────────────
|
| 25 |
+
st.markdown("**Imagen actual**")
|
| 26 |
+
|
| 27 |
+
can_download = img["label"] is not None
|
| 28 |
+
if not can_download:
|
| 29 |
+
st.info("Etiquete la imagen para habilitar la descarga individual.")
|
| 30 |
+
else:
|
| 31 |
+
zip_bytes, zip_name = export_single_image(image_id)
|
| 32 |
+
st.download_button(
|
| 33 |
+
label=f"⬇️ Descargar etiquetado — {img['filename']}",
|
| 34 |
+
data=zip_bytes,
|
| 35 |
+
file_name=zip_name,
|
| 36 |
+
mime="application/zip",
|
| 37 |
+
key=f"dl_single_{image_id}",
|
| 38 |
+
use_container_width=True,
|
| 39 |
+
)
|
| 40 |
+
|
| 41 |
+
st.divider()
|
| 42 |
+
|
| 43 |
+
# ── Bulk download ────────────────────────────────────────────────────
|
| 44 |
+
st.markdown("**Toda la sesión**")
|
| 45 |
+
|
| 46 |
+
summary = get_session_summary()
|
| 47 |
+
sc1, sc2 = st.columns(2)
|
| 48 |
+
with sc1:
|
| 49 |
+
st.metric("Imágenes", summary["total"])
|
| 50 |
+
st.metric("Con audio", summary["with_audio"])
|
| 51 |
+
with sc2:
|
| 52 |
+
st.metric("Etiquetadas", f"{summary['labeled']} / {summary['total']}")
|
| 53 |
+
st.metric("Con transcripción", summary["with_transcription"])
|
| 54 |
+
|
| 55 |
+
if summary["unlabeled"] > 0:
|
| 56 |
+
st.warning(
|
| 57 |
+
f"⚠️ {summary['unlabeled']} imagen(es) sin etiquetar. "
|
| 58 |
+
"Se incluirán en la descarga pero sin etiqueta."
|
| 59 |
+
)
|
| 60 |
+
|
| 61 |
+
if summary["total"] == 0:
|
| 62 |
+
st.info("No hay imágenes para descargar.")
|
| 63 |
+
else:
|
| 64 |
+
zip_bytes, zip_name = export_full_session()
|
| 65 |
+
if st.download_button(
|
| 66 |
+
label="⬇️ Descargar todo el etiquetado (ZIP)",
|
| 67 |
+
data=zip_bytes,
|
| 68 |
+
file_name=zip_name,
|
| 69 |
+
mime="application/zip",
|
| 70 |
+
key="dl_bulk",
|
| 71 |
+
use_container_width=True,
|
| 72 |
+
type="primary",
|
| 73 |
+
):
|
| 74 |
+
st.session_state.session_downloaded = True
|
| 75 |
+
|
| 76 |
+
# ── ML-ready formats (Idea F) ────────────────────────────────────────
|
| 77 |
+
if summary["labeled"] > 0:
|
| 78 |
+
st.divider()
|
| 79 |
+
st.markdown("**Formatos para ML**")
|
| 80 |
+
ml1, ml2 = st.columns(2)
|
| 81 |
+
with ml1:
|
| 82 |
+
csv_bytes, csv_name = export_huggingface_csv()
|
| 83 |
+
if st.download_button(
|
| 84 |
+
label="📊 CSV (HuggingFace)",
|
| 85 |
+
data=csv_bytes,
|
| 86 |
+
file_name=csv_name,
|
| 87 |
+
mime="text/csv",
|
| 88 |
+
key="dl_hf_csv",
|
| 89 |
+
use_container_width=True,
|
| 90 |
+
):
|
| 91 |
+
st.session_state.session_downloaded = True
|
| 92 |
+
with ml2:
|
| 93 |
+
jsonl_bytes, jsonl_name = export_jsonl()
|
| 94 |
+
if st.download_button(
|
| 95 |
+
label="📄 JSONL (Fine-tuning)",
|
| 96 |
+
data=jsonl_bytes,
|
| 97 |
+
file_name=jsonl_name,
|
| 98 |
+
mime="application/jsonl",
|
| 99 |
+
key="dl_jsonl",
|
| 100 |
+
use_container_width=True,
|
| 101 |
+
):
|
| 102 |
+
st.session_state.session_downloaded = True
|
interface/components/gallery.py
ADDED
|
@@ -0,0 +1,106 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""OphthalmoCapture — Image Gallery Component
|
| 2 |
+
|
| 3 |
+
Renders a thumbnail strip of all uploaded images with labeling-status
|
| 4 |
+
badges and click-to-select behaviour.
|
| 5 |
+
"""
|
| 6 |
+
|
| 7 |
+
import streamlit as st
|
| 8 |
+
from services import session_manager as sm
|
| 9 |
+
|
| 10 |
+
|
| 11 |
+
def _label_badge(label):
|
| 12 |
+
"""Return a coloured status indicator for the label value."""
|
| 13 |
+
if label is None:
|
| 14 |
+
return "🔴" # unlabeled
|
| 15 |
+
return "🟢" # labeled (any value)
|
| 16 |
+
|
| 17 |
+
|
| 18 |
+
def render_gallery():
|
| 19 |
+
"""Draw the horizontal thumbnail gallery with status badges.
|
| 20 |
+
|
| 21 |
+
Returns True if the user clicked on a thumbnail (triggers rerun).
|
| 22 |
+
"""
|
| 23 |
+
images = st.session_state.images
|
| 24 |
+
order = st.session_state.image_order
|
| 25 |
+
current_id = st.session_state.current_image_id
|
| 26 |
+
|
| 27 |
+
if not order:
|
| 28 |
+
return False
|
| 29 |
+
|
| 30 |
+
# ── Progress bar ─────────────────────────────────────────────────────
|
| 31 |
+
labeled, total = sm.get_labeling_progress()
|
| 32 |
+
progress_text = f"Progreso: **{labeled}** / **{total}** etiquetadas"
|
| 33 |
+
st.markdown(progress_text)
|
| 34 |
+
st.progress(labeled / total if total > 0 else 0)
|
| 35 |
+
|
| 36 |
+
# ── Thumbnail strip ──────────────────────────────────────────────────
|
| 37 |
+
# Show up to 8 thumbnails per row; wrap if there are more.
|
| 38 |
+
COLS_PER_ROW = 8
|
| 39 |
+
num_images = len(order)
|
| 40 |
+
|
| 41 |
+
# Paginate the gallery if many images
|
| 42 |
+
if "gallery_page" not in st.session_state:
|
| 43 |
+
st.session_state.gallery_page = 0
|
| 44 |
+
|
| 45 |
+
total_pages = max(1, -(-num_images // COLS_PER_ROW)) # ceil division
|
| 46 |
+
page = st.session_state.gallery_page
|
| 47 |
+
start = page * COLS_PER_ROW
|
| 48 |
+
end = min(start + COLS_PER_ROW, num_images)
|
| 49 |
+
visible_ids = order[start:end]
|
| 50 |
+
|
| 51 |
+
cols = st.columns(max(len(visible_ids), 1))
|
| 52 |
+
|
| 53 |
+
clicked = False
|
| 54 |
+
for i, img_id in enumerate(visible_ids):
|
| 55 |
+
img = images[img_id]
|
| 56 |
+
badge = _label_badge(img["label"])
|
| 57 |
+
is_selected = (img_id == current_id)
|
| 58 |
+
|
| 59 |
+
with cols[i]:
|
| 60 |
+
# Visual border to highlight the selected thumbnail
|
| 61 |
+
if is_selected:
|
| 62 |
+
st.markdown(
|
| 63 |
+
"<div style='border:3px solid #4CAF50; border-radius:8px; "
|
| 64 |
+
"padding:2px;'>",
|
| 65 |
+
unsafe_allow_html=True,
|
| 66 |
+
)
|
| 67 |
+
|
| 68 |
+
st.image(img["bytes"], use_container_width=True)
|
| 69 |
+
|
| 70 |
+
if is_selected:
|
| 71 |
+
st.markdown("</div>", unsafe_allow_html=True)
|
| 72 |
+
|
| 73 |
+
# Label + filename
|
| 74 |
+
short_name = img["filename"]
|
| 75 |
+
if len(short_name) > 18:
|
| 76 |
+
short_name = short_name[:15] + "…"
|
| 77 |
+
|
| 78 |
+
if st.button(
|
| 79 |
+
f"{badge} {short_name}",
|
| 80 |
+
key=f"thumb_{img_id}",
|
| 81 |
+
use_container_width=True,
|
| 82 |
+
):
|
| 83 |
+
sm.set_current_image(img_id)
|
| 84 |
+
clicked = True
|
| 85 |
+
|
| 86 |
+
# ── Gallery pagination ───────────────────────────────────────────────
|
| 87 |
+
if total_pages > 1:
|
| 88 |
+
gc1, gc2, gc3 = st.columns([1, 3, 1])
|
| 89 |
+
with gc1:
|
| 90 |
+
if page > 0:
|
| 91 |
+
if st.button("◀ Ant.", key="gal_prev"):
|
| 92 |
+
st.session_state.gallery_page -= 1
|
| 93 |
+
clicked = True
|
| 94 |
+
with gc2:
|
| 95 |
+
st.markdown(
|
| 96 |
+
f"<div style='text-align:center; padding-top:6px;'>"
|
| 97 |
+
f"Página {page + 1} / {total_pages}</div>",
|
| 98 |
+
unsafe_allow_html=True,
|
| 99 |
+
)
|
| 100 |
+
with gc3:
|
| 101 |
+
if page < total_pages - 1:
|
| 102 |
+
if st.button("Sig. ▶", key="gal_next"):
|
| 103 |
+
st.session_state.gallery_page += 1
|
| 104 |
+
clicked = True
|
| 105 |
+
|
| 106 |
+
return clicked
|
interface/components/image_protection.py
ADDED
|
@@ -0,0 +1,177 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""OphthalmoCapture — Image Protection Layer
|
| 2 |
+
|
| 3 |
+
Injects CSS and JavaScript into the Streamlit page to prevent users from
|
| 4 |
+
downloading, dragging, or otherwise saving the confidential medical images.
|
| 5 |
+
|
| 6 |
+
KEY DESIGN DECISION:
|
| 7 |
+
Streamlit's st.markdown(unsafe_allow_html=True) renders <style> tags but
|
| 8 |
+
STRIPS <script> tags for security. Therefore:
|
| 9 |
+
• CSS protections → injected via st.markdown (works natively).
|
| 10 |
+
• JS protections → injected via st.components.v1.html() which creates
|
| 11 |
+
a real iframe where JavaScript executes. From that iframe we reach
|
| 12 |
+
the main Streamlit page via window.parent.document (same-origin).
|
| 13 |
+
|
| 14 |
+
Protection layers (defence-in-depth):
|
| 15 |
+
1. CSS: pointer-events:none, user-select:none, draggable:false on <img>.
|
| 16 |
+
2. CSS: transparent ::after overlay on stImage containers blocks
|
| 17 |
+
right-click "Save image as…".
|
| 18 |
+
3. CSS: -webkit-touch-callout:none blocks mobile long-press save.
|
| 19 |
+
4. JS: contextmenu event blocked on the ENTIRE parent document.
|
| 20 |
+
5. JS: Ctrl+S / Ctrl+U / Ctrl+Shift+I / Ctrl+Shift+J / Ctrl+Shift+C /
|
| 21 |
+
F12 all intercepted and cancelled.
|
| 22 |
+
6. JS: dragstart blocked for all images.
|
| 23 |
+
7. JS: MutationObserver re-applies draggable=false to dynamically added
|
| 24 |
+
images (Streamlit re-renders on every interaction).
|
| 25 |
+
8. JS: Blob/URL revocation — monkey-patches URL.createObjectURL and
|
| 26 |
+
document.createElement to block programmatic image extraction.
|
| 27 |
+
|
| 28 |
+
IMPORTANT LIMITATION:
|
| 29 |
+
No client-side measure can guarantee absolute prevention. A technically
|
| 30 |
+
sophisticated user could still extract images through OS-level screenshots,
|
| 31 |
+
network packet inspection, or browser extensions that bypass JS hooks.
|
| 32 |
+
These protections eliminate ALL standard browser download paths and raise
|
| 33 |
+
the bar significantly.
|
| 34 |
+
"""
|
| 35 |
+
|
| 36 |
+
import streamlit as st
|
| 37 |
+
import streamlit.components.v1 as components
|
| 38 |
+
|
| 39 |
+
# ── CSS injected via st.markdown (Streamlit renders <style> natively) ────────
|
| 40 |
+
_PROTECTION_CSS = """
|
| 41 |
+
<style>
|
| 42 |
+
/* Layer 1: Disable ALL interaction on <img> tags */
|
| 43 |
+
img {
|
| 44 |
+
pointer-events: none !important;
|
| 45 |
+
user-select: none !important;
|
| 46 |
+
-webkit-user-select: none !important;
|
| 47 |
+
-moz-user-select: none !important;
|
| 48 |
+
-ms-user-select: none !important;
|
| 49 |
+
-webkit-user-drag: none !important;
|
| 50 |
+
-webkit-touch-callout: none !important;
|
| 51 |
+
}
|
| 52 |
+
|
| 53 |
+
/* Layer 2: Transparent overlay on every Streamlit image container */
|
| 54 |
+
[data-testid="stImage"] {
|
| 55 |
+
position: relative !important;
|
| 56 |
+
}
|
| 57 |
+
[data-testid="stImage"]::after {
|
| 58 |
+
content: "";
|
| 59 |
+
position: absolute;
|
| 60 |
+
top: 0; left: 0; right: 0; bottom: 0;
|
| 61 |
+
z-index: 10;
|
| 62 |
+
background: transparent;
|
| 63 |
+
pointer-events: auto !important;
|
| 64 |
+
cursor: default;
|
| 65 |
+
}
|
| 66 |
+
|
| 67 |
+
/* Layer 3: Extra drag prevention */
|
| 68 |
+
[data-testid="stImage"] img {
|
| 69 |
+
-webkit-user-drag: none !important;
|
| 70 |
+
user-drag: none !important;
|
| 71 |
+
}
|
| 72 |
+
</style>
|
| 73 |
+
"""
|
| 74 |
+
|
| 75 |
+
# ── JavaScript injected via components.html (runs in real iframe) ────────────
|
| 76 |
+
# From the iframe we access window.parent.document to attach listeners
|
| 77 |
+
# on the ACTUAL Streamlit page, not just inside the hidden iframe.
|
| 78 |
+
_PROTECTION_JS_HTML = """
|
| 79 |
+
<script>
|
| 80 |
+
(function () {
|
| 81 |
+
// The parent document is the real Streamlit page
|
| 82 |
+
var doc;
|
| 83 |
+
try { doc = window.parent.document; } catch(e) { doc = document; }
|
| 84 |
+
|
| 85 |
+
// Guard: only inject once per page lifecycle
|
| 86 |
+
if (doc.__ophthalmo_protection__) return;
|
| 87 |
+
doc.__ophthalmo_protection__ = true;
|
| 88 |
+
|
| 89 |
+
function block(e) {
|
| 90 |
+
e.preventDefault();
|
| 91 |
+
e.stopPropagation();
|
| 92 |
+
e.stopImmediatePropagation();
|
| 93 |
+
return false;
|
| 94 |
+
}
|
| 95 |
+
|
| 96 |
+
// ── Layer 4: Block context menu on ENTIRE page ──────────────────────
|
| 97 |
+
doc.addEventListener('contextmenu', function (e) {
|
| 98 |
+
return block(e);
|
| 99 |
+
}, true);
|
| 100 |
+
|
| 101 |
+
// ── Layer 5: Block keyboard shortcuts ───────────────────────────────
|
| 102 |
+
doc.addEventListener('keydown', function (e) {
|
| 103 |
+
var dominated = false;
|
| 104 |
+
var ctrl = e.ctrlKey || e.metaKey;
|
| 105 |
+
var key = e.key ? e.key.toLowerCase() : '';
|
| 106 |
+
|
| 107 |
+
// Ctrl+S — Save page
|
| 108 |
+
if (ctrl && key === 's') dominated = true;
|
| 109 |
+
// Ctrl+U — View source
|
| 110 |
+
if (ctrl && key === 'u') dominated = true;
|
| 111 |
+
// Ctrl+P — Print (can save as PDF with images)
|
| 112 |
+
if (ctrl && key === 'p') dominated = true;
|
| 113 |
+
// F12 — DevTools
|
| 114 |
+
if (e.keyCode === 123) dominated = true;
|
| 115 |
+
// Ctrl+Shift+I — DevTools (Inspector)
|
| 116 |
+
if (ctrl && e.shiftKey && key === 'i') dominated = true;
|
| 117 |
+
// Ctrl+Shift+J — DevTools (Console)
|
| 118 |
+
if (ctrl && e.shiftKey && key === 'j') dominated = true;
|
| 119 |
+
// Ctrl+Shift+C — DevTools (Element picker)
|
| 120 |
+
if (ctrl && e.shiftKey && key === 'c') dominated = true;
|
| 121 |
+
|
| 122 |
+
if (dominated) return block(e);
|
| 123 |
+
}, true);
|
| 124 |
+
|
| 125 |
+
// ── Layer 6: Block drag-and-drop of images ─────────────────────────
|
| 126 |
+
doc.addEventListener('dragstart', function (e) {
|
| 127 |
+
if (e.target && e.target.tagName === 'IMG') return block(e);
|
| 128 |
+
}, true);
|
| 129 |
+
|
| 130 |
+
// ── Layer 7: MutationObserver — lock new images as they appear ──────
|
| 131 |
+
function lockImages(root) {
|
| 132 |
+
var imgs = (root.querySelectorAll) ? root.querySelectorAll('img') : [];
|
| 133 |
+
for (var i = 0; i < imgs.length; i++) {
|
| 134 |
+
imgs[i].setAttribute('draggable', 'false');
|
| 135 |
+
imgs[i].ondragstart = function() { return false; };
|
| 136 |
+
imgs[i].oncontextmenu = function() { return false; };
|
| 137 |
+
}
|
| 138 |
+
}
|
| 139 |
+
lockImages(doc);
|
| 140 |
+
|
| 141 |
+
var obs = new MutationObserver(function (mutations) {
|
| 142 |
+
for (var m = 0; m < mutations.length; m++) {
|
| 143 |
+
var nodes = mutations[m].addedNodes;
|
| 144 |
+
for (var n = 0; n < nodes.length; n++) {
|
| 145 |
+
if (nodes[n].nodeType === 1) lockImages(nodes[n]);
|
| 146 |
+
}
|
| 147 |
+
}
|
| 148 |
+
});
|
| 149 |
+
obs.observe(doc.body, { childList: true, subtree: true });
|
| 150 |
+
|
| 151 |
+
// ── Layer 8: Neuter Blob URL creation for images ────────────────────
|
| 152 |
+
// Prevents programmatic extraction via createObjectURL
|
| 153 |
+
var origCreateObjectURL = URL.createObjectURL;
|
| 154 |
+
URL.createObjectURL = function(obj) {
|
| 155 |
+
if (obj instanceof Blob && obj.type && obj.type.startsWith('image/')) {
|
| 156 |
+
console.warn('[OphthalmoCapture] Blob URL creation blocked for images');
|
| 157 |
+
return '';
|
| 158 |
+
}
|
| 159 |
+
return origCreateObjectURL.call(URL, obj);
|
| 160 |
+
};
|
| 161 |
+
|
| 162 |
+
})();
|
| 163 |
+
</script>
|
| 164 |
+
"""
|
| 165 |
+
|
| 166 |
+
|
| 167 |
+
def inject_image_protection():
|
| 168 |
+
"""Inject all CSS + JS image-protection layers into the page.
|
| 169 |
+
|
| 170 |
+
Call this ONCE near the top of main.py, after st.set_page_config().
|
| 171 |
+
"""
|
| 172 |
+
# CSS — works natively via st.markdown
|
| 173 |
+
st.markdown(_PROTECTION_CSS, unsafe_allow_html=True)
|
| 174 |
+
|
| 175 |
+
# JS — MUST use components.html so the <script> actually executes.
|
| 176 |
+
# height=0 makes the iframe invisible.
|
| 177 |
+
components.html(_PROTECTION_JS_HTML, height=0, scrolling=False)
|
interface/components/labeler.py
ADDED
|
@@ -0,0 +1,82 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""OphthalmoCapture — Labeling Component
|
| 2 |
+
|
| 3 |
+
Provides the radio-button selector for classifying images (e.g. catarata /
|
| 4 |
+
no catarata) and persists the choice in the ephemeral session. The label
|
| 5 |
+
list is driven by config.LABEL_OPTIONS so it can be extended without touching
|
| 6 |
+
this component.
|
| 7 |
+
"""
|
| 8 |
+
|
| 9 |
+
import streamlit as st
|
| 10 |
+
import config
|
| 11 |
+
import database as db
|
| 12 |
+
from services import session_manager as sm
|
| 13 |
+
|
| 14 |
+
|
| 15 |
+
def render_labeler(image_id: str):
|
| 16 |
+
"""Render the labeling panel for the given image.
|
| 17 |
+
|
| 18 |
+
Displays a radio selector, saves the label into session state and
|
| 19 |
+
optionally persists metadata to the audit database.
|
| 20 |
+
"""
|
| 21 |
+
img = st.session_state.images.get(image_id)
|
| 22 |
+
if img is None:
|
| 23 |
+
return
|
| 24 |
+
|
| 25 |
+
st.subheader("🏷️ Etiquetado")
|
| 26 |
+
|
| 27 |
+
display_options = [opt["display"] for opt in config.LABEL_OPTIONS]
|
| 28 |
+
current_label = img.get("label")
|
| 29 |
+
|
| 30 |
+
# Determine current index (None if unlabeled)
|
| 31 |
+
if current_label is not None and current_label in display_options:
|
| 32 |
+
current_index = display_options.index(current_label)
|
| 33 |
+
else:
|
| 34 |
+
current_index = None
|
| 35 |
+
|
| 36 |
+
# Styled container with radio buttons
|
| 37 |
+
with st.container(border=True):
|
| 38 |
+
if current_index is None:
|
| 39 |
+
st.caption("⬇️ Seleccione una etiqueta para esta imagen")
|
| 40 |
+
|
| 41 |
+
selected = st.radio(
|
| 42 |
+
"Clasificación",
|
| 43 |
+
display_options,
|
| 44 |
+
index=current_index,
|
| 45 |
+
key=f"label_radio_{image_id}",
|
| 46 |
+
horizontal=True,
|
| 47 |
+
label_visibility="collapsed",
|
| 48 |
+
)
|
| 49 |
+
|
| 50 |
+
# Map selection
|
| 51 |
+
new_label = selected if selected in display_options else None
|
| 52 |
+
|
| 53 |
+
# Detect change, update session and auto-save to DB
|
| 54 |
+
if new_label is not None and new_label != current_label:
|
| 55 |
+
st.session_state.images[image_id]["label"] = new_label
|
| 56 |
+
st.session_state.images[image_id]["labeled_by"] = st.session_state.get(
|
| 57 |
+
"doctor_name", ""
|
| 58 |
+
)
|
| 59 |
+
sm.update_activity()
|
| 60 |
+
|
| 61 |
+
# Auto-save to audit DB (upsert — one record per image per session)
|
| 62 |
+
try:
|
| 63 |
+
db.save_or_update_annotation(
|
| 64 |
+
image_filename=img["filename"],
|
| 65 |
+
label=new_label,
|
| 66 |
+
transcription=img.get("transcription", ""),
|
| 67 |
+
doctor_name=st.session_state.get("doctor_name", ""),
|
| 68 |
+
session_id=st.session_state.get("session_id", ""),
|
| 69 |
+
)
|
| 70 |
+
except Exception:
|
| 71 |
+
pass # Non-blocking: audit DB failure should not break labeling
|
| 72 |
+
|
| 73 |
+
# ── Visual feedback ──────────────────────────────────────────────────
|
| 74 |
+
if new_label is None:
|
| 75 |
+
st.warning("🔴 Sin etiquetar")
|
| 76 |
+
else:
|
| 77 |
+
code = "—"
|
| 78 |
+
for opt in config.LABEL_OPTIONS:
|
| 79 |
+
if opt["display"] == new_label:
|
| 80 |
+
code = opt["code"]
|
| 81 |
+
break
|
| 82 |
+
st.success(f"🟢 Etiqueta: **{new_label}** (código: {code})")
|
interface/components/recorder.py
ADDED
|
@@ -0,0 +1,186 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""OphthalmoCapture — Audio Recorder & Transcription Component
|
| 2 |
+
|
| 3 |
+
Records audio via st.audio_input, transcribes with Whisper, stores the
|
| 4 |
+
audio bytes and transcription in the ephemeral session, and lets the
|
| 5 |
+
doctor edit the transcription or restore the original.
|
| 6 |
+
|
| 7 |
+
Includes timestamped segments from Whisper for reference.
|
| 8 |
+
"""
|
| 9 |
+
|
| 10 |
+
import hashlib
|
| 11 |
+
import streamlit as st
|
| 12 |
+
import database as db
|
| 13 |
+
from services import session_manager as sm
|
| 14 |
+
from services.whisper_service import transcribe_audio_with_timestamps, format_timestamp
|
| 15 |
+
|
| 16 |
+
|
| 17 |
+
def _audio_fingerprint(audio_bytes: bytes) -> str:
|
| 18 |
+
"""Return a short hash of the audio content for change detection."""
|
| 19 |
+
return hashlib.md5(audio_bytes).hexdigest()
|
| 20 |
+
|
| 21 |
+
|
| 22 |
+
def render_recorder(image_id: str, model, language: str):
|
| 23 |
+
"""Render the audio recording + transcription panel.
|
| 24 |
+
|
| 25 |
+
Parameters
|
| 26 |
+
----------
|
| 27 |
+
image_id : str
|
| 28 |
+
UUID of the currently selected image.
|
| 29 |
+
model :
|
| 30 |
+
Loaded Whisper model instance.
|
| 31 |
+
language : str
|
| 32 |
+
ISO language code for transcription (e.g. "es").
|
| 33 |
+
"""
|
| 34 |
+
img = st.session_state.images.get(image_id)
|
| 35 |
+
if img is None:
|
| 36 |
+
return
|
| 37 |
+
|
| 38 |
+
st.subheader("🎙️ Dictado y Transcripción")
|
| 39 |
+
|
| 40 |
+
# ── Audio recording ──────────────────────────────────────────────────
|
| 41 |
+
audio_wav = st.audio_input(
|
| 42 |
+
"Grabar audio",
|
| 43 |
+
key=f"audio_input_{image_id}",
|
| 44 |
+
)
|
| 45 |
+
|
| 46 |
+
# Track which audio blob we already processed so we don't re-transcribe
|
| 47 |
+
processed_key = f"_last_audio_{image_id}"
|
| 48 |
+
segments_key = f"_segments_{image_id}"
|
| 49 |
+
|
| 50 |
+
if audio_wav is not None:
|
| 51 |
+
audio_bytes = audio_wav.getvalue()
|
| 52 |
+
fingerprint = _audio_fingerprint(audio_bytes)
|
| 53 |
+
|
| 54 |
+
# Only transcribe if this is a *new* recording (content changed)
|
| 55 |
+
if st.session_state.get(processed_key) != fingerprint:
|
| 56 |
+
with st.spinner("Transcribiendo audio…"):
|
| 57 |
+
text, segments = transcribe_audio_with_timestamps(
|
| 58 |
+
model, audio_bytes, language
|
| 59 |
+
)
|
| 60 |
+
|
| 61 |
+
# Store in session
|
| 62 |
+
img["audio_bytes"] = audio_bytes
|
| 63 |
+
|
| 64 |
+
# Append (don't overwrite) if there was previous text
|
| 65 |
+
if img["transcription"]:
|
| 66 |
+
img["transcription"] += " " + text
|
| 67 |
+
else:
|
| 68 |
+
img["transcription"] = text
|
| 69 |
+
|
| 70 |
+
# Keep a copy of the raw Whisper output
|
| 71 |
+
if img["transcription_original"]:
|
| 72 |
+
img["transcription_original"] += " " + text
|
| 73 |
+
else:
|
| 74 |
+
img["transcription_original"] = text
|
| 75 |
+
|
| 76 |
+
# Store timestamped segments
|
| 77 |
+
existing_segments = st.session_state.get(segments_key, [])
|
| 78 |
+
st.session_state[segments_key] = existing_segments + segments
|
| 79 |
+
|
| 80 |
+
# Mark this audio as processed using content hash (stable across reruns)
|
| 81 |
+
st.session_state[processed_key] = fingerprint
|
| 82 |
+
# Update the text_area widget state so it reflects the new text
|
| 83 |
+
st.session_state[f"transcription_area_{image_id}"] = img["transcription"]
|
| 84 |
+
|
| 85 |
+
# Re-save to audit DB if the image is already labeled (upsert)
|
| 86 |
+
if img.get("label"):
|
| 87 |
+
try:
|
| 88 |
+
db.save_or_update_annotation(
|
| 89 |
+
image_filename=img["filename"],
|
| 90 |
+
label=img["label"],
|
| 91 |
+
transcription=img["transcription"],
|
| 92 |
+
doctor_name=st.session_state.get("doctor_name", ""),
|
| 93 |
+
session_id=st.session_state.get("session_id", ""),
|
| 94 |
+
)
|
| 95 |
+
except Exception:
|
| 96 |
+
pass
|
| 97 |
+
|
| 98 |
+
sm.update_activity()
|
| 99 |
+
st.rerun()
|
| 100 |
+
|
| 101 |
+
# ── Editable transcription ───────────────────────────────────────────
|
| 102 |
+
edited_text = st.text_area(
|
| 103 |
+
"Transcripción (editable)",
|
| 104 |
+
value=img["transcription"],
|
| 105 |
+
height=180,
|
| 106 |
+
key=f"transcription_area_{image_id}",
|
| 107 |
+
placeholder="Grabe un audio o escriba la transcripción manualmente…",
|
| 108 |
+
)
|
| 109 |
+
|
| 110 |
+
# Sync edits back to session
|
| 111 |
+
if edited_text != img["transcription"]:
|
| 112 |
+
img["transcription"] = edited_text
|
| 113 |
+
sm.update_activity()
|
| 114 |
+
|
| 115 |
+
# ── Timestamped segments (Idea C) ────────────────────────────────────
|
| 116 |
+
segments = st.session_state.get(segments_key, [])
|
| 117 |
+
if segments:
|
| 118 |
+
with st.expander("🕐 Segmentos con timestamps", expanded=False):
|
| 119 |
+
for seg in segments:
|
| 120 |
+
ts_start = format_timestamp(seg["start"])
|
| 121 |
+
ts_end = format_timestamp(seg["end"])
|
| 122 |
+
st.markdown(
|
| 123 |
+
f"`{ts_start} → {ts_end}` {seg['text']}"
|
| 124 |
+
)
|
| 125 |
+
|
| 126 |
+
# ── Helper buttons ────────────────────────────��──────────────────────
|
| 127 |
+
btn_cols = st.columns(3)
|
| 128 |
+
|
| 129 |
+
with btn_cols[0]:
|
| 130 |
+
# Re-record: clear audio and transcription so a new recording can be made
|
| 131 |
+
has_audio = img["audio_bytes"] is not None
|
| 132 |
+
if st.button(
|
| 133 |
+
"🎤 Volver a grabar",
|
| 134 |
+
key=f"rerecord_{image_id}",
|
| 135 |
+
disabled=not has_audio,
|
| 136 |
+
use_container_width=True,
|
| 137 |
+
):
|
| 138 |
+
img["audio_bytes"] = None
|
| 139 |
+
img["transcription"] = ""
|
| 140 |
+
img["transcription_original"] = ""
|
| 141 |
+
st.session_state.pop(segments_key, None)
|
| 142 |
+
st.session_state.pop(processed_key, None)
|
| 143 |
+
st.session_state.pop(f"transcription_area_{image_id}", None)
|
| 144 |
+
# Clear the audio_input widget state to reset the recorder
|
| 145 |
+
st.session_state.pop(f"audio_input_{image_id}", None)
|
| 146 |
+
sm.update_activity()
|
| 147 |
+
st.rerun()
|
| 148 |
+
|
| 149 |
+
with btn_cols[1]:
|
| 150 |
+
# Restore original Whisper transcription
|
| 151 |
+
has_original = bool(img["transcription_original"])
|
| 152 |
+
is_different = img["transcription"] != img["transcription_original"]
|
| 153 |
+
if st.button(
|
| 154 |
+
"🔄 Restaurar original",
|
| 155 |
+
key=f"restore_{image_id}",
|
| 156 |
+
disabled=not (has_original and is_different),
|
| 157 |
+
use_container_width=True,
|
| 158 |
+
):
|
| 159 |
+
img["transcription"] = img["transcription_original"]
|
| 160 |
+
sm.update_activity()
|
| 161 |
+
st.rerun()
|
| 162 |
+
|
| 163 |
+
with btn_cols[2]:
|
| 164 |
+
# Clear transcription entirely
|
| 165 |
+
if st.button(
|
| 166 |
+
"🗑️ Limpiar texto",
|
| 167 |
+
key=f"clear_text_{image_id}",
|
| 168 |
+
disabled=not img["transcription"],
|
| 169 |
+
use_container_width=True,
|
| 170 |
+
):
|
| 171 |
+
img["transcription"] = ""
|
| 172 |
+
sm.update_activity()
|
| 173 |
+
st.rerun()
|
| 174 |
+
|
| 175 |
+
# ── Status line ──────────────────────────────────────────────────────
|
| 176 |
+
if img["transcription"]:
|
| 177 |
+
modified_tag = ""
|
| 178 |
+
if (
|
| 179 |
+
img["transcription_original"]
|
| 180 |
+
and img["transcription"] != img["transcription_original"]
|
| 181 |
+
):
|
| 182 |
+
modified_tag = " ✏️ _modificada manualmente_"
|
| 183 |
+
word_count = len(img["transcription"].split())
|
| 184 |
+
st.caption(f"{word_count} palabras{modified_tag}")
|
| 185 |
+
else:
|
| 186 |
+
st.caption("Sin transcripción aún.")
|
interface/components/uploader.py
ADDED
|
@@ -0,0 +1,257 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""OphthalmoCapture — Image Upload Component
|
| 2 |
+
|
| 3 |
+
Handles file upload, validation, and ingestion into the ephemeral session.
|
| 4 |
+
Uses @st.dialog modals to warn about:
|
| 5 |
+
- Previously labeled images (from DB) — doctor chooses which to re-label.
|
| 6 |
+
- Session duplicates — informational notice.
|
| 7 |
+
"""
|
| 8 |
+
|
| 9 |
+
import streamlit as st
|
| 10 |
+
import config
|
| 11 |
+
import database as db
|
| 12 |
+
from services import session_manager as sm
|
| 13 |
+
from utils import validate_image_bytes
|
| 14 |
+
|
| 15 |
+
|
| 16 |
+
def _reset_uploader():
|
| 17 |
+
"""Increment the uploader key counter to clear the file_uploader widget."""
|
| 18 |
+
st.session_state._uploader_counter = st.session_state.get("_uploader_counter", 0) + 1
|
| 19 |
+
|
| 20 |
+
|
| 21 |
+
# ── Modal: previously labeled images ─────────────────────────────────────────
|
| 22 |
+
@st.dialog("⚠️ Imágenes ya etiquetadas", width="large", dismissible=False)
|
| 23 |
+
def _show_relabel_dialog():
|
| 24 |
+
"""Modal dialog asking the doctor which previously-labeled images to re-upload."""
|
| 25 |
+
pending = st.session_state.get("_pending_upload_review")
|
| 26 |
+
if not pending:
|
| 27 |
+
st.rerun()
|
| 28 |
+
return
|
| 29 |
+
|
| 30 |
+
prev = pending["previously_labeled"]
|
| 31 |
+
non_labeled_count = len(pending["files"]) - len(prev)
|
| 32 |
+
|
| 33 |
+
st.markdown(
|
| 34 |
+
f"**{len(prev)} imagen(es)** ya fueron etiquetadas anteriormente. "
|
| 35 |
+
"Seleccione cuáles desea volver a etiquetar."
|
| 36 |
+
)
|
| 37 |
+
if non_labeled_count > 0:
|
| 38 |
+
st.info(
|
| 39 |
+
f"ℹ️ Las otras **{non_labeled_count}** imagen(es) nuevas se subirán automáticamente."
|
| 40 |
+
)
|
| 41 |
+
|
| 42 |
+
relabel_choices = {}
|
| 43 |
+
for fname, records in prev.items():
|
| 44 |
+
latest = records[0]
|
| 45 |
+
label_info = latest.get("label", "—")
|
| 46 |
+
doctor_info = latest.get("doctorName", "—")
|
| 47 |
+
ts_info = str(latest.get("createdAt", ""))[:16]
|
| 48 |
+
n_times = len(records)
|
| 49 |
+
badge = f"({n_times} vez{'es' if n_times > 1 else ''})"
|
| 50 |
+
|
| 51 |
+
relabel_choices[fname] = st.checkbox(
|
| 52 |
+
f"**{fname}** — _{label_info}_ | {doctor_info} | {ts_info} {badge}",
|
| 53 |
+
value=True,
|
| 54 |
+
key=f"_dlg_relabel_{fname}",
|
| 55 |
+
)
|
| 56 |
+
|
| 57 |
+
st.divider()
|
| 58 |
+
col_a, col_b = st.columns(2)
|
| 59 |
+
with col_a:
|
| 60 |
+
if st.button("✅ Aceptar y subir", type="primary", use_container_width=True):
|
| 61 |
+
_process_pending(relabel_choices)
|
| 62 |
+
with col_b:
|
| 63 |
+
if st.button("❌ Cancelar etiquetadas", use_container_width=True):
|
| 64 |
+
_cancel_pending()
|
| 65 |
+
|
| 66 |
+
|
| 67 |
+
def _process_pending(relabel_choices: dict[str, bool]):
|
| 68 |
+
"""Ingest accepted files from the pending review."""
|
| 69 |
+
pending = st.session_state.pop("_pending_upload_review", None)
|
| 70 |
+
if not pending:
|
| 71 |
+
st.rerun()
|
| 72 |
+
return
|
| 73 |
+
|
| 74 |
+
prev = pending["previously_labeled"]
|
| 75 |
+
files_dict = pending["files"]
|
| 76 |
+
existing_filenames = {
|
| 77 |
+
img["filename"] for img in st.session_state.images.values()
|
| 78 |
+
}
|
| 79 |
+
|
| 80 |
+
if "_processed_uploads" not in st.session_state:
|
| 81 |
+
st.session_state._processed_uploads = set()
|
| 82 |
+
|
| 83 |
+
added = 0
|
| 84 |
+
for fname, raw_bytes in files_dict.items():
|
| 85 |
+
# If it was previously labeled and doctor unchecked it → skip
|
| 86 |
+
if fname in prev and not relabel_choices.get(fname, True):
|
| 87 |
+
continue
|
| 88 |
+
if fname not in existing_filenames:
|
| 89 |
+
sm.add_image(fname, raw_bytes)
|
| 90 |
+
st.session_state._processed_uploads.add(fname)
|
| 91 |
+
st.session_state.session_downloaded = False
|
| 92 |
+
added += 1
|
| 93 |
+
|
| 94 |
+
_reset_uploader()
|
| 95 |
+
if added > 0 and st.session_state.current_image_id is None:
|
| 96 |
+
st.session_state.current_image_id = st.session_state.image_order[0]
|
| 97 |
+
st.rerun()
|
| 98 |
+
|
| 99 |
+
|
| 100 |
+
def _cancel_pending():
|
| 101 |
+
"""Cancel previously-labeled images but still ingest new (non-labeled) ones."""
|
| 102 |
+
pending = st.session_state.pop("_pending_upload_review", None)
|
| 103 |
+
if pending:
|
| 104 |
+
prev = pending["previously_labeled"]
|
| 105 |
+
files_dict = pending["files"]
|
| 106 |
+
existing_filenames = {
|
| 107 |
+
img["filename"] for img in st.session_state.images.values()
|
| 108 |
+
}
|
| 109 |
+
if "_processed_uploads" not in st.session_state:
|
| 110 |
+
st.session_state._processed_uploads = set()
|
| 111 |
+
|
| 112 |
+
added = 0
|
| 113 |
+
for fname, raw_bytes in files_dict.items():
|
| 114 |
+
# Skip previously labeled — doctor chose to cancel them
|
| 115 |
+
if fname in prev:
|
| 116 |
+
continue
|
| 117 |
+
if fname not in existing_filenames:
|
| 118 |
+
sm.add_image(fname, raw_bytes)
|
| 119 |
+
st.session_state._processed_uploads.add(fname)
|
| 120 |
+
st.session_state.session_downloaded = False
|
| 121 |
+
added += 1
|
| 122 |
+
|
| 123 |
+
if added > 0 and st.session_state.current_image_id is None:
|
| 124 |
+
st.session_state.current_image_id = st.session_state.image_order[0]
|
| 125 |
+
|
| 126 |
+
_reset_uploader()
|
| 127 |
+
st.rerun()
|
| 128 |
+
|
| 129 |
+
|
| 130 |
+
# ── Modal: session duplicates (informational) ────────────────────────────────
|
| 131 |
+
@st.dialog("ℹ️ Imágenes duplicadas en sesión", dismissible=False)
|
| 132 |
+
def _show_duplicates_dialog():
|
| 133 |
+
"""Informational modal listing images already present in the current session."""
|
| 134 |
+
dup_names = st.session_state.get("_session_duplicates", [])
|
| 135 |
+
if not dup_names:
|
| 136 |
+
st.rerun()
|
| 137 |
+
return
|
| 138 |
+
|
| 139 |
+
st.markdown(
|
| 140 |
+
"Las siguientes imágenes **ya se encuentran en la sesión actual** "
|
| 141 |
+
"y no se volverán a subir:"
|
| 142 |
+
)
|
| 143 |
+
for fname in dup_names:
|
| 144 |
+
st.markdown(f"- `{fname}`")
|
| 145 |
+
|
| 146 |
+
if st.button("Aceptar", use_container_width=True):
|
| 147 |
+
st.session_state.pop("_session_duplicates", None)
|
| 148 |
+
st.rerun()
|
| 149 |
+
|
| 150 |
+
|
| 151 |
+
# ── Main uploader ────────────────────────────────────────────────────────────
|
| 152 |
+
def render_uploader():
|
| 153 |
+
"""Render the file uploader and process new uploads.
|
| 154 |
+
|
| 155 |
+
Returns the number of newly added images (0 if none).
|
| 156 |
+
"""
|
| 157 |
+
counter = st.session_state.get("_uploader_counter", 0)
|
| 158 |
+
|
| 159 |
+
uploaded_files = st.file_uploader(
|
| 160 |
+
"📤 Subir imágenes médicas",
|
| 161 |
+
type=config.ALLOWED_EXTENSIONS,
|
| 162 |
+
accept_multiple_files=True,
|
| 163 |
+
help=f"Formatos aceptados: {', '.join(config.ALLOWED_EXTENSIONS)}. "
|
| 164 |
+
f"Máx. {config.MAX_UPLOAD_SIZE_MB} MB por archivo.",
|
| 165 |
+
key=f"uploader_{counter}",
|
| 166 |
+
)
|
| 167 |
+
|
| 168 |
+
# ── Show pending dialogs (survive reruns) ────────────────────────────
|
| 169 |
+
if "_pending_upload_review" in st.session_state:
|
| 170 |
+
_show_relabel_dialog()
|
| 171 |
+
return 0
|
| 172 |
+
|
| 173 |
+
if "_session_duplicates" in st.session_state:
|
| 174 |
+
_show_duplicates_dialog()
|
| 175 |
+
return 0
|
| 176 |
+
|
| 177 |
+
if not uploaded_files:
|
| 178 |
+
return 0
|
| 179 |
+
|
| 180 |
+
if "_processed_uploads" not in st.session_state:
|
| 181 |
+
st.session_state._processed_uploads = set()
|
| 182 |
+
|
| 183 |
+
existing_filenames = {
|
| 184 |
+
img["filename"] for img in st.session_state.images.values()
|
| 185 |
+
}
|
| 186 |
+
|
| 187 |
+
# ── Classify files ───────────────────────────────────────────────────
|
| 188 |
+
new_files = []
|
| 189 |
+
skipped_invalid = 0
|
| 190 |
+
session_duplicates = []
|
| 191 |
+
|
| 192 |
+
for uf in uploaded_files:
|
| 193 |
+
# Already in the current session
|
| 194 |
+
if uf.name in existing_filenames:
|
| 195 |
+
if uf.name not in st.session_state._processed_uploads:
|
| 196 |
+
session_duplicates.append(uf.name)
|
| 197 |
+
st.session_state._processed_uploads.add(uf.name)
|
| 198 |
+
continue
|
| 199 |
+
|
| 200 |
+
# Already ingested via this uploader cycle
|
| 201 |
+
if uf.name in st.session_state._processed_uploads:
|
| 202 |
+
continue
|
| 203 |
+
|
| 204 |
+
raw_bytes = uf.getvalue()
|
| 205 |
+
if not validate_image_bytes(raw_bytes):
|
| 206 |
+
skipped_invalid += 1
|
| 207 |
+
continue
|
| 208 |
+
|
| 209 |
+
new_files.append((uf.name, raw_bytes))
|
| 210 |
+
|
| 211 |
+
# ── Check DB for previously labeled images ───────────────────────────
|
| 212 |
+
if new_files:
|
| 213 |
+
new_filenames = [name for name, _ in new_files]
|
| 214 |
+
previously_labeled = db.get_previously_labeled_filenames(new_filenames)
|
| 215 |
+
|
| 216 |
+
if previously_labeled:
|
| 217 |
+
# Store all files (new + previously labeled) for review
|
| 218 |
+
st.session_state["_pending_upload_review"] = {
|
| 219 |
+
"files": {name: raw for name, raw in new_files},
|
| 220 |
+
"previously_labeled": previously_labeled,
|
| 221 |
+
}
|
| 222 |
+
# Also show session duplicate dialog afterward if needed
|
| 223 |
+
if session_duplicates:
|
| 224 |
+
st.session_state["_session_duplicates"] = session_duplicates
|
| 225 |
+
st.rerun()
|
| 226 |
+
return 0
|
| 227 |
+
|
| 228 |
+
# ── Ingest files that need no review ─────────────────────────────────
|
| 229 |
+
new_count = 0
|
| 230 |
+
for name, raw_bytes in new_files:
|
| 231 |
+
if name in existing_filenames:
|
| 232 |
+
continue
|
| 233 |
+
if name in st.session_state._processed_uploads:
|
| 234 |
+
continue
|
| 235 |
+
|
| 236 |
+
sm.add_image(name, raw_bytes)
|
| 237 |
+
existing_filenames.add(name)
|
| 238 |
+
st.session_state._processed_uploads.add(name)
|
| 239 |
+
st.session_state.session_downloaded = False
|
| 240 |
+
new_count += 1
|
| 241 |
+
|
| 242 |
+
if skipped_invalid > 0:
|
| 243 |
+
st.warning(
|
| 244 |
+
f"⚠️ {skipped_invalid} archivo(s) no son imágenes válidas y fueron ignorados."
|
| 245 |
+
)
|
| 246 |
+
|
| 247 |
+
if new_count > 0:
|
| 248 |
+
_reset_uploader()
|
| 249 |
+
if st.session_state.current_image_id is None:
|
| 250 |
+
st.session_state.current_image_id = st.session_state.image_order[0]
|
| 251 |
+
|
| 252 |
+
# ── Show session duplicate info dialog if any ────────────────────────
|
| 253 |
+
if session_duplicates:
|
| 254 |
+
st.session_state["_session_duplicates"] = session_duplicates
|
| 255 |
+
st.rerun()
|
| 256 |
+
|
| 257 |
+
return new_count
|
interface/config.py
ADDED
|
@@ -0,0 +1,38 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""OphthalmoCapture — Configuration Constants."""
|
| 2 |
+
|
| 3 |
+
# ── Label Options ────────────────────────────────────────────────────────────
|
| 4 |
+
# Designed as a configurable list for easy extension (e.g. glaucoma, DR, AMD).
|
| 5 |
+
LABEL_OPTIONS = [
|
| 6 |
+
{"key": "catarata", "display": "Catarata", "code": 1},
|
| 7 |
+
{"key": "no_catarata", "display": "No Catarata", "code": 0},
|
| 8 |
+
]
|
| 9 |
+
|
| 10 |
+
# ── Session Settings ─────────────────────────────────────────────────────────
|
| 11 |
+
SESSION_TIMEOUT_MINUTES = 30
|
| 12 |
+
|
| 13 |
+
# ── Upload Settings ──────────────────────────────────────────────────────────
|
| 14 |
+
ALLOWED_EXTENSIONS = ["jpg", "jpeg", "png", "tif"]
|
| 15 |
+
MAX_UPLOAD_SIZE_MB = 50
|
| 16 |
+
|
| 17 |
+
# ── Whisper Settings ─────────────────────────────────────────────────────────
|
| 18 |
+
WHISPER_MODEL_OPTIONS = [
|
| 19 |
+
"tiny", "tiny.en", "base", "base.en",
|
| 20 |
+
"small", "small.en", "medium", "medium.en",
|
| 21 |
+
"large", "turbo",
|
| 22 |
+
]
|
| 23 |
+
DEFAULT_WHISPER_MODEL_INDEX = 1
|
| 24 |
+
|
| 25 |
+
WHISPER_LANGUAGE_OPTIONS = {
|
| 26 |
+
"es": "Español",
|
| 27 |
+
"en": "English",
|
| 28 |
+
}
|
| 29 |
+
DEFAULT_WHISPER_LANGUAGE = "es"
|
| 30 |
+
|
| 31 |
+
# ── App Metadata ─────────────────────────────────────────────────────────────
|
| 32 |
+
APP_TITLE = "OphthalmoCapture"
|
| 33 |
+
APP_ICON = "👁️"
|
| 34 |
+
APP_SUBTITLE = "Sistema de Etiquetado Médico Oftalmológico"
|
| 35 |
+
|
| 36 |
+
# ── UI Language ──────────────────────────────────────────────────────────────
|
| 37 |
+
# "es" = Español, "en" = English
|
| 38 |
+
UI_LANGUAGE = "es"
|
interface/database.py
CHANGED
|
@@ -1,7 +1,12 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
import os
|
| 2 |
import datetime
|
| 3 |
import sqlite3
|
| 4 |
-
import math
|
| 5 |
|
| 6 |
# Try importing firebase_admin
|
| 7 |
try:
|
|
@@ -12,12 +17,14 @@ except ImportError:
|
|
| 12 |
FIREBASE_AVAILABLE = False
|
| 13 |
|
| 14 |
DB_TYPE = "SQLITE"
|
|
|
|
| 15 |
db_ref = None
|
| 16 |
|
|
|
|
| 17 |
def init_db():
|
| 18 |
-
"""
|
| 19 |
global DB_TYPE, db_ref
|
| 20 |
-
|
| 21 |
# Try Firebase first
|
| 22 |
if FIREBASE_AVAILABLE and os.path.exists("serviceAccountKey.json"):
|
| 23 |
try:
|
|
@@ -32,14 +39,25 @@ def init_db():
|
|
| 32 |
|
| 33 |
# Fallback to SQLite
|
| 34 |
try:
|
| 35 |
-
conn = sqlite3.connect(
|
| 36 |
c = conn.cursor()
|
| 37 |
-
c.execute('''CREATE TABLE IF NOT EXISTS
|
| 38 |
-
|
| 39 |
-
|
| 40 |
-
|
| 41 |
-
|
| 42 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 43 |
conn.commit()
|
| 44 |
conn.close()
|
| 45 |
DB_TYPE = "SQLITE"
|
|
@@ -47,123 +65,390 @@ def init_db():
|
|
| 47 |
except Exception as e:
|
| 48 |
raise Exception(f"Database initialization failed: {e}")
|
| 49 |
|
| 50 |
-
|
| 51 |
-
|
|
|
|
| 52 |
timestamp = datetime.datetime.now()
|
| 53 |
-
|
| 54 |
if DB_TYPE == "FIREBASE":
|
| 55 |
-
db_ref.collection("
|
| 56 |
-
"
|
| 57 |
-
"
|
|
|
|
|
|
|
| 58 |
"createdAt": timestamp,
|
| 59 |
-
"doctor": doctor_name
|
| 60 |
})
|
| 61 |
else:
|
| 62 |
-
conn = sqlite3.connect(
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 63 |
c = conn.cursor()
|
| 64 |
-
|
| 65 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 66 |
conn.commit()
|
| 67 |
conn.close()
|
| 68 |
|
| 69 |
-
|
| 70 |
-
|
|
|
|
| 71 |
if DB_TYPE == "FIREBASE":
|
| 72 |
-
docs =
|
| 73 |
-
.
|
| 74 |
-
.
|
| 75 |
-
.
|
|
|
|
| 76 |
.stream()
|
|
|
|
| 77 |
for doc in docs:
|
| 78 |
-
return doc.to_dict()
|
|
|
|
| 79 |
else:
|
| 80 |
-
conn = sqlite3.connect(
|
| 81 |
c = conn.cursor()
|
| 82 |
-
c.execute(
|
|
|
|
|
|
|
|
|
|
|
|
|
| 83 |
row = c.fetchone()
|
| 84 |
conn.close()
|
| 85 |
if row:
|
| 86 |
-
return
|
| 87 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 88 |
|
| 89 |
def get_history_paginated(search_query="", page=1, per_page=10):
|
| 90 |
-
"""
|
| 91 |
-
|
| 92 |
Returns: (list_of_items, total_count)
|
| 93 |
"""
|
| 94 |
offset = (page - 1) * per_page
|
| 95 |
history = []
|
| 96 |
total_count = 0
|
| 97 |
-
|
| 98 |
if DB_TYPE == "FIREBASE":
|
| 99 |
-
ref = db_ref.collection("
|
| 100 |
if search_query:
|
| 101 |
-
|
| 102 |
-
|
| 103 |
-
|
|
|
|
| 104 |
else:
|
| 105 |
query = ref.order_by("createdAt", direction=firestore.Query.DESCENDING)
|
| 106 |
-
|
| 107 |
all_docs = list(query.stream())
|
| 108 |
total_count = len(all_docs)
|
| 109 |
-
|
| 110 |
-
# In-memory pagination for Firebase
|
| 111 |
-
start = offset
|
| 112 |
-
end = offset + per_page
|
| 113 |
-
for doc in all_docs[start:end]:
|
| 114 |
history.append(doc.to_dict())
|
| 115 |
|
| 116 |
else:
|
| 117 |
-
|
| 118 |
-
conn = sqlite3.connect('local_diagnoses.db', check_same_thread=False)
|
| 119 |
c = conn.cursor()
|
| 120 |
-
|
| 121 |
-
#
|
| 122 |
if search_query:
|
| 123 |
-
c.execute(
|
|
|
|
|
|
|
|
|
|
| 124 |
else:
|
| 125 |
-
c.execute("SELECT COUNT(*) FROM
|
| 126 |
total_count = c.fetchone()[0]
|
| 127 |
-
|
| 128 |
-
#
|
| 129 |
-
|
|
|
|
|
|
|
|
|
|
| 130 |
params = []
|
| 131 |
-
|
| 132 |
if search_query:
|
| 133 |
-
|
| 134 |
params.append(f"%{search_query}%")
|
| 135 |
-
|
| 136 |
-
query_sql += " ORDER BY id DESC LIMIT ? OFFSET ?"
|
| 137 |
params.extend([per_page, offset])
|
| 138 |
-
|
| 139 |
-
c.execute(
|
| 140 |
-
|
| 141 |
-
for row in rows:
|
| 142 |
history.append({
|
| 143 |
-
"
|
| 144 |
-
"
|
| 145 |
-
"
|
|
|
|
|
|
|
| 146 |
})
|
| 147 |
conn.close()
|
| 148 |
-
|
| 149 |
return history, total_count
|
| 150 |
|
| 151 |
-
|
| 152 |
-
|
|
|
|
| 153 |
if DB_TYPE == "FIREBASE":
|
| 154 |
-
docs = db_ref.collection("
|
| 155 |
-
|
| 156 |
-
|
| 157 |
-
.stream()
|
| 158 |
for doc in docs:
|
| 159 |
-
|
|
|
|
|
|
|
| 160 |
else:
|
| 161 |
-
conn = sqlite3.connect(
|
| 162 |
c = conn.cursor()
|
| 163 |
-
|
| 164 |
-
c.
|
| 165 |
-
|
|
|
|
| 166 |
conn.close()
|
| 167 |
-
|
| 168 |
-
|
| 169 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""OphthalmoCapture — Database Layer (Metadata Only)
|
| 2 |
+
|
| 3 |
+
Option B: The database persists annotation metadata (labels, transcriptions,
|
| 4 |
+
doctor info, timestamps) for audit and history. It NEVER stores images or audio.
|
| 5 |
+
"""
|
| 6 |
+
|
| 7 |
import os
|
| 8 |
import datetime
|
| 9 |
import sqlite3
|
|
|
|
| 10 |
|
| 11 |
# Try importing firebase_admin
|
| 12 |
try:
|
|
|
|
| 17 |
FIREBASE_AVAILABLE = False
|
| 18 |
|
| 19 |
DB_TYPE = "SQLITE"
|
| 20 |
+
DB_FILE = "annotations.db"
|
| 21 |
db_ref = None
|
| 22 |
|
| 23 |
+
|
| 24 |
def init_db():
|
| 25 |
+
"""Initialize the database connection (Firebase or SQLite fallback)."""
|
| 26 |
global DB_TYPE, db_ref
|
| 27 |
+
|
| 28 |
# Try Firebase first
|
| 29 |
if FIREBASE_AVAILABLE and os.path.exists("serviceAccountKey.json"):
|
| 30 |
try:
|
|
|
|
| 39 |
|
| 40 |
# Fallback to SQLite
|
| 41 |
try:
|
| 42 |
+
conn = sqlite3.connect(DB_FILE, check_same_thread=False)
|
| 43 |
c = conn.cursor()
|
| 44 |
+
c.execute('''CREATE TABLE IF NOT EXISTS annotations (
|
| 45 |
+
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
| 46 |
+
image_filename TEXT NOT NULL,
|
| 47 |
+
label TEXT,
|
| 48 |
+
transcription TEXT,
|
| 49 |
+
doctor_name TEXT DEFAULT '',
|
| 50 |
+
created_at DATETIME
|
| 51 |
+
)''')
|
| 52 |
+
c.execute('''CREATE INDEX IF NOT EXISTS idx_ann_filename
|
| 53 |
+
ON annotations (image_filename)''')
|
| 54 |
+
# Migration: add session_id column if it doesn't exist yet
|
| 55 |
+
try:
|
| 56 |
+
c.execute("ALTER TABLE annotations ADD COLUMN session_id TEXT DEFAULT ''")
|
| 57 |
+
except sqlite3.OperationalError:
|
| 58 |
+
pass # column already exists
|
| 59 |
+
c.execute('''CREATE INDEX IF NOT EXISTS idx_ann_session
|
| 60 |
+
ON annotations (image_filename, session_id)''')
|
| 61 |
conn.commit()
|
| 62 |
conn.close()
|
| 63 |
DB_TYPE = "SQLITE"
|
|
|
|
| 65 |
except Exception as e:
|
| 66 |
raise Exception(f"Database initialization failed: {e}")
|
| 67 |
|
| 68 |
+
|
| 69 |
+
def save_annotation(image_filename, label, transcription, doctor_name=""):
|
| 70 |
+
"""Save an annotation record (always INSERT). Stores metadata only."""
|
| 71 |
timestamp = datetime.datetime.now()
|
| 72 |
+
|
| 73 |
if DB_TYPE == "FIREBASE":
|
| 74 |
+
db_ref.collection("annotations").add({
|
| 75 |
+
"imageFilename": image_filename,
|
| 76 |
+
"label": label,
|
| 77 |
+
"transcription": transcription,
|
| 78 |
+
"doctorName": doctor_name,
|
| 79 |
"createdAt": timestamp,
|
|
|
|
| 80 |
})
|
| 81 |
else:
|
| 82 |
+
conn = sqlite3.connect(DB_FILE, check_same_thread=False)
|
| 83 |
+
c = conn.cursor()
|
| 84 |
+
c.execute(
|
| 85 |
+
"INSERT INTO annotations "
|
| 86 |
+
"(image_filename, label, transcription, doctor_name, created_at) "
|
| 87 |
+
"VALUES (?, ?, ?, ?, ?)",
|
| 88 |
+
(image_filename, label, transcription, doctor_name, timestamp),
|
| 89 |
+
)
|
| 90 |
+
conn.commit()
|
| 91 |
+
conn.close()
|
| 92 |
+
|
| 93 |
+
|
| 94 |
+
def save_or_update_annotation(
|
| 95 |
+
image_filename, label, transcription, doctor_name="", session_id=""
|
| 96 |
+
):
|
| 97 |
+
"""Upsert: within the same session, keep only ONE record per image.
|
| 98 |
+
|
| 99 |
+
If a record for (image_filename, session_id) already exists → UPDATE it.
|
| 100 |
+
Otherwise → INSERT a new one.
|
| 101 |
+
"""
|
| 102 |
+
timestamp = datetime.datetime.now()
|
| 103 |
+
|
| 104 |
+
if DB_TYPE == "FIREBASE":
|
| 105 |
+
# Query for existing doc with matching filename + session
|
| 106 |
+
docs = list(
|
| 107 |
+
db_ref.collection("annotations")
|
| 108 |
+
.where("imageFilename", "==", image_filename)
|
| 109 |
+
.where("sessionId", "==", session_id)
|
| 110 |
+
.limit(1)
|
| 111 |
+
.stream()
|
| 112 |
+
)
|
| 113 |
+
if docs:
|
| 114 |
+
docs[0].reference.update({
|
| 115 |
+
"label": label,
|
| 116 |
+
"transcription": transcription,
|
| 117 |
+
"doctorName": doctor_name,
|
| 118 |
+
"createdAt": timestamp,
|
| 119 |
+
})
|
| 120 |
+
else:
|
| 121 |
+
db_ref.collection("annotations").add({
|
| 122 |
+
"imageFilename": image_filename,
|
| 123 |
+
"label": label,
|
| 124 |
+
"transcription": transcription,
|
| 125 |
+
"doctorName": doctor_name,
|
| 126 |
+
"sessionId": session_id,
|
| 127 |
+
"createdAt": timestamp,
|
| 128 |
+
})
|
| 129 |
+
else:
|
| 130 |
+
conn = sqlite3.connect(DB_FILE, check_same_thread=False)
|
| 131 |
c = conn.cursor()
|
| 132 |
+
# Check if a row for this image+session already exists
|
| 133 |
+
c.execute(
|
| 134 |
+
"SELECT id FROM annotations "
|
| 135 |
+
"WHERE image_filename = ? AND session_id = ? LIMIT 1",
|
| 136 |
+
(image_filename, session_id),
|
| 137 |
+
)
|
| 138 |
+
row = c.fetchone()
|
| 139 |
+
if row:
|
| 140 |
+
c.execute(
|
| 141 |
+
"UPDATE annotations "
|
| 142 |
+
"SET label = ?, transcription = ?, doctor_name = ?, created_at = ? "
|
| 143 |
+
"WHERE id = ?",
|
| 144 |
+
(label, transcription, doctor_name, timestamp, row[0]),
|
| 145 |
+
)
|
| 146 |
+
else:
|
| 147 |
+
c.execute(
|
| 148 |
+
"INSERT INTO annotations "
|
| 149 |
+
"(image_filename, label, transcription, doctor_name, created_at, session_id) "
|
| 150 |
+
"VALUES (?, ?, ?, ?, ?, ?)",
|
| 151 |
+
(image_filename, label, transcription, doctor_name, timestamp, session_id),
|
| 152 |
+
)
|
| 153 |
conn.commit()
|
| 154 |
conn.close()
|
| 155 |
|
| 156 |
+
|
| 157 |
+
def get_latest_annotation(image_filename):
|
| 158 |
+
"""Retrieve the most recent annotation for a given image filename."""
|
| 159 |
if DB_TYPE == "FIREBASE":
|
| 160 |
+
docs = (
|
| 161 |
+
db_ref.collection("annotations")
|
| 162 |
+
.where("imageFilename", "==", image_filename)
|
| 163 |
+
.order_by("createdAt", direction=firestore.Query.DESCENDING)
|
| 164 |
+
.limit(1)
|
| 165 |
.stream()
|
| 166 |
+
)
|
| 167 |
for doc in docs:
|
| 168 |
+
return doc.to_dict()
|
| 169 |
+
return None
|
| 170 |
else:
|
| 171 |
+
conn = sqlite3.connect(DB_FILE, check_same_thread=False)
|
| 172 |
c = conn.cursor()
|
| 173 |
+
c.execute(
|
| 174 |
+
"SELECT image_filename, label, transcription, doctor_name, created_at "
|
| 175 |
+
"FROM annotations WHERE image_filename = ? ORDER BY id DESC LIMIT 1",
|
| 176 |
+
(image_filename,),
|
| 177 |
+
)
|
| 178 |
row = c.fetchone()
|
| 179 |
conn.close()
|
| 180 |
if row:
|
| 181 |
+
return {
|
| 182 |
+
"imageFilename": row[0],
|
| 183 |
+
"label": row[1],
|
| 184 |
+
"transcription": row[2],
|
| 185 |
+
"doctorName": row[3],
|
| 186 |
+
"createdAt": row[4],
|
| 187 |
+
}
|
| 188 |
+
return None
|
| 189 |
+
|
| 190 |
|
| 191 |
def get_history_paginated(search_query="", page=1, per_page=10):
|
| 192 |
+
"""Retrieve annotation history with search and pagination.
|
| 193 |
+
|
| 194 |
Returns: (list_of_items, total_count)
|
| 195 |
"""
|
| 196 |
offset = (page - 1) * per_page
|
| 197 |
history = []
|
| 198 |
total_count = 0
|
| 199 |
+
|
| 200 |
if DB_TYPE == "FIREBASE":
|
| 201 |
+
ref = db_ref.collection("annotations")
|
| 202 |
if search_query:
|
| 203 |
+
query = (
|
| 204 |
+
ref.where("imageFilename", ">=", search_query)
|
| 205 |
+
.where("imageFilename", "<=", search_query + "\uf8ff")
|
| 206 |
+
)
|
| 207 |
else:
|
| 208 |
query = ref.order_by("createdAt", direction=firestore.Query.DESCENDING)
|
| 209 |
+
|
| 210 |
all_docs = list(query.stream())
|
| 211 |
total_count = len(all_docs)
|
| 212 |
+
for doc in all_docs[offset : offset + per_page]:
|
|
|
|
|
|
|
|
|
|
|
|
|
| 213 |
history.append(doc.to_dict())
|
| 214 |
|
| 215 |
else:
|
| 216 |
+
conn = sqlite3.connect(DB_FILE, check_same_thread=False)
|
|
|
|
| 217 |
c = conn.cursor()
|
| 218 |
+
|
| 219 |
+
# Count
|
| 220 |
if search_query:
|
| 221 |
+
c.execute(
|
| 222 |
+
"SELECT COUNT(*) FROM annotations WHERE image_filename LIKE ?",
|
| 223 |
+
(f"%{search_query}%",),
|
| 224 |
+
)
|
| 225 |
else:
|
| 226 |
+
c.execute("SELECT COUNT(*) FROM annotations")
|
| 227 |
total_count = c.fetchone()[0]
|
| 228 |
+
|
| 229 |
+
# Fetch page
|
| 230 |
+
sql = (
|
| 231 |
+
"SELECT image_filename, label, transcription, doctor_name, created_at "
|
| 232 |
+
"FROM annotations"
|
| 233 |
+
)
|
| 234 |
params = []
|
|
|
|
| 235 |
if search_query:
|
| 236 |
+
sql += " WHERE image_filename LIKE ?"
|
| 237 |
params.append(f"%{search_query}%")
|
| 238 |
+
sql += " ORDER BY id DESC LIMIT ? OFFSET ?"
|
|
|
|
| 239 |
params.extend([per_page, offset])
|
| 240 |
+
|
| 241 |
+
c.execute(sql, params)
|
| 242 |
+
for row in c.fetchall():
|
|
|
|
| 243 |
history.append({
|
| 244 |
+
"imageFilename": row[0],
|
| 245 |
+
"label": row[1],
|
| 246 |
+
"transcription": row[2],
|
| 247 |
+
"doctorName": row[3],
|
| 248 |
+
"createdAt": row[4],
|
| 249 |
})
|
| 250 |
conn.close()
|
| 251 |
+
|
| 252 |
return history, total_count
|
| 253 |
|
| 254 |
+
|
| 255 |
+
def get_annotation_stats():
|
| 256 |
+
"""Get summary statistics of all stored annotations."""
|
| 257 |
if DB_TYPE == "FIREBASE":
|
| 258 |
+
docs = list(db_ref.collection("annotations").stream())
|
| 259 |
+
total = len(docs)
|
| 260 |
+
labels = {}
|
|
|
|
| 261 |
for doc in docs:
|
| 262 |
+
lbl = doc.to_dict().get("label", "sin_etiqueta")
|
| 263 |
+
labels[lbl] = labels.get(lbl, 0) + 1
|
| 264 |
+
return {"total": total, "by_label": labels}
|
| 265 |
else:
|
| 266 |
+
conn = sqlite3.connect(DB_FILE, check_same_thread=False)
|
| 267 |
c = conn.cursor()
|
| 268 |
+
c.execute("SELECT COUNT(*) FROM annotations")
|
| 269 |
+
total = c.fetchone()[0]
|
| 270 |
+
c.execute("SELECT label, COUNT(*) FROM annotations GROUP BY label")
|
| 271 |
+
labels = {row[0]: row[1] for row in c.fetchall()}
|
| 272 |
conn.close()
|
| 273 |
+
return {"total": total, "by_label": labels}
|
| 274 |
+
|
| 275 |
+
|
| 276 |
+
def get_previously_labeled_filenames(filenames: list[str]) -> dict[str, list[dict]]:
|
| 277 |
+
"""Check which filenames have been previously annotated in the DB.
|
| 278 |
+
|
| 279 |
+
Returns a dict mapping filename → list of annotation records.
|
| 280 |
+
Only filenames with at least one record are included.
|
| 281 |
+
"""
|
| 282 |
+
if not filenames:
|
| 283 |
+
return {}
|
| 284 |
+
|
| 285 |
+
result = {}
|
| 286 |
+
|
| 287 |
+
if DB_TYPE == "FIREBASE":
|
| 288 |
+
# Firestore doesn't support 'IN' with >30 items, so batch
|
| 289 |
+
for fname in filenames:
|
| 290 |
+
docs = (
|
| 291 |
+
db_ref.collection("annotations")
|
| 292 |
+
.where("imageFilename", "==", fname)
|
| 293 |
+
.order_by("createdAt", direction=firestore.Query.DESCENDING)
|
| 294 |
+
.stream()
|
| 295 |
+
)
|
| 296 |
+
records = [doc.to_dict() for doc in docs]
|
| 297 |
+
if records:
|
| 298 |
+
result[fname] = records
|
| 299 |
+
else:
|
| 300 |
+
conn = sqlite3.connect(DB_FILE, check_same_thread=False)
|
| 301 |
+
c = conn.cursor()
|
| 302 |
+
placeholders = ",".join("?" for _ in filenames)
|
| 303 |
+
c.execute(
|
| 304 |
+
f"SELECT image_filename, label, transcription, doctor_name, created_at "
|
| 305 |
+
f"FROM annotations WHERE image_filename IN ({placeholders}) "
|
| 306 |
+
f"ORDER BY created_at DESC",
|
| 307 |
+
filenames,
|
| 308 |
+
)
|
| 309 |
+
for row in c.fetchall():
|
| 310 |
+
fname = row[0]
|
| 311 |
+
record = {
|
| 312 |
+
"imageFilename": row[0],
|
| 313 |
+
"label": row[1],
|
| 314 |
+
"transcription": row[2],
|
| 315 |
+
"doctorName": row[3],
|
| 316 |
+
"createdAt": row[4],
|
| 317 |
+
}
|
| 318 |
+
result.setdefault(fname, []).append(record)
|
| 319 |
+
conn.close()
|
| 320 |
+
|
| 321 |
+
return result
|
| 322 |
+
|
| 323 |
+
|
| 324 |
+
def get_all_annotations_for_file(image_filename: str) -> list[dict]:
|
| 325 |
+
"""Retrieve ALL annotations for a given image filename, ordered by date desc."""
|
| 326 |
+
if DB_TYPE == "FIREBASE":
|
| 327 |
+
docs = (
|
| 328 |
+
db_ref.collection("annotations")
|
| 329 |
+
.where("imageFilename", "==", image_filename)
|
| 330 |
+
.order_by("createdAt", direction=firestore.Query.DESCENDING)
|
| 331 |
+
.stream()
|
| 332 |
+
)
|
| 333 |
+
return [doc.to_dict() for doc in docs]
|
| 334 |
+
else:
|
| 335 |
+
conn = sqlite3.connect(DB_FILE, check_same_thread=False)
|
| 336 |
+
c = conn.cursor()
|
| 337 |
+
c.execute(
|
| 338 |
+
"SELECT image_filename, label, transcription, doctor_name, created_at "
|
| 339 |
+
"FROM annotations WHERE image_filename = ? ORDER BY created_at DESC",
|
| 340 |
+
(image_filename,),
|
| 341 |
+
)
|
| 342 |
+
results = []
|
| 343 |
+
for row in c.fetchall():
|
| 344 |
+
results.append({
|
| 345 |
+
"imageFilename": row[0],
|
| 346 |
+
"label": row[1],
|
| 347 |
+
"transcription": row[2],
|
| 348 |
+
"doctorName": row[3],
|
| 349 |
+
"createdAt": row[4],
|
| 350 |
+
})
|
| 351 |
+
conn.close()
|
| 352 |
+
return results
|
| 353 |
+
|
| 354 |
+
|
| 355 |
+
def get_history_grouped(search_query="", page=1, per_page=10):
|
| 356 |
+
"""Retrieve annotation history GROUPED by image filename.
|
| 357 |
+
|
| 358 |
+
Returns: (list_of_groups, total_unique_images)
|
| 359 |
+
Each group = {"imageFilename": str, "annotations": [list of records]}
|
| 360 |
+
sorted by most recent annotation date per image.
|
| 361 |
+
"""
|
| 362 |
+
offset = (page - 1) * per_page
|
| 363 |
+
|
| 364 |
+
if DB_TYPE == "FIREBASE":
|
| 365 |
+
ref = db_ref.collection("annotations")
|
| 366 |
+
if search_query:
|
| 367 |
+
query = (
|
| 368 |
+
ref.where("imageFilename", ">=", search_query)
|
| 369 |
+
.where("imageFilename", "<=", search_query + "\uf8ff")
|
| 370 |
+
)
|
| 371 |
+
else:
|
| 372 |
+
query = ref.order_by("createdAt", direction=firestore.Query.DESCENDING)
|
| 373 |
+
|
| 374 |
+
all_docs = [doc.to_dict() for doc in query.stream()]
|
| 375 |
+
|
| 376 |
+
# Group by filename
|
| 377 |
+
grouped = {}
|
| 378 |
+
for doc in all_docs:
|
| 379 |
+
fname = doc.get("imageFilename", "")
|
| 380 |
+
grouped.setdefault(fname, []).append(doc)
|
| 381 |
+
|
| 382 |
+
# Sort groups by most recent annotation
|
| 383 |
+
sorted_groups = sorted(
|
| 384 |
+
grouped.items(),
|
| 385 |
+
key=lambda x: max(str(a.get("createdAt", "")) for a in x[1]),
|
| 386 |
+
reverse=True,
|
| 387 |
+
)
|
| 388 |
+
|
| 389 |
+
total_unique = len(sorted_groups)
|
| 390 |
+
page_groups = sorted_groups[offset:offset + per_page]
|
| 391 |
+
|
| 392 |
+
result = []
|
| 393 |
+
for fname, annotations in page_groups:
|
| 394 |
+
result.append({
|
| 395 |
+
"imageFilename": fname,
|
| 396 |
+
"annotations": sorted(
|
| 397 |
+
annotations,
|
| 398 |
+
key=lambda a: str(a.get("createdAt", "")),
|
| 399 |
+
reverse=True,
|
| 400 |
+
),
|
| 401 |
+
})
|
| 402 |
+
|
| 403 |
+
return result, total_unique
|
| 404 |
+
else:
|
| 405 |
+
conn = sqlite3.connect(DB_FILE, check_same_thread=False)
|
| 406 |
+
c = conn.cursor()
|
| 407 |
+
|
| 408 |
+
# Count unique filenames
|
| 409 |
+
where = ""
|
| 410 |
+
params = []
|
| 411 |
+
if search_query:
|
| 412 |
+
where = " WHERE image_filename LIKE ?"
|
| 413 |
+
params.append(f"%{search_query}%")
|
| 414 |
+
|
| 415 |
+
c.execute(
|
| 416 |
+
f"SELECT COUNT(DISTINCT image_filename) FROM annotations{where}",
|
| 417 |
+
params,
|
| 418 |
+
)
|
| 419 |
+
total_unique = c.fetchone()[0]
|
| 420 |
+
|
| 421 |
+
# Get unique filenames for this page, sorted by most recent
|
| 422 |
+
c.execute(
|
| 423 |
+
f"SELECT image_filename, MAX(created_at) as latest "
|
| 424 |
+
f"FROM annotations{where} "
|
| 425 |
+
f"GROUP BY image_filename ORDER BY latest DESC "
|
| 426 |
+
f"LIMIT ? OFFSET ?",
|
| 427 |
+
params + [per_page, offset],
|
| 428 |
+
)
|
| 429 |
+
page_filenames = [row[0] for row in c.fetchall()]
|
| 430 |
+
|
| 431 |
+
# Fetch all annotations for those filenames
|
| 432 |
+
result = []
|
| 433 |
+
for fname in page_filenames:
|
| 434 |
+
c.execute(
|
| 435 |
+
"SELECT image_filename, label, transcription, doctor_name, created_at "
|
| 436 |
+
"FROM annotations WHERE image_filename = ? ORDER BY created_at DESC",
|
| 437 |
+
(fname,),
|
| 438 |
+
)
|
| 439 |
+
annotations = []
|
| 440 |
+
for row in c.fetchall():
|
| 441 |
+
annotations.append({
|
| 442 |
+
"imageFilename": row[0],
|
| 443 |
+
"label": row[1],
|
| 444 |
+
"transcription": row[2],
|
| 445 |
+
"doctorName": row[3],
|
| 446 |
+
"createdAt": row[4],
|
| 447 |
+
})
|
| 448 |
+
result.append({
|
| 449 |
+
"imageFilename": fname,
|
| 450 |
+
"annotations": annotations,
|
| 451 |
+
})
|
| 452 |
+
|
| 453 |
+
conn.close()
|
| 454 |
+
return result, total_unique
|
interface/i18n.py
ADDED
|
@@ -0,0 +1,182 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""OphthalmoCapture — Internationalization (i18n)
|
| 2 |
+
|
| 3 |
+
Centralized UI strings. Switch the active language by changing
|
| 4 |
+
``ACTIVE_LANGUAGE``. All components import strings from here.
|
| 5 |
+
"""
|
| 6 |
+
|
| 7 |
+
ACTIVE_LANGUAGE = "es"
|
| 8 |
+
|
| 9 |
+
_STRINGS = {
|
| 10 |
+
"es": {
|
| 11 |
+
# App
|
| 12 |
+
"app_subtitle": "Sistema de Etiquetado Médico Oftalmológico",
|
| 13 |
+
# Sidebar
|
| 14 |
+
"settings": "⚙️ Configuración",
|
| 15 |
+
"doctor_name": "👨⚕️ Nombre del Doctor",
|
| 16 |
+
"whisper_model": "Modelo Whisper",
|
| 17 |
+
"dictation_language": "Idioma de dictado",
|
| 18 |
+
"current_session": "📊 Sesión Actual",
|
| 19 |
+
"db_type": "Base de datos",
|
| 20 |
+
"images_loaded": "Imágenes cargadas",
|
| 21 |
+
"labeled_count": "Etiquetadas",
|
| 22 |
+
"no_images": "No hay imágenes en la sesión.",
|
| 23 |
+
"history": "🗄️ Historial",
|
| 24 |
+
"search_image": "🔍 Buscar por imagen",
|
| 25 |
+
"no_records": "Sin registros.",
|
| 26 |
+
"label_header": "Etiqueta",
|
| 27 |
+
"doctor_header": "Doctor",
|
| 28 |
+
"no_transcription": "Sin transcripción",
|
| 29 |
+
"end_session": "🗑️ Finalizar Sesión",
|
| 30 |
+
"undownloaded_warning": "⚠️ Datos no descargados",
|
| 31 |
+
"timeout_in": "⏱️ Timeout en",
|
| 32 |
+
"confirm_delete": "¿Está seguro? **Todos los datos se eliminarán permanentemente.**",
|
| 33 |
+
"yes_delete": "✅ Sí, eliminar",
|
| 34 |
+
"cancel": "❌ Cancelar",
|
| 35 |
+
"logout": "🚪 Cerrar sesión",
|
| 36 |
+
# Upload
|
| 37 |
+
"upload_images": "📤 Subir imágenes médicas",
|
| 38 |
+
"upload_help_formats": "Formatos aceptados",
|
| 39 |
+
"upload_help_max": "Máx.",
|
| 40 |
+
"invalid_files": "archivo(s) no son imágenes válidas y fueron ignorados.",
|
| 41 |
+
"duplicate_files": "archivo(s) duplicados fueron omitidos.",
|
| 42 |
+
"upload_prompt": "📤 Suba imágenes médicas para comenzar el etiquetado.",
|
| 43 |
+
# Gallery
|
| 44 |
+
"progress": "Progreso",
|
| 45 |
+
"labeled_suffix": "etiquetadas",
|
| 46 |
+
"page": "Página",
|
| 47 |
+
# Labeler
|
| 48 |
+
"labeling": "🏷️ Etiquetado",
|
| 49 |
+
"select_label": "— Seleccione una etiqueta —",
|
| 50 |
+
"classification": "Clasificación de la imagen",
|
| 51 |
+
"unlabeled": "🔴 Sin etiquetar",
|
| 52 |
+
"label_set": "🟢 Etiqueta",
|
| 53 |
+
"code": "código",
|
| 54 |
+
"save_label": "💾 Guardar etiqueta en historial",
|
| 55 |
+
"select_before_save": "Seleccione una etiqueta antes de guardar.",
|
| 56 |
+
"label_saved": "✅ Etiqueta guardada en la base de datos.",
|
| 57 |
+
"save_error": "Error al guardar",
|
| 58 |
+
# Recorder
|
| 59 |
+
"dictation": "🎙️ Dictado y Transcripción",
|
| 60 |
+
"record_audio": "Grabar audio",
|
| 61 |
+
"transcribing": "Transcribiendo audio…",
|
| 62 |
+
"transcription_editable": "Transcripción (editable)",
|
| 63 |
+
"transcription_placeholder": "Grabe un audio o escriba la transcripción manualmente…",
|
| 64 |
+
"segments_timestamps": "🕐 Segmentos con timestamps",
|
| 65 |
+
"restore_original": "🔄 Restaurar original",
|
| 66 |
+
"clear_text": "🗑️ Limpiar texto",
|
| 67 |
+
"words": "palabras",
|
| 68 |
+
"manually_modified": "✏️ _modificada manualmente_",
|
| 69 |
+
"no_transcription_yet": "Sin transcripción aún.",
|
| 70 |
+
# Downloader
|
| 71 |
+
"download": "📥 Descarga",
|
| 72 |
+
"current_image": "Imagen actual",
|
| 73 |
+
"label_to_enable": "Etiquete la imagen para habilitar la descarga individual.",
|
| 74 |
+
"download_label": "⬇️ Descargar etiquetado",
|
| 75 |
+
"full_session": "Toda la sesión",
|
| 76 |
+
"images_metric": "Imágenes",
|
| 77 |
+
"with_audio": "Con audio",
|
| 78 |
+
"labeled_metric": "Etiquetadas",
|
| 79 |
+
"with_transcription": "Con transcripción",
|
| 80 |
+
"unlabeled_warning": "imagen(es) sin etiquetar. Se incluirán en la descarga pero sin etiqueta.",
|
| 81 |
+
"no_images_download": "No hay imágenes para descargar.",
|
| 82 |
+
"download_all": "⬇️ Descargar todo el etiquetado (ZIP)",
|
| 83 |
+
"ml_formats": "Formatos para ML",
|
| 84 |
+
"hf_csv": "📊 CSV (HuggingFace)",
|
| 85 |
+
"jsonl_finetune": "📄 JSONL (Fine-tuning)",
|
| 86 |
+
# Nav
|
| 87 |
+
"previous": "⬅️ Anterior",
|
| 88 |
+
"next": "Siguiente ➡️",
|
| 89 |
+
"delete_image": "🗑️ Eliminar esta imagen",
|
| 90 |
+
# Timeout
|
| 91 |
+
"session_expired_data": "⏰ Sesión expirada por inactividad",
|
| 92 |
+
"session_expired_clean": "⏰ Sesión expirada por inactividad. Se inició una nueva sesión.",
|
| 93 |
+
"download_before_expire": "Descargue sus datos antes de que expire la sesión la próxima vez.",
|
| 94 |
+
# Auth
|
| 95 |
+
"login_prompt": "👨⚕️ Inicie sesión para acceder al sistema de etiquetado.",
|
| 96 |
+
"login_error": "❌ Usuario o contraseña incorrectos.",
|
| 97 |
+
},
|
| 98 |
+
"en": {
|
| 99 |
+
"app_subtitle": "Ophthalmological Medical Labeling System",
|
| 100 |
+
"settings": "⚙️ Settings",
|
| 101 |
+
"doctor_name": "👨⚕️ Doctor Name",
|
| 102 |
+
"whisper_model": "Whisper Model",
|
| 103 |
+
"dictation_language": "Dictation Language",
|
| 104 |
+
"current_session": "📊 Current Session",
|
| 105 |
+
"db_type": "Database",
|
| 106 |
+
"images_loaded": "Images loaded",
|
| 107 |
+
"labeled_count": "Labeled",
|
| 108 |
+
"no_images": "No images in session.",
|
| 109 |
+
"history": "🗄️ History",
|
| 110 |
+
"search_image": "🔍 Search by image",
|
| 111 |
+
"no_records": "No records.",
|
| 112 |
+
"label_header": "Label",
|
| 113 |
+
"doctor_header": "Doctor",
|
| 114 |
+
"no_transcription": "No transcription",
|
| 115 |
+
"end_session": "🗑️ End Session",
|
| 116 |
+
"undownloaded_warning": "⚠️ Undownloaded data",
|
| 117 |
+
"timeout_in": "⏱️ Timeout in",
|
| 118 |
+
"confirm_delete": "Are you sure? **All data will be permanently deleted.**",
|
| 119 |
+
"yes_delete": "✅ Yes, delete",
|
| 120 |
+
"cancel": "❌ Cancel",
|
| 121 |
+
"logout": "🚪 Log out",
|
| 122 |
+
"upload_images": "📤 Upload medical images",
|
| 123 |
+
"upload_help_formats": "Accepted formats",
|
| 124 |
+
"upload_help_max": "Max.",
|
| 125 |
+
"invalid_files": "file(s) are not valid images and were ignored.",
|
| 126 |
+
"duplicate_files": "duplicate file(s) were skipped.",
|
| 127 |
+
"upload_prompt": "📤 Upload medical images to start labeling.",
|
| 128 |
+
"progress": "Progress",
|
| 129 |
+
"labeled_suffix": "labeled",
|
| 130 |
+
"page": "Page",
|
| 131 |
+
"labeling": "🏷️ Labeling",
|
| 132 |
+
"select_label": "— Select a label —",
|
| 133 |
+
"classification": "Image classification",
|
| 134 |
+
"unlabeled": "🔴 Unlabeled",
|
| 135 |
+
"label_set": "🟢 Label",
|
| 136 |
+
"code": "code",
|
| 137 |
+
"save_label": "💾 Save label to history",
|
| 138 |
+
"select_before_save": "Select a label before saving.",
|
| 139 |
+
"label_saved": "✅ Label saved to database.",
|
| 140 |
+
"save_error": "Save error",
|
| 141 |
+
"dictation": "🎙️ Dictation & Transcription",
|
| 142 |
+
"record_audio": "Record audio",
|
| 143 |
+
"transcribing": "Transcribing audio…",
|
| 144 |
+
"transcription_editable": "Transcription (editable)",
|
| 145 |
+
"transcription_placeholder": "Record audio or type the transcription manually…",
|
| 146 |
+
"segments_timestamps": "🕐 Segments with timestamps",
|
| 147 |
+
"restore_original": "🔄 Restore original",
|
| 148 |
+
"clear_text": "🗑️ Clear text",
|
| 149 |
+
"words": "words",
|
| 150 |
+
"manually_modified": "✏️ _manually modified_",
|
| 151 |
+
"no_transcription_yet": "No transcription yet.",
|
| 152 |
+
"download": "📥 Download",
|
| 153 |
+
"current_image": "Current image",
|
| 154 |
+
"label_to_enable": "Label the image to enable individual download.",
|
| 155 |
+
"download_label": "⬇️ Download labeling",
|
| 156 |
+
"full_session": "Full session",
|
| 157 |
+
"images_metric": "Images",
|
| 158 |
+
"with_audio": "With audio",
|
| 159 |
+
"labeled_metric": "Labeled",
|
| 160 |
+
"with_transcription": "With transcription",
|
| 161 |
+
"unlabeled_warning": "unlabeled image(s). They will be included in the download without a label.",
|
| 162 |
+
"no_images_download": "No images to download.",
|
| 163 |
+
"download_all": "⬇️ Download all labeling (ZIP)",
|
| 164 |
+
"ml_formats": "ML Formats",
|
| 165 |
+
"hf_csv": "📊 CSV (HuggingFace)",
|
| 166 |
+
"jsonl_finetune": "📄 JSONL (Fine-tuning)",
|
| 167 |
+
"previous": "⬅️ Previous",
|
| 168 |
+
"next": "Next ➡️",
|
| 169 |
+
"delete_image": "🗑️ Delete this image",
|
| 170 |
+
"session_expired_data": "⏰ Session expired due to inactivity",
|
| 171 |
+
"session_expired_clean": "⏰ Session expired. A new session has started.",
|
| 172 |
+
"download_before_expire": "Download your data before the session expires next time.",
|
| 173 |
+
"login_prompt": "👨⚕️ Log in to access the labeling system.",
|
| 174 |
+
"login_error": "❌ Wrong username or password.",
|
| 175 |
+
},
|
| 176 |
+
}
|
| 177 |
+
|
| 178 |
+
|
| 179 |
+
def t(key: str) -> str:
|
| 180 |
+
"""Return the translated string for *key* in the active language."""
|
| 181 |
+
lang_dict = _STRINGS.get(ACTIVE_LANGUAGE, _STRINGS["es"])
|
| 182 |
+
return lang_dict.get(key, key)
|
interface/main.py
CHANGED
|
@@ -1,108 +1,183 @@
|
|
| 1 |
import os
|
| 2 |
-
# CRITICAL FIX: MUST BE THE FIRST LINE
|
| 3 |
os.environ["KMP_DUPLICATE_LIB_OK"] = "TRUE"
|
| 4 |
|
| 5 |
import streamlit as st
|
| 6 |
-
import tempfile
|
| 7 |
import math
|
|
|
|
| 8 |
import database as db
|
| 9 |
import utils
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 10 |
|
| 11 |
-
#
|
| 12 |
-
st.set_page_config(
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 13 |
|
| 14 |
-
#
|
| 15 |
-
|
| 16 |
-
IMAGE_FOLDER = "full-fundus" # Folder containing your images
|
| 17 |
|
| 18 |
-
#
|
| 19 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 20 |
|
|
|
|
|
|
|
| 21 |
try:
|
| 22 |
active_db_type = db.init_db()
|
| 23 |
except Exception as e:
|
| 24 |
-
st.error(f"
|
| 25 |
st.stop()
|
| 26 |
|
| 27 |
-
#
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
st.session_state.dataset = utils.load_dataset(CSV_FILE_PATH, IMAGE_FOLDER)
|
| 31 |
|
| 32 |
-
#
|
| 33 |
-
|
| 34 |
|
| 35 |
-
|
| 36 |
-
st.
|
| 37 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 38 |
|
| 39 |
-
# SIDEBAR: SETTINGS & HISTORY
|
| 40 |
-
with st.sidebar:
|
| 41 |
-
st.title("⚙️ Settings")
|
| 42 |
-
|
| 43 |
-
# Model Selector
|
| 44 |
-
model_options = ["tiny", "tiny.en", "base", "base.en", "small", "small.en", "medium", "medium.en", "large", "turbo"]
|
| 45 |
-
selected_model = st.selectbox("Whisper Model Size", model_options, index=1)
|
| 46 |
-
|
| 47 |
st.divider()
|
| 48 |
-
|
| 49 |
-
#
|
| 50 |
-
|
| 51 |
-
|
| 52 |
-
|
| 53 |
-
if
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 54 |
st.session_state.history_search = search_input
|
| 55 |
st.session_state.history_page = 1
|
| 56 |
st.rerun()
|
| 57 |
|
| 58 |
-
if
|
| 59 |
st.session_state.history_page = 1
|
| 60 |
-
|
| 61 |
ITEMS_PER_PAGE = 5
|
| 62 |
try:
|
| 63 |
-
|
| 64 |
-
st.session_state.get(
|
| 65 |
-
st.session_state.history_page,
|
| 66 |
-
ITEMS_PER_PAGE
|
| 67 |
)
|
| 68 |
except Exception as e:
|
| 69 |
-
st.error(f"Error
|
| 70 |
-
|
| 71 |
-
|
| 72 |
-
if not
|
| 73 |
-
st.
|
| 74 |
else:
|
| 75 |
-
for
|
| 76 |
-
|
| 77 |
-
|
| 78 |
-
|
| 79 |
-
|
| 80 |
-
|
| 81 |
-
|
| 82 |
-
|
| 83 |
-
|
| 84 |
-
|
| 85 |
-
|
| 86 |
-
|
| 87 |
-
|
| 88 |
-
|
| 89 |
-
|
| 90 |
-
|
| 91 |
-
|
| 92 |
-
|
| 93 |
-
|
| 94 |
-
|
| 95 |
-
|
| 96 |
-
|
| 97 |
-
|
| 98 |
-
|
| 99 |
-
|
| 100 |
-
|
| 101 |
-
|
| 102 |
-
|
| 103 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 104 |
if total_pages > 1:
|
| 105 |
-
st.divider()
|
| 106 |
c1, c2, c3 = st.columns([1, 2, 1])
|
| 107 |
with c1:
|
| 108 |
if st.session_state.history_page > 1:
|
|
@@ -110,118 +185,145 @@ with st.sidebar:
|
|
| 110 |
st.session_state.history_page -= 1
|
| 111 |
st.rerun()
|
| 112 |
with c2:
|
| 113 |
-
st.markdown(
|
|
|
|
|
|
|
|
|
|
|
|
|
| 114 |
with c3:
|
| 115 |
if st.session_state.history_page < total_pages:
|
| 116 |
if st.button("▶️"):
|
| 117 |
st.session_state.history_page += 1
|
| 118 |
st.rerun()
|
| 119 |
|
| 120 |
-
|
| 121 |
-
with st.spinner(f"Loading Whisper '{selected_model}' model..."):
|
| 122 |
-
model = utils.load_whisper_model(selected_model)
|
| 123 |
|
| 124 |
-
#
|
| 125 |
-
if
|
| 126 |
-
|
| 127 |
-
|
| 128 |
-
|
| 129 |
-
|
| 130 |
-
|
| 131 |
-
|
| 132 |
-
|
| 133 |
-
|
| 134 |
-
for i, item in enumerate(DATASET):
|
| 135 |
-
if str(item["id"]) == str(last_id):
|
| 136 |
-
start_index = i
|
| 137 |
-
break
|
| 138 |
-
except Exception as e:
|
| 139 |
-
print(f"Could not restore session: {e}")
|
| 140 |
-
|
| 141 |
-
st.session_state.img_index = start_index
|
| 142 |
|
| 143 |
-
|
| 144 |
-
|
| 145 |
-
|
| 146 |
-
|
| 147 |
-
|
| 148 |
-
|
| 149 |
-
|
| 150 |
-
|
| 151 |
-
|
| 152 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 153 |
|
| 154 |
-
|
| 155 |
-
|
| 156 |
-
|
| 157 |
-
|
|
|
|
|
|
|
| 158 |
|
| 159 |
-
#
|
| 160 |
-
st.
|
| 161 |
-
|
|
|
|
|
|
|
| 162 |
|
| 163 |
-
|
| 164 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 165 |
|
| 166 |
with col_img:
|
| 167 |
-
st.image(
|
| 168 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 169 |
# Navigation
|
| 170 |
c1, c2, c3 = st.columns([1, 2, 1])
|
| 171 |
with c1:
|
| 172 |
-
if st.button("⬅️
|
| 173 |
-
|
| 174 |
-
|
|
|
|
| 175 |
st.rerun()
|
| 176 |
with c2:
|
| 177 |
-
st.markdown(
|
|
|
|
|
|
|
|
|
|
|
|
|
| 178 |
with c3:
|
| 179 |
-
if st.button("
|
| 180 |
-
|
| 181 |
-
|
|
|
|
| 182 |
st.rerun()
|
| 183 |
|
| 184 |
-
|
| 185 |
-
st.
|
| 186 |
-
|
| 187 |
-
|
| 188 |
-
|
| 189 |
-
|
| 190 |
-
|
| 191 |
-
|
| 192 |
-
|
| 193 |
-
|
| 194 |
-
|
| 195 |
-
|
| 196 |
-
|
| 197 |
-
|
| 198 |
-
|
| 199 |
-
|
| 200 |
-
if st.session_state.current_transcription:
|
| 201 |
-
st.session_state.current_transcription += " " + new_text
|
| 202 |
-
else:
|
| 203 |
-
st.session_state.current_transcription = new_text
|
| 204 |
-
|
| 205 |
-
st.session_state.last_processed_audio = audio_wav
|
| 206 |
-
os.remove(tmp_path)
|
| 207 |
-
except Exception as e:
|
| 208 |
-
st.error(f"Transcription Error: {e}")
|
| 209 |
-
|
| 210 |
-
diagnosis_text = st.text_area(
|
| 211 |
-
"Findings:",
|
| 212 |
-
value=st.session_state.current_transcription,
|
| 213 |
-
height=300
|
| 214 |
-
)
|
| 215 |
-
|
| 216 |
-
if diagnosis_text != st.session_state.current_transcription:
|
| 217 |
-
st.session_state.current_transcription = diagnosis_text
|
| 218 |
-
|
| 219 |
-
if st.button("💾 Save to Record", type="primary"):
|
| 220 |
-
if diagnosis_text.strip():
|
| 221 |
-
try:
|
| 222 |
-
db.save_diagnosis(current_img['id'], diagnosis_text)
|
| 223 |
-
st.success("Successfully saved to database.")
|
| 224 |
-
except Exception as e:
|
| 225 |
-
st.error(f"Save failed: {e}")
|
| 226 |
-
else:
|
| 227 |
-
st.warning("Cannot save empty diagnosis.")
|
|
|
|
| 1 |
import os
|
| 2 |
+
# CRITICAL FIX: MUST BE THE FIRST LINE
|
| 3 |
os.environ["KMP_DUPLICATE_LIB_OK"] = "TRUE"
|
| 4 |
|
| 5 |
import streamlit as st
|
|
|
|
| 6 |
import math
|
| 7 |
+
import config
|
| 8 |
import database as db
|
| 9 |
import utils
|
| 10 |
+
import i18n
|
| 11 |
+
from services import session_manager as sm
|
| 12 |
+
from services.whisper_service import load_whisper_model
|
| 13 |
+
from components.uploader import render_uploader
|
| 14 |
+
from components.gallery import render_gallery
|
| 15 |
+
from components.labeler import render_labeler
|
| 16 |
+
from components.recorder import render_recorder
|
| 17 |
+
from components.downloader import render_downloader
|
| 18 |
+
from components.image_protection import inject_image_protection
|
| 19 |
+
from services.auth_service import require_auth, render_logout_button
|
| 20 |
|
| 21 |
+
# ── PAGE CONFIG ──────────────────────────────────────────────────────────────
|
| 22 |
+
st.set_page_config(
|
| 23 |
+
page_title=config.APP_TITLE,
|
| 24 |
+
layout="wide",
|
| 25 |
+
page_icon=config.APP_ICON,
|
| 26 |
+
)
|
| 27 |
+
# ── AUTHENTICATION GATE ───────────────────────────────────────────────────────
|
| 28 |
+
if not require_auth():
|
| 29 |
+
st.stop()
|
| 30 |
|
| 31 |
+
# ── IMAGE PROTECTION (prevent download / right-click save) ───────────────────
|
| 32 |
+
inject_image_protection()
|
|
|
|
| 33 |
|
| 34 |
+
# Set UI language from config
|
| 35 |
+
i18n.ACTIVE_LANGUAGE = config.UI_LANGUAGE
|
| 36 |
+
# ── SESSION INITIALIZATION ──────────────────────────────────────────────────
|
| 37 |
+
sm.init_session()
|
| 38 |
+
|
| 39 |
+
# Check inactivity timeout
|
| 40 |
+
if sm.check_session_timeout(config.SESSION_TIMEOUT_MINUTES):
|
| 41 |
+
if sm.has_undownloaded_data():
|
| 42 |
+
summary = sm.get_session_data_summary()
|
| 43 |
+
st.warning(
|
| 44 |
+
f"⏰ Sesión expirada por inactividad ({config.SESSION_TIMEOUT_MINUTES} min). "
|
| 45 |
+
f"Se eliminaron **{summary['total']}** imágenes, "
|
| 46 |
+
f"**{summary['labeled']}** etiquetadas, "
|
| 47 |
+
f"**{summary['with_audio']}** con audio. "
|
| 48 |
+
"Descargue sus datos antes de que expire la sesión la próxima vez."
|
| 49 |
+
)
|
| 50 |
+
else:
|
| 51 |
+
st.info("⏰ Sesión expirada por inactividad. Se inició una nueva sesión.")
|
| 52 |
+
sm.clear_session()
|
| 53 |
+
sm.init_session()
|
| 54 |
|
| 55 |
+
# ── DATABASE (metadata only — never images or audio) ────────────────────────
|
| 56 |
+
utils.setup_env()
|
| 57 |
try:
|
| 58 |
active_db_type = db.init_db()
|
| 59 |
except Exception as e:
|
| 60 |
+
st.error(f"Error crítico de base de datos: {e}")
|
| 61 |
st.stop()
|
| 62 |
|
| 63 |
+
# ── SIDEBAR ──────────────────────────────────────────────────────────────────
|
| 64 |
+
with st.sidebar:
|
| 65 |
+
st.title("⚙️ Configuración")
|
|
|
|
| 66 |
|
| 67 |
+
# Logout button (only visible if auth is active)
|
| 68 |
+
render_logout_button()
|
| 69 |
|
| 70 |
+
# Doctor name
|
| 71 |
+
doctor = st.text_input(
|
| 72 |
+
"👨⚕️ Nombre del Doctor",
|
| 73 |
+
value=st.session_state.get("doctor_name", ""),
|
| 74 |
+
)
|
| 75 |
+
if doctor != st.session_state.get("doctor_name", ""):
|
| 76 |
+
st.session_state.doctor_name = doctor
|
| 77 |
+
|
| 78 |
+
st.divider()
|
| 79 |
+
|
| 80 |
+
# Whisper language (select FIRST so models can be filtered)
|
| 81 |
+
lang_keys = list(config.WHISPER_LANGUAGE_OPTIONS.keys())
|
| 82 |
+
lang_labels = list(config.WHISPER_LANGUAGE_OPTIONS.values())
|
| 83 |
+
selected_lang_display = st.selectbox("Idioma de dictado", lang_labels, index=0)
|
| 84 |
+
selected_language = lang_keys[lang_labels.index(selected_lang_display)]
|
| 85 |
+
|
| 86 |
+
# Whisper model — filtered by selected language
|
| 87 |
+
# Models ending in ".en" → English only. Others → multilingual.
|
| 88 |
+
# "large" and "turbo" are multilingual and work for all languages.
|
| 89 |
+
if selected_language == "en":
|
| 90 |
+
available_models = [
|
| 91 |
+
m for m in config.WHISPER_MODEL_OPTIONS
|
| 92 |
+
if m.endswith(".en") or m in ("large", "turbo")
|
| 93 |
+
]
|
| 94 |
+
else:
|
| 95 |
+
available_models = [
|
| 96 |
+
m for m in config.WHISPER_MODEL_OPTIONS if not m.endswith(".en")
|
| 97 |
+
]
|
| 98 |
+
selected_model = st.selectbox(
|
| 99 |
+
"Modelo Whisper",
|
| 100 |
+
available_models,
|
| 101 |
+
index=0,
|
| 102 |
+
)
|
| 103 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 104 |
st.divider()
|
| 105 |
+
|
| 106 |
+
# ── Session progress ─────────────────────────────────────────────────────
|
| 107 |
+
labeled, total = sm.get_labeling_progress()
|
| 108 |
+
st.subheader("📊 Sesión Actual")
|
| 109 |
+
st.caption(f"Base de datos: **{active_db_type}**")
|
| 110 |
+
if total > 0:
|
| 111 |
+
st.write(f"Imágenes cargadas: **{total}**")
|
| 112 |
+
st.write(f"Etiquetadas: **{labeled}** / {total}")
|
| 113 |
+
st.progress(labeled / total if total > 0 else 0)
|
| 114 |
+
else:
|
| 115 |
+
st.info("No hay imágenes en la sesión.")
|
| 116 |
+
|
| 117 |
+
st.divider()
|
| 118 |
+
|
| 119 |
+
# ── Annotation History (from DB) — Grouped by image ────────────────────────
|
| 120 |
+
st.subheader("🗄️ Historial")
|
| 121 |
+
search_input = st.text_input(
|
| 122 |
+
"🔍 Buscar por imagen",
|
| 123 |
+
value=st.session_state.get("history_search", ""),
|
| 124 |
+
)
|
| 125 |
+
if search_input != st.session_state.get("history_search", ""):
|
| 126 |
st.session_state.history_search = search_input
|
| 127 |
st.session_state.history_page = 1
|
| 128 |
st.rerun()
|
| 129 |
|
| 130 |
+
if "history_page" not in st.session_state:
|
| 131 |
st.session_state.history_page = 1
|
| 132 |
+
|
| 133 |
ITEMS_PER_PAGE = 5
|
| 134 |
try:
|
| 135 |
+
history_groups, total_items = db.get_history_grouped(
|
| 136 |
+
st.session_state.get("history_search", ""),
|
| 137 |
+
st.session_state.history_page,
|
| 138 |
+
ITEMS_PER_PAGE,
|
| 139 |
)
|
| 140 |
except Exception as e:
|
| 141 |
+
st.error(f"Error al obtener historial: {e}")
|
| 142 |
+
history_groups, total_items = [], 0
|
| 143 |
+
|
| 144 |
+
if not history_groups:
|
| 145 |
+
st.caption("Sin registros.")
|
| 146 |
else:
|
| 147 |
+
for group in history_groups:
|
| 148 |
+
fname = group["imageFilename"]
|
| 149 |
+
annotations = group["annotations"]
|
| 150 |
+
n_annotations = len(annotations)
|
| 151 |
+
latest = annotations[0]
|
| 152 |
+
latest_label = latest.get("label") or "—"
|
| 153 |
+
|
| 154 |
+
# Badge showing number of labelings
|
| 155 |
+
badge = f" ({n_annotations}x)" if n_annotations > 1 else ""
|
| 156 |
+
|
| 157 |
+
with st.expander(f"📄 {fname}{badge} — {latest_label}"):
|
| 158 |
+
for i, ann in enumerate(annotations):
|
| 159 |
+
ts = str(ann.get("createdAt", ""))[:16]
|
| 160 |
+
label = ann.get("label") or "—"
|
| 161 |
+
doctor = ann.get("doctorName") or "—"
|
| 162 |
+
text = ann.get("transcription", "") or ""
|
| 163 |
+
preview = (text[:60] + "…") if len(text) > 60 else text
|
| 164 |
+
|
| 165 |
+
if n_annotations > 1:
|
| 166 |
+
st.markdown(
|
| 167 |
+
f"**#{i + 1}** — `{ts}`"
|
| 168 |
+
)
|
| 169 |
+
st.write(f"**Etiqueta:** {label}")
|
| 170 |
+
st.write(f"**Doctor:** {doctor}")
|
| 171 |
+
if preview:
|
| 172 |
+
st.caption(f"📝 {preview}")
|
| 173 |
+
else:
|
| 174 |
+
st.caption("_Sin transcripción_")
|
| 175 |
+
|
| 176 |
+
if i < n_annotations - 1:
|
| 177 |
+
st.divider()
|
| 178 |
+
|
| 179 |
+
total_pages = max(1, math.ceil(total_items / ITEMS_PER_PAGE))
|
| 180 |
if total_pages > 1:
|
|
|
|
| 181 |
c1, c2, c3 = st.columns([1, 2, 1])
|
| 182 |
with c1:
|
| 183 |
if st.session_state.history_page > 1:
|
|
|
|
| 185 |
st.session_state.history_page -= 1
|
| 186 |
st.rerun()
|
| 187 |
with c2:
|
| 188 |
+
st.markdown(
|
| 189 |
+
f"<div style='text-align:center'>"
|
| 190 |
+
f"{st.session_state.history_page} / {total_pages}</div>",
|
| 191 |
+
unsafe_allow_html=True,
|
| 192 |
+
)
|
| 193 |
with c3:
|
| 194 |
if st.session_state.history_page < total_pages:
|
| 195 |
if st.button("▶️"):
|
| 196 |
st.session_state.history_page += 1
|
| 197 |
st.rerun()
|
| 198 |
|
| 199 |
+
st.divider()
|
|
|
|
|
|
|
| 200 |
|
| 201 |
+
# ── End session ──────────────────────────────────────────────────────────
|
| 202 |
+
if sm.has_undownloaded_data() and not st.session_state.get("session_downloaded", False):
|
| 203 |
+
summary = sm.get_session_data_summary()
|
| 204 |
+
remaining = sm.get_remaining_timeout_minutes(config.SESSION_TIMEOUT_MINUTES)
|
| 205 |
+
st.warning(
|
| 206 |
+
f"⚠️ Datos no descargados: **{summary['total']}** imágenes, "
|
| 207 |
+
f"**{summary['labeled']}** etiquetadas, "
|
| 208 |
+
f"**{summary['with_audio']}** con audio."
|
| 209 |
+
)
|
| 210 |
+
st.caption(f"⏱️ Timeout en ~{remaining:.0f} min")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 211 |
|
| 212 |
+
# Two-step confirmation to prevent accidental data loss
|
| 213 |
+
if not st.session_state.get("confirm_end_session", False):
|
| 214 |
+
if st.button(
|
| 215 |
+
"🗑️ Finalizar Sesión",
|
| 216 |
+
type="secondary",
|
| 217 |
+
use_container_width=True,
|
| 218 |
+
):
|
| 219 |
+
st.session_state.confirm_end_session = True
|
| 220 |
+
st.rerun()
|
| 221 |
+
else:
|
| 222 |
+
st.error(
|
| 223 |
+
"¿Está seguro? **Todos los datos se eliminarán permanentemente.**"
|
| 224 |
+
)
|
| 225 |
+
cc1, cc2 = st.columns(2)
|
| 226 |
+
with cc1:
|
| 227 |
+
if st.button("✅ Sí, eliminar", type="primary", use_container_width=True):
|
| 228 |
+
sm.clear_session()
|
| 229 |
+
st.rerun()
|
| 230 |
+
with cc2:
|
| 231 |
+
if st.button("❌ Cancelar", use_container_width=True):
|
| 232 |
+
st.session_state.confirm_end_session = False
|
| 233 |
+
st.rerun()
|
| 234 |
+
|
| 235 |
+
# ── LOAD WHISPER MODEL ───────────────────────────────────────────────────────
|
| 236 |
+
with st.spinner(f"Cargando modelo Whisper '{selected_model}'..."):
|
| 237 |
+
model = load_whisper_model(selected_model)
|
| 238 |
+
# ── BROWSER CLOSE GUARD (beforeunload) ───────────────────────────────────
|
| 239 |
+
# Warn the user when they try to close/reload the tab with data in session.
|
| 240 |
+
if sm.has_undownloaded_data() and not st.session_state.get("session_downloaded", False):
|
| 241 |
+
st.components.v1.html(
|
| 242 |
+
"""
|
| 243 |
+
<script>
|
| 244 |
+
window.addEventListener('beforeunload', function (e) {
|
| 245 |
+
e.preventDefault();
|
| 246 |
+
e.returnValue = '';
|
| 247 |
+
});
|
| 248 |
+
</script>
|
| 249 |
+
""",
|
| 250 |
+
height=0,
|
| 251 |
+
)
|
| 252 |
+
# ── MAIN CONTENT ───────────────────────────────���─────────────────────────────
|
| 253 |
+
st.title(f"{config.APP_ICON} {config.APP_TITLE}")
|
| 254 |
+
st.caption(config.APP_SUBTITLE)
|
| 255 |
+
|
| 256 |
+
# ── IMAGE UPLOAD ─────────────────────────────────────────────────────────────
|
| 257 |
+
new_count = render_uploader()
|
| 258 |
+
if new_count > 0:
|
| 259 |
+
st.rerun()
|
| 260 |
+
|
| 261 |
+
# ── WORKSPACE (requires at least one image) ──────────────────────────────────
|
| 262 |
+
if not st.session_state.image_order:
|
| 263 |
+
st.info("📤 Suba imágenes médicas para comenzar el etiquetado.")
|
| 264 |
+
st.stop()
|
| 265 |
|
| 266 |
+
# ── IMAGE GALLERY ────────────────────────────────────────────────────────────
|
| 267 |
+
st.divider()
|
| 268 |
+
gallery_clicked = render_gallery()
|
| 269 |
+
if gallery_clicked:
|
| 270 |
+
st.rerun()
|
| 271 |
+
st.divider()
|
| 272 |
|
| 273 |
+
# Ensure a valid current image is selected
|
| 274 |
+
current_id = st.session_state.current_image_id
|
| 275 |
+
if current_id is None or current_id not in st.session_state.images:
|
| 276 |
+
st.session_state.current_image_id = st.session_state.image_order[0]
|
| 277 |
+
current_id = st.session_state.current_image_id
|
| 278 |
|
| 279 |
+
current_img = sm.get_current_image()
|
| 280 |
+
order = st.session_state.image_order
|
| 281 |
+
current_idx = order.index(current_id)
|
| 282 |
+
|
| 283 |
+
# ── Two-column layout: Image | Tools ─────────────────────────────────────────
|
| 284 |
+
col_img, col_tools = st.columns([1.5, 1])
|
| 285 |
|
| 286 |
with col_img:
|
| 287 |
+
st.image(
|
| 288 |
+
current_img["bytes"],
|
| 289 |
+
caption=current_img["filename"],
|
| 290 |
+
use_container_width=True,
|
| 291 |
+
)
|
| 292 |
+
|
| 293 |
# Navigation
|
| 294 |
c1, c2, c3 = st.columns([1, 2, 1])
|
| 295 |
with c1:
|
| 296 |
+
if st.button("⬅️ Anterior", disabled=(len(order) <= 1)):
|
| 297 |
+
new_idx = (current_idx - 1) % len(order)
|
| 298 |
+
st.session_state.current_image_id = order[new_idx]
|
| 299 |
+
sm.update_activity()
|
| 300 |
st.rerun()
|
| 301 |
with c2:
|
| 302 |
+
st.markdown(
|
| 303 |
+
f"<div style='text-align:center'><b>{current_img['filename']}</b>"
|
| 304 |
+
f"<br>({current_idx + 1} de {len(order)})</div>",
|
| 305 |
+
unsafe_allow_html=True,
|
| 306 |
+
)
|
| 307 |
with c3:
|
| 308 |
+
if st.button("Siguiente ➡️", disabled=(len(order) <= 1)):
|
| 309 |
+
new_idx = (current_idx + 1) % len(order)
|
| 310 |
+
st.session_state.current_image_id = order[new_idx]
|
| 311 |
+
sm.update_activity()
|
| 312 |
st.rerun()
|
| 313 |
|
| 314 |
+
# Delete image from session
|
| 315 |
+
if st.button("🗑️ Eliminar esta imagen", key="delete_img"):
|
| 316 |
+
sm.remove_image(current_id)
|
| 317 |
+
sm.update_activity()
|
| 318 |
+
st.rerun()
|
| 319 |
+
|
| 320 |
+
with col_tools:
|
| 321 |
+
render_labeler(current_id)
|
| 322 |
+
|
| 323 |
+
st.divider()
|
| 324 |
+
|
| 325 |
+
render_recorder(current_id, model, selected_language)
|
| 326 |
+
|
| 327 |
+
st.divider()
|
| 328 |
+
|
| 329 |
+
render_downloader(current_id)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
interface/services/__init__.py
ADDED
|
File without changes
|
interface/services/auth_service.py
ADDED
|
@@ -0,0 +1,99 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""OphthalmoCapture — Basic Authentication Service
|
| 2 |
+
|
| 3 |
+
Provides a simple login gate using streamlit-authenticator.
|
| 4 |
+
Doctors must authenticate before accessing the labeling interface.
|
| 5 |
+
Their name is automatically set in the session for audit trails.
|
| 6 |
+
|
| 7 |
+
If streamlit-authenticator is not installed, authentication is skipped
|
| 8 |
+
and the app works in "anonymous" mode.
|
| 9 |
+
"""
|
| 10 |
+
|
| 11 |
+
import streamlit as st
|
| 12 |
+
|
| 13 |
+
try:
|
| 14 |
+
import streamlit_authenticator as stauth
|
| 15 |
+
AUTH_AVAILABLE = True
|
| 16 |
+
except ImportError:
|
| 17 |
+
AUTH_AVAILABLE = False
|
| 18 |
+
|
| 19 |
+
|
| 20 |
+
# ── Default credentials ──────────────────────────────────────────────────────
|
| 21 |
+
# In production, load these from a secure YAML/env. For now, hardcoded demo.
|
| 22 |
+
DEFAULT_CREDENTIALS = {
|
| 23 |
+
"usernames": {
|
| 24 |
+
"admin": {
|
| 25 |
+
"name": "Administrador",
|
| 26 |
+
"password": "$2b$12$dcvvIg0q/2hZ1pO9gBKqY./LfujFHvoJUvPDLx1qhLS0LtD2kzJoq",
|
| 27 |
+
# plain: "admin123" — generate new hashes with stauth.Hasher
|
| 28 |
+
},
|
| 29 |
+
"doctor1": {
|
| 30 |
+
"name": "Dr. García",
|
| 31 |
+
"password": "$2b$12$dcvvIg0q/2hZ1pO9gBKqY./LfujFHvoJUvPDLx1qhLS0LtD2kzJoq",
|
| 32 |
+
# plain: "admin123"
|
| 33 |
+
},
|
| 34 |
+
"doctor2": {
|
| 35 |
+
"name": "Dra. López",
|
| 36 |
+
"password": "$2b$12$dcvvIg0q/2hZ1pO9gBKqY./LfujFHvoJUvPDLx1qhLS0LtD2kzJoq",
|
| 37 |
+
# plain: "admin123"
|
| 38 |
+
},
|
| 39 |
+
}
|
| 40 |
+
}
|
| 41 |
+
|
| 42 |
+
COOKIE_NAME = "ophthalmocapture_auth"
|
| 43 |
+
COOKIE_KEY = "ophthalmocapture_secret_key"
|
| 44 |
+
COOKIE_EXPIRY_DAYS = 1
|
| 45 |
+
|
| 46 |
+
|
| 47 |
+
def _get_authenticator():
|
| 48 |
+
"""Return a single shared Authenticate instance per session."""
|
| 49 |
+
if "authenticator" not in st.session_state:
|
| 50 |
+
st.session_state["authenticator"] = stauth.Authenticate(
|
| 51 |
+
credentials=DEFAULT_CREDENTIALS,
|
| 52 |
+
cookie_name=COOKIE_NAME,
|
| 53 |
+
cookie_key=COOKIE_KEY,
|
| 54 |
+
cookie_expiry_days=COOKIE_EXPIRY_DAYS,
|
| 55 |
+
)
|
| 56 |
+
return st.session_state["authenticator"]
|
| 57 |
+
|
| 58 |
+
|
| 59 |
+
def require_auth() -> bool:
|
| 60 |
+
"""Show login form and return True if the user is authenticated.
|
| 61 |
+
|
| 62 |
+
If streamlit-authenticator is not installed, returns True immediately
|
| 63 |
+
(anonymous mode) and sets doctor_name to empty string.
|
| 64 |
+
"""
|
| 65 |
+
if not AUTH_AVAILABLE:
|
| 66 |
+
# Graceful degradation: no auth library → anonymous mode
|
| 67 |
+
return True
|
| 68 |
+
|
| 69 |
+
authenticator = _get_authenticator()
|
| 70 |
+
|
| 71 |
+
try:
|
| 72 |
+
authenticator.login(location="main")
|
| 73 |
+
except Exception:
|
| 74 |
+
pass
|
| 75 |
+
|
| 76 |
+
if st.session_state.get("authentication_status"):
|
| 77 |
+
# Set doctor name from authenticated user
|
| 78 |
+
username = st.session_state.get("username", "")
|
| 79 |
+
user_info = DEFAULT_CREDENTIALS["usernames"].get(username, {})
|
| 80 |
+
st.session_state.doctor_name = user_info.get("name", username)
|
| 81 |
+
return True
|
| 82 |
+
|
| 83 |
+
elif st.session_state.get("authentication_status") is False:
|
| 84 |
+
st.error("❌ Usuario o contraseña incorrectos.")
|
| 85 |
+
return False
|
| 86 |
+
|
| 87 |
+
else:
|
| 88 |
+
st.info("👨⚕️ Inicie sesión para acceder al sistema de etiquetado.")
|
| 89 |
+
return False
|
| 90 |
+
|
| 91 |
+
|
| 92 |
+
def render_logout_button():
|
| 93 |
+
"""Show a logout button in the sidebar (only if auth is active)."""
|
| 94 |
+
if not AUTH_AVAILABLE:
|
| 95 |
+
return
|
| 96 |
+
|
| 97 |
+
if st.session_state.get("authentication_status"):
|
| 98 |
+
authenticator = _get_authenticator()
|
| 99 |
+
authenticator.logout("🚪 Cerrar sesión", location="sidebar")
|
interface/services/export_service.py
ADDED
|
@@ -0,0 +1,202 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""OphthalmoCapture — Export Service
|
| 2 |
+
|
| 3 |
+
Generates in-memory ZIP packages for individual images or the full session.
|
| 4 |
+
Also produces ML-ready formats (HuggingFace CSV, JSONL).
|
| 5 |
+
Everything is built from st.session_state — nothing touches disk.
|
| 6 |
+
"""
|
| 7 |
+
|
| 8 |
+
import io
|
| 9 |
+
import csv
|
| 10 |
+
import json
|
| 11 |
+
import zipfile
|
| 12 |
+
import datetime
|
| 13 |
+
import streamlit as st
|
| 14 |
+
|
| 15 |
+
|
| 16 |
+
def _sanitize(name: str) -> str:
|
| 17 |
+
"""Remove characters not safe for ZIP entry names."""
|
| 18 |
+
return "".join(c if c.isalnum() or c in "._- " else "_" for c in name)
|
| 19 |
+
|
| 20 |
+
|
| 21 |
+
def _image_metadata(img: dict) -> dict:
|
| 22 |
+
"""Build a JSON-serialisable metadata dict for one image."""
|
| 23 |
+
return {
|
| 24 |
+
"filename": img["filename"],
|
| 25 |
+
"label": img["label"],
|
| 26 |
+
"transcription": img["transcription"],
|
| 27 |
+
"transcription_original": img["transcription_original"],
|
| 28 |
+
"doctor": img.get("labeled_by", ""),
|
| 29 |
+
"timestamp": img["timestamp"].isoformat() if img.get("timestamp") else "",
|
| 30 |
+
"has_audio": img["audio_bytes"] is not None,
|
| 31 |
+
}
|
| 32 |
+
|
| 33 |
+
|
| 34 |
+
# ── Individual export ────────────────────────────────────────────────────────
|
| 35 |
+
|
| 36 |
+
def export_single_image(image_id: str) -> tuple[bytes, str]:
|
| 37 |
+
"""Create a ZIP for one image's labeling data.
|
| 38 |
+
|
| 39 |
+
Returns (zip_bytes, suggested_filename).
|
| 40 |
+
"""
|
| 41 |
+
img = st.session_state.images[image_id]
|
| 42 |
+
safe_name = _sanitize(img["filename"].rsplit(".", 1)[0])
|
| 43 |
+
folder = f"etiquetado_{safe_name}"
|
| 44 |
+
|
| 45 |
+
buf = io.BytesIO()
|
| 46 |
+
with zipfile.ZipFile(buf, "w", zipfile.ZIP_DEFLATED) as zf:
|
| 47 |
+
# metadata.json
|
| 48 |
+
meta = _image_metadata(img)
|
| 49 |
+
zf.writestr(f"{folder}/metadata.json", json.dumps(meta, ensure_ascii=False, indent=2))
|
| 50 |
+
|
| 51 |
+
# transcripcion.txt
|
| 52 |
+
zf.writestr(f"{folder}/transcripcion.txt", img["transcription"] or "")
|
| 53 |
+
|
| 54 |
+
# audio_dictado.wav (if recorded)
|
| 55 |
+
if img["audio_bytes"]:
|
| 56 |
+
zf.writestr(f"{folder}/audio_dictado.wav", img["audio_bytes"])
|
| 57 |
+
|
| 58 |
+
zip_bytes = buf.getvalue()
|
| 59 |
+
return zip_bytes, f"{folder}.zip"
|
| 60 |
+
|
| 61 |
+
|
| 62 |
+
# ── Bulk export (full session) ───────────────────────────────────────────────
|
| 63 |
+
|
| 64 |
+
def export_full_session() -> tuple[bytes, str]:
|
| 65 |
+
"""Create a ZIP with all images' labeling data + a summary CSV.
|
| 66 |
+
|
| 67 |
+
Returns (zip_bytes, suggested_filename).
|
| 68 |
+
"""
|
| 69 |
+
now = datetime.datetime.now().strftime("%Y-%m-%d_%H%M")
|
| 70 |
+
root = f"sesion_{now}"
|
| 71 |
+
images = st.session_state.images
|
| 72 |
+
order = st.session_state.image_order
|
| 73 |
+
|
| 74 |
+
buf = io.BytesIO()
|
| 75 |
+
with zipfile.ZipFile(buf, "w", zipfile.ZIP_DEFLATED) as zf:
|
| 76 |
+
# ── Summary CSV ──────────────────────────────────────────────────
|
| 77 |
+
csv_buf = io.StringIO()
|
| 78 |
+
writer = csv.writer(csv_buf)
|
| 79 |
+
writer.writerow(["filename", "label", "has_audio", "has_transcription", "doctor"])
|
| 80 |
+
for img_id in order:
|
| 81 |
+
img = images[img_id]
|
| 82 |
+
writer.writerow([
|
| 83 |
+
img["filename"],
|
| 84 |
+
img["label"] or "",
|
| 85 |
+
"yes" if img["audio_bytes"] else "no",
|
| 86 |
+
"yes" if img["transcription"] else "no",
|
| 87 |
+
img.get("labeled_by", ""),
|
| 88 |
+
])
|
| 89 |
+
zf.writestr(f"{root}/resumen.csv", csv_buf.getvalue())
|
| 90 |
+
|
| 91 |
+
# ── Full metadata JSON ───────────────────────────────────────────
|
| 92 |
+
all_meta = []
|
| 93 |
+
for img_id in order:
|
| 94 |
+
all_meta.append(_image_metadata(images[img_id]))
|
| 95 |
+
zf.writestr(
|
| 96 |
+
f"{root}/etiquetas.json",
|
| 97 |
+
json.dumps(all_meta, ensure_ascii=False, indent=2),
|
| 98 |
+
)
|
| 99 |
+
|
| 100 |
+
# ── Per-image folders ────────────────────────────────────────────
|
| 101 |
+
for idx, img_id in enumerate(order, start=1):
|
| 102 |
+
img = images[img_id]
|
| 103 |
+
safe_name = _sanitize(img["filename"].rsplit(".", 1)[0])
|
| 104 |
+
img_folder = f"{root}/{idx:03d}_{safe_name}"
|
| 105 |
+
|
| 106 |
+
meta = _image_metadata(img)
|
| 107 |
+
zf.writestr(f"{img_folder}/metadata.json", json.dumps(meta, ensure_ascii=False, indent=2))
|
| 108 |
+
zf.writestr(f"{img_folder}/transcripcion.txt", img["transcription"] or "")
|
| 109 |
+
|
| 110 |
+
if img["audio_bytes"]:
|
| 111 |
+
zf.writestr(f"{img_folder}/audio_dictado.wav", img["audio_bytes"])
|
| 112 |
+
|
| 113 |
+
zip_bytes = buf.getvalue()
|
| 114 |
+
return zip_bytes, f"{root}.zip"
|
| 115 |
+
|
| 116 |
+
|
| 117 |
+
# ── Session summary ──────────────────────────────────────────────────────────
|
| 118 |
+
|
| 119 |
+
def get_session_summary() -> dict:
|
| 120 |
+
"""Return a summary dict for pre-download validation."""
|
| 121 |
+
images = st.session_state.images
|
| 122 |
+
total = len(images)
|
| 123 |
+
labeled = sum(1 for img in images.values() if img["label"] is not None)
|
| 124 |
+
with_audio = sum(1 for img in images.values() if img["audio_bytes"] is not None)
|
| 125 |
+
with_text = sum(1 for img in images.values() if img["transcription"])
|
| 126 |
+
return {
|
| 127 |
+
"total": total,
|
| 128 |
+
"labeled": labeled,
|
| 129 |
+
"with_audio": with_audio,
|
| 130 |
+
"with_transcription": with_text,
|
| 131 |
+
"unlabeled": total - labeled,
|
| 132 |
+
}
|
| 133 |
+
|
| 134 |
+
|
| 135 |
+
# ── ML-ready export formats (Idea F) ────────────────────────────────────────
|
| 136 |
+
|
| 137 |
+
def export_huggingface_csv() -> tuple[bytes, str]:
|
| 138 |
+
"""Export a CSV compatible with HuggingFace datasets.
|
| 139 |
+
|
| 140 |
+
Columns: filename, label, label_code, transcription, doctor
|
| 141 |
+
Only labeled images are included.
|
| 142 |
+
|
| 143 |
+
Returns (csv_bytes, suggested_filename).
|
| 144 |
+
"""
|
| 145 |
+
import config
|
| 146 |
+
|
| 147 |
+
images = st.session_state.images
|
| 148 |
+
order = st.session_state.image_order
|
| 149 |
+
label_map = {opt["display"]: opt["code"] for opt in config.LABEL_OPTIONS}
|
| 150 |
+
|
| 151 |
+
buf = io.StringIO()
|
| 152 |
+
writer = csv.writer(buf)
|
| 153 |
+
writer.writerow(["filename", "label", "label_code", "transcription", "doctor"])
|
| 154 |
+
|
| 155 |
+
for img_id in order:
|
| 156 |
+
img = images[img_id]
|
| 157 |
+
if img["label"] is None:
|
| 158 |
+
continue
|
| 159 |
+
writer.writerow([
|
| 160 |
+
img["filename"],
|
| 161 |
+
img["label"],
|
| 162 |
+
label_map.get(img["label"], ""),
|
| 163 |
+
img["transcription"],
|
| 164 |
+
img.get("labeled_by", ""),
|
| 165 |
+
])
|
| 166 |
+
|
| 167 |
+
csv_bytes = buf.getvalue().encode("utf-8")
|
| 168 |
+
now = datetime.datetime.now().strftime("%Y%m%d_%H%M")
|
| 169 |
+
return csv_bytes, f"dataset_hf_{now}.csv"
|
| 170 |
+
|
| 171 |
+
|
| 172 |
+
def export_jsonl() -> tuple[bytes, str]:
|
| 173 |
+
"""Export JSONL (one JSON object per line) suitable for LLM fine-tuning.
|
| 174 |
+
|
| 175 |
+
Each line: {"filename", "label", "label_code", "transcription", "doctor"}
|
| 176 |
+
Only labeled images are included.
|
| 177 |
+
|
| 178 |
+
Returns (jsonl_bytes, suggested_filename).
|
| 179 |
+
"""
|
| 180 |
+
import config
|
| 181 |
+
|
| 182 |
+
images = st.session_state.images
|
| 183 |
+
order = st.session_state.image_order
|
| 184 |
+
label_map = {opt["display"]: opt["code"] for opt in config.LABEL_OPTIONS}
|
| 185 |
+
|
| 186 |
+
lines = []
|
| 187 |
+
for img_id in order:
|
| 188 |
+
img = images[img_id]
|
| 189 |
+
if img["label"] is None:
|
| 190 |
+
continue
|
| 191 |
+
obj = {
|
| 192 |
+
"filename": img["filename"],
|
| 193 |
+
"label": img["label"],
|
| 194 |
+
"label_code": label_map.get(img["label"], ""),
|
| 195 |
+
"transcription": img["transcription"],
|
| 196 |
+
"doctor": img.get("labeled_by", ""),
|
| 197 |
+
}
|
| 198 |
+
lines.append(json.dumps(obj, ensure_ascii=False))
|
| 199 |
+
|
| 200 |
+
jsonl_bytes = "\n".join(lines).encode("utf-8")
|
| 201 |
+
now = datetime.datetime.now().strftime("%Y%m%d_%H%M")
|
| 202 |
+
return jsonl_bytes, f"dataset_{now}.jsonl"
|
interface/services/session_manager.py
ADDED
|
@@ -0,0 +1,157 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
OphthalmoCapture — Ephemeral Session Manager
|
| 3 |
+
|
| 4 |
+
All image data lives exclusively in st.session_state (RAM).
|
| 5 |
+
Nothing is written to disk. Data is only persisted when the user
|
| 6 |
+
explicitly downloads their labeling package.
|
| 7 |
+
"""
|
| 8 |
+
|
| 9 |
+
import streamlit as st
|
| 10 |
+
import uuid
|
| 11 |
+
import datetime
|
| 12 |
+
import gc
|
| 13 |
+
|
| 14 |
+
|
| 15 |
+
def init_session():
|
| 16 |
+
"""Initialize the ephemeral session data model."""
|
| 17 |
+
if "session_initialized" not in st.session_state:
|
| 18 |
+
st.session_state.session_initialized = True
|
| 19 |
+
st.session_state.session_id = str(uuid.uuid4()) # unique per session
|
| 20 |
+
st.session_state.images = {} # {uuid_str: image_data_dict}
|
| 21 |
+
st.session_state.image_order = [] # [uuid_str, ...] upload order
|
| 22 |
+
st.session_state.current_image_id = None
|
| 23 |
+
st.session_state.last_activity = datetime.datetime.now()
|
| 24 |
+
st.session_state.doctor_name = ""
|
| 25 |
+
st.session_state.confirm_end_session = False
|
| 26 |
+
|
| 27 |
+
|
| 28 |
+
def add_image(filename: str, image_bytes: bytes) -> str:
|
| 29 |
+
"""Add an uploaded image to the in-memory session store.
|
| 30 |
+
|
| 31 |
+
Returns the generated UUID for the image.
|
| 32 |
+
"""
|
| 33 |
+
img_id = str(uuid.uuid4())
|
| 34 |
+
st.session_state.images[img_id] = {
|
| 35 |
+
"filename": filename,
|
| 36 |
+
"bytes": image_bytes,
|
| 37 |
+
"label": None, # Set during labeling (Phase 3)
|
| 38 |
+
"audio_bytes": None, # WAV from recording (Phase 4)
|
| 39 |
+
"transcription": "", # Editable transcription text
|
| 40 |
+
"transcription_original": "", # Original Whisper output (read-only)
|
| 41 |
+
"timestamp": datetime.datetime.now(),
|
| 42 |
+
"labeled_by": st.session_state.get("doctor_name", ""),
|
| 43 |
+
}
|
| 44 |
+
st.session_state.image_order.append(img_id)
|
| 45 |
+
update_activity()
|
| 46 |
+
return img_id
|
| 47 |
+
|
| 48 |
+
|
| 49 |
+
def remove_image(img_id: str):
|
| 50 |
+
"""Remove a single image from the session, freeing memory."""
|
| 51 |
+
if img_id in st.session_state.images:
|
| 52 |
+
# Explicitly clear heavy byte fields before deletion
|
| 53 |
+
st.session_state.images[img_id]["bytes"] = None
|
| 54 |
+
st.session_state.images[img_id]["audio_bytes"] = None
|
| 55 |
+
del st.session_state.images[img_id]
|
| 56 |
+
|
| 57 |
+
if img_id in st.session_state.image_order:
|
| 58 |
+
st.session_state.image_order.remove(img_id)
|
| 59 |
+
|
| 60 |
+
# Update current selection if the deleted image was active
|
| 61 |
+
if st.session_state.current_image_id == img_id:
|
| 62 |
+
if st.session_state.image_order:
|
| 63 |
+
st.session_state.current_image_id = st.session_state.image_order[0]
|
| 64 |
+
else:
|
| 65 |
+
st.session_state.current_image_id = None
|
| 66 |
+
|
| 67 |
+
|
| 68 |
+
def get_current_image():
|
| 69 |
+
"""Get the data dict for the currently selected image, or None."""
|
| 70 |
+
img_id = st.session_state.get("current_image_id")
|
| 71 |
+
if img_id and img_id in st.session_state.images:
|
| 72 |
+
return st.session_state.images[img_id]
|
| 73 |
+
return None
|
| 74 |
+
|
| 75 |
+
|
| 76 |
+
def get_current_image_id():
|
| 77 |
+
"""Get the UUID of the currently selected image."""
|
| 78 |
+
return st.session_state.get("current_image_id")
|
| 79 |
+
|
| 80 |
+
|
| 81 |
+
def set_current_image(img_id: str):
|
| 82 |
+
"""Set the currently active image by UUID."""
|
| 83 |
+
if img_id in st.session_state.images:
|
| 84 |
+
st.session_state.current_image_id = img_id
|
| 85 |
+
update_activity()
|
| 86 |
+
|
| 87 |
+
|
| 88 |
+
def get_image_count() -> int:
|
| 89 |
+
"""Total number of images in session."""
|
| 90 |
+
return len(st.session_state.images)
|
| 91 |
+
|
| 92 |
+
|
| 93 |
+
def get_labeling_progress():
|
| 94 |
+
"""Return (labeled_count, total_count)."""
|
| 95 |
+
total = len(st.session_state.images)
|
| 96 |
+
labeled = sum(
|
| 97 |
+
1 for img in st.session_state.images.values()
|
| 98 |
+
if img["label"] is not None
|
| 99 |
+
)
|
| 100 |
+
return labeled, total
|
| 101 |
+
|
| 102 |
+
|
| 103 |
+
def has_undownloaded_data() -> bool:
|
| 104 |
+
"""Check if there is any data in the session."""
|
| 105 |
+
return len(st.session_state.images) > 0
|
| 106 |
+
|
| 107 |
+
|
| 108 |
+
def update_activity():
|
| 109 |
+
"""Update the last activity timestamp."""
|
| 110 |
+
st.session_state.last_activity = datetime.datetime.now()
|
| 111 |
+
|
| 112 |
+
|
| 113 |
+
def check_session_timeout(timeout_minutes: int = 30) -> bool:
|
| 114 |
+
"""Return True if the session has exceeded the inactivity timeout."""
|
| 115 |
+
last = st.session_state.get("last_activity")
|
| 116 |
+
if last:
|
| 117 |
+
elapsed = (datetime.datetime.now() - last).total_seconds() / 60
|
| 118 |
+
return elapsed > timeout_minutes
|
| 119 |
+
return False
|
| 120 |
+
|
| 121 |
+
|
| 122 |
+
def clear_session():
|
| 123 |
+
"""Completely wipe all session data — images, audio, everything.
|
| 124 |
+
|
| 125 |
+
Called on explicit cleanup or session timeout.
|
| 126 |
+
"""
|
| 127 |
+
# Explicitly null out heavy byte fields to help garbage collection
|
| 128 |
+
for img in st.session_state.get("images", {}).values():
|
| 129 |
+
img["bytes"] = None
|
| 130 |
+
img["audio_bytes"] = None
|
| 131 |
+
st.session_state.clear()
|
| 132 |
+
gc.collect()
|
| 133 |
+
|
| 134 |
+
|
| 135 |
+
def get_remaining_timeout_minutes(timeout_minutes: int = 30) -> float:
|
| 136 |
+
"""Return how many minutes remain before timeout, or 0 if already expired."""
|
| 137 |
+
last = st.session_state.get("last_activity")
|
| 138 |
+
if not last:
|
| 139 |
+
return 0.0
|
| 140 |
+
elapsed = (datetime.datetime.now() - last).total_seconds() / 60
|
| 141 |
+
remaining = timeout_minutes - elapsed
|
| 142 |
+
return max(0.0, remaining)
|
| 143 |
+
|
| 144 |
+
|
| 145 |
+
def get_session_data_summary() -> dict:
|
| 146 |
+
"""Return a summary of what data exists in the session (for warnings)."""
|
| 147 |
+
images = st.session_state.get("images", {})
|
| 148 |
+
total = len(images)
|
| 149 |
+
labeled = sum(1 for img in images.values() if img["label"] is not None)
|
| 150 |
+
with_audio = sum(1 for img in images.values() if img["audio_bytes"] is not None)
|
| 151 |
+
with_text = sum(1 for img in images.values() if img["transcription"])
|
| 152 |
+
return {
|
| 153 |
+
"total": total,
|
| 154 |
+
"labeled": labeled,
|
| 155 |
+
"with_audio": with_audio,
|
| 156 |
+
"with_transcription": with_text,
|
| 157 |
+
}
|
interface/services/whisper_service.py
ADDED
|
@@ -0,0 +1,102 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""OphthalmoCapture — Whisper Transcription Service
|
| 2 |
+
|
| 3 |
+
Encapsulates all Whisper-related logic: model loading, transcription,
|
| 4 |
+
and segment-level timestamps. Temporary files are ALWAYS cleaned up.
|
| 5 |
+
"""
|
| 6 |
+
|
| 7 |
+
import os
|
| 8 |
+
import shutil
|
| 9 |
+
import tempfile
|
| 10 |
+
import streamlit as st
|
| 11 |
+
import whisper
|
| 12 |
+
|
| 13 |
+
# ── Ensure ffmpeg is available ───────────────────────────────────────────────
|
| 14 |
+
# If system ffmpeg is not in PATH, use the bundled one from imageio-ffmpeg.
|
| 15 |
+
if shutil.which("ffmpeg") is None:
|
| 16 |
+
try:
|
| 17 |
+
import imageio_ffmpeg
|
| 18 |
+
_ffmpeg_real = imageio_ffmpeg.get_ffmpeg_exe()
|
| 19 |
+
# The bundled binary has a long name; create an alias as ffmpeg.exe
|
| 20 |
+
# next to it so that Whisper (which calls "ffmpeg") can find it.
|
| 21 |
+
_ffmpeg_alias = os.path.join(os.path.dirname(_ffmpeg_real), "ffmpeg.exe")
|
| 22 |
+
if not os.path.exists(_ffmpeg_alias):
|
| 23 |
+
try:
|
| 24 |
+
os.link(_ffmpeg_real, _ffmpeg_alias) # hard link (no admin)
|
| 25 |
+
except OSError:
|
| 26 |
+
import shutil as _sh
|
| 27 |
+
_sh.copy2(_ffmpeg_real, _ffmpeg_alias) # fallback: copy
|
| 28 |
+
os.environ["PATH"] = (
|
| 29 |
+
os.path.dirname(_ffmpeg_alias) + os.pathsep + os.environ.get("PATH", "")
|
| 30 |
+
)
|
| 31 |
+
except ImportError:
|
| 32 |
+
pass # Will fail later with a clear Whisper error
|
| 33 |
+
|
| 34 |
+
|
| 35 |
+
@st.cache_resource
|
| 36 |
+
def load_whisper_model(model_size: str):
|
| 37 |
+
"""Load and cache a Whisper model."""
|
| 38 |
+
print(f"Loading Whisper model: {model_size}...")
|
| 39 |
+
return whisper.load_model(model_size)
|
| 40 |
+
|
| 41 |
+
|
| 42 |
+
def transcribe_audio(model, audio_bytes: bytes, language: str = "es") -> str:
|
| 43 |
+
"""Transcribe raw WAV bytes and return plain text.
|
| 44 |
+
|
| 45 |
+
The temporary file is **always** deleted (try/finally).
|
| 46 |
+
"""
|
| 47 |
+
tmp_path = None
|
| 48 |
+
try:
|
| 49 |
+
with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as tmp:
|
| 50 |
+
tmp.write(audio_bytes)
|
| 51 |
+
tmp_path = tmp.name
|
| 52 |
+
|
| 53 |
+
result = model.transcribe(tmp_path, language=language)
|
| 54 |
+
return result.get("text", "").strip()
|
| 55 |
+
except Exception as e:
|
| 56 |
+
st.error(f"Error de transcripción: {e}")
|
| 57 |
+
return ""
|
| 58 |
+
finally:
|
| 59 |
+
if tmp_path and os.path.exists(tmp_path):
|
| 60 |
+
os.unlink(tmp_path)
|
| 61 |
+
|
| 62 |
+
|
| 63 |
+
def transcribe_audio_with_timestamps(
|
| 64 |
+
model, audio_bytes: bytes, language: str = "es"
|
| 65 |
+
) -> tuple[str, list[dict]]:
|
| 66 |
+
"""Transcribe raw WAV bytes and return (plain_text, segments).
|
| 67 |
+
|
| 68 |
+
Each segment dict contains:
|
| 69 |
+
{"start": float, "end": float, "text": str}
|
| 70 |
+
|
| 71 |
+
Useful for syncing transcript highlights with audio playback.
|
| 72 |
+
"""
|
| 73 |
+
tmp_path = None
|
| 74 |
+
try:
|
| 75 |
+
with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as tmp:
|
| 76 |
+
tmp.write(audio_bytes)
|
| 77 |
+
tmp_path = tmp.name
|
| 78 |
+
|
| 79 |
+
result = model.transcribe(tmp_path, language=language)
|
| 80 |
+
text = result.get("text", "").strip()
|
| 81 |
+
|
| 82 |
+
segments = []
|
| 83 |
+
for seg in result.get("segments", []):
|
| 84 |
+
segments.append({
|
| 85 |
+
"start": round(seg["start"], 2),
|
| 86 |
+
"end": round(seg["end"], 2),
|
| 87 |
+
"text": seg["text"].strip(),
|
| 88 |
+
})
|
| 89 |
+
|
| 90 |
+
return text, segments
|
| 91 |
+
except Exception as e:
|
| 92 |
+
st.error(f"Error de transcripción: {e}")
|
| 93 |
+
return "", []
|
| 94 |
+
finally:
|
| 95 |
+
if tmp_path and os.path.exists(tmp_path):
|
| 96 |
+
os.unlink(tmp_path)
|
| 97 |
+
|
| 98 |
+
|
| 99 |
+
def format_timestamp(seconds: float) -> str:
|
| 100 |
+
"""Convert seconds to MM:SS format."""
|
| 101 |
+
m, s = divmod(int(seconds), 60)
|
| 102 |
+
return f"{m:02d}:{s:02d}"
|
interface/utils.py
CHANGED
|
@@ -1,67 +1,31 @@
|
|
| 1 |
-
|
| 2 |
-
|
| 3 |
import os
|
| 4 |
-
import pandas as pd
|
| 5 |
|
| 6 |
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
"""
|
| 10 |
-
|
| 11 |
-
|
|
|
|
|
|
|
|
|
|
| 12 |
|
| 13 |
def setup_env():
|
| 14 |
-
"""
|
| 15 |
os.environ["KMP_DUPLICATE_LIB_OK"] = "TRUE"
|
| 16 |
|
| 17 |
-
def load_dataset(csv_path, image_folder):
|
| 18 |
-
"""
|
| 19 |
-
Reads a CSV and checks for image existence.
|
| 20 |
-
Expected CSV columns: 'filename' (required), 'label' (optional).
|
| 21 |
-
"""
|
| 22 |
-
images_list = []
|
| 23 |
-
|
| 24 |
-
# 1. Check if CSV exists
|
| 25 |
-
if not os.path.exists(csv_path):
|
| 26 |
-
st.error(f"⚠️ CSV file not found: {csv_path}")
|
| 27 |
-
return []
|
| 28 |
-
|
| 29 |
-
try:
|
| 30 |
-
df = pd.read_csv(csv_path)
|
| 31 |
-
except Exception as e:
|
| 32 |
-
st.error(f"Error reading CSV: {e}")
|
| 33 |
-
return []
|
| 34 |
|
| 35 |
-
|
| 36 |
-
|
| 37 |
-
filename_col = 'filename'
|
| 38 |
-
if 'filename' not in df.columns:
|
| 39 |
-
filename_col = df.columns[0]
|
| 40 |
-
st.warning(f"Column 'filename' not found. Using '{filename_col}' as filename.")
|
| 41 |
|
| 42 |
-
|
| 43 |
-
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
|
| 47 |
-
|
| 48 |
-
|
| 49 |
-
|
| 50 |
-
|
| 51 |
-
for ext in ['.jpg', '.png', '.jpeg', '.tif']:
|
| 52 |
-
if os.path.exists(full_path + ext):
|
| 53 |
-
full_path = full_path + ext
|
| 54 |
-
break
|
| 55 |
-
|
| 56 |
-
# Only add if file actually exists
|
| 57 |
-
if os.path.exists(full_path):
|
| 58 |
-
images_list.append({
|
| 59 |
-
"id": base_name,
|
| 60 |
-
"label": row.get('label', base_name), # Use 'label' column or fallback to name
|
| 61 |
-
"url": full_path # Streamlit accepts local paths here
|
| 62 |
-
})
|
| 63 |
-
|
| 64 |
-
if not images_list:
|
| 65 |
-
st.warning(f"No valid images found in '{image_folder}' matching the CSV.")
|
| 66 |
-
|
| 67 |
-
return images_list
|
|
|
|
| 1 |
+
"""OphthalmoCapture — Utility Functions."""
|
| 2 |
+
|
| 3 |
import os
|
|
|
|
| 4 |
|
| 5 |
|
| 6 |
+
# Known image magic byte signatures
|
| 7 |
+
_IMAGE_SIGNATURES = [
|
| 8 |
+
(b"\xff\xd8\xff", "JPEG"),
|
| 9 |
+
(b"\x89PNG\r\n\x1a\n", "PNG"),
|
| 10 |
+
(b"II\x2a\x00", "TIFF (LE)"),
|
| 11 |
+
(b"MM\x00\x2a", "TIFF (BE)"),
|
| 12 |
+
]
|
| 13 |
+
|
| 14 |
|
| 15 |
def setup_env():
|
| 16 |
+
"""Set up environment variables."""
|
| 17 |
os.environ["KMP_DUPLICATE_LIB_OK"] = "TRUE"
|
| 18 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 19 |
|
| 20 |
+
def validate_image_bytes(data: bytes) -> bool:
|
| 21 |
+
"""Verify that *data* starts with a known image magic-byte header.
|
|
|
|
|
|
|
|
|
|
|
|
|
| 22 |
|
| 23 |
+
Returns True if valid, False otherwise. This prevents non-image files
|
| 24 |
+
from being accepted even if they have a valid extension.
|
| 25 |
+
"""
|
| 26 |
+
if not data or len(data) < 8:
|
| 27 |
+
return False
|
| 28 |
+
for sig, _ in _IMAGE_SIGNATURES:
|
| 29 |
+
if data[: len(sig)] == sig:
|
| 30 |
+
return True
|
| 31 |
+
return False
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
requirements.txt
CHANGED
|
@@ -1,5 +1,6 @@
|
|
| 1 |
streamlit
|
| 2 |
openai-whisper
|
|
|
|
| 3 |
torch
|
| 4 |
pandas
|
| 5 |
firebase-admin
|
|
@@ -7,4 +8,5 @@ notebook
|
|
| 7 |
transformers
|
| 8 |
pillow
|
| 9 |
whisper
|
| 10 |
-
numba
|
|
|
|
|
|
| 1 |
streamlit
|
| 2 |
openai-whisper
|
| 3 |
+
imageio-ffmpeg
|
| 4 |
torch
|
| 5 |
pandas
|
| 6 |
firebase-admin
|
|
|
|
| 8 |
transformers
|
| 9 |
pillow
|
| 10 |
whisper
|
| 11 |
+
numba
|
| 12 |
+
streamlit-authenticator
|