Spaces:

borisbob91
/

wami

Sleeping

Bgk Injector SqLi commited on Feb 24

Commit

fa547a0

1 Parent(s): 84f49d6

Deploy Wami Dioula STT & TTS API

- FastAPI app with Speech-to-Text and Text-to-Speech
- Support for Dioula language (facebook/mms models)
- Docker configuration for HF Spaces
- CORS enabled for public API access
- Interactive documentation (Swagger + ReDoc)
- HF_TOKEN configured in Dockerfile

Files changed (5) hide show

.dockerignore +14 -0
Dockerfile +21 -0
README.md +87 -8
app.py +281 -0
requirements.txt +8 -0

.dockerignore ADDED Viewed

	@@ -0,0 +1,14 @@

+__pycache__/
+*.pyc
+*.pyo
+*.pyd
+.pytest_cache/
+.venv/
+venv/
+ENV/
+.git/
+.gitignore
+*.md
+!README.md
+test_api.py
+*.log

Dockerfile ADDED Viewed

	@@ -0,0 +1,21 @@

+FROM python:3.11-slim
+WORKDIR /app
+# Installer ffmpeg pour la conversion audio
+RUN apt-get update && apt-get install -y \
+    ffmpeg \
+    && rm -rf /var/lib/apt/lists/*
+# Copier les fichiers
+COPY requirements.txt .
+COPY app.py .
+# Installer les dépendances Python
+RUN pip install --no-cache-dir -r requirements.txt
+# Port pour Hugging Face Spaces
+EXPOSE 7860
+# Lancer l'application
+CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860"]

README.md CHANGED Viewed

@@ -1,12 +1,91 @@
 ---
-title: Wami
-emoji: 🏢
-colorFrom: yellow
-colorTo: purple
 sdk: docker
-pinned: false
-license: apache-2.0
-short_description: 'wami lingual '
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: Wami - Dioula STT & TTS API
+emoji: 🎙️
+colorFrom: blue
+colorTo: green
 sdk: docker
+app_port: 7860
 ---
+# Wami - API Dioula STT & TTS
+API de reconnaissance vocale (Speech-to-Text) et synthèse vocale (Text-to-Speech) en langue Dioula.
+## 🚀 Utilisation
+### Endpoints disponibles
+#### 1. Speech-to-Text (STT)
+Transcrit un fichier audio en texte Dioula.
+```bash
+curl -X POST https://votre-space-name.hf.space/api/stt \
+  -F "audio=@recording.wav"
+```
+**Réponse:**
+```json
+{
+  "transcription": "texte transcrit en dioula"
+}
+```
+#### 2. Text-to-Speech (TTS)
+Génère un audio en Dioula depuis du texte.
+```bash
+curl -X POST https://votre-space-name.hf.space/api/tts \
+  -F "text=na an be do minkɛ" \
+  -o output.wav
+```
+**Réponse:** Fichier audio WAV
+#### 3. Health Check
+Vérifie le statut de l'API.
+```bash
+curl https://votre-space-name.hf.space/health
+```
+**Réponse:**
+```json
+{
+  "status": "healthy",
+  "device": "cuda",
+  "models_loaded": {
+    "stt": true,
+    "tts": true
+  }
+}
+```
+## 📖 Documentation interactive
+- **Swagger UI:** `https://votre-space-name.hf.space/docs`
+- **ReDoc:** `https://votre-space-name.hf.space/redoc`
+## 🔧 Modèles utilisés
+- **STT:** [facebook/mms-1b-all](https://huggingface.co/facebook/mms-1b-all) (adapter Dioula)
+- **TTS:** [facebook/mms-tts-dyu](https://huggingface.co/facebook/mms-tts-dyu)
+## 💻 Déploiement local
+```bash
+pip install -r requirements.txt
+python app.py
+```
+Ouvrez [http://localhost:7860](http://localhost:7860)
+## 🌍 À propos du Dioula
+Le Dioula (code langue: `dyu`) est une langue mandée parlée principalement en Côte d'Ivoire, au Burkina Faso et au Mali.
+## 📝 Licence
+Les modèles utilisés sont sous licence Apache 2.0. Voir les pages des modèles pour plus de détails.

app.py ADDED Viewed

	@@ -0,0 +1,281 @@

+import io
+import os
+import tempfile
+from pathlib import Path
+import numpy as np
+import scipy.io.wavfile
+import soundfile as sf
+import torch
+import torchaudio
+from fastapi import FastAPI, File, Form, HTTPException, Request, UploadFile
+from fastapi.responses import FileResponse, HTMLResponse, JSONResponse
+from fastapi.middleware.cors import CORSMiddleware
+app = FastAPI(
+    title="Wami - Dioula STT & TTS API",
+    description="API de reconnaissance vocale (STT) et synthèse vocale (TTS) en Dioula",
+    version="1.0.0"
+)
+# CORS pour permettre les appels depuis n'importe quel domaine
+app.add_middleware(
+    CORSMiddleware,
+    allow_origins=["*"],
+    allow_credentials=True,
+    allow_methods=["*"],
+    allow_headers=["*"],
+)
+# Gestionnaires d'erreur globaux
+@app.exception_handler(HTTPException)
+async def http_exception_handler(request: Request, exc: HTTPException):
+    return JSONResponse(
+        status_code=exc.status_code,
+        content={"error": exc.detail}
+    )
+@app.exception_handler(Exception)
+async def global_exception_handler(request: Request, exc: Exception):
+    return JSONResponse(
+        status_code=500,
+        content={"error": f"Erreur serveur: {str(exc)}"}
+    )
+# Globals
+stt_processor = None
+stt_model = None
+tts_tokenizer = None
+tts_model = None
+device = "cpu"
+@app.on_event("startup")
+def load_models():
+    global stt_processor, stt_model, tts_tokenizer, tts_model, device
+    device = "cuda" if torch.cuda.is_available() else "cpu"
+    print(f"🚀 Device: {device}")
+    # STT
+    from transformers import AutoProcessor, Wav2Vec2ForCTC
+    print("⏳ Chargement du modèle STT (Dioula)...")
+    stt_processor = AutoProcessor.from_pretrained("facebook/mms-1b-all", target_lang="dyu")
+    stt_model = Wav2Vec2ForCTC.from_pretrained(
+        "facebook/mms-1b-all",
+        target_lang="dyu",
+        ignore_mismatched_sizes=True
+    )
+    stt_model.load_adapter("dyu")
+    stt_model.to(device)
+    print("✅ STT prêt!")
+    # TTS
+    from transformers import AutoTokenizer, VitsModel
+    print("⏳ Chargement du modèle TTS (Dioula)...")
+    tts_tokenizer = AutoTokenizer.from_pretrained("facebook/mms-tts-dyu")
+    tts_model = VitsModel.from_pretrained("facebook/mms-tts-dyu").to(device)
+    print("✅ TTS prêt!")
+# Page d'accueil avec documentation
+@app.get("/", response_class=HTMLResponse)
+def home():
+    return """
+    <!DOCTYPE html>
+    <html lang="fr">
+    <head>
+        <meta charset="UTF-8">
+        <meta name="viewport" content="width=device-width, initial-scale=1.0">
+        <title>Wami - API Dioula STT & TTS</title>
+        <style>
+            body { font-family: system-ui; max-width: 800px; margin: 40px auto; padding: 20px; line-height: 1.6; }
+            h1 { color: #2563eb; }
+            h2 { color: #1e40af; margin-top: 30px; }
+            code { background: #f1f5f9; padding: 2px 6px; border-radius: 4px; }
+            pre { background: #0f172a; color: #e2e8f0; padding: 16px; border-radius: 8px; overflow-x: auto; }
+            .endpoint { background: #f8fafc; padding: 16px; border-left: 4px solid #3b82f6; margin: 16px 0; }
+            .method { display: inline-block; padding: 4px 8px; border-radius: 4px; font-weight: bold; margin-right: 8px; }
+            .post { background: #10b981; color: white; }
+            .get { background: #3b82f6; color: white; }
+        </style>
+    </head>
+    <body>
+        <h1>🎙️ Wami - API Dioula STT & TTS</h1>
+        <p>API de reconnaissance vocale (Speech-to-Text) et synthèse vocale (Text-to-Speech) en Dioula.</p>
+        <h2>📖 Endpoints</h2>
+        <div class="endpoint">
+            <p><span class="method get">GET</span> <code>/</code></p>
+            <p>Cette page de documentation</p>
+        </div>
+        <div class="endpoint">
+            <p><span class="method get">GET</span> <code>/health</code></p>
+            <p>Statut de l'API et des modèles</p>
+        </div>
+        <div class="endpoint">
+            <p><span class="method post">POST</span> <code>/api/stt</code></p>
+            <p><strong>Speech-to-Text</strong> - Transcrit un fichier audio en texte Dioula</p>
+            <p><strong>Entrée:</strong> Fichier audio (WebM, WAV, MP3)</p>
+            <p><strong>Sortie:</strong> <code>{"transcription": "texte en dioula"}</code></p>
+            <pre>curl -X POST https://votre-space.hf.space/api/stt \\
+  -F "audio=@recording.wav"</pre>
+        </div>
+        <div class="endpoint">
+            <p><span class="method post">POST</span> <code>/api/tts</code></p>
+            <p><strong>Text-to-Speech</strong> - Génère un audio en Dioula depuis du texte</p>
+            <p><strong>Entrée:</strong> Texte en Dioula (paramètre <code>text</code>)</p>
+            <p><strong>Sortie:</strong> Fichier WAV</p>
+            <pre>curl -X POST https://votre-space.hf.space/api/tts \\
+  -F "text=na an be do minkɛ" \\
+  -o output.wav</pre>
+        </div>
+        <h2>🔗 Documentation interactive</h2>
+        <p>
+            <a href="/docs">Swagger UI</a> |
+            <a href="/redoc">ReDoc</a>
+        </p>
+        <h2>ℹ️ Modèles</h2>
+        <ul>
+            <li><strong>STT:</strong> facebook/mms-1b-all (adapter Dioula)</li>
+            <li><strong>TTS:</strong> facebook/mms-tts-dyu</li>
+        </ul>
+    </body>
+    </html>
+    """
+@app.get("/health")
+def health_check():
+    """Vérifie le statut de l'API et des modèles"""
+    return {
+        "status": "healthy",
+        "device": device,
+        "models_loaded": {
+            "stt": stt_model is not None,
+            "tts": tts_model is not None
+        }
+    }
+@app.post("/api/stt")
+async def speech_to_text(audio: UploadFile = File(...)):
+    """
+    Transcrit un fichier audio en texte Dioula
+    - **audio**: Fichier audio (WebM, WAV, MP3, etc.)
+    """
+    tmp_input = None
+    tmp_wav = None
+    try:
+        audio_bytes = await audio.read()
+        # Déterminer l'extension
+        content_type = audio.content_type or ""
+        if "webm" in content_type:
+            suffix = ".webm"
+        elif "wav" in content_type:
+            suffix = ".wav"
+        elif "mp3" in content_type:
+            suffix = ".mp3"
+        else:
+            suffix = ".webm"
+        # Sauvegarder temporairement
+        tmp_input = tempfile.NamedTemporaryFile(suffix=suffix, delete=False)
+        tmp_input.write(audio_bytes)
+        tmp_input.close()
+        # Convertir en WAV si nécessaire
+        if suffix != ".wav":
+            try:
+                audio_data, sample_rate = sf.read(tmp_input.name)
+                tmp_wav = tempfile.NamedTemporaryFile(suffix=".wav", delete=False)
+                tmp_wav.close()
+                sf.write(tmp_wav.name, audio_data, sample_rate)
+                audio_path = tmp_wav.name
+            except Exception as e:
+                raise HTTPException(
+                    status_code=400,
+                    detail=f"Impossible de lire l'audio. Format non supporté. Erreur: {str(e)}"
+                )
+        else:
+            audio_path = tmp_input.name
+        # Charger avec torchaudio
+        audio_input, sample_rate = torchaudio.load(audio_path)
+        # Mono
+        if audio_input.shape[0] > 1:
+            audio_input = torch.mean(audio_input, dim=0, keepdim=True)
+        # Resample à 16 kHz
+        if sample_rate != 16000:
+            resampler = torchaudio.transforms.Resample(sample_rate, 16000)
+            audio_input = resampler(audio_input)
+        audio_input = audio_input.squeeze()
+        # Inférence
+        inputs = stt_processor(audio_input, sampling_rate=16000, return_tensors="pt")
+        inputs = {k: v.to(device) for k, v in inputs.items()}
+        with torch.no_grad():
+            logits = stt_model(**inputs).logits
+        predicted_ids = torch.argmax(logits, dim=-1)
+        transcription = stt_processor.batch_decode(predicted_ids)[0]
+        return {"transcription": transcription}
+    except HTTPException:
+        raise
+    except Exception as e:
+        print(f"Erreur STT: {e}")
+        raise HTTPException(status_code=500, detail=f"Erreur lors de la transcription: {str(e)}")
+    finally:
+        if tmp_input and Path(tmp_input.name).exists():
+            Path(tmp_input.name).unlink(missing_ok=True)
+        if tmp_wav and Path(tmp_wav.name).exists():
+            Path(tmp_wav.name).unlink(missing_ok=True)
+@app.post("/api/tts")
+async def text_to_speech(text: str = Form(...)):
+    """
+    Génère un audio en Dioula depuis du texte
+    - **text**: Texte en Dioula à synthétiser
+    """
+    try:
+        if not text.strip():
+            raise HTTPException(status_code=400, detail="Le texte ne peut pas être vide")
+        inputs = tts_tokenizer(text, return_tensors="pt").to(device)
+        with torch.no_grad():
+            waveform = tts_model(**inputs).waveform
+        audio_data = waveform[0].cpu().numpy()
+        sample_rate = tts_model.config.sampling_rate
+        tmp = tempfile.NamedTemporaryFile(suffix=".wav", delete=False)
+        scipy.io.wavfile.write(tmp.name, rate=sample_rate, data=audio_data)
+        tmp.close()
+        return FileResponse(
+            tmp.name,
+            media_type="audio/wav",
+            filename="tts_dioula.wav"
+        )
+    except HTTPException:
+        raise
+    except Exception as e:
+        print(f"Erreur TTS: {e}")
+        raise HTTPException(status_code=500, detail=f"Erreur lors de la génération audio: {str(e)}")
+if __name__ == "__main__":
+    import uvicorn
+    uvicorn.run(app, host="0.0.0.0", port=7860)

requirements.txt ADDED Viewed

	@@ -0,0 +1,8 @@

+fastapi>=0.115.0
+uvicorn[standard]>=0.34.0
+python-multipart>=0.0.18
+scipy>=1.14.0
+soundfile>=0.12.0
+torch>=2.5.0
+torchaudio>=2.5.0
+transformers>=4.40.0