Spaces:
Running
Running
| title: Auralis Api | |
| emoji: π | |
| colorFrom: blue | |
| colorTo: purple | |
| sdk: docker | |
| pinned: false | |
| license: apache-2.0 | |
| # Auralis: Vocal Fatigue Scoring API | |
|  | |
| **Auralis** is an MLOps system and API designed to analyze voice recordings and generate a **vocal fatigue score**. It is built on advanced deep learning models and is capable of robustly handling real-world audio from various speakers, devices, and conditions. | |
| --- | |
| ## Overview | |
| * **System Name:** Auralis | |
| * **Current Version:** v1.0 | |
| * **Primary Function:** Estimate vocal fatigue score from uploaded audio files. | |
| * **Supported Audio Formats:** `.wav`, `.mp3`, `.m4a` | |
| * **Audio Duration:** Minimum 5 seconds, Maximum 10 seconds | |
| * **Scoring Range:** 0-100 (0 = healthy, 100 = fatigued) | |
| **Future Work:** Prosody features (pitch, jitter, shimmer, HNR) and a Python library for local usage will be released. | |
| ### Key Features | |
| 1. **Fatigue Scoring:** Uses ECAPA-TDNN-VHS model to extract health-centric embeddings and compute a fatigue score. | |
| 2. **Audio Validation:** Ensures only supported formats and durations are processed. | |
| 3. **Robust Exception Handling:** Provides meaningful warnings and HTTP 400 responses for unsupported or invalid audio. | |
| 4. **MLOps Ready:** Fully structured API with versioning (`/api/v1/voice/score`) and logging per request. | |
| ### Limitations | |
| * Not intended for medical diagnosis. | |
| * Currently provides only the fatigue score; additional reports and prosody-based insights are planned. | |
| * Requires a local or cloud-deployed server to host the API. | |
| --- | |
| ## Installation | |
| > **Note:** The Python library is under development. Currently, you can use the API through deployment or local server. | |
| ```bash | |
| # Clone the repository | |
| git clone https://github.com/Khubaib8281/auralis.git | |
| cd auralis | |
| # Create a virtual environment and activate | |
| python -m venv venv | |
| source venv/bin/activate # Linux/Mac | |
| venv\Scripts\activate # Windows | |
| # Install dependencies | |
| pip install -r requirements.txt | |
| ``` | |
| --- | |
| ## Running the API Locally | |
| ```bash | |
| # From the project root | |
| uvicorn app.main:app --reload --host 0.0.0.0 --port 8000 | |
| ``` | |
| * Swagger UI will be available at `http://127.0.0.1:8000/docs` | |
| * OpenAPI JSON at `http://127.0.0.1:8000/openapi.json` | |
| --- | |
| ## API Endpoint | |
| ### POST `https://huggingface.co/spaces/Khubaib01/auralis-api/api/v1/voice/score` | |
| **Description:** Upload a voice file to obtain a fatigue score. | |
| **Request:** | |
| * **Form Data:** `file` (UploadFile, required) | |
| **Example using `curl`:** | |
| ```bash | |
| curl -X POST "https://huggingface.co/spaces/Khubaib01/auralis-api/api/v1/voice/score" \ | |
| -F "file=@sample.wav" | |
| ``` | |
| **Response:** | |
| ```json | |
| { | |
| "fatigue_score": 42.7 | |
| } | |
| ``` | |
| **Error Responses:** | |
| * **400 Bad Request:** Unsupported file type or invalid audio duration | |
| * **500 Internal Server Error:** Unexpected server errors | |
| --- | |
| ## Logging | |
| * Logs all requests with method, endpoint, status code, and duration. | |
| * Logs warnings for invalid audio formats and durations. | |
| * Configurable logger is provided in `utils/logger.py`. | |
| --- | |
| ## Audio Validation Rules | |
| * Supported formats: `.wav`, `.mp3`, `.m4a` | |
| * Minimum duration: 5 seconds | |
| * Maximum duration: 10 seconds | |
| * Files failing validation return HTTP 400 with detailed messages | |
| --- | |
| ## Future Features | |
| 1. **Python Library:** For local inference without API calls. | |
| 2. **Prosody Analysis:** Including pitch, jitter, shimmer, and HNR. | |
| 3. **Automatic Report Generation:** Human-readable vocal fatigue reports. | |
| 4. **Extended Audio Support:** Handling longer recordings and batch processing. | |
| --- | |
| ## References | |
| * ECAPA-TDNN-VHS model for speaker embeddings: [SpeechBrain]( https://huggingface.co/Khubaib01/ECAPA-TDNN-VHE) | |
| * Supervised contrastive learning for embedding robustness | |
| * Real-world multi-speaker dataset (70β100 speakers, 60 male, 40 female) | |
| --- | |
| ## License | |
| **Auralis** is released under the Apache 2.0 license. | |
| --- | |
| *For research and feature extraction purposes only. Not intended for medical diagnosis or clinical use.* |