auralis-api / README.md
Khubaib01's picture
readme update
2d04b74 verified
---
title: Auralis Api
emoji: πŸ‘€
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
license: apache-2.0
---
# Auralis: Vocal Fatigue Scoring API
![Auralis Logo](./logo.png)
**Auralis** is an MLOps system and API designed to analyze voice recordings and generate a **vocal fatigue score**. It is built on advanced deep learning models and is capable of robustly handling real-world audio from various speakers, devices, and conditions.
---
## Overview
* **System Name:** Auralis
* **Current Version:** v1.0
* **Primary Function:** Estimate vocal fatigue score from uploaded audio files.
* **Supported Audio Formats:** `.wav`, `.mp3`, `.m4a`
* **Audio Duration:** Minimum 5 seconds, Maximum 10 seconds
* **Scoring Range:** 0-100 (0 = healthy, 100 = fatigued)
**Future Work:** Prosody features (pitch, jitter, shimmer, HNR) and a Python library for local usage will be released.
### Key Features
1. **Fatigue Scoring:** Uses ECAPA-TDNN-VHS model to extract health-centric embeddings and compute a fatigue score.
2. **Audio Validation:** Ensures only supported formats and durations are processed.
3. **Robust Exception Handling:** Provides meaningful warnings and HTTP 400 responses for unsupported or invalid audio.
4. **MLOps Ready:** Fully structured API with versioning (`/api/v1/voice/score`) and logging per request.
### Limitations
* Not intended for medical diagnosis.
* Currently provides only the fatigue score; additional reports and prosody-based insights are planned.
* Requires a local or cloud-deployed server to host the API.
---
## Installation
> **Note:** The Python library is under development. Currently, you can use the API through deployment or local server.
```bash
# Clone the repository
git clone https://github.com/Khubaib8281/auralis.git
cd auralis
# Create a virtual environment and activate
python -m venv venv
source venv/bin/activate # Linux/Mac
venv\Scripts\activate # Windows
# Install dependencies
pip install -r requirements.txt
```
---
## Running the API Locally
```bash
# From the project root
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
```
* Swagger UI will be available at `http://127.0.0.1:8000/docs`
* OpenAPI JSON at `http://127.0.0.1:8000/openapi.json`
---
## API Endpoint
### POST `https://huggingface.co/spaces/Khubaib01/auralis-api/api/v1/voice/score`
**Description:** Upload a voice file to obtain a fatigue score.
**Request:**
* **Form Data:** `file` (UploadFile, required)
**Example using `curl`:**
```bash
curl -X POST "https://huggingface.co/spaces/Khubaib01/auralis-api/api/v1/voice/score" \
-F "file=@sample.wav"
```
**Response:**
```json
{
"fatigue_score": 42.7
}
```
**Error Responses:**
* **400 Bad Request:** Unsupported file type or invalid audio duration
* **500 Internal Server Error:** Unexpected server errors
---
## Logging
* Logs all requests with method, endpoint, status code, and duration.
* Logs warnings for invalid audio formats and durations.
* Configurable logger is provided in `utils/logger.py`.
---
## Audio Validation Rules
* Supported formats: `.wav`, `.mp3`, `.m4a`
* Minimum duration: 5 seconds
* Maximum duration: 10 seconds
* Files failing validation return HTTP 400 with detailed messages
---
## Future Features
1. **Python Library:** For local inference without API calls.
2. **Prosody Analysis:** Including pitch, jitter, shimmer, and HNR.
3. **Automatic Report Generation:** Human-readable vocal fatigue reports.
4. **Extended Audio Support:** Handling longer recordings and batch processing.
---
## References
* ECAPA-TDNN-VHS model for speaker embeddings: [SpeechBrain]( https://huggingface.co/Khubaib01/ECAPA-TDNN-VHE)
* Supervised contrastive learning for embedding robustness
* Real-world multi-speaker dataset (70–100 speakers, 60 male, 40 female)
---
## License
**Auralis** is released under the Apache 2.0 license.
---
*For research and feature extraction purposes only. Not intended for medical diagnosis or clinical use.*