Spaces:

Khubaib01
/

auralis-api

Running

App Files Files Community

auralis-api / README.md

Khubaib01

readme update

2d04b74 verified 3 days ago

preview code

raw

history blame contribute delete

4 kB

metadata

title: Auralis Api
emoji: 👀
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
license: apache-2.0

Auralis: Vocal Fatigue Scoring API

Auralis is an MLOps system and API designed to analyze voice recordings and generate a vocal fatigue score. It is built on advanced deep learning models and is capable of robustly handling real-world audio from various speakers, devices, and conditions.

Overview

System Name: Auralis
Current Version: v1.0
Primary Function: Estimate vocal fatigue score from uploaded audio files.
Supported Audio Formats: .wav, .mp3, .m4a
Audio Duration: Minimum 5 seconds, Maximum 10 seconds
Scoring Range: 0-100 (0 = healthy, 100 = fatigued)

Future Work: Prosody features (pitch, jitter, shimmer, HNR) and a Python library for local usage will be released.

Key Features

Fatigue Scoring: Uses ECAPA-TDNN-VHS model to extract health-centric embeddings and compute a fatigue score.
Audio Validation: Ensures only supported formats and durations are processed.
Robust Exception Handling: Provides meaningful warnings and HTTP 400 responses for unsupported or invalid audio.
MLOps Ready: Fully structured API with versioning (/api/v1/voice/score) and logging per request.

Limitations

Not intended for medical diagnosis.
Currently provides only the fatigue score; additional reports and prosody-based insights are planned.
Requires a local or cloud-deployed server to host the API.

Installation

Note: The Python library is under development. Currently, you can use the API through deployment or local server.

# Clone the repository
git clone https://github.com/Khubaib8281/auralis.git
cd auralis

# Create a virtual environment and activate
python -m venv venv
source venv/bin/activate  # Linux/Mac
venv\Scripts\activate     # Windows

# Install dependencies
pip install -r requirements.txt

Running the API Locally

# From the project root
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

Swagger UI will be available at http://127.0.0.1:8000/docs
OpenAPI JSON at http://127.0.0.1:8000/openapi.json

API Endpoint

POST `https://huggingface.co/spaces/Khubaib01/auralis-api/api/v1/voice/score`

Description: Upload a voice file to obtain a fatigue score.

Request:

Form Data: file (UploadFile, required)

Example using curl:

curl -X POST "https://huggingface.co/spaces/Khubaib01/auralis-api/api/v1/voice/score" \
  -F "file=@sample.wav"

Response:

{
  "fatigue_score": 42.7
}

Error Responses:

400 Bad Request: Unsupported file type or invalid audio duration
500 Internal Server Error: Unexpected server errors

Logging

Logs all requests with method, endpoint, status code, and duration.
Logs warnings for invalid audio formats and durations.
Configurable logger is provided in utils/logger.py.

Audio Validation Rules

Supported formats: .wav, .mp3, .m4a
Minimum duration: 5 seconds
Maximum duration: 10 seconds
Files failing validation return HTTP 400 with detailed messages

Future Features

Python Library: For local inference without API calls.
Prosody Analysis: Including pitch, jitter, shimmer, and HNR.
Automatic Report Generation: Human-readable vocal fatigue reports.
Extended Audio Support: Handling longer recordings and batch processing.

References

ECAPA-TDNN-VHS model for speaker embeddings: SpeechBrain
Supervised contrastive learning for embedding robustness
Real-world multi-speaker dataset (70–100 speakers, 60 male, 40 female)

License

Auralis is released under the Apache 2.0 license.

For research and feature extraction purposes only. Not intended for medical diagnosis or clinical use.