auralis-api / README.md
Khubaib01's picture
readme update
2d04b74 verified
metadata
title: Auralis Api
emoji: 👀
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
license: apache-2.0

Auralis: Vocal Fatigue Scoring API

Auralis Logo

Auralis is an MLOps system and API designed to analyze voice recordings and generate a vocal fatigue score. It is built on advanced deep learning models and is capable of robustly handling real-world audio from various speakers, devices, and conditions.


Overview

  • System Name: Auralis
  • Current Version: v1.0
  • Primary Function: Estimate vocal fatigue score from uploaded audio files.
  • Supported Audio Formats: .wav, .mp3, .m4a
  • Audio Duration: Minimum 5 seconds, Maximum 10 seconds
  • Scoring Range: 0-100 (0 = healthy, 100 = fatigued)

Future Work: Prosody features (pitch, jitter, shimmer, HNR) and a Python library for local usage will be released.

Key Features

  1. Fatigue Scoring: Uses ECAPA-TDNN-VHS model to extract health-centric embeddings and compute a fatigue score.
  2. Audio Validation: Ensures only supported formats and durations are processed.
  3. Robust Exception Handling: Provides meaningful warnings and HTTP 400 responses for unsupported or invalid audio.
  4. MLOps Ready: Fully structured API with versioning (/api/v1/voice/score) and logging per request.

Limitations

  • Not intended for medical diagnosis.
  • Currently provides only the fatigue score; additional reports and prosody-based insights are planned.
  • Requires a local or cloud-deployed server to host the API.

Installation

Note: The Python library is under development. Currently, you can use the API through deployment or local server.

# Clone the repository
git clone https://github.com/Khubaib8281/auralis.git
cd auralis

# Create a virtual environment and activate
python -m venv venv
source venv/bin/activate  # Linux/Mac
venv\Scripts\activate     # Windows

# Install dependencies
pip install -r requirements.txt

Running the API Locally

# From the project root
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
  • Swagger UI will be available at http://127.0.0.1:8000/docs
  • OpenAPI JSON at http://127.0.0.1:8000/openapi.json

API Endpoint

POST https://huggingface.co/spaces/Khubaib01/auralis-api/api/v1/voice/score

Description: Upload a voice file to obtain a fatigue score.

Request:

  • Form Data: file (UploadFile, required)

Example using curl:

curl -X POST "https://huggingface.co/spaces/Khubaib01/auralis-api/api/v1/voice/score" \
  -F "file=@sample.wav"

Response:

{
  "fatigue_score": 42.7
}

Error Responses:

  • 400 Bad Request: Unsupported file type or invalid audio duration
  • 500 Internal Server Error: Unexpected server errors

Logging

  • Logs all requests with method, endpoint, status code, and duration.
  • Logs warnings for invalid audio formats and durations.
  • Configurable logger is provided in utils/logger.py.

Audio Validation Rules

  • Supported formats: .wav, .mp3, .m4a
  • Minimum duration: 5 seconds
  • Maximum duration: 10 seconds
  • Files failing validation return HTTP 400 with detailed messages

Future Features

  1. Python Library: For local inference without API calls.
  2. Prosody Analysis: Including pitch, jitter, shimmer, and HNR.
  3. Automatic Report Generation: Human-readable vocal fatigue reports.
  4. Extended Audio Support: Handling longer recordings and batch processing.

References

  • ECAPA-TDNN-VHS model for speaker embeddings: SpeechBrain
  • Supervised contrastive learning for embedding robustness
  • Real-world multi-speaker dataset (70–100 speakers, 60 male, 40 female)

License

Auralis is released under the Apache 2.0 license.


For research and feature extraction purposes only. Not intended for medical diagnosis or clinical use.