Spaces:

Khubaib01
/

auralis-api

Running

App Files Files Community

auralis-api / README.md

Khubaib01

readme update

2d04b74 verified 3 days ago

preview code

raw

history blame contribute delete

4 kB

	---
	title: Auralis Api
	emoji: 👀
	colorFrom: blue
	colorTo: purple
	sdk: docker
	pinned: false
	license: apache-2.0
	---

	# Auralis: Vocal Fatigue Scoring API

	![Auralis Logo](./logo.png)

	Auralis is an MLOps system and API designed to analyze voice recordings and generate a vocal fatigue score. It is built on advanced deep learning models and is capable of robustly handling real-world audio from various speakers, devices, and conditions.

	---

	## Overview

	* System Name: Auralis
	* Current Version: v1.0
	* Primary Function: Estimate vocal fatigue score from uploaded audio files.
	* Supported Audio Formats: `.wav`, `.mp3`, `.m4a`
	* Audio Duration: Minimum 5 seconds, Maximum 10 seconds
	* Scoring Range: 0-100 (0 = healthy, 100 = fatigued)

	Future Work: Prosody features (pitch, jitter, shimmer, HNR) and a Python library for local usage will be released.

	### Key Features

	1. Fatigue Scoring: Uses ECAPA-TDNN-VHS model to extract health-centric embeddings and compute a fatigue score.
	2. Audio Validation: Ensures only supported formats and durations are processed.
	3. Robust Exception Handling: Provides meaningful warnings and HTTP 400 responses for unsupported or invalid audio.
	4. MLOps Ready: Fully structured API with versioning (`/api/v1/voice/score`) and logging per request.

	### Limitations

	* Not intended for medical diagnosis.
	* Currently provides only the fatigue score; additional reports and prosody-based insights are planned.
	* Requires a local or cloud-deployed server to host the API.

	---

	## Installation

	> Note: The Python library is under development. Currently, you can use the API through deployment or local server.

	```bash
	# Clone the repository
	git clone https://github.com/Khubaib8281/auralis.git
	cd auralis

	# Create a virtual environment and activate
	python -m venv venv
	source venv/bin/activate # Linux/Mac
	venv\Scripts\activate # Windows

	# Install dependencies
	pip install -r requirements.txt
	```

	---

	## Running the API Locally

	```bash
	# From the project root
	uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
	```

	* Swagger UI will be available at `http://127.0.0.1:8000/docs`
	* OpenAPI JSON at `http://127.0.0.1:8000/openapi.json`

	---

	## API Endpoint

	### POST `https://huggingface.co/spaces/Khubaib01/auralis-api/api/v1/voice/score`

	Description: Upload a voice file to obtain a fatigue score.

	Request:

	* Form Data: `file` (UploadFile, required)

	Example using `curl`:

	```bash
	curl -X POST "https://huggingface.co/spaces/Khubaib01/auralis-api/api/v1/voice/score" \
	-F "file=@sample.wav"

	```

	Response:

	```json
	{
	"fatigue_score": 42.7
	}
	```

	Error Responses:

	* 400 Bad Request: Unsupported file type or invalid audio duration
	* 500 Internal Server Error: Unexpected server errors

	---

	## Logging

	* Logs all requests with method, endpoint, status code, and duration.
	* Logs warnings for invalid audio formats and durations.
	* Configurable logger is provided in `utils/logger.py`.

	---

	## Audio Validation Rules

	* Supported formats: `.wav`, `.mp3`, `.m4a`
	* Minimum duration: 5 seconds
	* Maximum duration: 10 seconds
	* Files failing validation return HTTP 400 with detailed messages

	---

	## Future Features

	1. Python Library: For local inference without API calls.
	2. Prosody Analysis: Including pitch, jitter, shimmer, and HNR.
	3. Automatic Report Generation: Human-readable vocal fatigue reports.
	4. Extended Audio Support: Handling longer recordings and batch processing.

	---

	## References

	* ECAPA-TDNN-VHS model for speaker embeddings: [SpeechBrain]( https://huggingface.co/Khubaib01/ECAPA-TDNN-VHE)
	* Supervised contrastive learning for embedding robustness
	* Real-world multi-speaker dataset (70–100 speakers, 60 male, 40 female)

	---

	## License

	Auralis is released under the Apache 2.0 license.

	---

	For research and feature extraction purposes only. Not intended for medical diagnosis or clinical use.