| | --- |
| | license: mit |
| | --- |
| | # Audio Feature Extraction Models |
| |
|
| | This repository contains pre-trained models for audio feature extraction, specifically: |
| |
|
| | - **Tempo Detection:** Estimates the tempo (BPM) of an audio track. |
| |
|
| | ## Model Details |
| |
|
| | ### Tempo Model |
| | - **Model Type:** Custom CNN architecture for tempo classification. |
| | - **Input:** Audio segments converted to Mel spectrograms followed by autocorrelation. |
| | - **Output:** Predicts Beats Per Minute (BPM) in a range from [85, 170]. |
| |
|
| | ### Key Detection Models |
| | - **Key Class Model:** Classifies into 12 relative key classes. |
| | - **Key Quality Model:** Determines if the key is Major or Minor. |
| | - **Input:** Audio segments converted to Mel spectrograms. |
| | - **Output:** |
| | - Key Class: One of 12 key signatures. |
| | - Key Quality: Binary classification (0 for Major, 1 for Minor). |
| |
|
| | ## Usage |
| |
|
| | ### Prerequisites |
| | - Python 3.7+ |
| | - PyTorch |
| | - torchaudio |
| | - transformers |
| |
|
| | ### Loading Models |
| |
|
| | To use these models with Hugging Face's transformers library: |
| |
|
| | ```python |
| | from transformers import [AutoModelForAudioClassification](https://x.com/i/grok?text=AutoModelForAudioClassification) |
| | |
| | # Load Tempo Model |
| | tempo_model = AutoModelForAudioClassification.from_pretrained("your_username/tempo_model") |
| | |
| | # Load Key Models |
| | key_class_model = AutoModelForAudioClassification.from_pretrained("your_username/key_class_model") |
| | key_quality_model = AutoModelForAudioClassification.from_pretrained("your_username/key_quality_model") |