|
|
--- |
|
|
language: en |
|
|
license: apache-2.0 |
|
|
tags: |
|
|
- t5 |
|
|
- music |
|
|
- spotify |
|
|
- text2json |
|
|
- audio-features |
|
|
- fine-tuned |
|
|
base_model: t5-base |
|
|
datasets: |
|
|
- custom |
|
|
library_name: transformers |
|
|
pipeline_tag: text2text-generation |
|
|
--- |
|
|
|
|
|
# T5-Base Fine-tuned for Spotify Features Prediction |
|
|
|
|
|
T5-Base fine-tuned to convert natural language prompts into Spotify audio feature JSON |
|
|
|
|
|
## Model Details |
|
|
|
|
|
- **Base Model**: t5-base |
|
|
- **Model Type**: Text-to-JSON generation |
|
|
- **Language**: English |
|
|
- **Task**: Convert natural language music preferences into Spotify audio feature JSON objects |
|
|
- **Fine-tuning Dataset**: Custom dataset of prompts to Spotify audio features |
|
|
|
|
|
## Training Configuration |
|
|
|
|
|
- **Epochs**: 7 |
|
|
- **Learning Rate**: 3e-4 |
|
|
- **Batch Size**: 8 (per device) |
|
|
- **Gradient Accumulation Steps**: 4 |
|
|
- **Scheduler**: Cosine with warmup |
|
|
- **Optimizer**: AdamW |
|
|
- **Max Length**: 256 tokens |
|
|
- **Precision**: bfloat16 |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
from transformers import T5ForConditionalGeneration, T5Tokenizer |
|
|
import json |
|
|
|
|
|
# Load model and tokenizer |
|
|
model = T5ForConditionalGeneration.from_pretrained("afsagag/t5-spotify-features") |
|
|
tokenizer = T5Tokenizer.from_pretrained("afsagag/t5-spotify-features") |
|
|
|
|
|
# Example usage |
|
|
prompt = "I want energetic dance music with high energy and danceability" |
|
|
input_text = f"prompt: {prompt}" |
|
|
|
|
|
# Tokenize and generate |
|
|
input_ids = tokenizer(input_text, return_tensors="pt", max_length=256, truncation=True).input_ids |
|
|
outputs = model.generate( |
|
|
input_ids, |
|
|
max_length=256, |
|
|
num_beams=4, |
|
|
early_stopping=True, |
|
|
do_sample=False |
|
|
) |
|
|
|
|
|
# Decode result |
|
|
result = tokenizer.decode(outputs[0], skip_special_tokens=True) |
|
|
print(result) |
|
|
|
|
|
# Parse JSON output |
|
|
try: |
|
|
spotify_features = json.loads(result) |
|
|
print("Generated Spotify Features:", spotify_features) |
|
|
except json.JSONDecodeError: |
|
|
print("Generated text is not valid JSON") |
|
|
``` |
|
|
|
|
|
## Expected Output Format |
|
|
|
|
|
The model generates JSON objects with Spotify audio features: |
|
|
|
|
|
```json |
|
|
{ |
|
|
"danceability": 0.85, |
|
|
"energy": 0.90, |
|
|
"valence": 0.75, |
|
|
"acousticness": 0.15, |
|
|
"instrumentalness": 0.05, |
|
|
"speechiness": 0.08, |
|
|
} |
|
|
``` |
|
|
|
|
|
## Metrics |
|
|
|
|
|
- **Per-set Mean Absolute Error**: Measures average prediction accuracy across feature sets |
|
|
- **Per-set Root Mean Squared Error**: Measures prediction variance |
|
|
- **Per-feature Correlation**: Pearson correlation for individual audio features |
|
|
|
|
|
## Model Files |
|
|
|
|
|
- `config.json`: Model configuration |
|
|
- `pytorch_model.bin`: Model weights |
|
|
- `tokenizer.json`: Tokenizer vocabulary |
|
|
- `tokenizer_config.json`: Tokenizer configuration |
|
|
- `special_tokens_map.json`: Special token mappings |
|
|
|
|
|
## Limitations |
|
|
|
|
|
- Model may occasionally generate invalid JSON that requires post-processing |
|
|
- Trained on specific prompt formats starting with "prompt: " |
|
|
- Performance depends on similarity to training data distribution |
|
|
- May not generalize well to very abstract or unusual music descriptions |
|
|
|
|
|
## Training Data |
|
|
|
|
|
The model was trained on a custom dataset pairing natural language music descriptions with corresponding Spotify audio feature values. |
|
|
|
|
|
## Ethical Considerations |
|
|
|
|
|
This model generates music preference predictions and should not be used as the sole basis for music recommendation systems without human oversight. |
|
|
|