A newer version of the Gradio SDK is available: 6.13.0
metadata
title: MMAU Evaluation
emoji: 🎵
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.27.0
app_file: app.py
pinned: false
MMAU Benchmark Evaluation
This Space allows you to evaluate your model predictions against the MMAU (Massive Multi-task Audio Understanding) benchmark.
How to Use
- Prepare your predictions in JSON format
- Upload the JSON file
- Click "Evaluate" to see your results
Expected JSON Format
Your predictions file should be a JSON array with objects containing:
[
{
"id": "sample-uuid-here",
"model_prediction": "your model's answer"
},
{
"id": "another-sample-uuid",
"model_prediction": "another answer"
}
]
id: Must match the sample IDs from the MMAU test setmodel_prediction: Your model's predicted answer
Metrics
The evaluation provides:
- Overall Accuracy: Total correct predictions / total samples
- Task-wise Accuracy: Breakdown by sound, music, and speech tasks
- Difficulty-wise Accuracy: Breakdown by easy, medium, and hard difficulty levels
- Sub-category Accuracy: Detailed breakdown by specific sub-categories
Deployment Instructions (for maintainers)
To deploy this Space:
- Create a new Space on Hugging Face
- Upload the following files:
app.pyrequirements.txtmmau-test.json(the ground truth file - keep this private!)
Keeping Ground Truth Private
The mmau-test.json file contains the ground truth answers. To keep it private:
Option 1: Private Space
- Make the entire Space private (requires Hugging Face Pro)
Option 2: Use Hugging Face Secrets
- Store the ground truth as a secret/dataset
- Modify
app.pyto load from the secret
Option 3: Git LFS with .gitattributes
- Add
mmau-test.jsonto.gitignore - Upload it manually via the Files interface (it won't be visible in the repo)
Recommended Approach
- Create a private dataset on Hugging Face with the ground truth
- Modify the app to load from the private dataset using your HF token as a secret
from huggingface_hub import hf_hub_download
import os
# Load from private dataset
GROUND_TRUTH_PATH = hf_hub_download(
repo_id="your-username/mmau-ground-truth",
filename="mmau-test.json",
repo_type="dataset",
token=os.environ.get("HF_TOKEN")
)
Then add HF_TOKEN as a secret in your Space settings.