MMAU-Eval / README.md
sonalkum's picture
Update README.md
ba1fc71 verified

A newer version of the Gradio SDK is available: 6.13.0

Upgrade
metadata
title: MMAU Evaluation
emoji: 🎵
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.27.0
app_file: app.py
pinned: false

MMAU Benchmark Evaluation

This Space allows you to evaluate your model predictions against the MMAU (Massive Multi-task Audio Understanding) benchmark.

How to Use

  1. Prepare your predictions in JSON format
  2. Upload the JSON file
  3. Click "Evaluate" to see your results

Expected JSON Format

Your predictions file should be a JSON array with objects containing:

[
    {
        "id": "sample-uuid-here",
        "model_prediction": "your model's answer"
    },
    {
        "id": "another-sample-uuid",
        "model_prediction": "another answer"
    }
]
  • id: Must match the sample IDs from the MMAU test set
  • model_prediction: Your model's predicted answer

Metrics

The evaluation provides:

  • Overall Accuracy: Total correct predictions / total samples
  • Task-wise Accuracy: Breakdown by sound, music, and speech tasks
  • Difficulty-wise Accuracy: Breakdown by easy, medium, and hard difficulty levels
  • Sub-category Accuracy: Detailed breakdown by specific sub-categories

Deployment Instructions (for maintainers)

To deploy this Space:

  1. Create a new Space on Hugging Face
  2. Upload the following files:
    • app.py
    • requirements.txt
    • mmau-test.json (the ground truth file - keep this private!)

Keeping Ground Truth Private

The mmau-test.json file contains the ground truth answers. To keep it private:

Option 1: Private Space

  • Make the entire Space private (requires Hugging Face Pro)

Option 2: Use Hugging Face Secrets

  • Store the ground truth as a secret/dataset
  • Modify app.py to load from the secret

Option 3: Git LFS with .gitattributes

  • Add mmau-test.json to .gitignore
  • Upload it manually via the Files interface (it won't be visible in the repo)

Recommended Approach

  1. Create a private dataset on Hugging Face with the ground truth
  2. Modify the app to load from the private dataset using your HF token as a secret
from huggingface_hub import hf_hub_download
import os

# Load from private dataset
GROUND_TRUTH_PATH = hf_hub_download(
    repo_id="your-username/mmau-ground-truth",
    filename="mmau-test.json",
    repo_type="dataset",
    token=os.environ.get("HF_TOKEN")
)

Then add HF_TOKEN as a secret in your Space settings.