Spaces:

sonalkum
/

MMAU-Eval

Running

App Files Files Community

MMAU-Eval / README.md

sonalkum

Update README.md

ba1fc71 verified 3 months ago

preview code

raw

history blame contribute delete

2.42 kB

A newer version of the Gradio SDK is available: 6.13.0

Upgrade

metadata

title: MMAU Evaluation
emoji: 🎵
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.27.0
app_file: app.py
pinned: false

MMAU Benchmark Evaluation

This Space allows you to evaluate your model predictions against the MMAU (Massive Multi-task Audio Understanding) benchmark.

How to Use

Prepare your predictions in JSON format
Upload the JSON file
Click "Evaluate" to see your results

Expected JSON Format

Your predictions file should be a JSON array with objects containing:

[
    {
        "id": "sample-uuid-here",
        "model_prediction": "your model's answer"
    },
    {
        "id": "another-sample-uuid",
        "model_prediction": "another answer"
    }
]

id: Must match the sample IDs from the MMAU test set
model_prediction: Your model's predicted answer

Metrics

The evaluation provides:

Overall Accuracy: Total correct predictions / total samples
Task-wise Accuracy: Breakdown by sound, music, and speech tasks
Difficulty-wise Accuracy: Breakdown by easy, medium, and hard difficulty levels
Sub-category Accuracy: Detailed breakdown by specific sub-categories

Deployment Instructions (for maintainers)

To deploy this Space:

Create a new Space on Hugging Face
Upload the following files:
- app.py
- requirements.txt
- mmau-test.json (the ground truth file - keep this private!)

Keeping Ground Truth Private

The mmau-test.json file contains the ground truth answers. To keep it private:

Option 1: Private Space

Make the entire Space private (requires Hugging Face Pro)

Option 2: Use Hugging Face Secrets

Store the ground truth as a secret/dataset
Modify app.py to load from the secret

Option 3: Git LFS with .gitattributes

Add mmau-test.json to .gitignore
Upload it manually via the Files interface (it won't be visible in the repo)

Recommended Approach

Create a private dataset on Hugging Face with the ground truth
Modify the app to load from the private dataset using your HF token as a secret

from huggingface_hub import hf_hub_download
import os

# Load from private dataset
GROUND_TRUTH_PATH = hf_hub_download(
    repo_id="your-username/mmau-ground-truth",
    filename="mmau-test.json",
    repo_type="dataset",
    token=os.environ.get("HF_TOKEN")
)

Then add HF_TOKEN as a secret in your Space settings.