Spaces:

manmeet3591
/

AFDBench

Runtime error

App Files Files Community

AFDBench / README.md

manmeet3591

Upload folder using huggingface_hub

bd00fe6 verified about 2 months ago

preview code

raw

history blame contribute delete

683 Bytes

A newer version of the Gradio SDK is available: 6.15.2

Upgrade

metadata

title: AFDBench
emoji: 🌦
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 4.19.2
app_file: app.py
pinned: true
license: apache-2.0
python_version: 3.11
short_description: The Weather Forecast Discussion Alignment Benchmark

AFDBench: Area Forecast Discussion Benchmark

AFDBench evaluates how well AI models generate professional meteorological text compared to Human NWS Forecasters.

Core Metrics:

Met-Align: Physical accuracy vs. Human numerical choices.
Style-Align: Linguistic alignment with NWS AFD professional prose.

Initial results on 7,734 human samples reveal a massive Meteorological Hallucination Gap in zero-shot open models.