Spaces:
Runtime error
Runtime error
| title: AFDBench | |
| emoji: 🌦 | |
| colorFrom: blue | |
| colorTo: green | |
| sdk: gradio | |
| sdk_version: 4.19.2 | |
| app_file: app.py | |
| pinned: true | |
| license: apache-2.0 | |
| python_version: 3.11 | |
| short_description: The Weather Forecast Discussion Alignment Benchmark | |
| # AFDBench: Area Forecast Discussion Benchmark | |
| AFDBench evaluates how well AI models generate professional meteorological text compared to Human NWS Forecasters. | |
| ### Core Metrics: | |
| 1. **Met-Align**: Physical accuracy vs. Human numerical choices. | |
| 2. **Style-Align**: Linguistic alignment with NWS AFD professional prose. | |
| Initial results on 7,734 human samples reveal a massive **Meteorological Hallucination Gap** in zero-shot open models. | |