EPI-Eval

community

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

ChrisHarig updated a Space 14 days ago

EPI-Eval/README

ChrisHarig updated a dataset 15 days ago

EPI-Eval/wikipedia-pageviews-predictions

ChrisHarig published a dataset 15 days ago

EPI-Eval/wikipedia-pageviews-predictions

View all activity

Organization Card

Community About org cards

EPI-Eval

A curated collection of large epidemiological datasets, normalized to a single schema so they can be searched, joined, and benchmarked against each other.

What we track

Time-series surveillance data on infectious disease — primarily respiratory viruses (flu, COVID-19, RSV) and arboviral disease (dengue, Zika, chikungunya), with smaller coverage of notifiable, mortality, wastewater, and behavioural / search signals. Sources come from CDC, WHO, ECDC, PAHO, OWID, and national public-health agencies; we re-publish them as Parquet with a consistent set of row-level columns (date, location_id, location_level, optional condition / case_status / as_of) and a metadata header describing pathogens, geography, cadence, and per-column units.

Why

Forecasting and modeling work routinely stalls on data plumbing — finding the canonical version of a series, normalizing geography codes, reconciling reporting cadences, tracking when a source was last revised. The goal of this org is to do that work once, in the open.

Schema

Every dataset card on this org uses the same frontmatter format (schema v0.1), validated against a controlled vocabulary (vocabularies.yaml). Curated metadata (pathogens, license, units) lives alongside computed metadata (time coverage, row count, observed cadence) generated at ingest.

Contributing a dataset

The ingest pipeline is in apart-forecasting-tool/upload_pipeline. A new dataset is one ingest.py + card.yaml under upload_pipeline/sources/<source_id>/; the validator confirms schema fit before upload. Each new truth dataset auto-creates an empty <id>-predictions companion at upload time.

Datasets (21)

Respiratory

Dataset	Pathogens	Geography	Cadence
CDC FluSurv-NET — weekly flu hospitalisation rates	influenza	US	weekly
CDC NHSN Hospital Respiratory Data (HRD)	influenza, sars-cov-2, rsv	US	weekly
CDC NREVSS — weekly RSV test specimens and positives	rsv	US	weekly
COVID Tracking Project — US states daily (archived)	sars-cov-2	US	daily
COVID-19 Forecast Hub — hospital admissions target	sars-cov-2	US	weekly
ECDC ERVISS — ILI/ARI primary-care consultation rates	influenza, sars-cov-2, rsv	multiple (30 countries)	weekly
Flu MetroCast Hub — sub-state flu hosp forecast target	influenza	US	weekly
FluSight Forecast Hub — flu hospital admission target	influenza	US	weekly
JHU CSSE COVID-19 — global daily (archived)	sars-cov-2	multiple	daily
NYT COVID-19 — US county daily	sars-cov-2	US	daily
OWID COVID-19 — global daily compiled	sars-cov-2	multiple	daily
PHAC Respiratory Virus Detection Surveillance — Canada weekly	influenza, influenza-a, influenza-b +7	CA	weekly
RSV Forecast Hub — RSV hospital admissions target	rsv	US	weekly
UKHSA Dashboard — England COVID-19 daily metrics	sars-cov-2	GB	daily
UKHSA Dashboard — England flu / COVID-19 / RSV weekly	influenza, sars-cov-2, rsv	GB	weekly

Syndromic / ED

Dataset	Pathogens	Geography	Cadence
CDC NSSP / ESSENCE — ED visits for ILI / COVID / RSV	influenza, sars-cov-2, rsv	US	weekly

Arboviral

Dataset	Pathogens	Geography	Cadence
OpenDengue — national dengue case counts (V1.3)	dengue	multiple	irregular

Mobility & contact

Dataset	Pathogens	Geography	Cadence
Google Community Mobility Reports — global daily	—	multiple	daily

Search & behavioural

Dataset	Pathogens	Geography	Cadence
Wikipedia pageviews — disease-article daily views	influenza, sars-cov-2, rsv +6	multiple	daily

Notifiable / other

Dataset	Pathogens	Geography	Cadence
OWID Mpox — global daily compiled	mpox	multiple	daily
WHO Global TB — annual country estimates	tuberculosis	multiple	annual

Predictions

Each truth dataset has a companion EPI-Eval/<id>-predictions repo that accumulates community-submitted forecasts. Schema is long-format: one row per (target_date, [dim values…], quantile, value), with quantile = NULL reserved for the point estimate. Forecasters submit through the EPI-Eval dashboard; a maintainer reviews each PR before merging, and merged predictions show up on the corresponding truth dataset's Show predictions toggle in the dashboard, with a per-submitter leaderboard (MAE / WIS / rWIS / coverage).

Status

Active. Coverage and dataset list grow through PRs to the upload pipeline.

models 0

None public yet

datasets 44

AI & ML interests

Recent Activity

Team members 3

EPI-Eval

What we track

Why

Schema

Contributing a dataset

Datasets (21)

Respiratory

Syndromic / ED

Arboviral

Mobility & contact

Search & behavioural

Notifiable / other

Predictions

Status

models 0

datasets 44 Sort: Recently updated

datasets 44