Spaces:

ContinuousBench
/

README

Running

App Files Files Community

README / README.md

alexbie98

Update README.md

d79e92b verified about 22 hours ago

preview code

raw

history blame contribute delete

1.01 kB

metadata

title: README
emoji: 🔥
colorFrom: yellow
colorTo: green
sdk: static
pinned: false

ContinuousBench

Blog post | Arxiv

ContinuousBench measures progress in differentially private synthetic data.

ContinuousBench has two tracks:

Geminon: Fictional, Gemini-generated corpus
News: Scraped news articles from September 2025

Both datasets:

are designed to contain completely new information that models cannot answer
are paired with QA that can only be answered after training on the corpus

Generate a DP synthetic version of News or Geminon, then test it: https://github.com/plau666/ContinuousBenchEval.

Our evaluation trains a model on your DP synthetic version, and then asks the paired QA to see if your DP synthetic data was capable of teaching a model the knowledge present in the original corpus.