Maslionok's picture
Use bundled exploration dataset by default
2a60f29

A newer version of the Gradio SDK is available: 6.16.0

Upgrade
metadata
title: Ad Classification Exploration
emoji: 📰
colorFrom: yellow
colorTo: blue
sdk: gradio
sdk_version: 6.10.0
app_file: app.py
pinned: false
license: cc-by-sa-4.0
short_description: Explore yearly ad and non-ad distributions in Impresso

The app loads this aggregated JSON file from the repo by default:

content-item-classification-base-multilingual_v1-0-0_aggregated_for_exploration.json

You can override the source with the DATA_SOURCE environment variable. Supported values are local paths, http(s) URLs, and s3:// URLs.

Expected row shape:

[
  {
    "provider_alias": "SFA",
    "provider_name": "Swiss Federal Archives",
    "newspaper_alias": "FedGazDe",
    "newspaper_title": "Bundesblatt",
    "year": 1851,
    "ad_count": 1,
    "non_ad_count": 18,
    "total_count": 19,
    "ad_share": 0.0526
  }
]

Optional S3 env vars when DATA_SOURCE uses s3://:

  • AWS_ENDPOINT_URL or S3_ENDPOINT_URL
  • AWS_REGION or S3_REGION
  • AWS_PROFILE or S3_PROFILE