| title: Ad Classification Exploration | |
| emoji: 📰 | |
| colorFrom: yellow | |
| colorTo: blue | |
| sdk: gradio | |
| sdk_version: 6.10.0 | |
| app_file: app.py | |
| pinned: false | |
| license: cc-by-sa-4.0 | |
| short_description: Explore yearly ad and non-ad distributions in Impresso | |
| The app loads this aggregated JSON file from the repo by default: | |
| `content-item-classification-base-multilingual_v1-0-0_aggregated_for_exploration.json` | |
| You can override the source with the `DATA_SOURCE` environment variable. | |
| Supported values are local paths, `http(s)` URLs, and `s3://` URLs. | |
| Expected row shape: | |
| ```json | |
| [ | |
| { | |
| "provider_alias": "SFA", | |
| "provider_name": "Swiss Federal Archives", | |
| "newspaper_alias": "FedGazDe", | |
| "newspaper_title": "Bundesblatt", | |
| "year": 1851, | |
| "ad_count": 1, | |
| "non_ad_count": 18, | |
| "total_count": 19, | |
| "ad_share": 0.0526 | |
| } | |
| ] | |
| ``` | |
| Optional S3 env vars when `DATA_SOURCE` uses `s3://`: | |
| - `AWS_ENDPOINT_URL` or `S3_ENDPOINT_URL` | |
| - `AWS_REGION` or `S3_REGION` | |
| - `AWS_PROFILE` or `S3_PROFILE` | |