Maslionok's picture
Use bundled exploration dataset by default
2a60f29
---
title: Ad Classification Exploration
emoji: 📰
colorFrom: yellow
colorTo: blue
sdk: gradio
sdk_version: 6.10.0
app_file: app.py
pinned: false
license: cc-by-sa-4.0
short_description: Explore yearly ad and non-ad distributions in Impresso
---
The app loads this aggregated JSON file from the repo by default:
`content-item-classification-base-multilingual_v1-0-0_aggregated_for_exploration.json`
You can override the source with the `DATA_SOURCE` environment variable.
Supported values are local paths, `http(s)` URLs, and `s3://` URLs.
Expected row shape:
```json
[
{
"provider_alias": "SFA",
"provider_name": "Swiss Federal Archives",
"newspaper_alias": "FedGazDe",
"newspaper_title": "Bundesblatt",
"year": 1851,
"ad_count": 1,
"non_ad_count": 18,
"total_count": 19,
"ad_share": 0.0526
}
]
```
Optional S3 env vars when `DATA_SOURCE` uses `s3://`:
- `AWS_ENDPOINT_URL` or `S3_ENDPOINT_URL`
- `AWS_REGION` or `S3_REGION`
- `AWS_PROFILE` or `S3_PROFILE`