README / README.md
yeongseonchoe's picture
docs: add organization card content
6672631 verified
metadata
title: README
emoji: πŸ“Š
colorFrom: blue
colorTo: green
sdk: static
pinned: false

kpubdata β€” Korean Public Data for Everyone

Making Korean government open data accessible worldwide with a single line of code.

from datasets import load_dataset

ds = load_dataset("kpubdata/seoul-apartment-trades")
df = ds["train"].to_pandas()

Mission

Korean public data (data.go.kr) is valuable but hard to access: complex API authentication, XML responses, Korean-only documentation, and no standard formats like Parquet or HuggingFace Datasets.

We bridge the gap β€” raw public data, cleaned and published as HuggingFace Datasets. No feature engineering, no opinions. Just honest, well-documented government data ready to use.

Principles

  • Source fidelity: Original Korean text values preserved as-is. English column names for accessibility.
  • Schema honesty: What is declared in the config is exactly what you get. No phantom columns, no all-null surprises.
  • Global-first documentation: Dataset cards in English with Korean domain context explained for international users.
  • No feature engineering: We publish clean raw data. Users add derived features (geocoding, distances, etc.) themselves β€” just like Kaggle.

Available Datasets

Dataset Records Period Source Description
seoul-apartment-trades ~234k 2020–2024 MOLIT via data.go.kr Apartment sale transactions in Seoul, all 25 districts

More datasets coming β€” air quality, weather, transit, and more.

How It Works

[data.go.kr API] β†’ [kpubdata SDK] β†’ [kpubdata-builder pipeline] β†’ [HuggingFace Dataset]
  1. kpubdata β€” Python SDK that handles API auth, pagination, and response parsing for Korean public data portals
  2. kpubdata-builder β€” Pipeline that fetches, transforms, validates, and publishes datasets to HuggingFace

Contributing

We welcome contributions! If there is a Korean public dataset you would like to see on HuggingFace:

  1. Check if the source API is available on data.go.kr
  2. Open an issue on kpubdata-builder
  3. Or submit a PR with a new dataset config (see publishing standards)

License

Datasets are published under licenses compatible with their original government data licenses. Most Korean public data uses κ³΅κ³΅λˆ„λ¦¬ (Korea Open Government License), mapped to CC-BY-4.0.

See individual dataset cards for specific licensing details.