Spaces:

biglam
/

README

Running

App Files Files Community

Expand org README: datasets-first reframe + clearer contribution paths

by davanstrien HF Staff - opened 7 days ago

base: refs/heads/main

←

from: refs/pr/3

Discussion Files changed

+13

-43

Files changed (1) hide show

README.md +13 -43

README.md CHANGED Viewed

@@ -7,58 +7,28 @@ sdk: static
 pinned: false
 ---
-# 📚 BigLAM: Machine Learning for Libraries, Archives, and Museums
-**BigLAM** is a community-driven initiative to build an open ecosystem of machine learning models, datasets, and tools for **Libraries, Archives, and Museums (LAMs)**.
-We aim to:
-- 🗃️ Share machine-learning-ready datasets from LAMs via the [Hugging Face Hub](https://huggingface.co/biglam)
-- 🤖 Train and release open-source models for LAM-relevant tasks
-- 🛠️ Develop tools and approaches tailored to LAM use cases
----
-<details>
-<summary><strong>✨ Background</strong></summary>
-BigLAM began as a [datasets hackathon](https://github.com/bigscience-workshop/lam) within the [BigScience 🌸](https://bigscience.huggingface.co/) project, a large-scale, open NLP collaboration.
-Our goal: make LAM datasets more discoverable and usable to support researchers, institutions, and ML practitioners working with cultural heritage data.
-</details>
-<details>
-<summary><strong>📂 What You'll Find</strong></summary>
-The [BigLAM organization](https://huggingface.co/biglam) hosts:
-- **Datasets**: image, text, and tabular data from and about libraries, archives, and museums
-- **Models**: fine-tuned for tasks like:
-  - Art/historical image classification
-  - Document layout analysis and OCR
-  - Metadata quality assessment
-  - Named entity recognition in heritage texts
-- **Spaces**: tools for interactive exploration and demonstration
-</details>
-<details>
-<summary><strong>🧩 Get Involved</strong></summary>
-We welcome contributions! You can:
-- Use our [datasets and models](https://huggingface.co/biglam)
-- Join the discussion on [GitHub](https://github.com/bigscience-workshop/lam/discussions)
-- Contribute your own tools or data
-- Share your work using BigLAM resources
-</details>
-## 🌍 Why It Matters
-Cultural heritage data is often underrepresented in machine learning. BigLAM helps address this by:
-- Supporting inclusive and responsible AI
-- Helping institutions experiment with ML for access, discovery, and preservation
-- Ensuring that ML systems reflect diverse human knowledge and expression
-- Developing tools and methods that work well with the unique formats, values, and needs of LAMs

 pinned: false
 ---
+# 📚 BigLAM
+A community-run home for machine-learning-ready datasets from libraries, archives, and museums.
+Most cultural-heritage data wasn't originally prepared with ML workflows in mind — it lives in catalogue systems, IIIF endpoints, METS/MODS records, and various idiosyncratic formats that each institution has its own version of. BigLAM is a place where those datasets get repackaged into formats ML practitioners can actually load and work with, contributed by the people who know the source material best.
+The org started as a [datasets hackathon](https://github.com/bigscience-workshop/lam) inside the [BigScience](https://bigscience.huggingface.co/) project in 2022 and has grown into a standing community for cultural-heritage ML.
+## What's here
+The org is datasets-first: 46+ image, text, and tabular collections from libraries, archives, and museums, prepared so they load cleanly with the `datasets` library. A handful of [models](https://huggingface.co/biglam?other=model) and [spaces](https://huggingface.co/biglam?other=space) live here too — mostly early experiments from the BigScience-era hackathon.
+For task-specific, deployable models built on top of these datasets, see the sibling org [small-models-for-glam](https://huggingface.co/small-models-for-glam).
+## Contributing a dataset
+If you've prepared a LAM dataset that other researchers might use, the best home is usually your **institution's own Hugging Face organisation** (e.g. [`NationalLibraryOfScotland`](https://huggingface.co/NationalLibraryOfScotland)). Institutional ownership signals authority over the data and makes long-term maintenance easier. Setting up a new org on the Hub is [free and quick](https://huggingface.co/organizations/new).
+If your institution isn't on the Hub yet, or you'd prefer to host the dataset here, [open a discussion](https://huggingface.co/spaces/biglam/README/discussions) and we'll help get it set up under BigLAM. Useful additions are typically datasets where the format conversion (METS/ALTO → parquet, IIIF manifest → loadable image splits, etc.) has already been done and the licensing is clear enough for open release.
+**Already have a dataset here that should sit under your institution's org?** Open a discussion or issue on the dataset repo — we're happy to transfer ownership.
+---
+60+ contributors over the years. Day-to-day maintenance is light-touch; for help with a contribution, open a discussion and someone will see it.