Expand org README: datasets-first reframe + clearer contribution paths

#3
by davanstrien HF Staff - opened
Files changed (1) hide show
  1. README.md +13 -43
README.md CHANGED
@@ -7,58 +7,28 @@ sdk: static
7
  pinned: false
8
  ---
9
 
10
- # πŸ“š BigLAM: Machine Learning for Libraries, Archives, and Museums
11
 
12
- **BigLAM** is a community-driven initiative to build an open ecosystem of machine learning models, datasets, and tools for **Libraries, Archives, and Museums (LAMs)**.
13
 
14
- We aim to:
15
 
16
- - πŸ—ƒοΈ Share machine-learning-ready datasets from LAMs via the [Hugging Face Hub](https://huggingface.co/biglam)
17
- - πŸ€– Train and release open-source models for LAM-relevant tasks
18
- - πŸ› οΈ Develop tools and approaches tailored to LAM use cases
19
 
20
- ---
21
-
22
- <details>
23
- <summary><strong>✨ Background</strong></summary>
24
-
25
- BigLAM began as a [datasets hackathon](https://github.com/bigscience-workshop/lam) within the [BigScience 🌸](https://bigscience.huggingface.co/) project, a large-scale, open NLP collaboration.
26
-
27
- Our goal: make LAM datasets more discoverable and usable to support researchers, institutions, and ML practitioners working with cultural heritage data.
28
- </details>
29
 
 
30
 
31
- <details>
32
- <summary><strong>πŸ“‚ What You'll Find</strong></summary>
33
 
34
- The [BigLAM organization](https://huggingface.co/biglam) hosts:
35
 
36
- - **Datasets**: image, text, and tabular data from and about libraries, archives, and museums
37
- - **Models**: fine-tuned for tasks like:
38
- - Art/historical image classification
39
- - Document layout analysis and OCR
40
- - Metadata quality assessment
41
- - Named entity recognition in heritage texts
42
- - **Spaces**: tools for interactive exploration and demonstration
43
- </details>
44
 
45
- <details>
46
- <summary><strong>🧩 Get Involved</strong></summary>
47
 
48
- We welcome contributions! You can:
49
 
50
- - Use our [datasets and models](https://huggingface.co/biglam)
51
- - Join the discussion on [GitHub](https://github.com/bigscience-workshop/lam/discussions)
52
- - Contribute your own tools or data
53
- - Share your work using BigLAM resources
54
- </details>
55
-
56
- ## 🌍 Why It Matters
57
-
58
- Cultural heritage data is often underrepresented in machine learning. BigLAM helps address this by:
59
-
60
- - Supporting inclusive and responsible AI
61
- - Helping institutions experiment with ML for access, discovery, and preservation
62
- - Ensuring that ML systems reflect diverse human knowledge and expression
63
- - Developing tools and methods that work well with the unique formats, values, and needs of LAMs
64
 
 
 
7
  pinned: false
8
  ---
9
 
10
+ # πŸ“š BigLAM
11
 
12
+ A community-run home for machine-learning-ready datasets from libraries, archives, and museums.
13
 
14
+ Most cultural-heritage data wasn't originally prepared with ML workflows in mind β€” it lives in catalogue systems, IIIF endpoints, METS/MODS records, and various idiosyncratic formats that each institution has its own version of. BigLAM is a place where those datasets get repackaged into formats ML practitioners can actually load and work with, contributed by the people who know the source material best.
15
 
16
+ The org started as a [datasets hackathon](https://github.com/bigscience-workshop/lam) inside the [BigScience](https://bigscience.huggingface.co/) project in 2022 and has grown into a standing community for cultural-heritage ML.
 
 
17
 
18
+ ## What's here
 
 
 
 
 
 
 
 
19
 
20
+ The org is datasets-first: 46+ image, text, and tabular collections from libraries, archives, and museums, prepared so they load cleanly with the `datasets` library. A handful of [models](https://huggingface.co/biglam?other=model) and [spaces](https://huggingface.co/biglam?other=space) live here too β€” mostly early experiments from the BigScience-era hackathon.
21
 
22
+ For task-specific, deployable models built on top of these datasets, see the sibling org [small-models-for-glam](https://huggingface.co/small-models-for-glam).
 
23
 
24
+ ## Contributing a dataset
25
 
26
+ If you've prepared a LAM dataset that other researchers might use, the best home is usually your **institution's own Hugging Face organisation** (e.g. [`NationalLibraryOfScotland`](https://huggingface.co/NationalLibraryOfScotland)). Institutional ownership signals authority over the data and makes long-term maintenance easier. Setting up a new org on the Hub is [free and quick](https://huggingface.co/organizations/new).
 
 
 
 
 
 
 
27
 
28
+ If your institution isn't on the Hub yet, or you'd prefer to host the dataset here, [open a discussion](https://huggingface.co/spaces/biglam/README/discussions) and we'll help get it set up under BigLAM. Useful additions are typically datasets where the format conversion (METS/ALTO β†’ parquet, IIIF manifest β†’ loadable image splits, etc.) has already been done and the licensing is clear enough for open release.
 
29
 
30
+ **Already have a dataset here that should sit under your institution's org?** Open a discussion or issue on the dataset repo β€” we're happy to transfer ownership.
31
 
32
+ ---
 
 
 
 
 
 
 
 
 
 
 
 
 
33
 
34
+ 60+ contributors over the years. Day-to-day maintenance is light-touch; for help with a contribution, open a discussion and someone will see it.