---
title: README  
emoji: 📚  
colorFrom: pink  
colorTo: gray  
sdk: static  
pinned: false  
---

# 📚 BigLAM: Machine Learning for Libraries, Archives, and Museums

**BigLAM** is a community-driven initiative to build an open ecosystem of machine learning models, datasets, and tools for **Libraries, Archives, and Museums (LAMs)**.

We aim to:

- 🗃️ Share machine-learning-ready datasets from LAMs via the [Hugging Face Hub](https://huggingface.co/biglam)
- 🤖 Train and release open-source models for LAM-relevant tasks
- 🛠️ Develop tools and approaches tailored to LAM use cases

---

<details>
<summary><strong>✨ Background</strong></summary>

BigLAM began as a [datasets hackathon](https://github.com/bigscience-workshop/lam) within the [BigScience 🌸](https://bigscience.huggingface.co/) project, a large-scale, open NLP collaboration.  

Our goal: make LAM datasets more discoverable and usable to support researchers, institutions, and ML practitioners working with cultural heritage data.
</details>


<details>
<summary><strong>📂 What You'll Find</strong></summary>

The [BigLAM organization](https://huggingface.co/biglam) hosts:

- **Datasets**: image, text, and tabular data from and about libraries, archives, and museums
- **Models**: fine-tuned for tasks like:
  - Art/historical image classification
  - Document layout analysis and OCR
  - Metadata quality assessment
  - Named entity recognition in heritage texts
- **Spaces**: tools for interactive exploration and demonstration
</details>

<details>
<summary><strong>🧩 Get Involved</strong></summary>

We welcome contributions! You can:

- Use our [datasets and models](https://huggingface.co/biglam)
- Join the discussion on [GitHub](https://github.com/bigscience-workshop/lam/discussions)
- Contribute your own tools or data
- Share your work using BigLAM resources
</details>

## 🌍 Why It Matters

Cultural heritage data is often underrepresented in machine learning. BigLAM helps address this by:

- Supporting inclusive and responsible AI
- Helping institutions experiment with ML for access, discovery, and preservation
- Ensuring that ML systems reflect diverse human knowledge and expression
- Developing tools and methods that work well with the unique formats, values, and needs of LAMs