---
title: README
emoji: 👀
colorFrom: yellow
colorTo: red
sdk: static
pinned: false
---

# Hugging Face Data for Research

A resource hub for researchers studying AI ecosystem development and adoption using data from the Hugging Face platform.

## Why the Hub matters for research

The Hugging Face Hub offers a rich source of data for understanding how the AI ecosystem evolves. Information about **models**, **datasets**, **Spaces**, **papers**, and community activity is publicly accessible—making it possible to analyze trends in model development, dataset usage, research directions, and adoption patterns over time.

## How to access the data

**Pre-compiled datasets** provide the simplest entry point. We recommend starting with community-maintained snapshots such as:
- **[cfahlgren1/hub-stats](https://huggingface.co/datasets/cfahlgren1/hub-stats)** — Daily snapshots of models, datasets, Spaces, papers, and related metadata in Parquet format, suitable for large-scale analysis.

For custom views or real-time access, use the **[Hub API](https://huggingface.co/docs/hub/api)**.
The API supports programmatic access to repository metadata, search, and more.
See the [OpenAPI specification](https://huggingface.co/.well-known/openapi.json) and [documentation](https://huggingface.co/docs/hub/api) for details.
Python users can rely on the [`huggingface_hub`](https://github.com/huggingface/huggingface_hub) client.

## Understanding data limitations


Metrics such as **download counts** are [useful but imperfect](https://huggingface.co/spaces/HF-Data-for-Research/README/discussions/1#69b9cbb18ba1f99b81c7e3f7).
They reflect a complex process shaped by infrastructure, caching, and even repository type - you can find documentation [here](https://huggingface.co/docs/hub/models-download-stats). They work well for:
- Global and temporal trends
- Relative comparisons across resources
- Longitudinal studies of adoption

They are less reliable for fine-grained rankings or absolute comparisons.
When designing studies, consider what the data actually measures and how infrastructure and noise may affect your conclusions.


## Explore and connect

- **Collections** — Curated datasets and research using Hub data
- **Discussions** — Questions, feedback, and collaboration: [Join the conversation](https://huggingface.co/spaces/HF-Data-for-Research/README/discussions)
  
We welcome researchers interested in responsible use of Hub data for ecosystem studies.