--- title: README emoji: 👀 colorFrom: yellow colorTo: red sdk: static pinned: false --- # Hugging Face Data for Research A resource hub for researchers studying AI ecosystem development and adoption using data from the Hugging Face platform. ## Why the Hub matters for research The Hugging Face Hub offers a rich source of data for understanding how the AI ecosystem evolves. Information about **models**, **datasets**, **Spaces**, **papers**, and community activity is publicly accessible—making it possible to analyze trends in model development, dataset usage, research directions, and adoption patterns over time. ## How to access the data **Pre-compiled datasets** provide the simplest entry point. We recommend starting with community-maintained snapshots such as: - **[cfahlgren1/hub-stats](https://huggingface.co/datasets/cfahlgren1/hub-stats)** — Daily snapshots of models, datasets, Spaces, papers, and related metadata in Parquet format, suitable for large-scale analysis. For custom views or real-time access, use the **[Hub API](https://huggingface.co/docs/hub/api)**. The API supports programmatic access to repository metadata, search, and more. See the [OpenAPI specification](https://huggingface.co/.well-known/openapi.json) and [documentation](https://huggingface.co/docs/hub/api) for details. Python users can rely on the [`huggingface_hub`](https://github.com/huggingface/huggingface_hub) client. ## Understanding data limitations Metrics such as **download counts** are [useful but imperfect](https://huggingface.co/spaces/HF-Data-for-Research/README/discussions/1#69b9cbb18ba1f99b81c7e3f7). They reflect a complex process shaped by infrastructure, caching, and even repository type - you can find documentation [here](https://huggingface.co/docs/hub/models-download-stats). They work well for: - Global and temporal trends - Relative comparisons across resources - Longitudinal studies of adoption They are less reliable for fine-grained rankings or absolute comparisons. When designing studies, consider what the data actually measures and how infrastructure and noise may affect your conclusions. ## Explore and connect - **Collections** — Curated datasets and research using Hub data - **Discussions** — Questions, feedback, and collaboration: [Join the conversation](https://huggingface.co/spaces/HF-Data-for-Research/README/discussions) We welcome researchers interested in responsible use of Hub data for ecosystem studies.