hf-viz / docs /README_SPACE.md
midah's picture
Clean up repository: remove planning docs and organize structure
da2430e
---
title: HF Model Ecosystem Visualizer
emoji: ๐ŸŒ
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
license: mit
app_port: 7860
---
# Anatomy of a Machine Learning Ecosystem: 2 Million Models on Hugging Face
**Authors:** Benjamin Laufer, Hamidah Oderinwale, Jon Kleinberg
**Research Paper**: [arXiv:2508.06811](https://arxiv.org/abs/2508.06811)
## About This Tool
This interactive visualization explores ~1.86M models from the Hugging Face ecosystem, visualizing them in a 3D embedding space where similar models appear closer together. The tool uses **chunked embeddings** for fast startup and efficient memory usage.
## Features
- **Fast Startup**: 2-5 seconds (uses chunked embeddings)
- **Low Memory**: ~100MB idle (vs 2.8GB without chunking)
- **Scalable**: Handles millions of models efficiently
- **Interactive**: Filter, search, and explore model relationships
- **Family Trees**: Visualize parent-child relationships between models
## How It Works
The system uses:
1. **Chunked Embeddings**: Pre-computed embeddings stored in chunks (50k models per chunk)
2. **On-Demand Loading**: Only loads embeddings for filtered models
3. **Pre-computed Coordinates**: UMAP coordinates stored with model metadata
4. **Fast API**: FastAPI backend with efficient data loading
## Data Source
- **Dataset**: [modelbiome/ai_ecosystem](https://huggingface.co/datasets/modelbiome/ai_ecosystem)
- **Pre-computed Data**: Automatically downloaded from `modelbiome/hf-viz-precomputed` on startup
## Deployment
This Space automatically:
1. Downloads pre-computed chunked data from Hugging Face Hub
2. Starts the FastAPI backend
3. Serves the React frontend
4. Uses chunked loading for efficient memory usage
## Performance
- **Startup**: 2-5 seconds
- **Memory**: ~100MB idle, ~200-500MB active
- **API Response**: <1s for filtered queries
- **Scales To**: Unlimited models
## Usage
1. **Filter Models**: Use the sidebar to filter by downloads, likes, search query
2. **Explore**: Zoom and pan to explore the embedding space
3. **Search**: Search for specific models or tags
4. **View Details**: Click on models to see detailed information
## Technical Details
- **Backend**: FastAPI (Python)
- **Frontend**: React + TypeScript
- **Embeddings**: SentenceTransformer (all-MiniLM-L6-v2)
- **Visualization**: UMAP (3D coordinates)
- **Storage**: Parquet files with chunked embeddings
## Resources
- **GitHub**: [bendlaufer/ai-ecosystem](https://github.com/bendlaufer/ai-ecosystem)
- **Paper**: [arXiv:2508.06811](https://arxiv.org/abs/2508.06811)
- **Dataset**: [modelbiome/ai_ecosystem](https://huggingface.co/datasets/modelbiome/ai_ecosystem)