| | --- |
| | title: HF Model Ecosystem Visualizer |
| | emoji: ๐ |
| | colorFrom: blue |
| | colorTo: purple |
| | sdk: docker |
| | pinned: false |
| | license: mit |
| | app_port: 7860 |
| | --- |
| | |
| | # Anatomy of a Machine Learning Ecosystem: 2 Million Models on Hugging Face |
| |
|
| | **Authors:** Benjamin Laufer, Hamidah Oderinwale, Jon Kleinberg |
| |
|
| | **Research Paper**: [arXiv:2508.06811](https://arxiv.org/abs/2508.06811) |
| |
|
| | ## About This Tool |
| |
|
| | This interactive visualization explores ~1.86M models from the Hugging Face ecosystem, visualizing them in a 3D embedding space where similar models appear closer together. The tool uses **chunked embeddings** for fast startup and efficient memory usage. |
| |
|
| | ## Features |
| |
|
| | - **Fast Startup**: 2-5 seconds (uses chunked embeddings) |
| | - **Low Memory**: ~100MB idle (vs 2.8GB without chunking) |
| | - **Scalable**: Handles millions of models efficiently |
| | - **Interactive**: Filter, search, and explore model relationships |
| | - **Family Trees**: Visualize parent-child relationships between models |
| |
|
| | ## How It Works |
| |
|
| | The system uses: |
| | 1. **Chunked Embeddings**: Pre-computed embeddings stored in chunks (50k models per chunk) |
| | 2. **On-Demand Loading**: Only loads embeddings for filtered models |
| | 3. **Pre-computed Coordinates**: UMAP coordinates stored with model metadata |
| | 4. **Fast API**: FastAPI backend with efficient data loading |
| |
|
| | ## Data Source |
| |
|
| | - **Dataset**: [modelbiome/ai_ecosystem](https://huggingface.co/datasets/modelbiome/ai_ecosystem) |
| | - **Pre-computed Data**: Automatically downloaded from `modelbiome/hf-viz-precomputed` on startup |
| |
|
| | ## Deployment |
| |
|
| | This Space automatically: |
| | 1. Downloads pre-computed chunked data from Hugging Face Hub |
| | 2. Starts the FastAPI backend |
| | 3. Serves the React frontend |
| | 4. Uses chunked loading for efficient memory usage |
| |
|
| | ## Performance |
| |
|
| | - **Startup**: 2-5 seconds |
| | - **Memory**: ~100MB idle, ~200-500MB active |
| | - **API Response**: <1s for filtered queries |
| | - **Scales To**: Unlimited models |
| |
|
| | ## Usage |
| |
|
| | 1. **Filter Models**: Use the sidebar to filter by downloads, likes, search query |
| | 2. **Explore**: Zoom and pan to explore the embedding space |
| | 3. **Search**: Search for specific models or tags |
| | 4. **View Details**: Click on models to see detailed information |
| |
|
| | ## Technical Details |
| |
|
| | - **Backend**: FastAPI (Python) |
| | - **Frontend**: React + TypeScript |
| | - **Embeddings**: SentenceTransformer (all-MiniLM-L6-v2) |
| | - **Visualization**: UMAP (3D coordinates) |
| | - **Storage**: Parquet files with chunked embeddings |
| |
|
| | ## Resources |
| |
|
| | - **GitHub**: [bendlaufer/ai-ecosystem](https://github.com/bendlaufer/ai-ecosystem) |
| | - **Paper**: [arXiv:2508.06811](https://arxiv.org/abs/2508.06811) |
| | - **Dataset**: [modelbiome/ai_ecosystem](https://huggingface.co/datasets/modelbiome/ai_ecosystem) |
| |
|
| |
|
| |
|