Spaces:

midah
/

hf-viz

Sleeping

App Files Files Community

hf-viz / docs /README_SPACE.md

midah

Clean up repository: remove planning docs and organize structure

da2430e about 1 month ago

preview code

raw

history blame contribute delete

2.66 kB

metadata

title: HF Model Ecosystem Visualizer
emoji: 🌐
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
license: mit
app_port: 7860

Anatomy of a Machine Learning Ecosystem: 2 Million Models on Hugging Face

Authors: Benjamin Laufer, Hamidah Oderinwale, Jon Kleinberg

Research Paper: arXiv:2508.06811

About This Tool

This interactive visualization explores ~1.86M models from the Hugging Face ecosystem, visualizing them in a 3D embedding space where similar models appear closer together. The tool uses chunked embeddings for fast startup and efficient memory usage.

Features

Fast Startup: 2-5 seconds (uses chunked embeddings)
Low Memory: ~100MB idle (vs 2.8GB without chunking)
Scalable: Handles millions of models efficiently
Interactive: Filter, search, and explore model relationships
Family Trees: Visualize parent-child relationships between models

How It Works

The system uses:

Chunked Embeddings: Pre-computed embeddings stored in chunks (50k models per chunk)
On-Demand Loading: Only loads embeddings for filtered models
Pre-computed Coordinates: UMAP coordinates stored with model metadata
Fast API: FastAPI backend with efficient data loading

Data Source

Dataset: modelbiome/ai_ecosystem
Pre-computed Data: Automatically downloaded from modelbiome/hf-viz-precomputed on startup

Deployment

This Space automatically:

Downloads pre-computed chunked data from Hugging Face Hub
Starts the FastAPI backend
Serves the React frontend
Uses chunked loading for efficient memory usage

Performance

Startup: 2-5 seconds
Memory: ~100MB idle, ~200-500MB active
API Response: <1s for filtered queries
Scales To: Unlimited models

Usage

Filter Models: Use the sidebar to filter by downloads, likes, search query
Explore: Zoom and pan to explore the embedding space
Search: Search for specific models or tags
View Details: Click on models to see detailed information

Technical Details

Backend: FastAPI (Python)
Frontend: React + TypeScript
Embeddings: SentenceTransformer (all-MiniLM-L6-v2)
Visualization: UMAP (3D coordinates)
Storage: Parquet files with chunked embeddings