--- title: ModelLens emoji: 🔭 colorFrom: indigo colorTo: pink sdk: gradio sdk_version: 4.44.0 python_version: '3.11' app_file: app.py pinned: false license: mit short_description: Finding the Best Model for Your Task from Myriads of Models --- # ModelLens — Finding the Best Model for Your Task from Myriads of Models Describe your dataset → pick a task and metric → get a ranked list of HuggingFace models likely to perform well on it. Backed by the `MLPMetric` (ablation_no_id) checkpoint trained on the `unified_augmented` corpus, with a candidate pool of ~47k HuggingFace models. ## How it works 1. Your dataset description is embedded with OpenAI `text-embedding-3-small` (1536-dim, the same encoder used during training). 2. The MLPMetric scores every candidate model conditioned on the embedding + chosen task + chosen metric. 3. We return the top-k, optionally filtered by parameter count, "official pretrained only", or "HuggingFace-hosted only". ## Bring your own OpenAI key This Space does **not** ship with a baked-in OpenAI key. Paste your own `sk-...` key into the "OpenAI API key" field — it is sent directly to OpenAI for that single request and is **not stored, logged, or reused** by this Space. A query costs roughly **$0.000001** on your account (about a millionth of a dollar). If you don't have a key yet: https://platform.openai.com/api-keys ## Files in this Space ``` app.py Gradio entry point recommend.py Recommender (loads checkpoint + model pool, embeds dataset desc) inference_lib.py Self-contained MLPMetric implementation (no module/ tree needed) build_model_pool.py Offline helper to (re)build assets/model_pool.npz requirements.txt Pinned deps assets/ model_pool.npz Pre-computed candidate pool (47k models, size+family ids, popularity, HF urls) checkpoint/ MLPMetric.pt ~37 MB trained weights args.json Training-time hyperparameters (model dims, num_*) data/ task2id.json Task vocab metric2id.json Metric vocab ``` The Space looks for the checkpoint at `checkpoint/MLPMetric.pt` and the data JSONs at `data/`. Override with env vars `MODEL_CKPT`, `MODEL_ARGS`, `DATA_DIR`, `POOL_PATH` if you lay things out differently. ## Running locally ```bash cd web pip install -r requirements.txt # either set OPENAI_API_KEY in env, or paste it into the UI at runtime python app.py # open http://localhost:7860 ``` ## Rebuilding the model pool When you bump the candidate set (e.g. add new HF models to `model2id.json` / `model_profile.json`): ```bash python web/build_model_pool.py \ --data-dir data/unified_augmented \ --args checkpoint/mlp/unified_augmented/ablation_no_model_id_no_dataset_id/args.json \ --out web/assets/model_pool.npz \ --min-popularity 0 ```