| --- |
| title: ModelLens |
| emoji: π |
| colorFrom: indigo |
| colorTo: pink |
| sdk: gradio |
| sdk_version: 4.44.0 |
| python_version: '3.11' |
| app_file: app.py |
| pinned: false |
| license: mit |
| short_description: Finding the Best Model for Your Task from Myriads of Models |
| --- |
| |
| # ModelLens β Finding the Best Model for Your Task from Myriads of Models |
|
|
| Describe your dataset β pick a task and metric β get a ranked list of HuggingFace |
| models likely to perform well on it. Backed by the `MLPMetric` (ablation_no_id) |
| checkpoint trained on the `unified_augmented` corpus, with a candidate pool of |
| ~47k HuggingFace models. |
|
|
| ## How it works |
|
|
| 1. Your dataset description is embedded with OpenAI `text-embedding-3-small` |
| (1536-dim, the same encoder used during training). |
| 2. The MLPMetric scores every candidate model conditioned on the embedding + |
| chosen task + chosen metric. |
| 3. We return the top-k, optionally filtered by parameter count, "official |
| pretrained only", or "HuggingFace-hosted only". |
|
|
| ## Bring your own OpenAI key |
|
|
| This Space does **not** ship with a baked-in OpenAI key. Paste your own |
| `sk-...` key into the "OpenAI API key" field β it is sent directly to OpenAI |
| for that single request and is **not stored, logged, or reused** by this Space. |
| A query costs roughly **$0.000001** on your account (about a millionth of a |
| dollar). |
|
|
| If you don't have a key yet: https://platform.openai.com/api-keys |
|
|
| ## Files in this Space |
|
|
| ``` |
| app.py Gradio entry point |
| recommend.py Recommender (loads checkpoint + model pool, embeds dataset desc) |
| inference_lib.py Self-contained MLPMetric implementation (no module/ tree needed) |
| build_model_pool.py Offline helper to (re)build assets/model_pool.npz |
| requirements.txt Pinned deps |
| assets/ |
| model_pool.npz Pre-computed candidate pool (47k models, size+family ids, popularity, HF urls) |
| checkpoint/ |
| MLPMetric.pt ~37 MB trained weights |
| args.json Training-time hyperparameters (model dims, num_*) |
| data/ |
| task2id.json Task vocab |
| metric2id.json Metric vocab |
| ``` |
|
|
| The Space looks for the checkpoint at `checkpoint/MLPMetric.pt` and the data |
| JSONs at `data/`. Override with env vars `MODEL_CKPT`, `MODEL_ARGS`, `DATA_DIR`, |
| `POOL_PATH` if you lay things out differently. |
|
|
| ## Running locally |
|
|
| ```bash |
| cd web |
| pip install -r requirements.txt |
| # either set OPENAI_API_KEY in env, or paste it into the UI at runtime |
| python app.py |
| # open http://localhost:7860 |
| ``` |
|
|
| ## Rebuilding the model pool |
|
|
| When you bump the candidate set (e.g. add new HF models to `model2id.json` / |
| `model_profile.json`): |
|
|
| ```bash |
| python web/build_model_pool.py \ |
| --data-dir data/unified_augmented \ |
| --args checkpoint/mlp/unified_augmented/ablation_no_model_id_no_dataset_id/args.json \ |
| --out web/assets/model_pool.npz \ |
| --min-popularity 0 |
| ``` |
|
|