ModelLens / README.md
luisrui
Pin Python version to 3.11 for prebuilt numpy/pandas/torch wheels
3c47c7b
---
title: ModelLens
emoji: πŸ”­
colorFrom: indigo
colorTo: pink
sdk: gradio
sdk_version: 4.44.0
python_version: '3.11'
app_file: app.py
pinned: false
license: mit
short_description: Finding the Best Model for Your Task from Myriads of Models
---
# ModelLens β€” Finding the Best Model for Your Task from Myriads of Models
Describe your dataset β†’ pick a task and metric β†’ get a ranked list of HuggingFace
models likely to perform well on it. Backed by the `MLPMetric` (ablation_no_id)
checkpoint trained on the `unified_augmented` corpus, with a candidate pool of
~47k HuggingFace models.
## How it works
1. Your dataset description is embedded with OpenAI `text-embedding-3-small`
(1536-dim, the same encoder used during training).
2. The MLPMetric scores every candidate model conditioned on the embedding +
chosen task + chosen metric.
3. We return the top-k, optionally filtered by parameter count, "official
pretrained only", or "HuggingFace-hosted only".
## Bring your own OpenAI key
This Space does **not** ship with a baked-in OpenAI key. Paste your own
`sk-...` key into the "OpenAI API key" field β€” it is sent directly to OpenAI
for that single request and is **not stored, logged, or reused** by this Space.
A query costs roughly **$0.000001** on your account (about a millionth of a
dollar).
If you don't have a key yet: https://platform.openai.com/api-keys
## Files in this Space
```
app.py Gradio entry point
recommend.py Recommender (loads checkpoint + model pool, embeds dataset desc)
inference_lib.py Self-contained MLPMetric implementation (no module/ tree needed)
build_model_pool.py Offline helper to (re)build assets/model_pool.npz
requirements.txt Pinned deps
assets/
model_pool.npz Pre-computed candidate pool (47k models, size+family ids, popularity, HF urls)
checkpoint/
MLPMetric.pt ~37 MB trained weights
args.json Training-time hyperparameters (model dims, num_*)
data/
task2id.json Task vocab
metric2id.json Metric vocab
```
The Space looks for the checkpoint at `checkpoint/MLPMetric.pt` and the data
JSONs at `data/`. Override with env vars `MODEL_CKPT`, `MODEL_ARGS`, `DATA_DIR`,
`POOL_PATH` if you lay things out differently.
## Running locally
```bash
cd web
pip install -r requirements.txt
# either set OPENAI_API_KEY in env, or paste it into the UI at runtime
python app.py
# open http://localhost:7860
```
## Rebuilding the model pool
When you bump the candidate set (e.g. add new HF models to `model2id.json` /
`model_profile.json`):
```bash
python web/build_model_pool.py \
--data-dir data/unified_augmented \
--args checkpoint/mlp/unified_augmented/ablation_no_model_id_no_dataset_id/args.json \
--out web/assets/model_pool.npz \
--min-popularity 0
```