ModelLens / README.md
luisrui
Pin Python version to 3.11 for prebuilt numpy/pandas/torch wheels
3c47c7b

A newer version of the Gradio SDK is available: 6.14.0

Upgrade
metadata
title: ModelLens
emoji: 🔭
colorFrom: indigo
colorTo: pink
sdk: gradio
sdk_version: 4.44.0
python_version: '3.11'
app_file: app.py
pinned: false
license: mit
short_description: Finding the Best Model for Your Task from Myriads of Models

ModelLens — Finding the Best Model for Your Task from Myriads of Models

Describe your dataset → pick a task and metric → get a ranked list of HuggingFace models likely to perform well on it. Backed by the MLPMetric (ablation_no_id) checkpoint trained on the unified_augmented corpus, with a candidate pool of ~47k HuggingFace models.

How it works

  1. Your dataset description is embedded with OpenAI text-embedding-3-small (1536-dim, the same encoder used during training).
  2. The MLPMetric scores every candidate model conditioned on the embedding + chosen task + chosen metric.
  3. We return the top-k, optionally filtered by parameter count, "official pretrained only", or "HuggingFace-hosted only".

Bring your own OpenAI key

This Space does not ship with a baked-in OpenAI key. Paste your own sk-... key into the "OpenAI API key" field — it is sent directly to OpenAI for that single request and is not stored, logged, or reused by this Space. A query costs roughly $0.000001 on your account (about a millionth of a dollar).

If you don't have a key yet: https://platform.openai.com/api-keys

Files in this Space

app.py              Gradio entry point
recommend.py        Recommender (loads checkpoint + model pool, embeds dataset desc)
inference_lib.py    Self-contained MLPMetric implementation (no module/ tree needed)
build_model_pool.py Offline helper to (re)build assets/model_pool.npz
requirements.txt    Pinned deps
assets/
  model_pool.npz    Pre-computed candidate pool (47k models, size+family ids, popularity, HF urls)
checkpoint/
  MLPMetric.pt      ~37 MB trained weights
  args.json         Training-time hyperparameters (model dims, num_*)
data/
  task2id.json      Task vocab
  metric2id.json    Metric vocab

The Space looks for the checkpoint at checkpoint/MLPMetric.pt and the data JSONs at data/. Override with env vars MODEL_CKPT, MODEL_ARGS, DATA_DIR, POOL_PATH if you lay things out differently.

Running locally

cd web
pip install -r requirements.txt
# either set OPENAI_API_KEY in env, or paste it into the UI at runtime
python app.py
# open http://localhost:7860

Rebuilding the model pool

When you bump the candidate set (e.g. add new HF models to model2id.json / model_profile.json):

python web/build_model_pool.py \
    --data-dir data/unified_augmented \
    --args     checkpoint/mlp/unified_augmented/ablation_no_model_id_no_dataset_id/args.json \
    --out      web/assets/model_pool.npz \
    --min-popularity 0