README: fix citation block (use correct @article format with arXiv ID + full author list)

68fabcb verified 4 days ago

4.41 kB

license: mit
library_name: pytorch
tags:
  - model-recommendation
  - model-selection
  - ranking
  - model-routing
  - benchmarks
  - leaderboard
pipeline_tag: tabular-regression

ModelLens — Trained Recommender Checkpoint

📄 Paper: ModelLens: Finding the Best Model for Your Task from Myriads of Models · 🤗 Collection: luisrui/modellens · 💻 Code: github.com/luisrui/ModelLens

This is the released ModelLens checkpoint — a metric-aware ranker that, given a dataset description + task + metric, returns a ranked list of HuggingFace models likely to perform well on it. No fine-tuning, no forward pass on the target dataset.

This repo only ships the weights. For:

Live demo (Gradio): 🤗 luisrui/ModelLens
Training data: 🤗 luisrui/ModelLens-corpus-v2 (1.81M rows, recommended)
Source code: github.com/luisrui/ModelLens
Paper: see citation below

What's in here

File	Size	Description
`ModelLens.pt`	~709 MB	Trained recommender weights (slim — inference-ready, ~3 unused parent-class buffers dropped)
`args.json`	~2 KB	Training-time hyperparameters (model dims, num_models / num_tasks / num_metrics / etc.)

Provenance

Trained on: luisrui/ModelLens-corpus-v2 — 1,807,133 (model × dataset × metric × value) records
Coverage: 47,242 HuggingFace models · 2,581 tasks · 3,714 metrics · ~86k datasets
Architecture: MLPMetricFull (the paper model — see github repo)
Loss: ensemble (listwise + pairwise + pointwise, λ_list=0.5, λ_pair=1.0, w_point=0.1)
Training: 30 epochs, DDP × 4 GPUs, bs=8, lr=1e-3, wd=1e-4, learnable τ
Slimmed checkpoint: inference-unused parent-class buffers + train-set dataset_desc_matrix stripped (load with strict=False).

Loading

from huggingface_hub import hf_hub_download
import torch, json

ckpt_path = hf_hub_download("luisrui/ModelLens", "ModelLens.pt")
args_path = hf_hub_download("luisrui/ModelLens", "args.json")

args  = json.load(open(args_path))
state = torch.load(ckpt_path, map_location="cpu")

# Build the model from source (see github.com/luisrui/ModelLens) and load:
# model = MLPMetricFull(**args_to_kwargs(args))
# model.load_state_dict(state, strict=False)   # strict=False is intentional

For a complete, ready-to-run setup including the candidate model pool + metadata, see inference_lib.py and recommend.py in the Space.

How it works

The dataset description is embedded with OpenAI text-embedding-3-small (1536-dim — same encoder used at training time).
The ranker scores every candidate model conditioned on (dataset_embedding, task_id, metric_id, model_size_bucket, model_family_id, model_id).
Returns the top-K candidates, optionally filtered by param count / "HF-hosted only" / "official pretrained only".

Intended use

Picking a starting model for a new task / dataset, without running every candidate.
Cheap pre-filter ahead of a more expensive transferability estimator or partial fine-tune.

Limitations

Knowledge is bounded by what's in corpus-v2 (up to early 2026).
Models / datasets that don't appear in the corpus fall back to text similarity over their descriptions — useful but weaker than the full signal available for in-corpus entities.
Scores are relative — the ranking is what matters; the absolute numbers are not calibrated to any specific metric scale.

Citation

@article{cai2026modellens,
  title={ModelLens: Finding the Best for Your Task from Myriads of Models},
  author={Cai, Rui and Mo, Weijie Jacky and Wen, Xiaofei and Ma, Qiyao and Zhu, Wenhui and Chen, Xiwen and Chen, Muhao and Zhao, Zhe},
  journal={arXiv preprint arXiv:2605.07075},
  year={2026}
}

License

MIT.