| --- |
| license: mit |
| library_name: pytorch |
| tags: |
| - model-recommendation |
| - model-selection |
| - ranking |
| - model-routing |
| - benchmarks |
| - leaderboard |
| pipeline_tag: tabular-regression |
| --- |
| |
| # ModelLens β Trained Recommender Checkpoint |
|
|
| π **Paper**: [ModelLens: Finding the Best Model for Your Task from Myriads of Models](https://huggingface.co/papers/2605.07075) |
| Β· π€ **Collection**: [luisrui/modellens](https://huggingface.co/collections/luisrui/modellens) |
| Β· π» **Code**: [github.com/luisrui/ModelLens](https://github.com/luisrui/ModelLens) |
|
|
| This is the released **ModelLens** checkpoint β a metric-aware ranker that, |
| given a dataset description + task + metric, returns a ranked list of |
| HuggingFace models likely to perform well on it. No fine-tuning, no |
| forward pass on the target dataset. |
|
|
| This repo only ships the weights. For: |
|
|
| - **Live demo (Gradio)**: π€ [`luisrui/ModelLens`](https://huggingface.co/spaces/luisrui/ModelLens) |
| - **Training data**: π€ [`luisrui/ModelLens-corpus-v2`](https://huggingface.co/datasets/luisrui/ModelLens-corpus-v2) (1.81M rows, recommended) |
| - **Source code**: [github.com/luisrui/ModelLens](https://github.com/luisrui/ModelLens) |
| - **Paper**: see citation below |
|
|
| ## What's in here |
|
|
| | File | Size | Description | |
| |---|---:|---| |
| | `ModelLens.pt` | ~709 MB | Trained recommender weights (slim β inference-ready, ~3 unused parent-class buffers dropped) | |
| | `args.json` | ~2 KB | Training-time hyperparameters (model dims, num_models / num_tasks / num_metrics / etc.) | |
| |
| ## Provenance |
| |
| - **Trained on**: [`luisrui/ModelLens-corpus-v2`](https://huggingface.co/datasets/luisrui/ModelLens-corpus-v2) β 1,807,133 (model Γ dataset Γ metric Γ value) records |
| - **Coverage**: 47,242 HuggingFace models Β· 2,581 tasks Β· 3,714 metrics Β· ~86k datasets |
| - **Architecture**: `MLPMetricFull` (the paper model β see [github repo](https://github.com/luisrui/ModelLens)) |
| - **Loss**: ensemble (listwise + pairwise + pointwise, `Ξ»_list=0.5, Ξ»_pair=1.0, w_point=0.1`) |
| - **Training**: 30 epochs, DDP Γ 4 GPUs, `bs=8`, `lr=1e-3`, `wd=1e-4`, learnable `Ο` |
| - **Slimmed checkpoint**: inference-unused parent-class buffers + train-set `dataset_desc_matrix` stripped (load with `strict=False`). |
|
|
| ## Loading |
|
|
| ```python |
| from huggingface_hub import hf_hub_download |
| import torch, json |
| |
| ckpt_path = hf_hub_download("luisrui/ModelLens", "ModelLens.pt") |
| args_path = hf_hub_download("luisrui/ModelLens", "args.json") |
| |
| args = json.load(open(args_path)) |
| state = torch.load(ckpt_path, map_location="cpu") |
| |
| # Build the model from source (see github.com/luisrui/ModelLens) and load: |
| # model = MLPMetricFull(**args_to_kwargs(args)) |
| # model.load_state_dict(state, strict=False) # strict=False is intentional |
| ``` |
|
|
| For a complete, ready-to-run setup including the candidate model pool + |
| metadata, see [`inference_lib.py`](https://huggingface.co/spaces/luisrui/ModelLens/blob/main/inference_lib.py) |
| and [`recommend.py`](https://huggingface.co/spaces/luisrui/ModelLens/blob/main/recommend.py) |
| in the Space. |
|
|
| ## How it works |
|
|
| 1. The dataset description is embedded with OpenAI `text-embedding-3-small` |
| (1536-dim β same encoder used at training time). |
| 2. The ranker scores every candidate model conditioned on |
| `(dataset_embedding, task_id, metric_id, model_size_bucket, model_family_id, model_id)`. |
| 3. Returns the top-K candidates, optionally filtered by param count / |
| "HF-hosted only" / "official pretrained only". |
|
|
| ## Intended use |
|
|
| - Picking a starting model for a new task / dataset, without running |
| every candidate. |
| - Cheap pre-filter ahead of a more expensive transferability estimator |
| or partial fine-tune. |
|
|
| ## Limitations |
|
|
| - Knowledge is bounded by what's in `corpus-v2` (up to early 2026). |
| - Models / datasets that don't appear in the corpus fall back to text |
| similarity over their descriptions β useful but weaker than the full |
| signal available for in-corpus entities. |
| - Scores are *relative* β the ranking is what matters; the absolute |
| numbers are not calibrated to any specific metric scale. |
|
|
| ## Citation |
|
|
| ```bibtex |
| @article{cai2026modellens, |
| title={ModelLens: Finding the Best for Your Task from Myriads of Models}, |
| author={Cai, Rui and Mo, Weijie Jacky and Wen, Xiaofei and Ma, Qiyao and Zhu, Wenhui and Chen, Xiwen and Chen, Muhao and Zhao, Zhe}, |
| journal={arXiv preprint arXiv:2605.07075}, |
| year={2026} |
| } |
| ``` |
|
|
| ## License |
|
|
| MIT. |
|
|