File size: 4,413 Bytes
9bf25b2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7ded8ac
 
 
 
9bf25b2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
68fabcb
 
 
 
 
9bf25b2
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
---
license: mit
library_name: pytorch
tags:
  - model-recommendation
  - model-selection
  - ranking
  - model-routing
  - benchmarks
  - leaderboard
pipeline_tag: tabular-regression
---

# ModelLens — Trained Recommender Checkpoint

📄 **Paper**: [ModelLens: Finding the Best Model for Your Task from Myriads of Models](https://huggingface.co/papers/2605.07075)
 Â·  🤗 **Collection**: [luisrui/modellens](https://huggingface.co/collections/luisrui/modellens)
 Â·  💻 **Code**: [github.com/luisrui/ModelLens](https://github.com/luisrui/ModelLens)

This is the released **ModelLens** checkpoint — a metric-aware ranker that,
given a dataset description + task + metric, returns a ranked list of
HuggingFace models likely to perform well on it. No fine-tuning, no
forward pass on the target dataset.

This repo only ships the weights. For:

- **Live demo (Gradio)**: 🤗 [`luisrui/ModelLens`](https://huggingface.co/spaces/luisrui/ModelLens)
- **Training data**: 🤗 [`luisrui/ModelLens-corpus-v2`](https://huggingface.co/datasets/luisrui/ModelLens-corpus-v2) (1.81M rows, recommended)
- **Source code**: [github.com/luisrui/ModelLens](https://github.com/luisrui/ModelLens)
- **Paper**: see citation below

## What's in here

| File | Size | Description |
|---|---:|---|
| `ModelLens.pt` | ~709 MB | Trained recommender weights (slim — inference-ready, ~3 unused parent-class buffers dropped) |
| `args.json` | ~2 KB | Training-time hyperparameters (model dims, num_models / num_tasks / num_metrics / etc.) |

## Provenance

- **Trained on**: [`luisrui/ModelLens-corpus-v2`](https://huggingface.co/datasets/luisrui/ModelLens-corpus-v2) — 1,807,133 (model × dataset × metric × value) records
- **Coverage**: 47,242 HuggingFace models · 2,581 tasks · 3,714 metrics · ~86k datasets
- **Architecture**: `MLPMetricFull` (the paper model — see [github repo](https://github.com/luisrui/ModelLens))
- **Loss**: ensemble (listwise + pairwise + pointwise, `λ_list=0.5, λ_pair=1.0, w_point=0.1`)
- **Training**: 30 epochs, DDP × 4 GPUs, `bs=8`, `lr=1e-3`, `wd=1e-4`, learnable `τ`
- **Slimmed checkpoint**: inference-unused parent-class buffers + train-set `dataset_desc_matrix` stripped (load with `strict=False`).

## Loading

```python
from huggingface_hub import hf_hub_download
import torch, json

ckpt_path = hf_hub_download("luisrui/ModelLens", "ModelLens.pt")
args_path = hf_hub_download("luisrui/ModelLens", "args.json")

args  = json.load(open(args_path))
state = torch.load(ckpt_path, map_location="cpu")

# Build the model from source (see github.com/luisrui/ModelLens) and load:
# model = MLPMetricFull(**args_to_kwargs(args))
# model.load_state_dict(state, strict=False)   # strict=False is intentional
```

For a complete, ready-to-run setup including the candidate model pool +
metadata, see [`inference_lib.py`](https://huggingface.co/spaces/luisrui/ModelLens/blob/main/inference_lib.py)
and [`recommend.py`](https://huggingface.co/spaces/luisrui/ModelLens/blob/main/recommend.py)
in the Space.

## How it works

1. The dataset description is embedded with OpenAI `text-embedding-3-small`
   (1536-dim — same encoder used at training time).
2. The ranker scores every candidate model conditioned on
   `(dataset_embedding, task_id, metric_id, model_size_bucket, model_family_id, model_id)`.
3. Returns the top-K candidates, optionally filtered by param count /
   "HF-hosted only" / "official pretrained only".

## Intended use

- Picking a starting model for a new task / dataset, without running
  every candidate.
- Cheap pre-filter ahead of a more expensive transferability estimator
  or partial fine-tune.

## Limitations

- Knowledge is bounded by what's in `corpus-v2` (up to early 2026).
- Models / datasets that don't appear in the corpus fall back to text
  similarity over their descriptions — useful but weaker than the full
  signal available for in-corpus entities.
- Scores are *relative* — the ranking is what matters; the absolute
  numbers are not calibrated to any specific metric scale.

## Citation

```bibtex
@article{cai2026modellens,
  title={ModelLens: Finding the Best for Your Task from Myriads of Models},
  author={Cai, Rui and Mo, Weijie Jacky and Wen, Xiaofei and Ma, Qiyao and Zhu, Wenhui and Chen, Xiwen and Chen, Muhao and Zhao, Zhe},
  journal={arXiv preprint arXiv:2605.07075},
  year={2026}
}
```

## License

MIT.