Spaces:

luisrui
/

ModelLens

Running

App Files Files Community

ModelLens / README.md

luisrui

Pin Python version to 3.11 for prebuilt numpy/pandas/torch wheels

3c47c7b 4 days ago

preview code

raw

history blame contribute delete

2.81 kB

	---
	title: ModelLens
	emoji: 🔭
	colorFrom: indigo
	colorTo: pink
	sdk: gradio
	sdk_version: 4.44.0
	python_version: '3.11'
	app_file: app.py
	pinned: false
	license: mit
	short_description: Finding the Best Model for Your Task from Myriads of Models
	---

	# ModelLens — Finding the Best Model for Your Task from Myriads of Models

	Describe your dataset → pick a task and metric → get a ranked list of HuggingFace
	models likely to perform well on it. Backed by the `MLPMetric` (ablation_no_id)
	checkpoint trained on the `unified_augmented` corpus, with a candidate pool of
	~47k HuggingFace models.

	## How it works

	1. Your dataset description is embedded with OpenAI `text-embedding-3-small`
	(1536-dim, the same encoder used during training).
	2. The MLPMetric scores every candidate model conditioned on the embedding +
	chosen task + chosen metric.
	3. We return the top-k, optionally filtered by parameter count, "official
	pretrained only", or "HuggingFace-hosted only".

	## Bring your own OpenAI key

	This Space does not ship with a baked-in OpenAI key. Paste your own
	`sk-...` key into the "OpenAI API key" field — it is sent directly to OpenAI
	for that single request and is not stored, logged, or reused by this Space.
	A query costs roughly $0.000001 on your account (about a millionth of a
	dollar).

	If you don't have a key yet: https://platform.openai.com/api-keys

	## Files in this Space

	```
	app.py Gradio entry point
	recommend.py Recommender (loads checkpoint + model pool, embeds dataset desc)
	inference_lib.py Self-contained MLPMetric implementation (no module/ tree needed)
	build_model_pool.py Offline helper to (re)build assets/model_pool.npz
	requirements.txt Pinned deps
	assets/
	model_pool.npz Pre-computed candidate pool (47k models, size+family ids, popularity, HF urls)
	checkpoint/
	MLPMetric.pt ~37 MB trained weights
	args.json Training-time hyperparameters (model dims, num_*)
	data/
	task2id.json Task vocab
	metric2id.json Metric vocab
	```

	The Space looks for the checkpoint at `checkpoint/MLPMetric.pt` and the data
	JSONs at `data/`. Override with env vars `MODEL_CKPT`, `MODEL_ARGS`, `DATA_DIR`,
	`POOL_PATH` if you lay things out differently.

	## Running locally

	```bash
	cd web
	pip install -r requirements.txt
	# either set OPENAI_API_KEY in env, or paste it into the UI at runtime
	python app.py
	# open http://localhost:7860
	```

	## Rebuilding the model pool

	When you bump the candidate set (e.g. add new HF models to `model2id.json` /
	`model_profile.json`):

	```bash
	python web/build_model_pool.py \
	--data-dir data/unified_augmented \
	--args checkpoint/mlp/unified_augmented/ablation_no_model_id_no_dataset_id/args.json \
	--out web/assets/model_pool.npz \
	--min-popularity 0
	```