Spaces:
Sleeping
title: eval-card-registry
emoji: 🗂️
colorFrom: blue
colorTo: green
sdk: docker
app_port: 7860
pinned: false
eval-card-registry
Query-only disambiguation API for AI evaluation entity names. Resolves raw benchmark / model / metric / harness strings (e.g. "MATH Level 5") to stable canonical IDs (math).
This Space runs in read-only mode — it serves lookups against pre-built entity data. Write operations (entity creation, alias edits) happen in a separate pipeline.
Base URL
https://evaleval-entity-registry.hf.space/api/v1
Resolve
curl -X POST https://evaleval-entity-registry.hf.space/api/v1/resolve \
-H 'Content-Type: application/json' \
-d '{"raw_value": "MATH Level 5", "entity_type": "benchmark"}'
Response:
{
"canonical_id": "math-level-5",
"strategy": "exact",
"confidence": 1.0,
"created_new": false,
"review_status": "reviewed"
}
If nothing matches, canonical_id is null and strategy is "no_match". In read-only mode, no draft entity is created.
entity_type is one of: benchmark, model, metric, harness. Optional source_config scopes the lookup to a specific source.
Batch resolve:
curl -X POST https://evaleval-entity-registry.hf.space/api/v1/resolve/batch \
-H 'Content-Type: application/json' \
-d '[
{"raw_value": "MATH Level 5", "entity_type": "benchmark"},
{"raw_value": "meta-llama/Llama-3.1-8B", "entity_type": "model"}
]'
Browse entities
GET /api/v1/benchmarks?search=math
GET /api/v1/benchmarks/{id}
GET /api/v1/models
GET /api/v1/metrics
GET /api/v1/harnesses
GET /api/v1/aliases?status=uncertain&entity_type=benchmark
Health
GET /api/v1/health
GET /api/v1/stats
Write endpoints
Disabled in this Space. POST/PATCH on entities and aliases return 405 Method Not Allowed. Mutations happen in the data pipeline (separate from this Space).
Interactive docs
OpenAPI docs at /docs.
Data sources
- Entity data: HF Dataset repo
evaleval/entity-registry-data(read at startup) - Resolve logs: HF Storage Bucket
evaleval/entity-registry-storage(written asynchronously for resolver improvement)