Cooolder
/

SCOPE

@@ -11,8 +11,16 @@ tags:
 ---
 # SCOPE: Scalable and Controllable Outcome Performance Estimator
-SCOPE is a specialized model that predicts how a target LLM will perform on a given question. Given a target question and a set of anchor questions with known performance results, SCOPE predicts the **output length** and **correctness** of the target model's response.
 ## Model Description
@@ -396,12 +404,6 @@ for sample in dataset:
 3. **Batch Processing**: Use vLLM for high-throughput batch inference
 4. **Anchor Selection**: Choose anchors similar to your target question domain
-## Limitations
-- Performance predictions are estimates based on anchor patterns
-- Accuracy depends on the quality and relevance of anchor questions
-- Works best when anchors are from the same domain as the target question
 ## Citation
 ```bibtex

 ---
 # SCOPE: Scalable and Controllable Outcome Performance Estimator
+[📄 Paper (arXiv:2601.22323)](https://www.arxiv.org/abs/2601.22323)
+This repository accompanies the paper “Models Under SCOPE: Scalable and Controllable Routing via Pre-hoc Reasoning”, which introduces SCOPE (Scalable and Controllable Outcome Performance Estimator) — a new framework for large language model (LLM) routing.
+SCOPE reframes model routing as a pre-hoc estimation problem: instead of directly selecting a model from a fixed candidate set, it predicts each model’s expected performance (correctness) and inference cost (token length) before execution, based on the model’s historical behaviors on similar queries. This enables training-free generalization to unseen models and allows users to flexibly control the trade-off between accuracy and cost through a budget-aware utility function.
+Overall, SCOPE provides a scalable, explainable, and controllable solution for allocating test-time compute across heterogeneous model portfolios.
+![SCOPE paradigm](assets/1.pdf)
+The figure above illustrates the core difference between traditional routers and SCOPE.
+Conventional LLM routers treat routing as a closed-set classification problem, simply memorizing model names and selecting one model per query. In contrast, SCOPE reasons over models’ past behaviors, explicitly predicting outcome correctness and token cost, and then makes a budget-aware decision based on these estimates. This design allows SCOPE to generalize to unseen models and supports dynamic cost–accuracy control at inference time.
 ## Model Description
 3. **Batch Processing**: Use vLLM for high-throughput batch inference
 4. **Anchor Selection**: Choose anchors similar to your target question domain
 ## Citation
 ```bibtex