Agent Cost Optimizer (ACO)

Lightweight cost control layer for AI agent harnesses. Reduces agent run cost while preserving or improving task quality.

Status (verified 2026-06-15): v1 DistilBERT specialists are the production models. The v2 ModernBERT experiment did not improve results and regressed the tool-gater (βˆ’19.6pp F1) β€” use v1. See AUDIT.md for the independent verification, and TRUTH.md for the honest project state. The headline "56% cost reduction" is a trace-based simulation result, not a live Docker test-verified result (see caveats below).

HF Hub

What ACO Does

Module Problem Solution Impact
Model Cascade Router Overusing frontier models Routes to cheapest capable model ~56% cost reduction (simulated, iso-quality)
Tool-Use Cost Gate Unnecessary tool calls Predicts whether tool call is worth cost gate F1=0.92 (verified)
Verifier Budgeter Verifying everything Selectively gating verification gate F1=0.65 (verified, weak)
Context Compressor Growing context windows ACON-pattern state-preserving compression 30-54% token reduction (literature)
Cache-Aware Layout Cache-unfriendly prompts Static-at-prefix optimization 50-90% input cost reduction (literature)
Intervention Threshold Verifier-caused regressions Only verify when safe (d/(d+r) threshold) Prevents collapse (literature)
Token Budget Estimator Verbose outputs Per-query optimal length prediction 15-25% output token reduction (literature)
Doom Detector Continuing doomed runs Terminate/rescue on failure signals Prevents wasted cost
Meta-Tool Miner Repeated workflows Compress successful traces into macros LLM calls saved on repeats

Evidence levels: "verified" = independently re-evaluated on held-out data in this repo. "simulated" = derived from trace simulation (aco/benchmark.py, SWE-Router traces), not live test execution. "literature" = reported by cited papers, not independently measured here.

Verified vs. Simulated β€” what is actually proven

Claim Status Evidence
Tool-gater F1 = 0.92, acc = 0.98 βœ… Verified Independent re-eval, N=1695 held-out
Tier-router F1 = 0.67, acc = 0.70 βœ… Verified Independent re-eval, N=1260 held-out
Verifier-gater F1 = 0.65, acc = 0.65 βœ… Verified (weak model) Independent re-eval, N=863 held-out
v2 ModernBERT beats v1 ❌ False v2 regressed tool-gater βˆ’19.6pp F1; flat elsewhere
Cascade β‰ˆ 56% cost reduction at iso-quality ⚠️ Simulated only Trace simulation; frontier-retry actually solves +4 instances
"4/4 Django instances solved by T1" ⚠️ Misleading Patches passed git apply --check only β€” not test-verified
Live SWE-bench resolution by cascade agent ❌ Not demonstrated No Docker/test-verified results file exists in repo

See AUDIT.md for full detail.

Architecture

Agent Request
     β”‚
     β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚         ACO Control Layer           β”‚
β”‚                                     β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”        β”‚
β”‚  β”‚Classifier│─▢│Tier Router│──┐     β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚     β”‚
β”‚                              β–Ό     β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”‚
β”‚  β”‚Compressorβ”‚  β”‚Cache Layoutβ”‚      β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β”‚
β”‚       β”‚            β”‚              β”‚
β”‚       β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜              β”‚
β”‚             β–Ό                     β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”‚
β”‚  β”‚Tool Gate β”‚  β”‚Ver. Budgetβ”‚     β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚
β”‚             β”‚        β”‚           β”‚
β”‚             β–Ό        β–Ό           β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
β”‚  β”‚  Retry + Recovery + Doom β”‚    β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚
β”‚             β”‚                    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
              β–Ό
         Agent API Call

Quick Start

from aco import ACOPipeline

# v1 DistilBERT specialists are the production models (v2 regressed β€” do not use).
aco = ACOPipeline(use_v2=False)

result = aco.optimize(
    request="Fix the connection pool timeout in database.py and run the tests",
    available_tools=["edit_file", "run_tests", "git_diff"],
    context={"system_prompt": "...", "tool_definitions": "..."}
)

print(result.routing)        # Which model to use
print(result.tool_decisions) # Which tools to call
print(result.needs_verify)   # Whether to verify output
print(result.context_budget) # How much context to include
print(result.cost_estimate)  # Expected cost

Install

pip install transformers torch datasets
git clone https://huggingface.co/narcolepticchicken/agent-cost-optimizer
cd agent-cost-optimizer
pip install -e .

Specialist Models (v1 β€” production)

Metrics below are independently verified on held-out test splits (not training logs):

Model Base Params Accuracy F1 Macro Test N
Tier Router DistilBERT 67M 0.70 0.67 1260
Tool Gater DistilBERT 67M 0.98 0.92 1695
Verifier Gater DistilBERT 67M 0.65 0.65 863

v2 models (ModernBERT-base, 149M): trained but not recommended. The architecture upgrade

  • focal loss did not help: tier-router flat (+0.7pp), verifier-gater flat (+1.3pp), and tool-gater regressed βˆ’19.6pp F1 (ModernBERT overfits the 5.6% minority class). Raw comparison in v2_verification_results.json. Use the v1 models above.

Key Papers

Finding Paper
BERT classifier routing β†’ 2x cost reduction RouteLLM (2406.18665)
Guideline compression β†’ 26-54% token reduction ACON (2510.00615)
Critics cause collapse when overused Intervention Paradox
Systematic component optimization β†’ lower cost Efficient Agents (2508.02694)
Cascade + self-consistency β†’ cheaper In-Context Distillation (2512.02543)

Documentation

License

MIT

Generated by ML Intern

This model repository was generated by ML Intern, an agent for machine learning research and development on the Hugging Face Hub.

Downloads last month
91
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support