FINOPS · REINFORCEMENT LEARNING · Q1 2025

Teach agents
to spend wisely.

CloudSense is an OpenEnv-compatible RL benchmark that simulates real AWS accounts with authentic pricing, utilization, and dependency graphs. Agents must identify waste, optimize spend, and reason about blast radius — the cascading infrastructure impact of every action — without breaking production.

Issue No. 001
Region US-EAST-1
Pricing ON-DEMAND
Model QWEN 2.5 72B

Tasks

Easy / Medium / Hard

Resources

61total

6 · 15 · 40 per task

Monthly spend

$18.3k

Summed across accounts

Blast levels

None → Critical

§ 01 · CURRICULUM

Three escalating scenarios

EASY · STARTUP

Startup
Cleanup

A 6-resource dev/staging account. Obvious waste, no production, no dependencies. Tests basic cost-optimization fundamentals.

Steps10

Spend$627

Baseline0.94

→

MEDIUM · MID-SIZE

Mid-Size
Audit

15 resources mixing prod and non-prod. Must distinguish seasonal spikes, failover replicas, and expiring reservations from genuine waste.

Steps20

Spend$3.5k

Baseline0.78

→

HARD · ENTERPRISE

Enterprise
FinOps

40 interdependent resources. Cross-region replication, oversized Elasticsearch, NAT Gateway traps. Requires blast-radius reasoning.

Steps45

Spend$14.2k

Baseline0.76

→

§ 02 · REFERENCE

Action space & API

Action Space / 09 verbs

rightsize_resource

shrink to cheaper instance type

terminate_resource

remove unused infrastructure

add_lifecycle_policy

S3 tiering · ~70% savings

enable_autoscaling

dynamic capacity · ~20% savings

purchase_reservation

steady workloads · ~30% savings

change_storage_class

Glacier / IA tiers

schedule_uptime

business-hours only

request_more_info

defer, gather context

skip_resource

safe for critical prod

HTTP Endpoints / OpenEnv

GET

/healthliveness probe

GET

/versionname · version · api level

GET

/taskslist available scenarios

POST

/reset?task_id=<id>begin a new episode

POST

/stepexecute action · JSON body

GET

/statecurrent observation

POST

/closeend current episode

Run the
benchmark
yourself.

OPEN API DOCS→ LIST TASKS→ SOURCE · GITHUB↗ SPACE · HUGGING FACE↗

Teach agentsto spend wisely.

Three escalating scenarios

StartupCleanup

Mid-SizeAudit

EnterpriseFinOps

Action space & API

Action Space / 09 verbs

HTTP Endpoints / OpenEnv

Run thebenchmarkyourself.

Teach agents
to spend wisely.

Startup
Cleanup

Mid-Size
Audit

Enterprise
FinOps

Run the
benchmark
yourself.