feat: implement Kubernetes executor for automated cluster scaling and infrastructure management cf2697b div18 commited on 21 days ago
feat(curriculum): add progressive training curriculum management 52a986a div18 commited on 24 days ago
feat: Add observability setup guide and integrate Prometheus and Grafana dfe5268 Keshav051 commited on Apr 5
feat: implement core SRE simulation environment, Pydantic schemas, and physics models for task-based cluster management 5144b7e div18 commited on Apr 1
update cost model and observation descriptions for clarity and accuracy f656047 Keshav051 commited on Mar 31
added dropout, probability, normalization, and limits to values. Making the environment more challenging and balanced bba6f8a PranavKK1201 commited on Mar 30