cff-version: 1.2.0 message: "If you use ChargebackOps in your research, please cite it as below." title: "ChargebackOps: A cost-asymmetric multi-round adversarial environment for training LLM agents on B2B dispute workflows" abstract: | ChargebackOps is an OpenEnv-compatible reinforcement learning environment that simulates the merchant side of a credit-card chargeback dispute. The environment exposes a decision-theoretic primitive — multi-round adjudication with cost- asymmetric terminal economics, partial observability, and a procedurally- constrained adversary — that is rare in current RL benchmarks and generalizes beyond chargebacks to insurance claims, tax audits, content-moderation appeals, and patent disputes. The repository ships an 8-dimension decomposable Rubric system, a parametric task generator, an ISO 20022 adapter, a Stripe sandbox connector, and a reproducible single-T4 SFT + GRPO training pipeline that documents and remedies a previously-undescribed post-SFT GRPO collapse failure mode on token-deterministic tasks. type: software authors: - family-names: Dutta given-names: Mitudru email: mitudrudutta72@gmail.com repository-code: "https://github.com/MitudruDutta/ChargeBackOps" url: "https://huggingface.co/spaces/mitudrudutta/ChargeBackOps" license: MIT keywords: - reinforcement learning - large language models - multi-round adjudication - chargeback disputes - cost-asymmetric environments - GRPO - RLVR - OpenEnv preferred-citation: type: software title: "ChargebackOps: A cost-asymmetric multi-round adversarial environment for training LLM agents on B2B dispute workflows" authors: - family-names: Dutta given-names: Mitudru url: "https://github.com/MitudruDutta/ChargeBackOps" year: 2026