Spaces:
Sleeping
Sleeping
| cff-version: 1.2.0 | |
| message: "If you use ChargebackOps in your research, please cite it as below." | |
| title: "ChargebackOps: A cost-asymmetric multi-round adversarial environment for training LLM agents on B2B dispute workflows" | |
| abstract: | | |
| ChargebackOps is an OpenEnv-compatible reinforcement learning environment that | |
| simulates the merchant side of a credit-card chargeback dispute. The environment | |
| exposes a decision-theoretic primitive — multi-round adjudication with cost- | |
| asymmetric terminal economics, partial observability, and a procedurally- | |
| constrained adversary — that is rare in current RL benchmarks and generalizes | |
| beyond chargebacks to insurance claims, tax audits, content-moderation appeals, | |
| and patent disputes. The repository ships an 8-dimension decomposable Rubric | |
| system, a parametric task generator, an ISO 20022 adapter, a Stripe sandbox | |
| connector, and a reproducible single-T4 SFT + GRPO training pipeline that | |
| documents and remedies a previously-undescribed post-SFT GRPO collapse failure | |
| mode on token-deterministic tasks. | |
| type: software | |
| authors: | |
| - family-names: Dutta | |
| given-names: Mitudru | |
| email: mitudrudutta72@gmail.com | |
| repository-code: "https://github.com/MitudruDutta/ChargeBackOps" | |
| url: "https://huggingface.co/spaces/mitudrudutta/ChargeBackOps" | |
| license: MIT | |
| keywords: | |
| - reinforcement learning | |
| - large language models | |
| - multi-round adjudication | |
| - chargeback disputes | |
| - cost-asymmetric environments | |
| - GRPO | |
| - RLVR | |
| - OpenEnv | |
| preferred-citation: | |
| type: software | |
| title: "ChargebackOps: A cost-asymmetric multi-round adversarial environment for training LLM agents on B2B dispute workflows" | |
| authors: | |
| - family-names: Dutta | |
| given-names: Mitudru | |
| url: "https://github.com/MitudruDutta/ChargeBackOps" | |
| year: 2026 | |