Benchmarking and Learning Real-World Customer Service Dialogue
Abstract
OlaBench and OlaMind address the gap between industrial customer service benchmarks and real-world deployment by introducing a comprehensive evaluation framework and a reinforcement learning approach that improves dialogue quality and operational efficiency.
Existing benchmarks and training pipelines for industrial intelligent customer service (ICS) remain misaligned with real-world dialogue requirements, overemphasizing verifiable task success while under-measuring subjective service quality and realistic failure modes, leaving a gap between offline gains and deployable dialogue behavior. We close this gap with a benchmark-to-optimization loop: we first introduce OlaBench, an ICS benchmark spanning retrieval-augmented generation, workflow-based systems, and agentic settings, which evaluates service capability, safety, and latency sensitivity; moreover, motivated by OlaBench results showing state-of-the-art LLMs still fall short, we propose OlaMind, which distills reusable reasoning patterns and service strategies from expert dialogues and applies rubric-aware staged exploration--exploitation reinforcement learning to improve model capability. OlaMind surpasses GPT-5.2 and Gemini 3 Pro on OlaBench (78.72 vs. 70.58/70.84) and, in online A/B tests, delivers an average +23.67% issue resolution and -6.6% human transfer rate versus the baseline, bridging offline gains to deployment. Together, OlaBench and OlaMind advance ICS systems toward more anthropomorphic, professional, and reliable deployment.
Community
Hi everyone, we are the authors of this paper.
Official project page:
https://olamind-olabench.github.io/
We bridge the gap between offline evaluation and real-world deployment in industrial customer service with OlaBench, a multi-dimensional benchmark. We also propose OlaMind, a staged rubric-aware RL framework that leverages expert reasoning patterns to achieve state-of-the-art benchmark results and validated online gains.
Thanks for your interest!
impressive
Get this paper in your agent:
hf papers read 2510.22143 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper