Spaces:
Sleeping
Sleeping
| spec_version: 1 | |
| name: helpdesk_env | |
| version: "0.1.0" | |
| description: > | |
| An OpenEnv RL environment simulating UPI banking customer support workflows. | |
| An AI agent classifies issues, retrieves the correct FAQ or escalation path, | |
| and completes a safe multi-turn support flow across three graded tasks of | |
| increasing difficulty. | |
| author: Freakdivi | |
| tags: | |
| - openenv | |
| - banking | |
| - upi | |
| - customer-support | |
| - rl-environment | |
| type: space | |
| runtime: fastapi | |
| app: server.app:app | |
| port: 8000 | |
| tasks: | |
| - id: easy | |
| difficulty: easy | |
| description: Classify the customer's issue into the correct support category | |
| max_steps: 1 | |
| reward_range: [0.0, 1.0] | |
| grader: | |
| type: llm | |
| prompt_template: > | |
| Score the agent's performance for the easy helpdesk task on a scale from | |
| 0.001 to 0.999. Reward correct issue classification, safe behavior, and | |
| efficient completion. Penalize incorrect categories, unsafe requests for | |
| sensitive information, or invalid actions. Return only a numeric score. | |
| - id: medium | |
| difficulty: medium | |
| description: Select the correct FAQ or escalate cases that require manual handling | |
| max_steps: 3 | |
| reward_range: [0.0, 1.0] | |
| grader: | |
| type: llm | |
| prompt_template: > | |
| Score the agent's performance for the medium helpdesk task on a scale | |
| from 0.001 to 0.999. Reward selecting the correct FAQ or making the | |
| correct escalation decision, while maintaining safe guidance and good | |
| efficiency. Penalize incorrect retrieval, missed escalation, unsafe | |
| behavior, or unnecessary extra steps. Return only a numeric score. | |
| - id: hard | |
| difficulty: hard | |
| description: Run a multi-turn support conversation with clarification, guidance, and safe closure | |
| max_steps: 8 | |
| reward_range: [0.0, 1.0] | |
| grader: | |
| type: llm | |
| prompt_template: > | |
| Score the agent's performance for the hard helpdesk task on a scale from | |
| 0.001 to 0.999. Reward appropriate clarification, correct FAQ retrieval, | |
| safe and useful guidance, and closing the case only when the issue is | |
| actually resolved. Penalize unsafe behavior, premature closure, missing | |
| clarification, or poor multi-turn handling. Return only a numeric score. | |
| observation_space: | |
| type: object | |
| fields: | |
| case_id: string | |
| track: string | |
| customer_message: string | |
| conversation_history: array | |
| known_facts: object | |
| required_slots: array | |
| available_actions: array | |
| turn_number: integer | |
| action_space: | |
| type: object | |
| fields: | |
| action_type: "classify | lookup_faq | ask_clarification | reply | escalate | resolve_ticket" | |
| category: string (optional) | |
| faq_id: string (optional) | |
| message: string (optional) | |
| fields_requested: array (optional) | |
| target: string (optional) | |
| operation: string (optional) | |
| reward: | |
| type: float | |
| range: [0.0, 1.0] | |
| description: > | |
| Partial reward is produced at each step and normalized by the environment. | |
| The final reward combines correctness, safety, resolution, efficiency, and | |
| penalties, with score outputs constrained to the open interval (0, 1) for | |
| submission compatibility. | |
| endpoints: | |
| reset: POST /reset | |
| step: POST /step | |
| state: GET /state | |
| health: GET /health | |
| runtime_config: | |
| framework: fastapi | |
| python: "3.10" | |
| port: 8000 | |