Spaces:

Freakdivi
/

helpdesk_env

Sleeping

App Files Files Community

helpdesk_env / openenv.yaml

Freakdivi

Upload folder using huggingface_hub

456ff39 verified about 2 months ago

raw

history blame contribute delete

3.34 kB

	spec_version: 1
	name: helpdesk_env
	version: "0.1.0"
	description: >
	An OpenEnv RL environment simulating UPI banking customer support workflows.
	An AI agent classifies issues, retrieves the correct FAQ or escalation path,
	and completes a safe multi-turn support flow across three graded tasks of
	increasing difficulty.
	author: Freakdivi
	tags:
	- openenv
	- banking
	- upi
	- customer-support
	- rl-environment

	type: space
	runtime: fastapi
	app: server.app:app
	port: 8000

	tasks:
	- id: easy
	difficulty: easy
	description: Classify the customer's issue into the correct support category
	max_steps: 1
	reward_range: [0.0, 1.0]
	grader:
	type: llm
	prompt_template: >
	Score the agent's performance for the easy helpdesk task on a scale from
	0.001 to 0.999. Reward correct issue classification, safe behavior, and
	efficient completion. Penalize incorrect categories, unsafe requests for
	sensitive information, or invalid actions. Return only a numeric score.

	- id: medium
	difficulty: medium
	description: Select the correct FAQ or escalate cases that require manual handling
	max_steps: 3
	reward_range: [0.0, 1.0]
	grader:
	type: llm
	prompt_template: >
	Score the agent's performance for the medium helpdesk task on a scale
	from 0.001 to 0.999. Reward selecting the correct FAQ or making the
	correct escalation decision, while maintaining safe guidance and good
	efficiency. Penalize incorrect retrieval, missed escalation, unsafe
	behavior, or unnecessary extra steps. Return only a numeric score.

	- id: hard
	difficulty: hard
	description: Run a multi-turn support conversation with clarification, guidance, and safe closure
	max_steps: 8
	reward_range: [0.0, 1.0]
	grader:
	type: llm
	prompt_template: >
	Score the agent's performance for the hard helpdesk task on a scale from
	0.001 to 0.999. Reward appropriate clarification, correct FAQ retrieval,
	safe and useful guidance, and closing the case only when the issue is
	actually resolved. Penalize unsafe behavior, premature closure, missing
	clarification, or poor multi-turn handling. Return only a numeric score.

	observation_space:
	type: object
	fields:
	case_id: string
	track: string
	customer_message: string
	conversation_history: array
	known_facts: object
	required_slots: array
	available_actions: array
	turn_number: integer

	action_space:
	type: object
	fields:
	action_type: "classify \| lookup_faq \| ask_clarification \| reply \| escalate \| resolve_ticket"
	category: string (optional)
	faq_id: string (optional)
	message: string (optional)
	fields_requested: array (optional)
	target: string (optional)
	operation: string (optional)

	reward:
	type: float
	range: [0.0, 1.0]
	description: >
	Partial reward is produced at each step and normalized by the environment.
	The final reward combines correctness, safety, resolution, efficiency, and
	penalties, with score outputs constrained to the open interval (0, 1) for
	submission compatibility.

	endpoints:
	reset: POST /reset
	step: POST /step
	state: GET /state
	health: GET /health

	runtime_config:
	framework: fastapi
	python: "3.10"
	port: 8000