Spaces:
Sleeping
Sleeping
| name: emotional-support-conversations | |
| version: 0.1.0 | |
| description: > | |
| An OpenEnv environment for training and evaluating agents on open-ended | |
| emotional support conversations. Agents converse with a deterministic seeker | |
| simulator with hidden internal state (distress, trust, openness) and are | |
| graded with a hybrid immediate + future-oriented reward signal inspired by | |
| RLFF-ESC (Yang et al., 2025, arXiv:2508.12935). | |
| author: meta-hack-submission | |
| license: MIT | |
| tags: | |
| - openenv | |
| - conversation | |
| - emotional-support | |
| - mental-health | |
| - rl-native | |
| - partial-observability | |
| entrypoint: server.app:app | |
| port: 7860 | |
| runtime: | |
| python: "3.11" | |
| vcpu: 2 | |
| memory_gb: 8 | |
| tasks: | |
| - id: work_stress_venting | |
| difficulty: easy | |
| description: > | |
| Cooperative seeker venting about workplace stress. Agent must validate | |
| feelings, explore the concern, and guide to a light action plan. | |
| - id: guarded_relationship | |
| difficulty: medium | |
| description: > | |
| Guarded seeker who only reveals the real relationship issue after trust | |
| is built. Premature advice is penalised. | |
| - id: crisis_fragile_trust | |
| difficulty: hard | |
| description: > | |
| High-distress seeker with multiple interleaved concerns and fragile | |
| trust. Any dismissive or interrogative turn collapses trust; recovery is | |
| possible but costly. | |
| action_space: | |
| type: text | |
| description: Free-text conversational reply from the agent to the seeker. | |
| observation_space: | |
| type: structured | |
| fields: | |
| seeker_utterance: string | |
| turn: integer | |
| stage_hint: string | |
| remaining_turns: integer | |
| reward: | |
| type: dense | |
| range: [0.0, 1.0] | |
| shaping: | |
| - immediate_turn_reward | |
| - future_oriented_trajectory_reward | |
| - anti_repetition_penalty | |
| success: | |
| type: hard_gated | |
| description: > | |
| Success requires both a high final score and task-specific completion | |
| conditions (resolved closing stage, trust/distress targets, reveal, and | |
| safety reference for the crisis task). | |