acepeax's picture
Updated README.md
345d3a3 unverified

A newer version of the Gradio SDK is available: 6.13.0

Upgrade
metadata
title: Text Adventure Agent Submission
emoji: 🗺
colorFrom: green
colorTo: blue
sdk: gradio
sdk_version: 5.12.0
app_file: app.py
pinned: false
license: mit

Text Adventure Agent Submission

I started from example_submission and iterated on it to make the agent easier to debug and more structured overall. My focus was to reduce invalid actions, track progress across rooms, and give the model clearer instructions so it could behave more consistently under the step limit.

On top of the baseline, I expanded the system prompt to enforce a strict response schema that includes last action status and a list of potential directions, and I updated the parser to read those fields reliably. I also strengthened validation of tool names and action verbs, added debug printing for the raw LLM response and parsed tool call, put extra emphasis in the prompt on movement actions like climbing objects (trees, ladders, elevators) instead of only examining them, and increased the context window to include the last 5 actions.

I reimplemented the map because the agent rarely used get_map in practice. Instead of relying on that tool, I maintain my own world structure and include it at each step as part of the agent’s context. The world variable is a dictionary keyed by location name (the first line of the observation). Each value is a list of exits, and each exit stores a direction string, a speculative boolean, and an optional leads_to string. A speculative exit is inferred from model output or room text; once I actually move and confirm it, I set speculative to false and fill leads_to with the destination location. This structure lets the agent remember which directions are known, which are untested, and where confirmed exits go.

I wanted to rely on Jericho’s get_valid_actions, but it requires spaCy and I couldn’t get it working in my setup (it hangs). To keep moving, I built a replacement in agent.py that asks the same Qwen 72B model to propose valid actions from the observation text. It is slower than the built‑in approach, but it works in my environment and keeps everything inside the same model call pipeline.