Spaces:

Mayank022
/

api-testing-env

Sleeping

App Files Files Community

Mayank022 commited on Apr 8

Commit

5936836

verified ·

1 Parent(s): 49a2923

Upload folder using huggingface_hub

Browse files

Files changed (4) hide show

Dockerfile +1 -1
README.md +42 -18
client.py +7 -1
inference.py +10 -1

Dockerfile CHANGED Viewed

@@ -54,7 +54,7 @@ ENV PYTHONPATH="/app/env:$PYTHONPATH"
 HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
     CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:8000/health')" || exit 1
-# Enable web interface
 ENV ENABLE_WEB_INTERFACE=true
 # Run the server

 HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
     CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:8000/health')" || exit 1
+# Enable web interface (default OpenEnv UI at /web; custom Gradio at /ui)
 ENV ENABLE_WEB_INTERFACE=true
 # Run the server

README.md CHANGED Viewed

@@ -162,38 +162,62 @@ docker run -p 8000:8000 api-testing-env
 curl -X POST http://localhost:8000/reset -H 'Content-Type: application/json' -d '{}'
 ```
-### Inference (`inference.py`)
-The submission entry point. Uses an OpenAI-compatible LLM to play all 3 tasks
-and prints the mandatory `[START] / [STEP] / [END]` log lines that the
-OpenEnv judging pipeline parses.
 ```bash
-# 1. Set required env vars (see .env.example)
-export API_BASE_URL=https://router.huggingface.co/v1
-export MODEL_NAME=Qwen/Qwen2.5-72B-Instruct
-export HF_TOKEN=hf_xxx
-# 2. Choose how to attach to the environment (pick ONE):
-#    (a) in-process (default, fastest, no Docker)
 python inference.py
-#    (b) against a built docker image (matches the OpenEnv sample)
-IMAGE_NAME=api-testing-env:latest python inference.py
-#    (c) against a running server / deployed HF Space
-ENV_BASE_URL=https://your-username-api-testing-env.hf.space python inference.py
 ```
-The script makes **one LLM call per task** in plan mode, executes the returned
-JSON action plan against the env, and emits exactly:
 ```
-[START] task=basic_validation env=api_testing_env model=Qwen/Qwen2.5-72B-Instruct
 [STEP]  step=1 action=GET_/tasks reward=0.33 done=false error=null
 [STEP]  step=2 action=POST_/tasks reward=0.28 done=false error=null
 ...
-[END]   success=true steps=17 score=0.820 rewards=0.33,0.28,...
 ```
 Each per-task `score` is normalized to **[0, 1]** as

 curl -X POST http://localhost:8000/reset -H 'Content-Type: application/json' -d '{}'
 ```
+### Inference (`inference.py`) — SUBMISSION ENTRY POINT
+The script judges run to evaluate this environment. It uses an OpenAI-compatible
+client, makes **one LLM call per task** in plan mode, executes the returned JSON
+action plan against the env, and emits the mandatory `[START] / [STEP] / [END]`
+log lines.
+#### Required Environment Variables
+| Variable | Purpose |
+|----------|---------|
+| `API_BASE_URL` | OpenAI-compatible LLM endpoint (default: HuggingFace router) |
+| `MODEL_NAME` | Model identifier to use for inference |
+| `HF_TOKEN` | HuggingFace token (used as API key) |
+#### Run Command (the format judges use)
 ```bash
+API_BASE_URL=https://router.huggingface.co/v1 \
+MODEL_NAME=meta-llama/Llama-3.3-70B-Instruct \
+HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx \
+python inference.py
+```
+#### Optional — Choose How to Attach to the Environment
+```bash
+# (a) In-process — default, fastest, no Docker
+API_BASE_URL=https://router.huggingface.co/v1 \
+MODEL_NAME=meta-llama/Llama-3.3-70B-Instruct \
+HF_TOKEN=hf_xxx \
 python inference.py
+# (b) Against a built Docker image
+API_BASE_URL=https://router.huggingface.co/v1 \
+MODEL_NAME=meta-llama/Llama-3.3-70B-Instruct \
+HF_TOKEN=hf_xxx \
+IMAGE_NAME=api-testing-env:latest \
+python inference.py
+# (c) Against a deployed HuggingFace Space
+API_BASE_URL=https://router.huggingface.co/v1 \
+MODEL_NAME=meta-llama/Llama-3.3-70B-Instruct \
+HF_TOKEN=hf_xxx \
+ENV_BASE_URL=https://Mayank022-api-testing-env.hf.space \
+python inference.py
 ```
+#### Mandatory Output Format (parsed by the OpenEnv judge)
 ```
+[START] task=basic_validation env=api_testing_env model=meta-llama/Llama-3.3-70B-Instruct
 [STEP]  step=1 action=GET_/tasks reward=0.33 done=false error=null
 [STEP]  step=2 action=POST_/tasks reward=0.28 done=false error=null
 ...
+[END]   success=true steps=21 score=0.820 rewards=0.33,0.28,...
 ```
 Each per-task `score` is normalized to **[0, 1]** as

client.py CHANGED Viewed

@@ -5,7 +5,13 @@ from typing import Dict
 from openenv.core.client_types import StepResult
 from openenv.core import EnvClient
-from .models import APITestAction, APITestObservation, APITestState
 class APITestEnv(

 from openenv.core.client_types import StepResult
 from openenv.core import EnvClient
+# Support both package import (`from api_testing_env.client import ...`)
+# and flat-module import (`from client import ...` from inference.py).
+# `inference.py` injects its own directory into sys.path so the fallback works.
+try:
+    from .models import APITestAction, APITestObservation, APITestState
+except ImportError:  # pragma: no cover - flat-module fallback for inference.py
+    from models import APITestAction, APITestObservation, APITestState  # type: ignore[no-redef,import-not-found]
 class APITestEnv(

inference.py CHANGED Viewed

@@ -126,10 +126,19 @@ def log_step(step: int, action: str, reward: float, done: bool, error: Optional[
 def log_end(success: bool, steps: int, score: float, rewards: list[float]) -> None:
     rewards_str = ",".join(f"{r:.2f}" for r in rewards)
     print(
         f"[END] success={str(success).lower()} steps={steps} "
-        f"score={score:.3f} rewards={rewards_str}",
         flush=True,
     )

 def log_end(success: bool, steps: int, score: float, rewards: list[float]) -> None:
+    """Emit the [END] line in the EXACT format expected by the OpenEnv judge.
+    Spec format (from problem statement):
+        [END] success=<true|false> steps=<n> score=<score> rewards=<r1,r2,...,rn>
+    Spec example:
+        [END] success=true steps=3 score=1.00 rewards=0.00,0.00,1.00
+    All numeric fields use 2-decimal format to match the spec example.
+    """
     rewards_str = ",".join(f"{r:.2f}" for r in rewards)
     print(
         f"[END] success={str(success).lower()} steps={steps} "
+        f"score={score:.2f} rewards={rewards_str}",
         flush=True,
     )