Spaces:
Running
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
What This Is
An OpenEnv environment for the Scaler/Meta PyTorch hackathon. AI agents connect via WebSocket, write Python trading strategy code against NIFTY/BANKNIFTY market data, backtest it, and iterate. Three tasks (easy/medium/hard) scored [0.0-1.0].
Commands
# Run server locally (must set PYTHONPATH since package maps to repo root)
PYTHONPATH=. uvicorn server.app:app --host 0.0.0.0 --port 8000
# Validate environment spec compliance
openenv validate . # local file checks
openenv validate --url http://localhost:8000 # runtime endpoint checks (server must be running)
# Run baseline inference (server must be running)
API_BASE_URL=http://localhost:8000/v1 MODEL_NAME=gpt-4o HF_TOKEN=<key> ENV_URL=http://localhost:8000 python inference.py
# Quick smoke test (no LLM needed)
PYTHONPATH=. python -c "
from server.quant_research_env_environment import QuantResearchEnvironment
from models import QuantAction
env = QuantResearchEnvironment()
obs = env.reset(task_id='easy')
print(f'Reset OK: {obs.task_id}, steps={obs.steps_remaining}')
"
# Deploy to HuggingFace Spaces
openenv push --repo-id <username>/quant-research-env
# Regenerate lockfile after dependency changes
uv lock
Architecture
Request flow: inference.py β client.py (EnvClient) β WebSocket β server/app.py (FastAPI, auto-generated endpoints) β server/quant_research_env_environment.py (reset/step/state)
step() routing by action_type:
explore_dataβsandbox.execute_exploration_query()β runs pandas query in subprocesssubmit_codeβsandbox.check_syntax()+constraints.scan_for_dangerous_code()+constraints.scan_for_lookahead()β validates and stores coderun_backtestβsandbox.execute_strategy_code()βbacktester.replay_trades_*()βgrader.grade_*()β executes, backtests, gradessubmit_finalβ final grading; for hard task:constraints.detect_runtime_lookahead()+data_loader.get_test_data()+ OOS evaluation
Key design decisions:
- Agent code always runs in a subprocess via
sandbox.py(tempfile pickle for data transfer, 30s timeout), never in-process - Exception:
constraints.detect_runtime_lookahead()loads agent code in-process viaimportlib.utilfor the truncation test - Data loaded once and cached in module-level
_cachedict indata_loader.py rewardin observations defaults toself._best_score(best-ever score in the episode), not the current step's score- Tasks are plain dicts in
server/tasks/*.py, not classes - Easy task signature:
generate_trades(df)(merged DataFrame). Medium/Hard:generate_trades(nifty_df, banknifty_df)(two separate DataFrames) - Hard task OOS evaluation only happens at
submit_final, not duringrun_backtest - Step exhaustion auto-submits via
_force_submit()
Package quirk: pyproject.toml maps quant_research_env to . (repo root). Must either pip install -e . or use PYTHONPATH=.. Both app.py and environment.py use try/except ImportError dual-import patterns to handle both cases.
Grading
Each grader returns [0.0, 1.0] with graduated partial credit. Ground truth values from the RAETH Trading Eval:
- Easy: 16,639 trades, Sharpe -1.3693 (tolerances: trades Β±5%, PnL Β±2%, Sharpe Β±0.05)
- Medium: 18,815 trades/leg, Sharpe -2.8732 (same tolerance structure)
- Hard: no ground truth; scored on hidden 2023-2026 OOS Sharpe via piecewise linear mapping
Backtester uses population std (ddof=0) and annualizes with sqrt(252 * 375). Exposure violations recorded only above 0.85 (0.80 limit + 0.05 tolerance).