quant-research-env / CLAUDE.md
yobro4619's picture
Upload folder using huggingface_hub
e9c52c9 verified

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

What This Is

An OpenEnv environment for the Scaler/Meta PyTorch hackathon. AI agents connect via WebSocket, write Python trading strategy code against NIFTY/BANKNIFTY market data, backtest it, and iterate. Three tasks (easy/medium/hard) scored [0.0-1.0].

Commands

# Run server locally (must set PYTHONPATH since package maps to repo root)
PYTHONPATH=. uvicorn server.app:app --host 0.0.0.0 --port 8000

# Validate environment spec compliance
openenv validate .                          # local file checks
openenv validate --url http://localhost:8000  # runtime endpoint checks (server must be running)

# Run baseline inference (server must be running)
API_BASE_URL=http://localhost:8000/v1 MODEL_NAME=gpt-4o HF_TOKEN=<key> ENV_URL=http://localhost:8000 python inference.py

# Quick smoke test (no LLM needed)
PYTHONPATH=. python -c "
from server.quant_research_env_environment import QuantResearchEnvironment
from models import QuantAction
env = QuantResearchEnvironment()
obs = env.reset(task_id='easy')
print(f'Reset OK: {obs.task_id}, steps={obs.steps_remaining}')
"

# Deploy to HuggingFace Spaces
openenv push --repo-id <username>/quant-research-env

# Regenerate lockfile after dependency changes
uv lock

Architecture

Request flow: inference.py β†’ client.py (EnvClient) β†’ WebSocket β†’ server/app.py (FastAPI, auto-generated endpoints) β†’ server/quant_research_env_environment.py (reset/step/state)

step() routing by action_type:

  • explore_data β†’ sandbox.execute_exploration_query() β€” runs pandas query in subprocess
  • submit_code β†’ sandbox.check_syntax() + constraints.scan_for_dangerous_code() + constraints.scan_for_lookahead() β€” validates and stores code
  • run_backtest β†’ sandbox.execute_strategy_code() β†’ backtester.replay_trades_*() β†’ grader.grade_*() β€” executes, backtests, grades
  • submit_final β†’ final grading; for hard task: constraints.detect_runtime_lookahead() + data_loader.get_test_data() + OOS evaluation

Key design decisions:

  • Agent code always runs in a subprocess via sandbox.py (tempfile pickle for data transfer, 30s timeout), never in-process
  • Exception: constraints.detect_runtime_lookahead() loads agent code in-process via importlib.util for the truncation test
  • Data loaded once and cached in module-level _cache dict in data_loader.py
  • reward in observations defaults to self._best_score (best-ever score in the episode), not the current step's score
  • Tasks are plain dicts in server/tasks/*.py, not classes
  • Easy task signature: generate_trades(df) (merged DataFrame). Medium/Hard: generate_trades(nifty_df, banknifty_df) (two separate DataFrames)
  • Hard task OOS evaluation only happens at submit_final, not during run_backtest
  • Step exhaustion auto-submits via _force_submit()

Package quirk: pyproject.toml maps quant_research_env to . (repo root). Must either pip install -e . or use PYTHONPATH=.. Both app.py and environment.py use try/except ImportError dual-import patterns to handle both cases.

Grading

Each grader returns [0.0, 1.0] with graduated partial credit. Ground truth values from the RAETH Trading Eval:

  • Easy: 16,639 trades, Sharpe -1.3693 (tolerances: trades Β±5%, PnL Β±2%, Sharpe Β±0.05)
  • Medium: 18,815 trades/leg, Sharpe -2.8732 (same tolerance structure)
  • Hard: no ground truth; scored on hidden 2023-2026 OOS Sharpe via piecewise linear mapping

Backtester uses population std (ddof=0) and annualizes with sqrt(252 * 375). Exposure violations recorded only above 0.85 (0.80 limit + 0.05 tolerance).