Spaces:

yobro4619
/

quant-research-env

Running

App Files Files Community

quant-research-env / CLAUDE.md

yobro4619

Upload folder using huggingface_hub

e9c52c9 verified 30 days ago

preview code

raw

history blame contribute delete

3.74 kB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

What This Is

An OpenEnv environment for the Scaler/Meta PyTorch hackathon. AI agents connect via WebSocket, write Python trading strategy code against NIFTY/BANKNIFTY market data, backtest it, and iterate. Three tasks (easy/medium/hard) scored [0.0-1.0].

Commands

# Run server locally (must set PYTHONPATH since package maps to repo root)
PYTHONPATH=. uvicorn server.app:app --host 0.0.0.0 --port 8000

# Validate environment spec compliance
openenv validate .                          # local file checks
openenv validate --url http://localhost:8000  # runtime endpoint checks (server must be running)

# Run baseline inference (server must be running)
API_BASE_URL=http://localhost:8000/v1 MODEL_NAME=gpt-4o HF_TOKEN=<key> ENV_URL=http://localhost:8000 python inference.py

# Quick smoke test (no LLM needed)
PYTHONPATH=. python -c "
from server.quant_research_env_environment import QuantResearchEnvironment
from models import QuantAction
env = QuantResearchEnvironment()
obs = env.reset(task_id='easy')
print(f'Reset OK: {obs.task_id}, steps={obs.steps_remaining}')
"

# Deploy to HuggingFace Spaces
openenv push --repo-id <username>/quant-research-env

# Regenerate lockfile after dependency changes
uv lock

Architecture

Request flow: inference.py → client.py (EnvClient) → WebSocket → server/app.py (FastAPI, auto-generated endpoints) → server/quant_research_env_environment.py (reset/step/state)

step() routing by action_type:

explore_data → sandbox.execute_exploration_query() — runs pandas query in subprocess
submit_code → sandbox.check_syntax() + constraints.scan_for_dangerous_code() + constraints.scan_for_lookahead() — validates and stores code
run_backtest → sandbox.execute_strategy_code() → backtester.replay_trades_*() → grader.grade_*() — executes, backtests, grades
submit_final → final grading; for hard task: constraints.detect_runtime_lookahead() + data_loader.get_test_data() + OOS evaluation

Key design decisions:

Agent code always runs in a subprocess via sandbox.py (tempfile pickle for data transfer, 30s timeout), never in-process
Exception: constraints.detect_runtime_lookahead() loads agent code in-process via importlib.util for the truncation test
Data loaded once and cached in module-level _cache dict in data_loader.py
reward in observations defaults to self._best_score (best-ever score in the episode), not the current step's score
Tasks are plain dicts in server/tasks/*.py, not classes
Easy task signature: generate_trades(df) (merged DataFrame). Medium/Hard: generate_trades(nifty_df, banknifty_df) (two separate DataFrames)
Hard task OOS evaluation only happens at submit_final, not during run_backtest
Step exhaustion auto-submits via _force_submit()

Package quirk: pyproject.toml maps quant_research_env to . (repo root). Must either pip install -e . or use PYTHONPATH=.. Both app.py and environment.py use try/except ImportError dual-import patterns to handle both cases.

Grading

Each grader returns [0.0, 1.0] with graduated partial credit. Ground truth values from the RAETH Trading Eval:

Easy: 16,639 trades, Sharpe -1.3693 (tolerances: trades ±5%, PnL ±2%, Sharpe ±0.05)
Medium: 18,815 trades/leg, Sharpe -2.8732 (same tolerance structure)
Hard: no ground truth; scored on hidden 2023-2026 OOS Sharpe via piecewise linear mapping

Backtester uses population std (ddof=0) and annualizes with sqrt(252 * 375). Exposure violations recorded only above 0.85 (0.80 limit + 0.05 tolerance).