mlops-openenv / ARCHITECTURE.md
trretretret's picture
Deploy ML pipeline debugging environment to HF Spaces
7e782aa

Backend Architecture

Project Structure

MLops-Openenvhack/
β”œβ”€β”€ app.py                 # FastAPI server - main entry point
β”œβ”€β”€ inference.py           # Baseline LLM agent for evaluation
β”œβ”€β”€ models.py              # Pydantic models (Action, Observation, State)
β”œβ”€β”€ mlops_environment.py   # Core environment logic
β”œβ”€β”€ artifact_generator.py  # Procedural bug/artifact generation
β”œβ”€β”€ client.py              # Python client library
β”œβ”€β”€ openenv.yaml           # OpenEnv specification
β”œβ”€β”€ Dockerfile             # Container configuration
β”œβ”€β”€ requirements.txt       # Python dependencies
└── README.md             # Documentation

How It Works

1. Server (app.py)

  • Runs FastAPI on port 7860
  • Provides REST endpoints:
    • GET /health - Health check
    • POST /reset - Initialize new task
    • POST /step - Execute action
    • GET /state - Get current state
    • GET /tasks - List available tasks
    • GET /openenv/state - OpenEnv state

2. Environment (mlops_environment.py)

  • Manages task state
  • Processes actions through _handle_* methods
  • Generates rewards based on agent behavior
  • Tracks artifacts read and sanity checks

3. Artifact Generator (artifact_generator.py)

  • Procedurally generates training artifacts with planted bugs
  • Creates realistic: logs, configs, preprocessing code, eval results
  • Supports 9 bug types across 3 difficulty levels

4. Inference Agent (inference.py)

  • LLM-powered agent using OpenAI API
  • Reads artifacts, runs sanity checks
  • Submits diagnosis with confidence scoring
  • Implements rate limiting and fallback

API Flow

Client -> app.py (FastAPI)
           |
           +-> mlops_environment.py (core logic)
                    |
                    +-> artifact_generator.py (bug generation)
                    |
                    +-> models.py (data validation)
                    |
                    +-> Returns Observation, Reward, Done, Info

Task Flow

1. Client POST /reset with task_id (easy/medium/hard)
2. Environment generates artifacts with planted bug
3. Client POST /step with action
4. Environment processes action, returns observation
5. Agent investigates until diagnosis submitted
6. Grader scores against planted bug (0.0 - 1.0)

Data Models

Action Types

  • read_config, read_logs, check_dataset_stats
  • inspect_preprocessing, read_eval_results
  • run_sanity_check, query_artifact
  • submit_diagnosis

Reward Structure

  • +0.02 per new artifact read
  • -0.02 per duplicate read
  • +0.01 per new sanity check
  • Terminal: +0.15 category + 0.25 file + 0.30 field + 0.30 fix