Spaces:
Sleeping
Sleeping
Backend Architecture
Project Structure
MLops-Openenvhack/
βββ app.py # FastAPI server - main entry point
βββ inference.py # Baseline LLM agent for evaluation
βββ models.py # Pydantic models (Action, Observation, State)
βββ mlops_environment.py # Core environment logic
βββ artifact_generator.py # Procedural bug/artifact generation
βββ client.py # Python client library
βββ openenv.yaml # OpenEnv specification
βββ Dockerfile # Container configuration
βββ requirements.txt # Python dependencies
βββ README.md # Documentation
How It Works
1. Server (app.py)
- Runs FastAPI on port 7860
- Provides REST endpoints:
GET /health- Health checkPOST /reset- Initialize new taskPOST /step- Execute actionGET /state- Get current stateGET /tasks- List available tasksGET /openenv/state- OpenEnv state
2. Environment (mlops_environment.py)
- Manages task state
- Processes actions through
_handle_*methods - Generates rewards based on agent behavior
- Tracks artifacts read and sanity checks
3. Artifact Generator (artifact_generator.py)
- Procedurally generates training artifacts with planted bugs
- Creates realistic: logs, configs, preprocessing code, eval results
- Supports 9 bug types across 3 difficulty levels
4. Inference Agent (inference.py)
- LLM-powered agent using OpenAI API
- Reads artifacts, runs sanity checks
- Submits diagnosis with confidence scoring
- Implements rate limiting and fallback
API Flow
Client -> app.py (FastAPI)
|
+-> mlops_environment.py (core logic)
|
+-> artifact_generator.py (bug generation)
|
+-> models.py (data validation)
|
+-> Returns Observation, Reward, Done, Info
Task Flow
1. Client POST /reset with task_id (easy/medium/hard)
2. Environment generates artifacts with planted bug
3. Client POST /step with action
4. Environment processes action, returns observation
5. Agent investigates until diagnosis submitted
6. Grader scores against planted bug (0.0 - 1.0)
Data Models
Action Types
- read_config, read_logs, check_dataset_stats
- inspect_preprocessing, read_eval_results
- run_sanity_check, query_artifact
- submit_diagnosis
Reward Structure
- +0.02 per new artifact read
- -0.02 per duplicate read
- +0.01 per new sanity check
- Terminal: +0.15 category + 0.25 file + 0.30 field + 0.30 fix