File size: 2,660 Bytes
7e782aa
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
# Backend Architecture

## Project Structure

```
MLops-Openenvhack/
β”œβ”€β”€ app.py                 # FastAPI server - main entry point
β”œβ”€β”€ inference.py           # Baseline LLM agent for evaluation
β”œβ”€β”€ models.py              # Pydantic models (Action, Observation, State)
β”œβ”€β”€ mlops_environment.py   # Core environment logic
β”œβ”€β”€ artifact_generator.py  # Procedural bug/artifact generation
β”œβ”€β”€ client.py              # Python client library
β”œβ”€β”€ openenv.yaml           # OpenEnv specification
β”œβ”€β”€ Dockerfile             # Container configuration
β”œβ”€β”€ requirements.txt       # Python dependencies
└── README.md             # Documentation
```

## How It Works

### 1. Server (app.py)
- Runs FastAPI on port 7860
- Provides REST endpoints:
  - `GET /health` - Health check
  - `POST /reset` - Initialize new task
  - `POST /step` - Execute action
  - `GET /state` - Get current state
  - `GET /tasks` - List available tasks
  - `GET /openenv/state` - OpenEnv state

### 2. Environment (mlops_environment.py)
- Manages task state
- Processes actions through `_handle_*` methods
- Generates rewards based on agent behavior
- Tracks artifacts read and sanity checks

### 3. Artifact Generator (artifact_generator.py)
- Procedurally generates training artifacts with planted bugs
- Creates realistic: logs, configs, preprocessing code, eval results
- Supports 9 bug types across 3 difficulty levels

### 4. Inference Agent (inference.py)
- LLM-powered agent using OpenAI API
- Reads artifacts, runs sanity checks
- Submits diagnosis with confidence scoring
- Implements rate limiting and fallback

## API Flow

```
Client -> app.py (FastAPI)
           |
           +-> mlops_environment.py (core logic)
                    |
                    +-> artifact_generator.py (bug generation)
                    |
                    +-> models.py (data validation)
                    |
                    +-> Returns Observation, Reward, Done, Info
```

## Task Flow

```
1. Client POST /reset with task_id (easy/medium/hard)
2. Environment generates artifacts with planted bug
3. Client POST /step with action
4. Environment processes action, returns observation
5. Agent investigates until diagnosis submitted
6. Grader scores against planted bug (0.0 - 1.0)
```

## Data Models

### Action Types
- read_config, read_logs, check_dataset_stats
- inspect_preprocessing, read_eval_results
- run_sanity_check, query_artifact
- submit_diagnosis

### Reward Structure
- +0.02 per new artifact read
- -0.02 per duplicate read
- +0.01 per new sanity check
- Terminal: +0.15 category + 0.25 file + 0.30 field + 0.30 fix