sairaj2 commited on
Commit
4869ca3
·
1 Parent(s): fdf2e15
.DS_Store ADDED
Binary file (8.2 kB). View file
 
Dockerfile CHANGED
@@ -4,7 +4,19 @@ FROM python:3.11-slim
4
  ENV PYTHONDONTWRITEBYTECODE=1 \
5
  PYTHONUNBUFFERED=1 \
6
  PIP_NO_CACHE_DIR=1 \
7
- PORT=7860
 
 
 
 
 
 
 
 
 
 
 
 
8
 
9
  WORKDIR /app
10
 
 
4
  ENV PYTHONDONTWRITEBYTECODE=1 \
5
  PYTHONUNBUFFERED=1 \
6
  PIP_NO_CACHE_DIR=1 \
7
+ PORT=7860 \
8
+ OPENENV_NAME=data-cleaning-env \
9
+ OPENENV_VERSION=1.0.0 \
10
+ OPENENV_DESCRIPTION="OpenEnv Data Cleaning and Validation Environment" \
11
+ OPENENV_RUNTIME_TYPE=docker \
12
+ OPENENV_HEALTH_CHECK=/health \
13
+ OPENENV_ENV_TYPE=data_cleaning \
14
+ OPENENV_MAX_STEPS=50 \
15
+ OPENENV_PREVIEW_ROWS=10 \
16
+ OPENENV_DEFAULT_TASK_LEVEL=easy \
17
+ OPENENV_GRADING_METHOD=deterministic \
18
+ OPENENV_RESOLUTION_BONUS=0.2 \
19
+ OPENENV_ERROR_PENALTY=0.05
20
 
21
  WORKDIR /app
22
 
OPENENV_SETUP.md ADDED
@@ -0,0 +1,200 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # OpenEnv Configuration Setup
2
+
3
+ ## Overview
4
+
5
+ This document describes how to properly configure OpenEnv for the Data Cleaning Environment. The "OpenENV is not set" error has been resolved by implementing proper environment variable configuration.
6
+
7
+ **Note:** This configuration applies to both the main project directory and the `DataCleanser/` subdirectory. Both environments have been configured identically to ensure consistency.
8
+
9
+ ## Configuration Files
10
+
11
+ ### 1. OpenEnv Configuration Module (`env/openenv_config.py`)
12
+
13
+ This module provides:
14
+ - `OpenEnvConfig` class for managing configuration
15
+ - Environment variable detection and validation
16
+ - YAML configuration generation
17
+ - Configuration status reporting
18
+
19
+ ### 2. Docker Environment Variables (`Dockerfile`)
20
+
21
+ The Dockerfile now includes all necessary OpenEnv environment variables:
22
+
23
+ ```dockerfile
24
+ ENV OPENENV_NAME=data-cleaning-env \
25
+ OPENENV_VERSION=1.0.0 \
26
+ OPENENV_DESCRIPTION="OpenEnv Data Cleaning and Validation Environment" \
27
+ OPENENV_RUNTIME_TYPE=docker \
28
+ OPENENV_HEALTH_CHECK=/health \
29
+ OPENENV_ENV_TYPE=data_cleaning \
30
+ OPENENV_MAX_STEPS=50 \
31
+ OPENENV_PREVIEW_ROWS=10 \
32
+ OPENENV_DEFAULT_TASK_LEVEL=easy \
33
+ OPENENV_GRADING_METHOD=deterministic \
34
+ OPENENV_RESOLUTION_BONUS=0.2 \
35
+ OPENENV_ERROR_PENALTY=0.05
36
+ ```
37
+
38
+ ### 3. FastAPI Endpoints (`app.py`)
39
+
40
+ Added new endpoints for OpenEnv configuration:
41
+ - `GET /openenv/status` - Get OpenEnv status and configuration
42
+ - `GET /openenv/config` - Get detailed OpenEnv configuration
43
+
44
+ ## Environment Variables
45
+
46
+ | Variable | Default | Description |
47
+ |----------|---------|-------------|
48
+ | `OPENENV_NAME` | `data-cleaning-env` | Environment name |
49
+ | `OPENENV_VERSION` | `1.0.0` | Environment version |
50
+ | `OPENENV_DESCRIPTION` | `OpenEnv Data Cleaning and Validation Environment` | Environment description |
51
+ | `OPENENV_RUNTIME_TYPE` | `docker` | Runtime type |
52
+ | `OPENENV_PORT` | `7860` | Server port |
53
+ | `OPENENV_HEALTH_CHECK` | `/health` | Health check endpoint |
54
+ | `OPENENV_ENV_TYPE` | `data_cleaning` | Environment type |
55
+ | `OPENENV_MAX_STEPS` | `50` | Maximum steps per episode |
56
+ | `OPENENV_PREVIEW_ROWS` | `10` | Number of rows to preview |
57
+ | `OPENENV_DEFAULT_TASK_LEVEL` | `easy` | Default task difficulty |
58
+ | `OPENENV_GRADING_METHOD` | `deterministic` | Grading method |
59
+ | `OPENENV_RESOLUTION_BONUS` | `0.2` | Resolution bonus weight |
60
+ | `OPENENV_ERROR_PENALTY` | `0.05` | Error penalty weight |
61
+
62
+ ## Testing OpenEnv Configuration
63
+
64
+ ### 1. Check Configuration Status
65
+
66
+ ```bash
67
+ curl http://localhost:7860/openenv/status
68
+ ```
69
+
70
+ Expected response:
71
+ ```json
72
+ {
73
+ "openenv_configured": true,
74
+ "name": "data-cleaning-env",
75
+ "version": "1.0.0",
76
+ "port": 7860,
77
+ "environment_type": "data_cleaning",
78
+ "max_steps": 50,
79
+ "preview_rows": 10,
80
+ "environment_variables": {
81
+ "OPENENV_NAME": "NOT_SET",
82
+ "OPENENV_VERSION": "NOT_SET",
83
+ ...
84
+ },
85
+ "status": "healthy"
86
+ }
87
+ ```
88
+
89
+ ### 2. Test Reset Endpoint
90
+
91
+ ```bash
92
+ curl -X POST http://localhost:7860/reset \
93
+ -H "Content-Type: application/json" \
94
+ -d '{
95
+ "task_id": "easy_001",
96
+ "session_id": "test_session"
97
+ }'
98
+ ```
99
+
100
+ Expected response:
101
+ ```json
102
+ {
103
+ "success": true,
104
+ "message": "Environment reset with task easy_001",
105
+ "data": {
106
+ "session_id": "test_session",
107
+ "observation": {...},
108
+ "state": {...}
109
+ }
110
+ }
111
+ ```
112
+
113
+ ### 3. Check Health
114
+
115
+ ```bash
116
+ curl http://localhost:7860/health
117
+ ```
118
+
119
+ Expected response:
120
+ ```json
121
+ {"status": "healthy"}
122
+ ```
123
+
124
+ ## Docker Setup
125
+
126
+ ### Building the Container
127
+
128
+ ```bash
129
+ docker build -t data-cleanser-openenv .
130
+ ```
131
+
132
+ ### Running the Container
133
+
134
+ ```bash
135
+ docker run -p 7860:7860 data-cleanser-openenv
136
+ ```
137
+
138
+ ### Custom Environment Variables
139
+
140
+ You can override any OpenEnv variable when running the container:
141
+
142
+ ```bash
143
+ docker run -p 7860:7860 \
144
+ -e OPENENV_MAX_STEPS=100 \
145
+ -e OPENENV_PREVIEW_ROWS=20 \
146
+ data-cleanser-openenv
147
+ ```
148
+
149
+ ## OpenEnv YAML Configuration
150
+
151
+ The system generates an OpenEnv YAML configuration automatically. You can view it using:
152
+
153
+ ```bash
154
+ curl http://localhost:7860/openenv/config
155
+ ```
156
+
157
+ This returns the complete YAML configuration that can be used with OpenEnv tools.
158
+
159
+ ## Troubleshooting
160
+
161
+ ### "OpenENV is not set" Error
162
+
163
+ This error occurs when OpenEnv environment variables are not properly configured. To fix:
164
+
165
+ 1. Ensure Docker environment variables are set in the Dockerfile
166
+ 2. Verify the environment variables are being loaded correctly
167
+ 3. Check the OpenEnv status endpoint for configuration validation
168
+
169
+ ### Configuration Validation
170
+
171
+ Use the status endpoint to validate your configuration:
172
+
173
+ ```bash
174
+ curl http://localhost:7860/openenv/status | jq '.openenv_configured'
175
+ ```
176
+
177
+ Should return `true` for a properly configured environment.
178
+
179
+ ### Common Issues
180
+
181
+ 1. **Port conflicts**: Ensure port 7860 is available
182
+ 2. **Missing datasets**: Run `POST /generate-datasets` to create test data
183
+ 3. **Invalid task IDs**: Use `GET /tasks` to list available tasks
184
+
185
+ ## Integration with OpenEnv Tools
186
+
187
+ The environment is now compatible with OpenEnv tools and can be used with:
188
+
189
+ - OpenEnv CLI tools
190
+ - OpenEnv evaluation frameworks
191
+ - OpenEnv benchmarking tools
192
+
193
+ The `/openenv/status` and `/openenv/config` endpoints provide the necessary information for OpenEnv tools to interact with this environment.
194
+
195
+ ## Next Steps
196
+
197
+ 1. Test the environment with actual OpenEnv tools
198
+ 2. Configure additional task types if needed
199
+ 3. Customize the grading system for specific use cases
200
+ 4. Set up monitoring and logging for production use
__pycache__/app.cpython-314.pyc CHANGED
Binary files a/__pycache__/app.cpython-314.pyc and b/__pycache__/app.cpython-314.pyc differ
 
app.py CHANGED
@@ -20,6 +20,7 @@ from env.models import (
20
  QualityMetrics, SchemaInfo, IssueSummary
21
  )
22
  from env.tasks import TaskManager
 
23
 
24
  logging.basicConfig(level=logging.INFO)
25
  logger = logging.getLogger(__name__)
@@ -677,6 +678,36 @@ async def list_sessions():
677
  }
678
 
679
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
680
  @app.post("/generate-datasets")
681
  async def generate_datasets():
682
  """Generate all task datasets"""
 
20
  QualityMetrics, SchemaInfo, IssueSummary
21
  )
22
  from env.tasks import TaskManager
23
+ from env.openenv_config import create_openenv_config, check_openenv_env_vars, print_openenv_status
24
 
25
  logging.basicConfig(level=logging.INFO)
26
  logger = logging.getLogger(__name__)
 
678
  }
679
 
680
 
681
+ @app.get("/openenv/config")
682
+ async def get_openenv_config():
683
+ """Get OpenEnv configuration"""
684
+ config = create_openenv_config()
685
+ return {
686
+ "config": config.to_dict(),
687
+ "environment_variables": check_openenv_env_vars(),
688
+ "valid": config.validate()
689
+ }
690
+
691
+
692
+ @app.get("/openenv/status")
693
+ def get_openenv_status():
694
+ """Get OpenEnv status and configuration"""
695
+ config = create_openenv_config()
696
+ env_vars = check_openenv_env_vars()
697
+
698
+ return {
699
+ "openenv_configured": config.validate(),
700
+ "name": config.name,
701
+ "version": config.version,
702
+ "port": config.port,
703
+ "environment_type": config.env_type,
704
+ "max_steps": config.max_steps,
705
+ "preview_rows": config.preview_rows,
706
+ "environment_variables": env_vars,
707
+ "status": "healthy" if config.validate() else "configuration_error"
708
+ }
709
+
710
+
711
  @app.post("/generate-datasets")
712
  async def generate_datasets():
713
  """Generate all task datasets"""
env/__pycache__/openenv_config.cpython-314.pyc ADDED
Binary file (9.96 kB). View file
 
env/openenv_config.py ADDED
@@ -0,0 +1,155 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ OpenEnv Configuration Module
3
+ """
4
+
5
+ import os
6
+ from typing import Optional, Dict, Any
7
+
8
+
9
+ class OpenEnvConfig:
10
+ """Configuration for OpenEnv environment"""
11
+
12
+ def __init__(self):
13
+ self.name = os.getenv("OPENENV_NAME", "data-cleaning-env")
14
+ self.version = os.getenv("OPENENV_VERSION", "1.0.0")
15
+ self.description = os.getenv("OPENENV_DESCRIPTION", "OpenEnv Data Cleaning and Validation Environment")
16
+ self.runtime_type = os.getenv("OPENENV_RUNTIME_TYPE", "docker")
17
+ self.port = int(os.getenv("OPENENV_PORT", "7860"))
18
+ self.health_check = os.getenv("OPENENV_HEALTH_CHECK", "/health")
19
+
20
+ # Environment settings
21
+ self.env_type = os.getenv("OPENENV_ENV_TYPE", "data_cleaning")
22
+ self.max_steps = int(os.getenv("OPENENV_MAX_STEPS", "50"))
23
+ self.preview_rows = int(os.getenv("OPENENV_PREVIEW_ROWS", "10"))
24
+
25
+ # Task settings
26
+ self.task_levels = ["easy", "medium", "hard"]
27
+ self.default_task_level = os.getenv("OPENENV_DEFAULT_TASK_LEVEL", "easy")
28
+
29
+ # Grading settings
30
+ self.grading_method = os.getenv("OPENENV_GRADING_METHOD", "deterministic")
31
+ self.resolution_bonus = float(os.getenv("OPENENV_RESOLUTION_BONUS", "0.2"))
32
+ self.error_penalty = float(os.getenv("OPENENV_ERROR_PENALTY", "0.05"))
33
+
34
+ def to_dict(self) -> Dict[str, Any]:
35
+ """Convert configuration to dictionary"""
36
+ return {
37
+ "name": self.name,
38
+ "version": self.version,
39
+ "description": self.description,
40
+ "runtime": {
41
+ "type": self.runtime_type,
42
+ "port": self.port,
43
+ "health_check": self.health_check
44
+ },
45
+ "environment": {
46
+ "type": self.env_type,
47
+ "max_steps": self.max_steps,
48
+ "preview_rows": self.preview_rows
49
+ },
50
+ "grading": {
51
+ "method": self.grading_method,
52
+ "resolution_bonus": self.resolution_bonus,
53
+ "error_penalty": self.error_penalty
54
+ }
55
+ }
56
+
57
+ def validate(self) -> bool:
58
+ """Validate configuration"""
59
+ if self.port <= 0 or self.port > 65535:
60
+ return False
61
+ if self.max_steps <= 0:
62
+ return False
63
+ if self.preview_rows <= 0:
64
+ return False
65
+ if self.default_task_level not in self.task_levels:
66
+ return False
67
+ return True
68
+
69
+ def get_openenv_yaml(self) -> str:
70
+ """Generate OpenEnv YAML configuration"""
71
+ config = self.to_dict()
72
+
73
+ yaml_content = f"""name: {config['name']}
74
+ version: {config['version']}
75
+ description: {config['description']}
76
+
77
+ runtime:
78
+ type: {config['runtime']['type']}
79
+ port: {config['runtime']['port']}
80
+ health_check: {config['runtime']['health_check']}
81
+
82
+ environment:
83
+ type: {config['environment']['type']}
84
+ max_steps: {config['environment']['max_steps']}
85
+ preview_rows: {config['environment']['preview_rows']}
86
+
87
+ grading:
88
+ method: {config['grading']['method']}
89
+ resolution_bonus: {config['grading']['resolution_bonus']}
90
+ error_penalty: {config['grading']['error_penalty']}
91
+
92
+ # Environment variables used:
93
+ # OPENENV_NAME: {self.name}
94
+ # OPENENV_VERSION: {self.version}
95
+ # OPENENV_DESCRIPTION: {self.description}
96
+ # OPENENV_RUNTIME_TYPE: {self.runtime_type}
97
+ # OPENENV_PORT: {self.port}
98
+ # OPENENV_HEALTH_CHECK: {self.health_check}
99
+ # OPENENV_ENV_TYPE: {self.env_type}
100
+ # OPENENV_MAX_STEPS: {self.max_steps}
101
+ # OPENENV_PREVIEW_ROWS: {self.preview_rows}
102
+ # OPENENV_DEFAULT_TASK_LEVEL: {self.default_task_level}
103
+ # OPENENV_GRADING_METHOD: {self.grading_method}
104
+ # OPENENV_RESOLUTION_BONUS: {self.resolution_bonus}
105
+ # OPENENV_ERROR_PENALTY: {self.error_penalty}
106
+ """
107
+ return yaml_content
108
+
109
+
110
+ def create_openenv_config() -> OpenEnvConfig:
111
+ """Get OpenEnv configuration instance"""
112
+ return OpenEnvConfig()
113
+
114
+
115
+ def check_openenv_env_vars() -> Dict[str, str]:
116
+ """Check which OpenEnv environment variables are set"""
117
+ env_vars = {
118
+ 'OPENENV_NAME': os.getenv('OPENENV_NAME', 'NOT_SET'),
119
+ 'OPENENV_VERSION': os.getenv('OPENENV_VERSION', 'NOT_SET'),
120
+ 'OPENENV_DESCRIPTION': os.getenv('OPENENV_DESCRIPTION', 'NOT_SET'),
121
+ 'OPENENV_RUNTIME_TYPE': os.getenv('OPENENV_RUNTIME_TYPE', 'NOT_SET'),
122
+ 'OPENENV_PORT': os.getenv('OPENENV_PORT', 'NOT_SET'),
123
+ 'OPENENV_HEALTH_CHECK': os.getenv('OPENENV_HEALTH_CHECK', 'NOT_SET'),
124
+ 'OPENENV_ENV_TYPE': os.getenv('OPENENV_ENV_TYPE', 'NOT_SET'),
125
+ 'OPENENV_MAX_STEPS': os.getenv('OPENENV_MAX_STEPS', 'NOT_SET'),
126
+ 'OPENENV_PREVIEW_ROWS': os.getenv('OPENENV_PREVIEW_ROWS', 'NOT_SET'),
127
+ 'OPENENV_DEFAULT_TASK_LEVEL': os.getenv('OPENENV_DEFAULT_TASK_LEVEL', 'NOT_SET'),
128
+ 'OPENENV_GRADING_METHOD': os.getenv('OPENENV_GRADING_METHOD', 'NOT_SET'),
129
+ 'OPENENV_RESOLUTION_BONUS': os.getenv('OPENENV_RESOLUTION_BONUS', 'NOT_SET'),
130
+ 'OPENENV_ERROR_PENALTY': os.getenv('OPENENV_ERROR_PENALTY', 'NOT_SET'),
131
+ }
132
+ return env_vars
133
+
134
+
135
+ def print_openenv_status():
136
+ """Print OpenEnv configuration status"""
137
+ config = get_openenv_config()
138
+ env_vars = check_openenv_env_vars()
139
+
140
+ print("=== OpenEnv Configuration Status ===")
141
+ print(f"Configuration valid: {config.validate()}")
142
+ print(f"Name: {config.name}")
143
+ print(f"Version: {config.version}")
144
+ print(f"Port: {config.port}")
145
+ print(f"Environment Type: {config.env_type}")
146
+ print(f"Max Steps: {config.max_steps}")
147
+ print(f"Preview Rows: {config.preview_rows}")
148
+ print()
149
+ print("Environment Variables:")
150
+ for key, value in env_vars.items():
151
+ status = "✓" if value != "NOT_SET" else "✗"
152
+ print(f" {status} {key}: {value}")
153
+ print()
154
+ print("OpenEnv YAML Configuration:")
155
+ print(config.get_openenv_yaml())