Spaces:

Arry7868
/

Kaggle_Simulation_Environment

Sleeping

App Files Files Community

Kaggle_Simulation_Environment / README.md

Arry7868

Added training Script

a0e77e6 verified 2 months ago

preview code

Raw

History Blame Contribute Delete

9.53 kB

	---
	title: KaggleSimEnv
	emoji: 🚀
	colorFrom: blue
	colorTo: purple
	sdk: docker
	app_file: server/app.py
	pinned: false
	---

	# KaggleSimEnv v3

	Production-grade OpenEnv RL environment simulating Kaggle competitions with hierarchical action categories, causal dataset properties, failure-mode traps, contextual strategy scoring, and 50+ advanced strategies.

	> 🤗 Live on Hugging Face Spaces: [https://huggingface.co/spaces/aadi-gupta/kaggle-sim-env](https://huggingface.co/spaces/aadi-gupta/kaggle-sim-env)
	>
	> 📓 Training Notebook (Colab): [train_grpo.ipynb](train_grpo.ipynb) — GRPO training with Unsloth + TRL on a free T4 GPU
	>
	> 📝 Writeup: (link your HF blog post or YouTube video here once published)

	---

	## Training Results

	We trained a `Qwen2.5-0.5B-Instruct` agent using GRPO (Group Relative Policy Optimisation) via TRL + Unsloth.
	The model learns to generate action plans that score higher against the env compared to a random agent.

	### Episode Reward Curve

	![Reward curve — random vs expert baseline over 30 episodes](plots/reward_curve.png)
	X-axis: episode number. Y-axis: final grade score (0–1). Smoothed with a rolling window of 8.

	### Loss Curve (score gap to optimal)

	![Loss curve — score gap (1 − score) per episode](plots/loss_curve.png)
	Lower is better. Expert baseline (blue) consistently closes the gap faster than the random agent (red).

	### Per-task Score: Random vs Expert Baseline

	![Baseline vs trained comparison per task](plots/baseline_vs_trained.png)
	Expert baseline outperforms random agent across all 5 tasks (30-episode mean scores).

	### Quantitative Results (30-episode run)

	\| Task \| Random agent \| Expert baseline \| Delta \|
	\|---\|---\|---\|---\|
	\| easy_churn \| 0.41 \| 1.00 \| +0.59 \|
	\| medium_fraud \| 0.26 \| 0.78 \| +0.52 \|
	\| hard_leaky_noisy \| 0.13 \| 0.64 \| +0.51 \|
	\| image_quality \| 0.05 \| 0.48 \| +0.43 \|
	\| trajectory_pred \| 0.06 \| 0.48 \| +0.42 \|
	\| Mean \| 0.18 \| 0.68 \| +0.50 \|

	> To reproduce plots: `python generate_training_plots_stub.py --episodes 30`
	> To reproduce full GRPO training: open `train_grpo.ipynb` in Google Colab (T4 GPU, ~25 min).

	---

	## Quick Start

	```bash
	pip install -r requirements.txt
	uvicorn server.app:app --host 0.0.0.0 --port 7860 --reload
	```

	### Baseline agent

	```bash
	export OPENAI_API_KEY=sk-...
	python -m baseline.run_baseline --mode local
	```

	---

	## Architecture

	```
	openenvHackathon/
	├── kaggle_sim_env/
	│ ├── models.py # Hierarchical categories, DatasetProperties, FailureMode
	│ ├── environment.py # Causal logic, trap detection, mitigation tracking
	│ ├── tasks.py # 5 tasks with properties, traps, context relevance
	│ ├── grader.py # 4-axis grading (perf + strategy + combo + trap)
	│ ├── leaderboard.py # Ghost competitor leaderboard
	│ ├── hints.py # Per-task hint dispensing
	│ └── rewards.py # 9-component dense reward
	├── api/server.py # FastAPI (8 endpoints)
	├── baseline/run_baseline.py # Structured phase-based agent
	├── openenv.yaml / Dockerfile / requirements.txt
	```

	---

	## Hierarchical Action Space

	Actions use `category` to reduce search space:

	```json
	{
	"action_type": "feature_engineering",
	"parameters": {
	"category": "distribution",
	"technique": "log_transform"
	}
	}
	```

	\| Action Type \| Categories → Techniques \|
	\|---\|---\|
	\| `set_cv` \| standard(kfold, repeated_kfold) · group(group_kfold, stratified_group_kfold) · temporal(time_split, combined_group_time) \|
	\| `feature_engineering` \| distribution(log_transform, normalize, quantile_features) · interaction(interaction_terms, domain_ratios) · encoding(sin_cos_encoding, target_encoding, spatial_encoding, tfidf_features) · spatial(relative_coordinates, distance_features) · signal(frequency_features, multi_layer_features, fourier_resampling) \|
	\| `detect_shift` \| detection(adversarial_validation, feature_importance_shift) · mitigation(remove_identifiers, domain_invariant_features) \|
	\| `train_model` \| tree(xgboost, lightgbm, catboost, random_forest) · linear(linear) · neural(neural_network, pretrained_backbone, temporal_cnn, transformer_encoder) \|
	\| `handle_imbalance` \| weighting(scale_pos_weight, class_weighted_loss) · calibration(calibrate_probabilities, optimize_threshold) · hierarchy(hierarchical_labels, lower_thresholds_recall) \|
	\| `clean_data` \| removal(remove_corrupted, remove_outliers, remove_leaky_features) · reconstruction(analytical_reconstruction, nan_native_model, domain_augmentation, clean_subset_training) \|
	\| `augmentation` \| geometric(geometric, rotation_invariant, image_rectification) · color(color_transform, clahe) · noise(gaussian_noise, robustness_augmentation) · domain(camera_simulation, temporal_augmentation, symmetry_augmentation, multi_view_processing) \|
	\| `ensemble` \| averaging(weighted_average, multi_seed_averaging, swa) · stacking(stacking) · diversity(diverse_features, heterogeneous) \|
	\| `postprocess` \| calibration(bias_correction, prediction_shrinkage, per_group_calibration) · domain(domain_rules, physics_constraints) · inference(tta) \|
	\| `tune_loss` \| asymmetric(asymmetric_loss, epsilon_insensitive) · uncertainty(gaussian_nll) · multi_objective(multi_task, interval_regression, quantile_regression) · weighting(sample_weighted, auxiliary_physics_loss) \|
	\| `regularize` \| weight(strong_regularization, ema, dropout) · transfer(freeze_backbone) \|

	Plus: `pseudo_label` (iterations), `inspect_top_solution`, `submit`

	---

	## Causal Dataset Properties

	Each task has ground-truth properties that drive causal reward logic:

	```python
	DatasetProperties(
	has_shift=True, # Actions addressing shift are rewarded
	has_leakage=True, # Cleaning leaky features is critical
	has_noise_features=True, # Interaction terms on noise amplify it
	has_missing_data=True, # Reconstruction strategies get bonus
	has_imbalance=True, # Scale_pos_weight becomes relevant
	has_images=False, # Image augmentation is irrelevant → penalty
	needs_physics=False, # Physics loss is irrelevant → penalty
	)
	```

	Actions are scored based on whether they match the dataset:

	```
	if dataset.has_shift and action == "adversarial_validation":
	reward += context_bonus # Relevant!
	elif not dataset.has_images and action == "geometric_augmentation":
	reward += irrelevant_penalty # Wrong domain!
	```

	---

	## Failure-Mode Traps

	The environment contains traps that punish common mistakes:

	\| Trap \| Trigger \| Effect \| Mitigation \|
	\|---\|---\|---\|---\|
	\| kfold_on_temporal_data \| Using kfold when has_shift \| CV +0.08, test -0.04 \| Use time_split instead \|
	\| ignoring_shift \| Training without addressing shift \| test -0.06 \| Detect shift first \|
	\| keeping_leaky_feature \| Training when has_leakage \| test -0.08 \| Clean leaky features first \|
	\| target_encoding_leakage \| target_encoding on shifted data \| CV +0.05, test -0.06 \| Don't use it \|
	\| interaction_terms_on_noise \| interaction_terms when noise \| CV +0.05, test -0.04 \| Avoid on noisy data \|
	\| tree_model_on_images \| xgboost on image data \| CV +0.04, test -0.02 \| Use pretrained_backbone \|
	\| no_augmentation_on_images \| Submit without augmentation \| test -0.04 \| Apply augmentation \|
	\| raw_heading_without_sincos \| Submit without sin_cos_encoding \| test -0.03 \| Encode angles properly \|

	Traps can be mitigated by taking the correct action first. The environment tracks mitigations.

	---

	## Grading (4 Axes)

	```
	final = 0.40×performance + 0.25×strategy + 0.20×combo + 0.15×trap_avoidance
	```

	\| Component \| Description \|
	\|---\|---\|
	\| performance \| Test score vs ghost competitors \|
	\| strategy \| Contextual — penalises irrelevant strategies used \|
	\| combo \| Fraction of synergy combos activated \|
	\| trap_avoidance \| 1.0 minus fraction of traps triggered \|

	---

	## Reward Function (9 Components)

	\| Component \| Description \|
	\|---\|---\|
	\| `cv_improvement` \| Δ CV score \|
	\| `strategy_bonus` \| +0.05 for expected strategy \|
	\| `context_bonus` \| +0.03×relevance (positive) or -0.04×relevance (negative) \|
	\| `combo_bonus` \| +0.08 per combo completed \|
	\| `redundancy_penalty` \| -0.03 × repeat count \|
	\| `irrelevant_penalty` \| -0.05 for actions with relevance ≤ -0.8 \|
	\| `trap_penalty` \| -0.08 per trap triggered \|
	\| `overfitting_penalty` \| -0.5 × gap when CV-test > 0.05 \|
	\| `submission_bonus` \| 0.5 × test_score \|

	---

	## Tasks (5)

	\| Task \| Difficulty \| Traps \| Combos \| Key Challenge \|
	\|---\|---\|---\|---\|---\|
	\| `easy_churn` \| Easy \| 3 \| 2 \| Clean tabular, mild imbalance \|
	\| `medium_fraud` \| Medium \| 3 \| 3 \| Shift, heavy imbalance, safety-critical \|
	\| `hard_leaky_noisy` \| Hard \| 4 \| 4 \| Leakage, noise, missing data, shift \|
	\| `image_quality` \| Hard \| 2 \| 4 \| Heavy-tailed, camera bias, augmentation \|
	\| `trajectory_pred` \| Hard \| 2 \| 4 \| Multi-agent, physics, spatial-temporal \|

	---

	## Baseline Agent

	Structured multi-phase approach:
	1. Inspect hints (1-2)
	2. Diagnose dataset properties
	3. Clean if needed
	4. CV appropriate for domain
	5. Features domain-relevant only
	6. Train right model family
	7. Tune imbalance/loss
	8. Ensemble (1-2 techniques)
	9. Submit

	Keeps actions to 8-15 total. Uses hints to inform decisions.

	---

	## Docker

	```bash
	docker build -t kaggle-sim-env .
	docker run -p 7860:7860 kaggle-sim-env
	```

	---

	## License

	MIT