Crispo / README.md

Upload folder using huggingface_hub

8843f95 verified about 2 months ago

6.85 kB

	---
	license: agpl-3.0
	tags:
	- learning-augmented-algorithms
	- automated-algorithm-design
	- crispo
	- code-generation
	- ai-code-generation
	---

	# Crispo: Autonomous Co-Design of ML Predictors and Learning-Augmented Algorithms

	Crispo is a production-ready, research-grade system for the automated co-design of Learning-Augmented Algorithms (LAA). It transforms high-level objectives into complete, two-part "Solution Packages" containing a machine learning predictor and a specialized algorithm that consumes its predictions.

	## 🎯 System Overview

	The core innovation of Crispo is its ability to bridge the gap between machine learning and classical algorithm design. For a given online problem (e.g., ski rental), it generates:

	1. A Predictor Script: An ML model (e.g., ARIMA) that learns from historical data to predict future values and quantifies its own uncertainty.
	2. An Algorithm Script: A Learning-Augmented Algorithm that takes the ML prediction as input and intelligently balances it against a robust worst-case strategy using a trust parameter (λ).

	The entire system is designed to be autonomous, optimizing its own components and learning from past performance to improve future solutions.

	## 🏗️ Core Architecture

	Crispo is built on a three-tier optimization stack, ensuring a clear separation of concerns:

	1. Genetic Algorithm (Strategic): The `GAOptimizer` evolves high-level parameters for code generation, searching for the best overall strategy. It now features adaptive population sizing for improved efficiency.
	2. Reinforcement Learning (Tactical): The `RLAgent` fine-tunes the parameters for a specific layer, using a Q-table to learn optimal, context-aware adjustments. The Q-table is now pruned to prevent unbounded memory growth.
	3. Attention Mechanism (Coordination): The `AttentionRouter` allows different layers of the generated pipeline to share information, ensuring a cohesive and well-coordinated final output.

	This stack feeds into an intent-driven `CodeGenerator` that selects and parameterizes code templates based on the user's objective.

	## ✨ Key Features & Innovations

	### 1. Learning-Augmented Algorithm (LAA) Co-Design

	Crispo's primary feature is its end-to-end framework for generating and evaluating LAAs. The system automatically co-designs a predictor and an algorithm that work in tandem.

	### 2. Two-Stage "Live" Evaluation

	To ensure solutions are robust, the `Verifier` performs a rigorous, two-stage evaluation that simulates a real-world deployment:

	* Stage 1: Prediction: The generated predictor is run on historical data to produce a "live" prediction interval.
	* Stage 2: Execution: The generated algorithm is run with the live prediction, and its performance (e.g., `competitive_ratio`) is measured.

	This methodology is novel and provides a much more realistic assessment than mock evaluations.

	### 3. Solution Registry

	Verified solutions are automatically versioned and saved to the `solution_registry/` directory. This creates a persistent, queryable knowledge base of high-quality solutions.

	Example Query:
	```bash
	python3 crispo.py --query-registry "competitive_ratio:1.2"
	```

	### 4. Meta-Learning with UCB1

	The `MetaLearner` allows Crispo to learn from its own performance. It has been upgraded from a simple epsilon-greedy strategy to an Upper Confidence Bound (UCB1) algorithm, which provides a more principled and efficient balance between exploring new strategies and exploiting known good ones.

	## ⚙️ Component Analysis & Recent Improvements

	- `GAOptimizer`: Now uses adaptive population sizing to scale its search space based on problem complexity, improving performance. It also evaluates fitness in parallel using a `ProcessPoolExecutor`.
	- `RLAgent`: The Q-table is now pruned after each training episode to prevent memory exhaustion in long-running sessions.
	- `Verifier`: Now includes a `PredictorEvaluator` that calculates Uncertainty Quantification (UQ) metrics (`coverage_rate` and `interval_sharpness`) for the generated predictor, providing a more complete picture of the solution's quality.
	- Security: Subprocess execution is now sandboxed with resource limits to prevent runaway processes, and file writes are validated to prevent directory traversal attacks.

	## 🚀 Advanced Features

	### Bayesian Neural Architecture Search (NAS)

	The NAS pipeline has been upgraded from a random search to a Bayesian Optimization strategy, using Gaussian Processes to intelligently search for optimal neural network architectures. This results in a ~10x speedup in finding near-optimal architectures.

	### Federated Optimizer

	The placeholder `FederatedOptimizer` has been replaced with a functional Federated Averaging (FedAvg) implementation, enabling true federated learning across multiple clients.

	### Transfer Learning

	A production-ready, three-step transfer learning pipeline (`load_model`, `apply_model`, `log_to_registry`) is available to transfer knowledge from previously trained models.

	## Usage

	Crispo is a command-line tool. The main entry point is `crispo.py`.

	### Basic Example

	```bash
	python3 crispo.py --project "MyDataPipeline" --objective "Fetch data from an API, process it with pandas, and analyze with numpy"
	```

	### LAA Co-Design Example

	To generate a Learning-Augmented Algorithm for the ski rental problem:

	```bash
	python3 crispo.py --project "SkiRentalLAA" --objective "Generate a learning-augmented algorithm for the ski rental problem" --trust-parameter 0.7
	```

	Note: This requires a `ski_rental_history.csv` file in the root directory.

	### Enabling Advanced Features

	```bash
	python3 crispo.py --objective "Optimize a deep learning model" \
	--enable-nas \
	--enable-transfer-learning \
	--enable-federated-optimization
	```

	### Saving and Loading Meta-Knowledge

	You can persist the `MetaLearner`'s state across runs:

	```bash
	# Save the learned state
	python3 crispo.py --objective "My first run" --save-metaknowledge knowledge.pkl

	# Load the state for a new run
	python3 crispo.py --objective "My second run, building on the first" --load-metaknowledge knowledge.pkl
	```

	## Licensing

	`crispo` is licensed under the GNU Affero General Public License v3.0 (AGPLv3). This means you are free to use, modify, and distribute this software for any open-source project that is also licensed under the AGPLv3.

	For use in a closed-source, proprietary, or commercial application, a separate commercial license is required. Please contact us at `your.email@example.com` to inquire about obtaining a commercial license.

	## Testing

	The project uses the built-in `unittest` framework. To run the full test suite:

	```bash
	python3 -m unittest test_crispo.py
	```