Crispo / README.md
LOOFYYLO's picture
Upload folder using huggingface_hub
8843f95 verified
metadata
license: agpl-3.0
tags:
  - learning-augmented-algorithms
  - automated-algorithm-design
  - crispo
  - code-generation
  - ai-code-generation

Crispo: Autonomous Co-Design of ML Predictors and Learning-Augmented Algorithms

Crispo is a production-ready, research-grade system for the automated co-design of Learning-Augmented Algorithms (LAA). It transforms high-level objectives into complete, two-part "Solution Packages" containing a machine learning predictor and a specialized algorithm that consumes its predictions.

🎯 System Overview

The core innovation of Crispo is its ability to bridge the gap between machine learning and classical algorithm design. For a given online problem (e.g., ski rental), it generates:

  1. A Predictor Script: An ML model (e.g., ARIMA) that learns from historical data to predict future values and quantifies its own uncertainty.
  2. An Algorithm Script: A Learning-Augmented Algorithm that takes the ML prediction as input and intelligently balances it against a robust worst-case strategy using a trust parameter (Ξ»).

The entire system is designed to be autonomous, optimizing its own components and learning from past performance to improve future solutions.

πŸ—οΈ Core Architecture

Crispo is built on a three-tier optimization stack, ensuring a clear separation of concerns:

  1. Genetic Algorithm (Strategic): The GAOptimizer evolves high-level parameters for code generation, searching for the best overall strategy. It now features adaptive population sizing for improved efficiency.
  2. Reinforcement Learning (Tactical): The RLAgent fine-tunes the parameters for a specific layer, using a Q-table to learn optimal, context-aware adjustments. The Q-table is now pruned to prevent unbounded memory growth.
  3. Attention Mechanism (Coordination): The AttentionRouter allows different layers of the generated pipeline to share information, ensuring a cohesive and well-coordinated final output.

This stack feeds into an intent-driven CodeGenerator that selects and parameterizes code templates based on the user's objective.

✨ Key Features & Innovations

1. Learning-Augmented Algorithm (LAA) Co-Design

Crispo's primary feature is its end-to-end framework for generating and evaluating LAAs. The system automatically co-designs a predictor and an algorithm that work in tandem.

2. Two-Stage "Live" Evaluation

To ensure solutions are robust, the Verifier performs a rigorous, two-stage evaluation that simulates a real-world deployment:

  • Stage 1: Prediction: The generated predictor is run on historical data to produce a "live" prediction interval.
  • Stage 2: Execution: The generated algorithm is run with the live prediction, and its performance (e.g., competitive_ratio) is measured.

This methodology is novel and provides a much more realistic assessment than mock evaluations.

3. Solution Registry

Verified solutions are automatically versioned and saved to the solution_registry/ directory. This creates a persistent, queryable knowledge base of high-quality solutions.

Example Query:

python3 crispo.py --query-registry "competitive_ratio:1.2"

4. Meta-Learning with UCB1

The MetaLearner allows Crispo to learn from its own performance. It has been upgraded from a simple epsilon-greedy strategy to an Upper Confidence Bound (UCB1) algorithm, which provides a more principled and efficient balance between exploring new strategies and exploiting known good ones.

βš™οΈ Component Analysis & Recent Improvements

  • GAOptimizer: Now uses adaptive population sizing to scale its search space based on problem complexity, improving performance. It also evaluates fitness in parallel using a ProcessPoolExecutor.
  • RLAgent: The Q-table is now pruned after each training episode to prevent memory exhaustion in long-running sessions.
  • Verifier: Now includes a PredictorEvaluator that calculates Uncertainty Quantification (UQ) metrics (coverage_rate and interval_sharpness) for the generated predictor, providing a more complete picture of the solution's quality.
  • Security: Subprocess execution is now sandboxed with resource limits to prevent runaway processes, and file writes are validated to prevent directory traversal attacks.

πŸš€ Advanced Features

Bayesian Neural Architecture Search (NAS)

The NAS pipeline has been upgraded from a random search to a Bayesian Optimization strategy, using Gaussian Processes to intelligently search for optimal neural network architectures. This results in a ~10x speedup in finding near-optimal architectures.

Federated Optimizer

The placeholder FederatedOptimizer has been replaced with a functional Federated Averaging (FedAvg) implementation, enabling true federated learning across multiple clients.

Transfer Learning

A production-ready, three-step transfer learning pipeline (load_model, apply_model, log_to_registry) is available to transfer knowledge from previously trained models.

Usage

Crispo is a command-line tool. The main entry point is crispo.py.

Basic Example

python3 crispo.py --project "MyDataPipeline" --objective "Fetch data from an API, process it with pandas, and analyze with numpy"

LAA Co-Design Example

To generate a Learning-Augmented Algorithm for the ski rental problem:

python3 crispo.py --project "SkiRentalLAA" --objective "Generate a learning-augmented algorithm for the ski rental problem" --trust-parameter 0.7

Note: This requires a ski_rental_history.csv file in the root directory.

Enabling Advanced Features

python3 crispo.py --objective "Optimize a deep learning model" \
                  --enable-nas \
                  --enable-transfer-learning \
                  --enable-federated-optimization

Saving and Loading Meta-Knowledge

You can persist the MetaLearner's state across runs:

# Save the learned state
python3 crispo.py --objective "My first run" --save-metaknowledge knowledge.pkl

# Load the state for a new run
python3 crispo.py --objective "My second run, building on the first" --load-metaknowledge knowledge.pkl

Licensing

crispo is licensed under the GNU Affero General Public License v3.0 (AGPLv3). This means you are free to use, modify, and distribute this software for any open-source project that is also licensed under the AGPLv3.

For use in a closed-source, proprietary, or commercial application, a separate commercial license is required. Please contact us at your.email@example.com to inquire about obtaining a commercial license.

Testing

The project uses the built-in unittest framework. To run the full test suite:

python3 -m unittest test_crispo.py