shinka-backup / docs /configuration.md
JustinTX's picture
Add files using upload-large-folder tool
1556404 verified
# Shinka Configuration Guide ⚙️
This guide covers the comprehensive configuration system in Shinka, including all parameters, file structures, and advanced configuration patterns.
## Table of Contents
1. [Core Configuration Components](#core-configuration-components)
2. [Configuration Parameters](#configuration-parameters)
3. [Pre-configured Variants](#pre-configured-variants)
4. [Configuration Structure](#configuration-structure)
5. [Creating Custom Configurations](#creating-custom-configurations)
6. [Advanced Configuration Patterns](#advanced-configuration-patterns)
7. [Configuration Examples](#configuration-examples)
8. [Configuration Best Practices](#configuration-best-practices)
## Core Configuration Components
### 1. Evolution Config (`evo_config`)
Controls the core evolutionary algorithm parameters:
```yaml
evo_config:
_target_: shinka.core.EvolutionConfig
num_generations: 20 # Number of evolution generations
max_parallel_jobs: 1 # Maximum parallel evaluations
max_patch_attempts: 10 # Max attempts to generate valid patches
# LLM Configuration
llm_models: # List of LLM models for mutations
- "azure-gpt-4.1"
llm_dynamic_selection: null # Dynamic model selection strategy
embedding_model: "text-embedding-3-small"
# Patch Configuration
patch_types: # Types of code modifications
- "diff" # Diff-based patches
- "full" # Full code replacement
patch_type_probs: # Probabilities for each patch type
- 0.5
- 0.5
# Task Configuration
language: "python" # Programming language
init_program_path: "???" # Path to initial program
task_sys_msg: "???" # System message for LLM
job_type: "local" # Job execution type
results_dir: ${output_dir} # Results directory
```
### 2. Database Config (`db_config`)
Manages the evolutionary database and island topology:
```yaml
db_config:
_target_: shinka.database.DatabaseConfig
db_path: "evolution_db.sqlite" # SQLite database path
# Island Configuration
num_islands: 2 # Number of evolutionary islands
island_elitism: true # Enable elite preservation per island
# Archive Configuration
archive_size: 20 # Size of elite solution archive
num_archive_inspirations: 4 # Solutions drawn from archive
num_top_k_inspirations: 2 # Solutions from current generation
# Selection and Migration
exploitation_ratio: 0.2 # Exploitation vs exploration balance
elite_selection_ratio: 0.3 # Fraction of elites for selection
migration_interval: 10 # Generations between migrations
migration_rate: 0.1 # Fraction of population migrated
```
### 3. Job Config (`job_config`)
Defines the execution environment and resource requirements:
#### Local Execution
```yaml
job_config:
_target_: shinka.launch.LocalJobConfig
eval_program_path: "shinka/evaluate.py"
```
#### Slurm Cluster Execution
```yaml
job_config:
_target_: shinka.launch.SlurmCondaJobConfig
modules: # Environment modules
- "cuda/12.4"
- "cudnn/8.9.7"
- "hpcx/2.20"
eval_program_path: "shinka/utils/eval_hydra.py"
conda_env: "shinka" # Conda environment name
time: "01:00:00" # Maximum job runtime
cpus: 4 # CPU cores per job
gpus: 1 # GPUs per job
mem: "16G" # Memory per job
```
### 4. Task Config
Defines problem-specific settings and evaluation functions:
```yaml
# Task-specific evaluation function
evaluate_function:
_target_: examples.my_task.evaluate.main
program_path: ??? # Filled by runner
results_dir: ??? # Filled by runner
# Job configuration for this task
distributed_job_config:
_target_: shinka.launch.SlurmCondaJobConfig
# ... resource requirements ...
# Evolution settings specific to this task
evo_config:
task_sys_msg: |
You are an expert in [domain].
Key insights: [domain knowledge]
language: "python"
init_program_path: "examples/my_task/initial.py"
job_type: "slurm_conda"
exp_name: "shinka_my_task"
```
## Configuration Parameters
### Evolution Parameters
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `num_generations` | int | 20 | Number of evolutionary generations |
| `max_parallel_jobs` | int | 1 | Maximum concurrent evaluations |
| `max_patch_attempts` | int | 10 | Maximum attempts to generate valid patches |
| `llm_models` | list | `["azure-gpt-4.1"]` | LLM models for mutations |
| `patch_types` | list | `["diff", "full"]` | Types of code modifications |
| `patch_type_probs` | list | `[0.5, 0.5]` | Probabilities for patch types |
| `language` | str | `"python"` | Programming language |
| `embedding_model` | str | `"text-embedding-3-small"` | Model for code embeddings |
### Database Parameters
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `num_islands` | int | 2 | Number of evolutionary islands |
| `archive_size` | int | 20 | Size of elite solution archive |
| `num_archive_inspirations` | int | 4 | Solutions drawn from archive |
| `num_top_k_inspirations` | int | 2 | Solutions from current generation |
| `exploitation_ratio` | float | 0.2 | Balance between exploitation/exploration |
| `elite_selection_ratio` | float | 0.3 | Fraction of elites for selection |
| `migration_interval` | int | 10 | Generations between island migrations |
| `migration_rate` | float | 0.1 | Fraction of population migrated |
| `island_elitism` | bool | true | Preserve elites per island |
### Resource Parameters
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `time` | str | `"01:00:00"` | Maximum job runtime (HH:MM:SS) |
| `cpus` | int | 4 | CPU cores per job |
| `gpus` | int | 0 | GPUs per job |
| `mem` | str | `"8G"` | Memory per job |
| `conda_env` | str | `"shinka"` | Conda environment name |
| `modules` | list | `[]` | Environment modules to load |
## Pre-configured Variants
Shinka uses [Hydra](https://hydra.cc/) for flexible, hierarchical configuration management. The system is designed around composable configuration files that can be mixed and matched to create different experimental setups.
Variants provide pre-configured combinations of settings for common use cases:
### Circle Packing Example
```yaml
# configs/variant/circle_packing_example.yaml
defaults:
- override /database@_global_: island_large
- override /evolution@_global_: large_budget
- override /task@_global_: circle_packing
- override /cluster@_global_: local
- _self_
variant_suffix: "_example"
```
### Agent Design Example
```yaml
# configs/variant/agent_design_example.yaml
defaults:
- override /database@_global_: island_medium
- override /evolution@_global_: medium_budget
- override /task@_global_: agent_design
- override /cluster@_global_: local
- _self_
evo_config:
num_generations: 15
variant_suffix: "_agent_example"
```
## Configuration Structure
```
configs/
├── config.yaml # Main config file with defaults
├── cluster/ # Execution environments
│ ├── local.yaml # Local execution
│ ├── gcp.yaml # Google Cloud Platform
│ └── remote.yaml # Remote Slurm clusters
├── database/ # Evolution database settings
│ ├── island_small.yaml # Small-scale evolution (2 islands)
│ ├── island_medium.yaml# Medium-scale evolution (4 islands)
│ └── island_large.yaml # Large-scale evolution (8+ islands)
├── evolution/ # Evolution parameters
│ ├── small_budget.yaml # Few generations, quick runs
│ ├── medium_budget.yaml# Moderate computational budget
│ └── large_budget.yaml # Extensive evolution runs
├── task/ # Problem definitions
│ ├── circle_packing.yaml
│ ├── agent_design.yaml
│ ├── bbo_search.yaml
│ ├── cifar10.yaml
│ ├── cuda_optim.yaml
│ ├── mad_moe.yaml
│ └── novelty_generator.yaml
└── variant/ # Pre-configured combinations
├── circle_packing_example.yaml
├── agent_design_example.yaml
├── mad_moe_example.yaml
└── default.yaml
```
## Creating Custom Configurations
### Method 1: Custom Variant File
Create a new variant file combining existing components:
```yaml
# configs/variant/my_custom_variant.yaml
defaults:
- override /database@_global_: island_small
- override /evolution@_global_: small_budget
- override /task@_global_: my_task
- override /cluster@_global_: local
- _self_
# Override specific parameters
evo_config:
num_generations: 25
max_parallel_jobs: 2
db_config:
archive_size: 30
migration_interval: 5
variant_suffix: "_custom"
```
Launch with:
```bash
shinka_launch variant=my_custom_variant
```
### Method 2: Command Line Overrides
Override parameters directly on the command line:
```bash
shinka_launch \
task=circle_packing \
database=island_large \
evolution=medium_budget \
cluster=local \
evo_config.num_generations=50 \
evo_config.max_parallel_jobs=4 \
db_config.num_islands=6 \
variant_suffix="_custom_run"
```
### Method 3: Custom Task Configuration
Create a new task configuration:
```yaml
# configs/task/my_optimization_task.yaml
evaluate_function:
_target_: examples.my_optimization.evaluate.main
program_path: ???
results_dir: ???
distributed_job_config:
_target_: shinka.launch.LocalJobConfig
eval_program_path: "shinka/utils/eval_hydra.py"
evo_config:
task_sys_msg: |
You are an expert optimization researcher working on [specific problem].
Key insights to consider:
1. [Domain-specific insight 1]
2. [Domain-specific insight 2]
3. [Domain-specific insight 3]
Focus on [specific optimization goals].
language: "python"
init_program_path: "examples/my_optimization/initial.py"
job_type: "local"
exp_name: "shinka_my_optimization"
```
## Advanced Configuration Patterns
### Multi-Model Evolution
Use multiple LLM models with different strengths:
```yaml
evo_config:
llm_models:
- "azure-gpt-4.1" # Strong reasoning
- "claude-3-sonnet" # Good at code
- "azure-gpt-4o-mini" # Fast iterations
# Optional: Dynamic model selection
llm_dynamic_selection:
strategy: "performance_based"
window_size: 10
```
## Configuration Examples
### Quick Prototyping Setup
```yaml
# Fast iteration for development
defaults:
- override /database@_global_: island_small
- override /evolution@_global_: small_budget
- override /cluster@_global_: local
evo_config:
num_generations: 5
max_parallel_jobs: 1
db_config:
num_islands: 1
archive_size: 10
variant_suffix: "_prototype"
```
### Production Research Setup
```yaml
# Large-scale research experiment
defaults:
- override /database@_global_: island_large
- override /evolution@_global_: large_budget
- override /cluster@_global_: remote
evo_config:
num_generations: 100
max_parallel_jobs: 8
db_config:
num_islands: 8
archive_size: 50
migration_interval: 5
variant_suffix: "_production"
```
### Multi-Task Comparison
```yaml
# Configuration for comparing across tasks
defaults:
- override /database@_global_: island_medium
- override /evolution@_global_: medium_budget
- override /cluster@_global_: local
# Standardized parameters for fair comparison
evo_config:
num_generations: 30
max_parallel_jobs: 2
llm_models: ["azure-gpt-4.1"]
db_config:
num_islands: 4
archive_size: 25
variant_suffix: "_comparison"
```
## Configuration Best Practices
### 1. Start Small, Scale Up
- Begin with `island_small` and `small_budget` configurations
- Increase complexity as you understand the problem better
### 2. Use Meaningful Variant Suffixes
- Include key parameters in the suffix: `_gen50_islands4_gpt4`
- This helps identify experiments in results directories
### 3. Document Custom Configurations
- Add comments explaining parameter choices
- Include expected runtime and resource usage
### 4. Version Control Configurations
- Keep variant files in version control
- Tag configurations used for important results
### 5. Monitor Resource Usage
- Start with conservative resource allocations
- Monitor actual usage and adjust accordingly
For more examples and detailed parameter explanations, see the configuration files in the `configs/` directory and the [Getting Started Guide](getting_started.md).