Add files using upload-large-folder tool

1556404 verified 23 days ago

12.8 kB

	# Shinka Configuration Guide ⚙️

	This guide covers the comprehensive configuration system in Shinka, including all parameters, file structures, and advanced configuration patterns.

	## Table of Contents

	1. [Core Configuration Components](#core-configuration-components)
	2. [Configuration Parameters](#configuration-parameters)
	3. [Pre-configured Variants](#pre-configured-variants)
	4. [Configuration Structure](#configuration-structure)
	5. [Creating Custom Configurations](#creating-custom-configurations)
	6. [Advanced Configuration Patterns](#advanced-configuration-patterns)
	7. [Configuration Examples](#configuration-examples)
	8. [Configuration Best Practices](#configuration-best-practices)


	## Core Configuration Components

	### 1. Evolution Config (`evo_config`)

	Controls the core evolutionary algorithm parameters:

	```yaml
	evo_config:
	_target_: shinka.core.EvolutionConfig
	num_generations: 20 # Number of evolution generations
	max_parallel_jobs: 1 # Maximum parallel evaluations
	max_patch_attempts: 10 # Max attempts to generate valid patches

	# LLM Configuration
	llm_models: # List of LLM models for mutations
	- "azure-gpt-4.1"
	llm_dynamic_selection: null # Dynamic model selection strategy
	embedding_model: "text-embedding-3-small"

	# Patch Configuration
	patch_types: # Types of code modifications
	- "diff" # Diff-based patches
	- "full" # Full code replacement
	patch_type_probs: # Probabilities for each patch type
	- 0.5
	- 0.5

	# Task Configuration
	language: "python" # Programming language
	init_program_path: "???" # Path to initial program
	task_sys_msg: "???" # System message for LLM
	job_type: "local" # Job execution type
	results_dir: ${output_dir} # Results directory
	```

	### 2. Database Config (`db_config`)

	Manages the evolutionary database and island topology:

	```yaml
	db_config:
	_target_: shinka.database.DatabaseConfig
	db_path: "evolution_db.sqlite" # SQLite database path

	# Island Configuration
	num_islands: 2 # Number of evolutionary islands
	island_elitism: true # Enable elite preservation per island

	# Archive Configuration
	archive_size: 20 # Size of elite solution archive
	num_archive_inspirations: 4 # Solutions drawn from archive
	num_top_k_inspirations: 2 # Solutions from current generation

	# Selection and Migration
	exploitation_ratio: 0.2 # Exploitation vs exploration balance
	elite_selection_ratio: 0.3 # Fraction of elites for selection
	migration_interval: 10 # Generations between migrations
	migration_rate: 0.1 # Fraction of population migrated
	```

	### 3. Job Config (`job_config`)

	Defines the execution environment and resource requirements:

	#### Local Execution
	```yaml
	job_config:
	_target_: shinka.launch.LocalJobConfig
	eval_program_path: "shinka/evaluate.py"
	```

	#### Slurm Cluster Execution
	```yaml
	job_config:
	_target_: shinka.launch.SlurmCondaJobConfig
	modules: # Environment modules
	- "cuda/12.4"
	- "cudnn/8.9.7"
	- "hpcx/2.20"
	eval_program_path: "shinka/utils/eval_hydra.py"
	conda_env: "shinka" # Conda environment name
	time: "01:00:00" # Maximum job runtime
	cpus: 4 # CPU cores per job
	gpus: 1 # GPUs per job
	mem: "16G" # Memory per job
	```

	### 4. Task Config

	Defines problem-specific settings and evaluation functions:

	```yaml
	# Task-specific evaluation function
	evaluate_function:
	_target_: examples.my_task.evaluate.main
	program_path: ??? # Filled by runner
	results_dir: ??? # Filled by runner

	# Job configuration for this task
	distributed_job_config:
	_target_: shinka.launch.SlurmCondaJobConfig
	# ... resource requirements ...

	# Evolution settings specific to this task
	evo_config:
	task_sys_msg: \|
	You are an expert in [domain].
	Key insights: [domain knowledge]
	language: "python"
	init_program_path: "examples/my_task/initial.py"
	job_type: "slurm_conda"

	exp_name: "shinka_my_task"
	```

	## Configuration Parameters

	### Evolution Parameters

	\| Parameter \| Type \| Default \| Description \|
	\|-----------\|------\|---------\|-------------\|
	\| `num_generations` \| int \| 20 \| Number of evolutionary generations \|
	\| `max_parallel_jobs` \| int \| 1 \| Maximum concurrent evaluations \|
	\| `max_patch_attempts` \| int \| 10 \| Maximum attempts to generate valid patches \|
	\| `llm_models` \| list \| `["azure-gpt-4.1"]` \| LLM models for mutations \|
	\| `patch_types` \| list \| `["diff", "full"]` \| Types of code modifications \|
	\| `patch_type_probs` \| list \| `[0.5, 0.5]` \| Probabilities for patch types \|
	\| `language` \| str \| `"python"` \| Programming language \|
	\| `embedding_model` \| str \| `"text-embedding-3-small"` \| Model for code embeddings \|

	### Database Parameters

	\| Parameter \| Type \| Default \| Description \|
	\|-----------\|------\|---------\|-------------\|
	\| `num_islands` \| int \| 2 \| Number of evolutionary islands \|
	\| `archive_size` \| int \| 20 \| Size of elite solution archive \|
	\| `num_archive_inspirations` \| int \| 4 \| Solutions drawn from archive \|
	\| `num_top_k_inspirations` \| int \| 2 \| Solutions from current generation \|
	\| `exploitation_ratio` \| float \| 0.2 \| Balance between exploitation/exploration \|
	\| `elite_selection_ratio` \| float \| 0.3 \| Fraction of elites for selection \|
	\| `migration_interval` \| int \| 10 \| Generations between island migrations \|
	\| `migration_rate` \| float \| 0.1 \| Fraction of population migrated \|
	\| `island_elitism` \| bool \| true \| Preserve elites per island \|

	### Resource Parameters

	\| Parameter \| Type \| Default \| Description \|
	\|-----------\|------\|---------\|-------------\|
	\| `time` \| str \| `"01:00:00"` \| Maximum job runtime (HH:MM:SS) \|
	\| `cpus` \| int \| 4 \| CPU cores per job \|
	\| `gpus` \| int \| 0 \| GPUs per job \|
	\| `mem` \| str \| `"8G"` \| Memory per job \|
	\| `conda_env` \| str \| `"shinka"` \| Conda environment name \|
	\| `modules` \| list \| `[]` \| Environment modules to load \|

	## Pre-configured Variants

	Shinka uses [Hydra](https://hydra.cc/) for flexible, hierarchical configuration management. The system is designed around composable configuration files that can be mixed and matched to create different experimental setups.

	Variants provide pre-configured combinations of settings for common use cases:

	### Circle Packing Example
	```yaml
	# configs/variant/circle_packing_example.yaml
	defaults:
	- override /database@_global_: island_large
	- override /evolution@_global_: large_budget
	- override /task@_global_: circle_packing
	- override /cluster@_global_: local
	- _self_

	variant_suffix: "_example"
	```

	### Agent Design Example
	```yaml
	# configs/variant/agent_design_example.yaml
	defaults:
	- override /database@_global_: island_medium
	- override /evolution@_global_: medium_budget
	- override /task@_global_: agent_design
	- override /cluster@_global_: local
	- _self_

	evo_config:
	num_generations: 15

	variant_suffix: "_agent_example"
	```

	## Configuration Structure

	```
	configs/
	├── config.yaml # Main config file with defaults
	├── cluster/ # Execution environments
	│ ├── local.yaml # Local execution
	│ ├── gcp.yaml # Google Cloud Platform
	│ └── remote.yaml # Remote Slurm clusters
	├── database/ # Evolution database settings
	│ ├── island_small.yaml # Small-scale evolution (2 islands)
	│ ├── island_medium.yaml# Medium-scale evolution (4 islands)
	│ └── island_large.yaml # Large-scale evolution (8+ islands)
	├── evolution/ # Evolution parameters
	│ ├── small_budget.yaml # Few generations, quick runs
	│ ├── medium_budget.yaml# Moderate computational budget
	│ └── large_budget.yaml # Extensive evolution runs
	├── task/ # Problem definitions
	│ ├── circle_packing.yaml
	│ ├── agent_design.yaml
	│ ├── bbo_search.yaml
	│ ├── cifar10.yaml
	│ ├── cuda_optim.yaml
	│ ├── mad_moe.yaml
	│ └── novelty_generator.yaml
	└── variant/ # Pre-configured combinations
	├── circle_packing_example.yaml
	├── agent_design_example.yaml
	├── mad_moe_example.yaml
	└── default.yaml
	```

	## Creating Custom Configurations

	### Method 1: Custom Variant File

	Create a new variant file combining existing components:

	```yaml
	# configs/variant/my_custom_variant.yaml
	defaults:
	- override /database@_global_: island_small
	- override /evolution@_global_: small_budget
	- override /task@_global_: my_task
	- override /cluster@_global_: local
	- _self_

	# Override specific parameters
	evo_config:
	num_generations: 25
	max_parallel_jobs: 2

	db_config:
	archive_size: 30
	migration_interval: 5

	variant_suffix: "_custom"
	```

	Launch with:
	```bash
	shinka_launch variant=my_custom_variant
	```

	### Method 2: Command Line Overrides

	Override parameters directly on the command line:

	```bash
	shinka_launch \
	task=circle_packing \
	database=island_large \
	evolution=medium_budget \
	cluster=local \
	evo_config.num_generations=50 \
	evo_config.max_parallel_jobs=4 \
	db_config.num_islands=6 \
	variant_suffix="_custom_run"
	```

	### Method 3: Custom Task Configuration

	Create a new task configuration:

	```yaml
	# configs/task/my_optimization_task.yaml
	evaluate_function:
	_target_: examples.my_optimization.evaluate.main
	program_path: ???
	results_dir: ???

	distributed_job_config:
	_target_: shinka.launch.LocalJobConfig
	eval_program_path: "shinka/utils/eval_hydra.py"

	evo_config:
	task_sys_msg: \|
	You are an expert optimization researcher working on [specific problem].

	Key insights to consider:
	1. [Domain-specific insight 1]
	2. [Domain-specific insight 2]
	3. [Domain-specific insight 3]

	Focus on [specific optimization goals].
	language: "python"
	init_program_path: "examples/my_optimization/initial.py"
	job_type: "local"

	exp_name: "shinka_my_optimization"
	```

	## Advanced Configuration Patterns

	### Multi-Model Evolution

	Use multiple LLM models with different strengths:

	```yaml
	evo_config:
	llm_models:
	- "azure-gpt-4.1" # Strong reasoning
	- "claude-3-sonnet" # Good at code
	- "azure-gpt-4o-mini" # Fast iterations

	# Optional: Dynamic model selection
	llm_dynamic_selection:
	strategy: "performance_based"
	window_size: 10
	```

	## Configuration Examples

	### Quick Prototyping Setup
	```yaml
	# Fast iteration for development
	defaults:
	- override /database@_global_: island_small
	- override /evolution@_global_: small_budget
	- override /cluster@_global_: local

	evo_config:
	num_generations: 5
	max_parallel_jobs: 1

	db_config:
	num_islands: 1
	archive_size: 10

	variant_suffix: "_prototype"
	```

	### Production Research Setup
	```yaml
	# Large-scale research experiment
	defaults:
	- override /database@_global_: island_large
	- override /evolution@_global_: large_budget
	- override /cluster@_global_: remote

	evo_config:
	num_generations: 100
	max_parallel_jobs: 8

	db_config:
	num_islands: 8
	archive_size: 50
	migration_interval: 5

	variant_suffix: "_production"
	```

	### Multi-Task Comparison
	```yaml
	# Configuration for comparing across tasks
	defaults:
	- override /database@_global_: island_medium
	- override /evolution@_global_: medium_budget
	- override /cluster@_global_: local

	# Standardized parameters for fair comparison
	evo_config:
	num_generations: 30
	max_parallel_jobs: 2
	llm_models: ["azure-gpt-4.1"]

	db_config:
	num_islands: 4
	archive_size: 25

	variant_suffix: "_comparison"
	```

	## Configuration Best Practices

	### 1. Start Small, Scale Up
	- Begin with `island_small` and `small_budget` configurations
	- Increase complexity as you understand the problem better

	### 2. Use Meaningful Variant Suffixes
	- Include key parameters in the suffix: `_gen50_islands4_gpt4`
	- This helps identify experiments in results directories

	### 3. Document Custom Configurations
	- Add comments explaining parameter choices
	- Include expected runtime and resource usage

	### 4. Version Control Configurations
	- Keep variant files in version control
	- Tag configurations used for important results

	### 5. Monitor Resource Usage
	- Start with conservative resource allocations
	- Monitor actual usage and adjust accordingly

	For more examples and detailed parameter explanations, see the configuration files in the `configs/` directory and the [Getting Started Guide](getting_started.md).