# Shinka Configuration Guide ⚙️ This guide covers the comprehensive configuration system in Shinka, including all parameters, file structures, and advanced configuration patterns. ## Table of Contents 1. [Core Configuration Components](#core-configuration-components) 2. [Configuration Parameters](#configuration-parameters) 3. [Pre-configured Variants](#pre-configured-variants) 4. [Configuration Structure](#configuration-structure) 5. [Creating Custom Configurations](#creating-custom-configurations) 6. [Advanced Configuration Patterns](#advanced-configuration-patterns) 7. [Configuration Examples](#configuration-examples) 8. [Configuration Best Practices](#configuration-best-practices) ## Core Configuration Components ### 1. Evolution Config (`evo_config`) Controls the core evolutionary algorithm parameters: ```yaml evo_config: _target_: shinka.core.EvolutionConfig num_generations: 20 # Number of evolution generations max_parallel_jobs: 1 # Maximum parallel evaluations max_patch_attempts: 10 # Max attempts to generate valid patches # LLM Configuration llm_models: # List of LLM models for mutations - "azure-gpt-4.1" llm_dynamic_selection: null # Dynamic model selection strategy embedding_model: "text-embedding-3-small" # Patch Configuration patch_types: # Types of code modifications - "diff" # Diff-based patches - "full" # Full code replacement patch_type_probs: # Probabilities for each patch type - 0.5 - 0.5 # Task Configuration language: "python" # Programming language init_program_path: "???" # Path to initial program task_sys_msg: "???" # System message for LLM job_type: "local" # Job execution type results_dir: ${output_dir} # Results directory ``` ### 2. Database Config (`db_config`) Manages the evolutionary database and island topology: ```yaml db_config: _target_: shinka.database.DatabaseConfig db_path: "evolution_db.sqlite" # SQLite database path # Island Configuration num_islands: 2 # Number of evolutionary islands island_elitism: true # Enable elite preservation per island # Archive Configuration archive_size: 20 # Size of elite solution archive num_archive_inspirations: 4 # Solutions drawn from archive num_top_k_inspirations: 2 # Solutions from current generation # Selection and Migration exploitation_ratio: 0.2 # Exploitation vs exploration balance elite_selection_ratio: 0.3 # Fraction of elites for selection migration_interval: 10 # Generations between migrations migration_rate: 0.1 # Fraction of population migrated ``` ### 3. Job Config (`job_config`) Defines the execution environment and resource requirements: #### Local Execution ```yaml job_config: _target_: shinka.launch.LocalJobConfig eval_program_path: "shinka/evaluate.py" ``` #### Slurm Cluster Execution ```yaml job_config: _target_: shinka.launch.SlurmCondaJobConfig modules: # Environment modules - "cuda/12.4" - "cudnn/8.9.7" - "hpcx/2.20" eval_program_path: "shinka/utils/eval_hydra.py" conda_env: "shinka" # Conda environment name time: "01:00:00" # Maximum job runtime cpus: 4 # CPU cores per job gpus: 1 # GPUs per job mem: "16G" # Memory per job ``` ### 4. Task Config Defines problem-specific settings and evaluation functions: ```yaml # Task-specific evaluation function evaluate_function: _target_: examples.my_task.evaluate.main program_path: ??? # Filled by runner results_dir: ??? # Filled by runner # Job configuration for this task distributed_job_config: _target_: shinka.launch.SlurmCondaJobConfig # ... resource requirements ... # Evolution settings specific to this task evo_config: task_sys_msg: | You are an expert in [domain]. Key insights: [domain knowledge] language: "python" init_program_path: "examples/my_task/initial.py" job_type: "slurm_conda" exp_name: "shinka_my_task" ``` ## Configuration Parameters ### Evolution Parameters | Parameter | Type | Default | Description | |-----------|------|---------|-------------| | `num_generations` | int | 20 | Number of evolutionary generations | | `max_parallel_jobs` | int | 1 | Maximum concurrent evaluations | | `max_patch_attempts` | int | 10 | Maximum attempts to generate valid patches | | `llm_models` | list | `["azure-gpt-4.1"]` | LLM models for mutations | | `patch_types` | list | `["diff", "full"]` | Types of code modifications | | `patch_type_probs` | list | `[0.5, 0.5]` | Probabilities for patch types | | `language` | str | `"python"` | Programming language | | `embedding_model` | str | `"text-embedding-3-small"` | Model for code embeddings | ### Database Parameters | Parameter | Type | Default | Description | |-----------|------|---------|-------------| | `num_islands` | int | 2 | Number of evolutionary islands | | `archive_size` | int | 20 | Size of elite solution archive | | `num_archive_inspirations` | int | 4 | Solutions drawn from archive | | `num_top_k_inspirations` | int | 2 | Solutions from current generation | | `exploitation_ratio` | float | 0.2 | Balance between exploitation/exploration | | `elite_selection_ratio` | float | 0.3 | Fraction of elites for selection | | `migration_interval` | int | 10 | Generations between island migrations | | `migration_rate` | float | 0.1 | Fraction of population migrated | | `island_elitism` | bool | true | Preserve elites per island | ### Resource Parameters | Parameter | Type | Default | Description | |-----------|------|---------|-------------| | `time` | str | `"01:00:00"` | Maximum job runtime (HH:MM:SS) | | `cpus` | int | 4 | CPU cores per job | | `gpus` | int | 0 | GPUs per job | | `mem` | str | `"8G"` | Memory per job | | `conda_env` | str | `"shinka"` | Conda environment name | | `modules` | list | `[]` | Environment modules to load | ## Pre-configured Variants Shinka uses [Hydra](https://hydra.cc/) for flexible, hierarchical configuration management. The system is designed around composable configuration files that can be mixed and matched to create different experimental setups. Variants provide pre-configured combinations of settings for common use cases: ### Circle Packing Example ```yaml # configs/variant/circle_packing_example.yaml defaults: - override /database@_global_: island_large - override /evolution@_global_: large_budget - override /task@_global_: circle_packing - override /cluster@_global_: local - _self_ variant_suffix: "_example" ``` ### Agent Design Example ```yaml # configs/variant/agent_design_example.yaml defaults: - override /database@_global_: island_medium - override /evolution@_global_: medium_budget - override /task@_global_: agent_design - override /cluster@_global_: local - _self_ evo_config: num_generations: 15 variant_suffix: "_agent_example" ``` ## Configuration Structure ``` configs/ ├── config.yaml # Main config file with defaults ├── cluster/ # Execution environments │ ├── local.yaml # Local execution │ ├── gcp.yaml # Google Cloud Platform │ └── remote.yaml # Remote Slurm clusters ├── database/ # Evolution database settings │ ├── island_small.yaml # Small-scale evolution (2 islands) │ ├── island_medium.yaml# Medium-scale evolution (4 islands) │ └── island_large.yaml # Large-scale evolution (8+ islands) ├── evolution/ # Evolution parameters │ ├── small_budget.yaml # Few generations, quick runs │ ├── medium_budget.yaml# Moderate computational budget │ └── large_budget.yaml # Extensive evolution runs ├── task/ # Problem definitions │ ├── circle_packing.yaml │ ├── agent_design.yaml │ ├── bbo_search.yaml │ ├── cifar10.yaml │ ├── cuda_optim.yaml │ ├── mad_moe.yaml │ └── novelty_generator.yaml └── variant/ # Pre-configured combinations ├── circle_packing_example.yaml ├── agent_design_example.yaml ├── mad_moe_example.yaml └── default.yaml ``` ## Creating Custom Configurations ### Method 1: Custom Variant File Create a new variant file combining existing components: ```yaml # configs/variant/my_custom_variant.yaml defaults: - override /database@_global_: island_small - override /evolution@_global_: small_budget - override /task@_global_: my_task - override /cluster@_global_: local - _self_ # Override specific parameters evo_config: num_generations: 25 max_parallel_jobs: 2 db_config: archive_size: 30 migration_interval: 5 variant_suffix: "_custom" ``` Launch with: ```bash shinka_launch variant=my_custom_variant ``` ### Method 2: Command Line Overrides Override parameters directly on the command line: ```bash shinka_launch \ task=circle_packing \ database=island_large \ evolution=medium_budget \ cluster=local \ evo_config.num_generations=50 \ evo_config.max_parallel_jobs=4 \ db_config.num_islands=6 \ variant_suffix="_custom_run" ``` ### Method 3: Custom Task Configuration Create a new task configuration: ```yaml # configs/task/my_optimization_task.yaml evaluate_function: _target_: examples.my_optimization.evaluate.main program_path: ??? results_dir: ??? distributed_job_config: _target_: shinka.launch.LocalJobConfig eval_program_path: "shinka/utils/eval_hydra.py" evo_config: task_sys_msg: | You are an expert optimization researcher working on [specific problem]. Key insights to consider: 1. [Domain-specific insight 1] 2. [Domain-specific insight 2] 3. [Domain-specific insight 3] Focus on [specific optimization goals]. language: "python" init_program_path: "examples/my_optimization/initial.py" job_type: "local" exp_name: "shinka_my_optimization" ``` ## Advanced Configuration Patterns ### Multi-Model Evolution Use multiple LLM models with different strengths: ```yaml evo_config: llm_models: - "azure-gpt-4.1" # Strong reasoning - "claude-3-sonnet" # Good at code - "azure-gpt-4o-mini" # Fast iterations # Optional: Dynamic model selection llm_dynamic_selection: strategy: "performance_based" window_size: 10 ``` ## Configuration Examples ### Quick Prototyping Setup ```yaml # Fast iteration for development defaults: - override /database@_global_: island_small - override /evolution@_global_: small_budget - override /cluster@_global_: local evo_config: num_generations: 5 max_parallel_jobs: 1 db_config: num_islands: 1 archive_size: 10 variant_suffix: "_prototype" ``` ### Production Research Setup ```yaml # Large-scale research experiment defaults: - override /database@_global_: island_large - override /evolution@_global_: large_budget - override /cluster@_global_: remote evo_config: num_generations: 100 max_parallel_jobs: 8 db_config: num_islands: 8 archive_size: 50 migration_interval: 5 variant_suffix: "_production" ``` ### Multi-Task Comparison ```yaml # Configuration for comparing across tasks defaults: - override /database@_global_: island_medium - override /evolution@_global_: medium_budget - override /cluster@_global_: local # Standardized parameters for fair comparison evo_config: num_generations: 30 max_parallel_jobs: 2 llm_models: ["azure-gpt-4.1"] db_config: num_islands: 4 archive_size: 25 variant_suffix: "_comparison" ``` ## Configuration Best Practices ### 1. Start Small, Scale Up - Begin with `island_small` and `small_budget` configurations - Increase complexity as you understand the problem better ### 2. Use Meaningful Variant Suffixes - Include key parameters in the suffix: `_gen50_islands4_gpt4` - This helps identify experiments in results directories ### 3. Document Custom Configurations - Add comments explaining parameter choices - Include expected runtime and resource usage ### 4. Version Control Configurations - Keep variant files in version control - Tag configurations used for important results ### 5. Monitor Resource Usage - Start with conservative resource allocations - Monitor actual usage and adjust accordingly For more examples and detailed parameter explanations, see the configuration files in the `configs/` directory and the [Getting Started Guide](getting_started.md).