File size: 29,885 Bytes

6f90f5c

{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "a8d3c8de",
   "metadata": {},
   "source": [
    "# `shinka` Tutorial — Interactive LLM-Driven Evolution 🧬 [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/SakanaAI/ShinkaEvolve/blob/main/examples/shinka_tutorial.ipynb)\n",
    "\n",
    "\n",
    "This notebook gives a **hands-on** overview of `shinka`, a framework that combines LLMs with evolutionary search. We will focus on the **circle packing** and **novelty generator** examples, explaining the core functionality and how to quickly get started.\n",
    "\n",
    "We will cover:\n",
    "\n",
    "* setting up a `shinka` environment with API keys\n",
    "* giving an overview of the main `shinka` components (islands, archive, mutations, meta-learning)\n",
    "* writing a fully custom `shinka` configuration and running experiments\n",
    "* inspecting results and understanding how the evolutionary process unfolded\n",
    "* using the preconfigured `shinka` configs with the launcher and within Python\n",
    "* customizing the components of `shinka` and comparing results\n",
    "* obtaining fully interactive reports with the `shinka` WebUI"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "aa0e2fcd",
   "metadata": {},
   "source": [
    "## Google Colab setup (optional)\n",
    "Can be skipped if running locally. Installs the Shinka package inside a Colab instance."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9565aace",
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import sys\n",
    "import subprocess\n",
    "from pathlib import Path\n",
    "\n",
    "# detect colab\n",
    "try:\n",
    "    import google.colab  # type: ignore\n",
    "\n",
    "    in_colab = True\n",
    "except Exception:\n",
    "    in_colab = False\n",
    "\n",
    "repo_name = \"ShinkaEvolve\"\n",
    "ssh_url = \"git@github.com:SakanaAI/ShinkaEvolve.git\"\n",
    "https_url = \"https://github.com/SakanaAI/ShinkaEvolve.git\"\n",
    "\n",
    "cwd = Path.cwd()\n",
    "repo_root = cwd\n",
    "\n",
    "if not (repo_root / repo_name).exists():\n",
    "    for parent in cwd.resolve().parents:\n",
    "        if (parent / repo_name).exists():\n",
    "            repo_root = parent\n",
    "            break\n",
    "\n",
    "if in_colab:\n",
    "    root_candidate = Path(\"/content\") / repo_name\n",
    "    if not root_candidate.exists():\n",
    "        print(\"cloning repository...\")\n",
    "        subprocess.check_call(\n",
    "            [\"git\", \"clone\", \"--depth\", \"1\", https_url, str(root_candidate)]\n",
    "        )\n",
    "    repo_root = root_candidate\n",
    "    os.chdir(repo_root)\n",
    "    print(\"repo_root:\", repo_root)\n",
    "    print(\"installing package and deps...\")\n",
    "    subprocess.check_call([sys.executable, \"-m\", \"pip\", \"install\", \"-e\", \".\"])\n",
    "else:\n",
    "    print(\"repo_root:\", repo_root)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "da186c96",
   "metadata": {},
   "source": [
    "## Environment setup\n",
    "\n",
    "This cell ensures the repo is importable and optionally loads `.env`\n",
    "credentials if `python-dotenv` is available. When using Google Colab, please insert your keys in the `additional_llm_keys` list."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "38185842",
   "metadata": {},
   "outputs": [],
   "source": [
    "import sys\n",
    "import os\n",
    "from pathlib import Path\n",
    "\n",
    "repo_root = Path.cwd()\n",
    "if not (repo_root / \"shinka\").exists():\n",
    "    for parent in Path.cwd().resolve().parents:\n",
    "        if (parent / \"shinka\").exists():\n",
    "            repo_root = parent\n",
    "            break\n",
    "\n",
    "sys.path.insert(0, str(repo_root))\n",
    "print(\"repo_root:\", repo_root)\n",
    "\n",
    "env_path = repo_root / \".env\"\n",
    "if env_path.exists():\n",
    "    try:\n",
    "        from dotenv import load_dotenv\n",
    "\n",
    "        load_dotenv(env_path)\n",
    "        print(\"loaded .env\")\n",
    "    except Exception as e:\n",
    "        print(\"could not load .env:\", e)\n",
    "else:\n",
    "    print(\".env not found, continuing without it)\")\n",
    "\n",
    "# override with your own keys not present in .env file\n",
    "additional_llm_keys = [\n",
    "    # \"OPENAI_API_KEY=...\",\n",
    "    # \"GEMINI_API_KEY=...\",\n",
    "    # \"OPENROUTER_API_KEY=...\",\n",
    "    # \"DEEPSEEK_API_KEY=...\",\n",
    "    # \"AZURE_OPENAI_API_KEY=...\",\n",
    "    # \"ANTHROPIC_API_KEY=...\",\n",
    "]\n",
    "\n",
    "for kv in additional_llm_keys:\n",
    "    if \"=\" in kv:\n",
    "        key, val = kv.split(\"=\", 1)\n",
    "        if val:\n",
    "            os.environ[key] = val\n",
    "            print(f\"set {key}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "387f9817",
   "metadata": {},
   "source": [
    "## 2. Shinka overview\n",
    "\n",
    "**Population & archive**\n",
    "- `ProgramDatabase` stores *all* candidates, scores, and metadata.\n",
    "- An **archive** keeps strong and diverse solutions across **islands**.\n",
    "- Islands evolve in parallel to avoid premature convergence.\n",
    "\n",
    "**Mutations**\n",
    "- `patch_types`: `diff` (targeted edits), `full` (rewrite), `cross`\n",
    "  (combine two parents).\n",
    "- LLMs are prompted with task context and inspiration from the archive.\n",
    "\n",
    "**Novelty**\n",
    "- Optional semantic checks with embeddings to avoid duplicating ideas:\n",
    "  `code_embed_sim_threshold`, `embedding_model`, `novelty_llm_models`.\n",
    "\n",
    "**Meta‑learning**\n",
    "- Periodic reviews (`meta_rec_interval`) extract patterns into a scratchpad.\n",
    "- Arm selection for multiple LLMs via bandits (e.g., `UCB1`).\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "185bffd6",
   "metadata": {},
   "source": [
    "## Writing a fully custom shinka configuration and running experiments on circle packing\n",
    "\n",
    "We mirror the example in `examples/circle_packing/run_evo.py` but default to\n",
    "a **small number of generations** and only using smaller models to keep this\n",
    "tutorial quick and inexpensive.\n",
    "\n",
    "Feel free to play around and cutomized each component!\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "378bc430",
   "metadata": {},
   "outputs": [],
   "source": [
    "# standard shinka imports\n",
    "import os\n",
    "import datetime as dt\n",
    "from time import perf_counter\n",
    "from pathlib import Path\n",
    "from shinka.core import EvolutionRunner, EvolutionConfig\n",
    "from shinka.database import DatabaseConfig\n",
    "from shinka.launch import LocalJobConfig\n",
    "\n",
    "# default circle packing message - can be customized to your liking!\n",
    "search_task_sys_msg = (\n",
    "    \"You are an expert mathematician specializing in circle packing problems \"\n",
    "    \"and computational geometry. The best known result for the sum of radii \"\n",
    "    \"when packing 26 circles in a unit square is 2.635.\\n\\n\"\n",
    "    \"Key directions to explore:\\n\"\n",
    "    \"1. The optimal arrangement likely involves variable-sized circles\\n\"\n",
    "    \"2. A pure hexagonal arrangement may not be optimal due to edge effects\\n\"\n",
    "    \"3. The densest known circle packings often use a hybrid approach\\n\"\n",
    "    \"4. The optimization routine is critically important - simple physics-\"\n",
    "    \"based models with carefully tuned parameters\\n\"\n",
    "    \"5. Consider strategic placement of circles at square corners and edges\\n\"\n",
    "    \"6. Place larger circles near the center and smaller near the edges\\n\"\n",
    "    \"7. Math literature suggests special arrangements for specific n\\n\"\n",
    "    \"8. You can use scipy.optimize to refine radii given fixed centers and \"\n",
    "    \"constraints\\n\\n\"\n",
    "    \"Be creative and try to find a new solution.\"\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "45421c6c",
   "metadata": {},
   "outputs": [],
   "source": [
    "# pick llms based on available keys\n",
    "llm_models = []\n",
    "if os.getenv(\"GEMINI_API_KEY\"):\n",
    "    llm_models.append(\"gemini-2.5-flash\")\n",
    "if os.getenv(\"OPENAI_API_KEY\"):\n",
    "    llm_models.append(\"gpt-5-mini\")\n",
    "if os.getenv(\"ANTHROPIC_API_KEY\"):\n",
    "    llm_models.append(\"claude-3-7-sonnet\")\n",
    "elif os.getenv(\"AWS_ACCESS_KEY_ID\") and os.getenv(\"AWS_REGION\"):\n",
    "    llm_models.append(\"bedrock/us.anthropic.claude-sonnet-4-20250514-v1:0\")\n",
    "if not llm_models:\n",
    "    llm_models = [\"gpt-5-mini\"]  # fallback if no keys detected\n",
    "\n",
    "# pick embedding model based on available keys\n",
    "embedding_model_name = \"\"\n",
    "if os.getenv(\"GEMINI_API_KEY\"):\n",
    "    embedding_model_name = \"gemini-embedding-001\"\n",
    "elif os.getenv(\"OPENAI_API_KEY\"):\n",
    "    embedding_model_name = \"text-embedding-3-small\"\n",
    "else:\n",
    "    embedding_model_name = \"text-embedding-3-small\"\n",
    "print(f\"✅ Embedding model selected: {embedding_model_name}\")\n",
    "\n",
    "\n",
    "# unique experiment directory\n",
    "timestamp = dt.datetime.now().strftime(\"%Y%m%d_%H%M%S\")\n",
    "run_tag = f\"{timestamp}_weighted_fast\"\n",
    "\n",
    "evo_config = EvolutionConfig(\n",
    "    task_sys_msg=search_task_sys_msg,\n",
    "    # use all three mutation patch types and set prob for all\n",
    "    patch_types=[\"diff\", \"full\", \"cross\"],\n",
    "    patch_type_probs=[0.6, 0.3, 0.1],\n",
    "    # runs for 20 generations in sequence\n",
    "    num_generations=20,\n",
    "    max_parallel_jobs=1,  # only one job at a time\n",
    "    max_patch_resamples=3,  # resample 3 times if patch fails\n",
    "    max_patch_attempts=3,  # try 3 times to fix patch via reflection\n",
    "    # runs locally using the local environment (no loading of conda/docker)\n",
    "    job_type=\"local\",\n",
    "    language=\"python\",\n",
    "    # set LLMs for ensemble\n",
    "    llm_models=llm_models,\n",
    "    llm_kwargs=dict(\n",
    "        temperatures=[0.0, 0.5],  # uniform temperature sampling\n",
    "        max_tokens=16384,\n",
    "    ),\n",
    "    # no meta scratchpad\n",
    "    meta_rec_interval=None,  # e.g. every 5 generations\n",
    "    meta_llm_models=None,  # e.g. [\"gpt-4.1\"]\n",
    "    meta_llm_kwargs={},  # same as above\n",
    "    # Set path to initial program relative to repo root\n",
    "    init_program_path=\"initial.py\",\n",
    "    results_dir=f\"results/circle_packing/{run_tag}\",\n",
    "    # each mutation has three chances of providing a novel solution\n",
    "    max_novelty_attempts=3,\n",
    "    # ensemble llm selection among candidates based on past performance\n",
    "    llm_dynamic_selection=None,  # e.g. \"ucb1\"\n",
    "    # set embedding model\n",
    "    embedding_model=embedding_model_name,\n",
    ")\n",
    "\n",
    "db_config = DatabaseConfig(\n",
    "    db_path=\"evolution_db.sqlite\",\n",
    "    num_islands=2,\n",
    "    archive_size=20,\n",
    "    elite_selection_ratio=0.3,\n",
    "    num_archive_inspirations=4,\n",
    "    num_top_k_inspirations=2,\n",
    "    migration_interval=10,\n",
    "    migration_rate=0.1,\n",
    "    island_elitism=True,\n",
    "    enforce_island_separation=True,\n",
    "    parent_selection_strategy=\"weighted\",\n",
    "    parent_selection_lambda=10.0,\n",
    "    \n",
    ")\n",
    "\n",
    "job_config = LocalJobConfig(eval_program_path=\"evaluate.py\")\n",
    "\n",
    "print(\"llm_models:\", llm_models)\n",
    "print(\"embedding_model:\", embedding_model_name)\n",
    "print(\"results_dir:\", evo_config.results_dir)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "74e75305",
   "metadata": {},
   "source": [
    "## Running a minimal circle packing experiment\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1ab09cc8",
   "metadata": {},
   "outputs": [],
   "source": [
    "circle_packing_path = repo_root / \"examples\" / \"circle_packing\"\n",
    "if os.getcwd() != str(circle_packing_path):\n",
    "    os.chdir(circle_packing_path)\n",
    "    print(\"changed working dir to:\", circle_packing_path)\n",
    "\n",
    "runner = EvolutionRunner(\n",
    "    evo_config=evo_config,\n",
    "    job_config=job_config,\n",
    "    db_config=db_config,\n",
    "    verbose=True,\n",
    ")\n",
    "\n",
    "tic = perf_counter()\n",
    "runner.run()\n",
    "toc = perf_counter()\n",
    "\n",
    "print(\"completed in\", round(toc - tic, 2), \"s\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "474e88bb",
   "metadata": {},
   "source": [
    "### Inspecting the results\n",
    "\n",
    "Loading and plotting the evolution trajectory and the lineage tree of the best solution.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8b79dd5a",
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "\n",
    "from shinka.utils import load_programs_to_df\n",
    "from shinka.plots import plot_lineage_tree, plot_improvement\n",
    "\n",
    "\n",
    "results_root = Path(runner.results_dir)\n",
    "\n",
    "task_name = \"Circle Packing with shinka\"\n",
    "if os.path.exists(f\"{results_root}/{results_root}/evolution_db.sqlite\"):\n",
    "    db_root = results_root / results_root\n",
    "else:\n",
    "    db_root = results_root\n",
    "\n",
    "df = load_programs_to_df(f\"{db_root}/evolution_db.sqlite\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c9edf503",
   "metadata": {},
   "outputs": [],
   "source": [
    "fig, axs = plt.subplots(1, 2, figsize=(30, 10), gridspec_kw={\"width_ratios\": [1, 1.5]})\n",
    "\n",
    "fig.suptitle(\n",
    "    f\"{task_name}\",\n",
    "    fontsize=30,\n",
    "    weight=\"bold\",\n",
    "    y=1,\n",
    ")\n",
    "\n",
    "plot_improvement(df, f\"{task_name}: Improvements\", fig, axs[0])\n",
    "plot_lineage_tree(df, f\"{task_name}: Evolution Tree\", fig, axs[1])\n",
    "\n",
    "plt.tight_layout()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "613c76ef",
   "metadata": {},
   "source": [
    "### Comparing parent selection strategies\n",
    "\n",
    "Lets ablate one of the critical components of `shinka`, the parent selection \n",
    "strategy and compare the results\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ef4d9a63",
   "metadata": {},
   "outputs": [],
   "source": [
    "import copy\n",
    "\n",
    "db_config_uniform = copy.deepcopy(db_config)\n",
    "db_config_uniform.parent_selection_strategy = \"uniform\"\n",
    "\n",
    "evo_config_uniform = copy.deepcopy(evo_config)\n",
    "timestamp = dt.datetime.now().strftime(\"%Y%m%d_%H%M%S\")\n",
    "run_tag = f\"{timestamp}_unweighted_fast\"\n",
    "evo_config_uniform.results_dir = run_tag\n",
    "\n",
    "\n",
    "circle_packing_path = repo_root / \"examples\" / \"circle_packing\"\n",
    "if os.getcwd() != str(circle_packing_path):\n",
    "    os.chdir(circle_packing_path)\n",
    "    print(\"changed working dir to:\", circle_packing_path)\n",
    "\n",
    "runner = EvolutionRunner(\n",
    "    evo_config=evo_config,\n",
    "    job_config=job_config,\n",
    "    db_config=db_config,\n",
    "    verbose=True,\n",
    ")\n",
    "\n",
    "tic = perf_counter()\n",
    "runner.run()\n",
    "toc = perf_counter()\n",
    "\n",
    "print(\"completed in\", round(toc - tic, 2), \"s\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "276ed8e8",
   "metadata": {},
   "outputs": [],
   "source": [
    "results_root_unweighted = Path(runner.results_dir)\n",
    "if os.path.exists(\n",
    "    f\"{results_root_unweighted}/{results_root_unweighted}/evolution_db.sqlite\"\n",
    "):\n",
    "    db_root = results_root_unweighted / results_root_unweighted\n",
    "else:\n",
    "    db_root = results_root_unweighted\n",
    "\n",
    "task_name = \"Shinka w/o parent weighting\"\n",
    "df = load_programs_to_df(f\"{db_root}/evolution_db.sqlite\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c13d0373",
   "metadata": {},
   "outputs": [],
   "source": [
    "fig, axs = plt.subplots(1, 2, figsize=(30, 10), gridspec_kw={\"width_ratios\": [1, 1.5]})\n",
    "\n",
    "fig.suptitle(\n",
    "    f\"{task_name}\",\n",
    "    fontsize=30,\n",
    "    weight=\"bold\",\n",
    "    y=1,\n",
    ")\n",
    "\n",
    "plot_improvement(df, f\"{task_name}: Improvements\", fig, axs[0])\n",
    "plot_lineage_tree(df, f\"{task_name}: Evolution Tree\", fig, axs[1])\n",
    "\n",
    "plt.tight_layout()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "15a1fa6c",
   "metadata": {},
   "source": [
    "## Launcher and preconfigured `shinka` configs\n",
    "\n",
    "\n",
    "`shinka` provides many effective preset configurations to run experiments.\n",
    "You can mix and match any of the task and `shinka` configurations using the \n",
    "`shinka` launcher:\n",
    "\n",
    "```bash\n",
    "shinka_launch \\\n",
    "    task=circle_packing \\\n",
    "    database=island_large \\\n",
    "    evolution=small_budget \\\n",
    "    cluster=local \\\n",
    "    evo_config.num_generations=10 \\\n",
    "    db_config.num_archive_inspirations=2 \\\n",
    "    variant_suffix=\"_fast\"\n",
    "```\n",
    "\n",
    "You can also define a new or use an existing experiment variant in \n",
    "`configs/variant/` e.g.,:\n",
    "```bash\n",
    "shinka_launch variant=circle_packing_example\n",
    "```\n",
    "\n",
    "You can also load the presets and use them in conjunction with Python api, with \n",
    " in a few lines of code:\n",
    "\n",
    "```py\n",
    "from shinka.launch.utils import build_cfgs_from_python\n",
    "\n",
    "launcher_args = [\n",
    "    \"variant=circle_packing_example\",\n",
    "]\n",
    "\n",
    "job_cfg, db_cfg, evo_cfg, cfg = build_cfgs_from_python(*launcher_args)\n",
    "\n",
    "evo_runner = EvolutionRunner(\n",
    "    evo_config=evo_cfg,\n",
    "    job_config=job_cfg,\n",
    "    db_config=db_cfg,\n",
    "    verbose=cfg.verbose,\n",
    ")\n",
    "evo_runner.run()\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e50be1eb",
   "metadata": {},
   "source": [
    "## Novelty generator example modifying preconfigured configurations\n",
    "\n",
    "This example showcases how `shinka` can be extended with open-ended  **LLM-as-a-judge evaluation** to go beyond predefined performance metrics. The evaluation function using large language models as critical judges to output a **final novelty score** score quantifying diverse, meaningful, and inspirational are the outputs of an LLM generated function:\n",
    "\n",
    "```py\n",
    "def generate_novelty(rng: int) -> str:\n",
    "    ...\n",
    "```\n",
    "\n",
    "Let's load and customize the preset configurations for this task using the\n",
    "Python api:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2b7667ac",
   "metadata": {},
   "outputs": [],
   "source": [
    "from shinka.utils.utils_hydra import build_cfgs_from_python\n",
    "from shinka.core import EvolutionRunner\n",
    "\n",
    "\n",
    "# command line configs can be overridden both with lists of arguments or a\n",
    "# Python dictionary of keyword arguments that will be converted to the\n",
    "# appropriate arguments\n",
    "\n",
    "launcher_args = [\n",
    "    \"variant=novelty_generator_example\",\n",
    "    \"database=island_small\",\n",
    "    \"evolution=small_budget\",\n",
    "    \"evo_config.num_generations=10\",\n",
    "]\n",
    "\n",
    "launcher_kwargs = {\n",
    "    \"evo_config.llm_models\": [\"gpt-5-mini\"],\n",
    "    \"evaluate_function.llm_judge_names\": [\n",
    "        \"bedrock/us.anthropic.claude-sonnet-4-20250514-v1:0\"\n",
    "    ],\n",
    "}\n",
    "\n",
    "if os.getcwd() != str(repo_root):\n",
    "    os.chdir(repo_root)\n",
    "    print(\"changed working dir to:\", repo_root)\n",
    "\n",
    "job_cfg, db_cfg, evo_cfg, cfg = build_cfgs_from_python(\n",
    "    *launcher_args, **launcher_kwargs\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9de76d8a",
   "metadata": {},
   "outputs": [],
   "source": [
    "evo_runner = EvolutionRunner(\n",
    "    evo_config=evo_cfg,\n",
    "    job_config=job_cfg,\n",
    "    db_config=db_cfg,\n",
    "    verbose=cfg.verbose,\n",
    ")\n",
    "evo_runner.run()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a0190650",
   "metadata": {},
   "source": [
    "### Inpecting results and loading final function"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5c6f9365",
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "\n",
    "from shinka.utils import load_programs_to_df\n",
    "from shinka.plots import plot_lineage_tree, plot_improvement\n",
    "\n",
    "results_root = Path(evo_runner.results_dir)\n",
    "\n",
    "task_name = \"Novelty generator\"\n",
    "if os.path.exists(f\"{results_root}/{results_root}/evolution_db.sqlite\"):\n",
    "    db_root = results_root / results_root\n",
    "else:\n",
    "    db_root = results_root\n",
    "\n",
    "df = load_programs_to_df(f\"{db_root}/evolution_db.sqlite\")\n",
    "fig, axs = plt.subplots(1, 2, figsize=(30, 10), gridspec_kw={\"width_ratios\": [1, 1.5]})\n",
    "\n",
    "fig.suptitle(\n",
    "    f\"{task_name}\",\n",
    "    fontsize=30,\n",
    "    weight=\"bold\",\n",
    "    y=1,\n",
    ")\n",
    "\n",
    "plot_improvement(df, f\"{task_name}: Improvements\", fig, axs[0])\n",
    "plot_lineage_tree(df, f\"{task_name}: Evolution Tree\", fig, axs[1])\n",
    "\n",
    "plt.tight_layout()\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bb756f99",
   "metadata": {},
   "outputs": [],
   "source": [
    "import importlib.util\n",
    "from rich.console import Console\n",
    "\n",
    "console = Console()\n",
    "\n",
    "program_path = results_root / \"best/main.py\"\n",
    "spec = importlib.util.spec_from_file_location(\"program\", program_path)\n",
    "if spec is None or spec.loader is None:\n",
    "    raise ImportError(f\"Could not load module at {program_path}\")\n",
    "\n",
    "module = importlib.util.module_from_spec(spec)\n",
    "spec.loader.exec_module(module)\n",
    "\n",
    "test_inputs = [1, 2, 3]\n",
    "novel_outputs = module.run_experiment(test_inputs)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0f449ff8",
   "metadata": {},
   "source": [
    "### Customizing the novelty generator\n",
    "\n",
    "Let's do some customization of the novelty generator example:\n",
    "- let's provide more explicit guidance to the `shinka` agents with the system prompt\n",
    "- let's only use `full` mutations to incentivize more diversity\n",
    "- let's try using claude instead of gpt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1a89d351",
   "metadata": {},
   "outputs": [],
   "source": [
    "# new system prompt specifically asking the shinka agents to generate aiscii\n",
    "# art with procedural generations\n",
    "new_system_prompt = (\n",
    "    \"Make a python function that takes as input a random integer and produces \"\n",
    "    \"ASCII art that is cool, novel, and visually engaging. The art should be \"\n",
    "    \"generated procedurally, with the random input seed controlling structures, \"\n",
    "    \"patterns, and variations. Depending on its input, each output should be \"\n",
    "    \"diverse from all other outputs produced with different inputs. Please, call \"\n",
    "    'this function \"def generate_novelty(rng: int) -> str\"\\n\\n'\n",
    "    \"Different judges will evaluate how 1) diverse, 2) meaningful, and 3) \"\n",
    "    \"inspirational the generated ASCII art pieces are for different random seeds. \"\n",
    "    \"These three criteria will be used to assign your function a \"\n",
    "    '\"final_novelty_score\" for each judge. Only functions excelling across all '\n",
    "    'three dimensions will achieve a high \"final_novelty_score\".\\n\\n'\n",
    "    \"Now bring out your creativity, focus on procedural ASCII art, and surprise us!\"\n",
    ")\n",
    "\n",
    "launcher_args = [\n",
    "    \"variant=novelty_generator_example\",\n",
    "    \"database=island_small\",\n",
    "    \"evolution=small_budget\",\n",
    "    \"evo_config.num_generations=10\",\n",
    "]\n",
    "\n",
    "launcher_kwargs = {\n",
    "    \"evo_config.llm_models\": [\"bedrock/us.anthropic.claude-sonnet-4-20250514-v1:0\"],\n",
    "    \"evaluate_function.llm_judge_names\": [\n",
    "        \"bedrock/us.anthropic.claude-sonnet-4-20250514-v1:0\"\n",
    "    ],\n",
    "    \"evo_config.patch_types\": [\"full\"],\n",
    "    \"evo_config.patch_type_probs\": [1],\n",
    "}\n",
    "\n",
    "\n",
    "job_cfg, db_cfg, evo_cfg, cfg = build_cfgs_from_python(\n",
    "    *launcher_args, **launcher_kwargs\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "03521bf0",
   "metadata": {},
   "outputs": [],
   "source": [
    "evo_runner = EvolutionRunner(\n",
    "    evo_config=evo_cfg,\n",
    "    job_config=job_cfg,\n",
    "    db_config=db_cfg,\n",
    "    verbose=cfg.verbose,\n",
    ")\n",
    "evo_runner.run()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dc9aa87a",
   "metadata": {},
   "source": [
    "### Inpecting results of our custom implementation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "488344a5",
   "metadata": {},
   "outputs": [],
   "source": [
    "results_root = Path(evo_runner.results_dir)\n",
    "\n",
    "task_name = \"Novelty generator\"\n",
    "if os.path.exists(f\"{results_root}/{results_root}/evolution_db.sqlite\"):\n",
    "    db_root = results_root / results_root\n",
    "else:\n",
    "    db_root = results_root\n",
    "\n",
    "df = load_programs_to_df(f\"{db_root}/evolution_db.sqlite\")\n",
    "fig, axs = plt.subplots(1, 2, figsize=(30, 10), gridspec_kw={\"width_ratios\": [1, 1.5]})\n",
    "\n",
    "fig.suptitle(\n",
    "    f\"{task_name}\",\n",
    "    fontsize=30,\n",
    "    weight=\"bold\",\n",
    "    y=1,\n",
    ")\n",
    "\n",
    "plot_improvement(df, f\"{task_name}: Improvements\", fig, axs[0])\n",
    "plot_lineage_tree(df, f\"{task_name}: Evolution Tree\", fig, axs[1])\n",
    "\n",
    "plt.tight_layout()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "98e38b82",
   "metadata": {},
   "outputs": [],
   "source": [
    "console = Console()\n",
    "\n",
    "program_path = results_root / \"best/main.py\"\n",
    "spec = importlib.util.spec_from_file_location(\"program\", program_path)\n",
    "if spec is None or spec.loader is None:\n",
    "    raise ImportError(f\"Could not load module at {program_path}\")\n",
    "\n",
    "module = importlib.util.module_from_spec(spec)\n",
    "spec.loader.exec_module(module)\n",
    "\n",
    "test_inputs = [1, 2, 3]\n",
    "novel_outputs = module.run_experiment(test_inputs)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1d670e35",
   "metadata": {},
   "source": [
    "## Visualizing your runs with the WebUI for a full report of the evolutionary process\n",
    "\n",
    "### WebUI\n",
    "On the **remote** machine where the run is stored:\n",
    "```bash\n",
    "shinka_visualize --port 8888\n",
    "```\n",
    "\n",
    "On your **local** machine (if **remote** != **local**):\n",
    "```bash\n",
    "ssh -vvv -L 8888:localhost:8888 your_user@slurm-cluster\n",
    "```\n",
    "Then open <http://localhost:8888/> to browse runs.\n",
    "\n",
    "The following cells launch WebUI when using `shinka` on a local setup:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "048c4785",
   "metadata": {},
   "outputs": [],
   "source": [
    "import sys, subprocess, time\n",
    "\n",
    "# start the webui as a background process\n",
    "webui_proc = subprocess.Popen(\n",
    "    [\"shinka_visualize\", \"--port\", \"8888\", \"--open\"],\n",
    "    stdout=subprocess.PIPE,\n",
    "    stderr=subprocess.STDOUT,\n",
    "    text=True,\n",
    ")\n",
    "\n",
    "# wait briefly so server has time to start\n",
    "time.sleep(3)\n",
    "print(\"webui started on http://127.0.0.1:8888\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "97df7213",
   "metadata": {},
   "outputs": [],
   "source": [
    "from IPython.display import IFrame, display\n",
    "\n",
    "display(IFrame(src=\"http://127.0.0.1:8888\", width=\"100%\", height=800))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2fa6dc61",
   "metadata": {},
   "outputs": [],
   "source": [
    "webui_proc.terminate()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "scevo",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}