diff --git "a/BioReason/data/BioReasoning_DataCuration_KEGG.ipynb" "b/BioReason/data/BioReasoning_DataCuration_KEGG.ipynb" new file mode 100644--- /dev/null +++ "b/BioReason/data/BioReasoning_DataCuration_KEGG.ipynb" @@ -0,0 +1,2575 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Genetic Variant Analysis with KEGG Pathway Data\n", + "\n", + "This notebook demonstrates the process of analyzing genetic variants using KEGG pathway data and the Anthropic Claude API. The analysis creates structured reasoning paths explaining the biological mechanisms and disease relationships for genetic variants.\n", + "\n", + "## Overview\n", + "\n", + "The notebook includes functions to:\n", + "1. Load genetic variant data from TSV files\n", + "2. Process variants in batches using the Anthropic API\n", + "3. Generate detailed biological reasoning for each variant\n", + "4. Combine results into a comprehensive dataset\n", + "\n", + "## Requirements\n", + "\n", + "- Python 3.7+\n", + "- anthropic library\n", + "- tqdm for progress tracking\n", + "- Access to Anthropic Claude API\n", + "\n", + "## Data Format\n", + "\n", + "The input TSV file should contain columns for:\n", + "- Var_ID: Variant identifier\n", + "- ENTRY: Gene entry\n", + "- Chr: Chromosome\n", + "- Start: Position\n", + "- RefAllele: Reference allele\n", + "- AltAllele: Alternative allele\n", + "- Network Definition: Pathway information\n", + "- Gene: Gene information (JSON format)\n", + "- Disease: Associated disease (JSON format)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Setup and Installation\n", + "\n", + "Install required packages and set up the environment." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "nPQxSEejKYkk", + "outputId": "d444f6f8-90ca-4f0a-e872-082e04154c7b" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Requirement already satisfied: anthropic in /usr/local/lib/python3.11/dist-packages (0.50.0)\n", + "Requirement already satisfied: anyio<5,>=3.5.0 in /usr/local/lib/python3.11/dist-packages (from anthropic) (4.9.0)\n", + "Requirement already satisfied: distro<2,>=1.7.0 in /usr/local/lib/python3.11/dist-packages (from anthropic) (1.9.0)\n", + "Requirement already satisfied: httpx<1,>=0.25.0 in /usr/local/lib/python3.11/dist-packages (from anthropic) (0.28.1)\n", + "Requirement already satisfied: jiter<1,>=0.4.0 in /usr/local/lib/python3.11/dist-packages (from anthropic) (0.9.0)\n", + "Requirement already satisfied: pydantic<3,>=1.9.0 in /usr/local/lib/python3.11/dist-packages (from anthropic) (2.11.3)\n", + "Requirement already satisfied: sniffio in /usr/local/lib/python3.11/dist-packages (from anthropic) (1.3.1)\n", + "Requirement already satisfied: typing-extensions<5,>=4.10 in /usr/local/lib/python3.11/dist-packages (from anthropic) (4.13.2)\n", + "Requirement already satisfied: idna>=2.8 in /usr/local/lib/python3.11/dist-packages (from anyio<5,>=3.5.0->anthropic) (3.10)\n", + "Requirement already satisfied: certifi in /usr/local/lib/python3.11/dist-packages (from httpx<1,>=0.25.0->anthropic) (2025.1.31)\n", + "Requirement already satisfied: httpcore==1.* in /usr/local/lib/python3.11/dist-packages (from httpx<1,>=0.25.0->anthropic) (1.0.9)\n", + "Requirement already satisfied: h11>=0.16 in /usr/local/lib/python3.11/dist-packages (from httpcore==1.*->httpx<1,>=0.25.0->anthropic) (0.16.0)\n", + "Requirement already satisfied: annotated-types>=0.6.0 in /usr/local/lib/python3.11/dist-packages (from pydantic<3,>=1.9.0->anthropic) (0.7.0)\n", + "Requirement already satisfied: pydantic-core==2.33.1 in /usr/local/lib/python3.11/dist-packages (from pydantic<3,>=1.9.0->anthropic) (2.33.1)\n", + "Requirement already satisfied: typing-inspection>=0.4.0 in /usr/local/lib/python3.11/dist-packages (from pydantic<3,>=1.9.0->anthropic) (0.4.0)\n" + ] + } + ], + "source": [ + "!pip install anthropic\n", + "\n", + "import os\n", + "import json\n", + "import time\n", + "import glob\n", + "import datetime\n", + "import re\n", + "from tqdm.notebook import tqdm\n", + "import anthropic\n", + "from anthropic.types.message_create_params import MessageCreateParamsNonStreaming\n", + "from anthropic.types.messages.batch_create_params import Request\n", + "\n", + "# Create directories\n", + "output_dir = \"processed_variants\"\n", + "os.makedirs(output_dir, exist_ok=True)\n", + "\n", + "# API key setup - replace with your preferred method\n", + "# Option 1: Set as environment variable (recommended for production)\n", + "api_key = os.getenv('ANTHROPIC_API_KEY')\n", + "\n", + "# Option 2: For Google Colab, uncomment the following lines:\n", + "# from google.colab import userdata\n", + "# api_key = userdata.get('ANTHROPIC_API_KEY')\n", + "\n", + "# Option 3: Direct input (not recommended for production)\n", + "if not api_key:\n", + " api_key = input(\"Enter your Anthropic API key: \")\n", + "\n", + "# Create Anthropic client\n", + "client = anthropic.Anthropic(api_key=api_key)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "L8pHMeUrXfxW" + }, + "outputs": [], + "source": [ + "import os\n", + "import json\n", + "import time\n", + "import glob\n", + "import datetime\n", + "import re\n", + "from tqdm.notebook import tqdm\n", + "import anthropic\n", + "from anthropic.types.message_create_params import MessageCreateParamsNonStreaming\n", + "from anthropic.types.messages.batch_create_params import Request\n", + "\n", + "# Create directories\n", + "output_dir = \"processed_variants\"\n", + "os.makedirs(output_dir, exist_ok=True)\n", + "\n", + "# Get API key from Google Colab secrets\n", + "from google.colab import userdata\n", + "api_key = userdata.get('api_key')\n", + "if not api_key:\n", + " api_key = input(\"Enter your Anthropic API key: \")\n", + "\n", + "# Create Anthropic client\n", + "client = anthropic.Anthropic(api_key=api_key)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Data Loading Functions\n", + "\n", + "Functions to load and process genetic variant data from TSV files." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "A5rPTgZWXf0z" + }, + "outputs": [], + "source": [ + "# Load the variant data\n", + "def load_variant_data(file_path):\n", + " \"\"\"Load variant data from a TSV file.\"\"\"\n", + " variants = []\n", + "\n", + " with open(file_path, 'r', encoding='utf-8') as f:\n", + " # Get header line\n", + " header = f.readline().strip().split('\\t')\n", + "\n", + " # Read each line and create a dictionary\n", + " for line in f:\n", + " values = line.strip().split('\\t')\n", + " variant = {header[i]: values[i] for i in range(len(header))}\n", + " variants.append(variant)\n", + "\n", + " return variants\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "sdddg01zaQOo" + }, + "outputs": [], + "source": [ + "def create_variant_prompt(variant):\n", + " \"\"\"Create a prompt for analyzing a genetic variant.\"\"\"\n", + " prompt = f\"\"\"# Genetic Variant Analysis Prompt\n", + "\n", + " You are a genetics expert analyzing disease-causing mutations. For the following variant data, create a detailed reasoning path explaining the biological mechanism and disease relationship.\n", + "\n", + " ## Variant Data:\n", + " - Variant ID: {variant.get('Var_ID', 'Unknown')}\n", + " - Gene: {variant.get('ENTRY', 'Unknown')} ({\", \".join([f\"{k.split(';')[0]}\" for k in json.loads(variant.get('Gene', '{}')).values()])})\n", + " - Chromosome: {variant.get('Chr', 'Unknown')}\n", + " - Position: {variant.get('Start', 'Unknown')}\n", + " - Reference Allele: {variant.get('RefAllele', 'Unknown')}\n", + " - Alternative Allele: {variant.get('AltAllele', 'Unknown')}\n", + " - Network: {variant.get('Network Definition', 'Unknown')}\n", + " - Associated Disease: {list(json.loads(variant.get('Disease', '{}')).keys())[0] if variant.get('Disease') else 'Unknown'}\n", + "\n", + " ## Instructions\n", + " 1. Based on this variant data, provide a structured analysis in valid JSON format with the following components:\n", + " - Keep the complete raw_data object containing all original fields\n", + " - Generate one detailed question about the biological effect of this variant and what disease it might contribute to\n", + " - Provide a concise answer (2-3 sentences) summarizing the mechanism and disease relationship\n", + " - Develop a comprehensive reasoning path containing:\n", + " - The variant identifier\n", + " - The HGVS notation\n", + " - 8-12 sequential reasoning steps that trace the causal pathway from the genetic mutation to its cellular effects and disease manifestation\n", + " - Relevant labels for pathways, diseases, and genes\n", + "\n", + " ## Output Format\n", + " ```json\n", + " {{\n", + " \"raw_data\": {{\n", + " // Complete original data object with all fields\n", + " }},\n", + " \"question\": \"What is the biological effect of the [gene] mutation [id] ([ref]>[alt] at [position]) and what disease might it contribute to?\",\n", + " \"answer\": \"Concise 2-3 sentence answer summarizing mechanism and disease\",\n", + " \"reasoning\": {{\n", + " \"variant_id\": \"ID\",\n", + " \"hgvs\": \"Formal HGVS notation\",\n", + " \"reasoning_steps\": [\n", + " \"Step 1: Description of mutation at molecular level\",\n", + " \"Step 2: Effect on protein structure/function\",\n", + " \"Step 3: Effect on cellular pathway/process\",\n", + " // Additional steps showing causal chain\n", + " \"Final step: How this contributes to disease pathology\"\n", + " ],\n", + " \"labels\": {{\n", + " \"pathway\": [\"Pathway identifiers\"],\n", + " \"disease\": [\"Disease names\"],\n", + " \"gene\": [\"Gene names\"]\n", + " }}\n", + " }}\n", + " }}\n", + " Important notes:\n", + "\n", + " Ensure your response is VALID JSON without ANY explanatory text outside the JSON structure\n", + " Do not include markdown code blocks (```) in your response - just provide the raw JSON\n", + " Provide detailed, scientifically accurate reasoning steps that show the complete causal pathway\n", + " For HGVS notation, include both genomic (g.) and protein (p.) level changes\n", + "\n", + " Analyze this variant data and provide your complete analysis in valid JSON format:\n", + " \"\"\"\n", + " return prompt\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Prompt Creation\n", + "\n", + "Function to create structured prompts for genetic variant analysis." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "1d8S4WHHaQSw" + }, + "outputs": [], + "source": [ + "def process_variants_in_batches(variants, batch_size=5, model=\"claude-3-7-sonnet-20250219\", max_tokens=6000):\n", + " \"\"\"Process variants in batches using the Anthropic SDK.\"\"\"\n", + " print(f\"Processing {len(variants)} variants in batches of {batch_size}\")\n", + " # Process in batches\n", + " for i in range(0, len(variants), batch_size):\n", + " batch_variants = variants[i:i+batch_size]\n", + " print(f\"Processing batch {i//batch_size + 1} with {len(batch_variants)} variants\")\n", + "\n", + " # Create batch requests\n", + " batch_requests = []\n", + " for variant in batch_variants:\n", + " # Create a custom_id (max 64 chars)\n", + " var_id = variant.get('Var_ID', 'variant')\n", + " gene = variant.get('ENTRY', '')\n", + " custom_id = f\"{var_id}_{gene}\"[:64]\n", + "\n", + " # Create the prompt\n", + " prompt = create_variant_prompt(variant)\n", + "\n", + " # Add to batch requests\n", + " batch_requests.append(\n", + " Request(\n", + " custom_id=custom_id,\n", + " params=MessageCreateParamsNonStreaming(\n", + " model=model,\n", + " max_tokens=max_tokens,\n", + " temperature=0.2, # Slightly higher temperature for reasoning variation\n", + " system=\"You are a genetics expert analyzing disease-causing mutations. Provide your analysis in VALID JSON format only, with no markdown formatting or explanatory text. Your JSON should contain raw_data, question, answer, and reasoning components.\",\n", + " messages=[\n", + " {\"role\": \"user\", \"content\": prompt}\n", + " ]\n", + " )\n", + " )\n", + " )\n", + "\n", + " # Submit batch\n", + " print(f\"Submitting batch with {len(batch_requests)} requests...\")\n", + " batch = client.messages.batches.create(requests=batch_requests)\n", + " print(f\"Batch created with ID: {batch.id}\")\n", + " print(f\"Initial status: {batch.processing_status}\")\n", + "\n", + " # Poll for batch completion\n", + " polling_interval = 10 # seconds\n", + " while True:\n", + " # Get batch status\n", + " batch_status = client.messages.batches.retrieve(batch.id)\n", + "\n", + " # Print status\n", + " print(f\"Batch status: {batch_status.processing_status}\")\n", + " print(f\"Processing: {batch_status.request_counts.processing}, \"\n", + " f\"Succeeded: {batch_status.request_counts.succeeded}, \"\n", + " f\"Errored: {batch_status.request_counts.errored}\")\n", + "\n", + " # Exit loop if processing is complete\n", + " if batch_status.processing_status == \"ended\":\n", + " break\n", + "\n", + " # Wait before checking again\n", + " print(f\"Waiting {polling_interval} seconds...\")\n", + " time.sleep(polling_interval)\n", + "\n", + " # Process batch results\n", + " print(\"Processing batch results...\")\n", + " try:\n", + " for result in client.messages.batches.results(batch.id):\n", + " custom_id = result.custom_id\n", + "\n", + " # Extract variant ID from custom_id\n", + " variant_id = custom_id.split('_')[0]\n", + " output_file = os.path.join(output_dir, f\"{variant_id}_processed.json\")\n", + "\n", + " # Handle different result types\n", + " if result.result.type == \"succeeded\":\n", + " # Get the message content\n", + " message = result.result.message\n", + " content = message.content[0].text if message.content else \"\"\n", + "\n", + " # Extract and parse the JSON\n", + " try:\n", + " # Try direct parsing first\n", + " try:\n", + " parsed_json = json.loads(content)\n", + " except json.JSONDecodeError:\n", + " # Remove markdown code blocks if present\n", + " if \"```json\" in content or \"```\" in content:\n", + " content = re.sub(r'```json\\s*', '', content)\n", + " content = re.sub(r'```\\s*', '', content)\n", + "\n", + " # Extract just the JSON part\n", + " json_start = content.find('{')\n", + " json_end = content.rfind('}') + 1\n", + "\n", + " if json_start >= 0 and json_end > json_start:\n", + " json_text = content[json_start:json_end]\n", + " parsed_json = json.loads(json_text)\n", + "\n", + " # Save the parsed result\n", + " with open(output_file, 'w', encoding='utf-8') as f:\n", + " json.dump(parsed_json, f, indent=2)\n", + " print(f\"✓ Saved result for {variant_id}\")\n", + "\n", + " except Exception as e:\n", + " print(f\"✗ Error parsing result for {variant_id}: {e}\")\n", + " # Save the raw content\n", + " with open(output_file, 'w', encoding='utf-8') as f:\n", + " json.dump({\"error\": str(e), \"raw_content\": content}, f, indent=2)\n", + "\n", + " # Also save as text file for manual fixing\n", + " with open(f\"{output_file}_raw.txt\", 'w', encoding='utf-8') as f:\n", + " f.write(content)\n", + "\n", + " elif result.result.type == \"errored\":\n", + " error_message = \"Unknown error\"\n", + " if hasattr(result.result, 'error') and hasattr(result.result.error, 'message'):\n", + " error_message = result.result.error.message\n", + "\n", + " print(f\"✗ Error processing {variant_id}: {error_message}\")\n", + " # Save the error\n", + " with open(output_file, 'w', encoding='utf-8') as f:\n", + " json.dump({\"error\": error_message}, f, indent=2)\n", + "\n", + " except Exception as e:\n", + " print(f\"Error processing batch results: {str(e)}\")\n", + "\n", + " # Wait between batches\n", + " if i + batch_size < len(variants):\n", + " print(\"Waiting 5 seconds before next batch...\")\n", + " time.sleep(5)\n", + "\n", + " print(\"All batches processed!\")\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Batch Processing Functions\n", + "\n", + "Functions to process variants in batches using the Anthropic API." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Qh3milYjUBhP" + }, + "source": [ + "## This is the version I used for the curation\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "rNwRB7XK9LJ9" + }, + "source": [ + "## Run for a Batch ##" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Zt0TkTir4gpg" + }, + "outputs": [], + "source": [ + "import os\n", + "import json\n", + "import time\n", + "import glob\n", + "import datetime\n", + "import re\n", + "from tqdm.notebook import tqdm\n", + "import anthropic\n", + "from anthropic.types.message_create_params import MessageCreateParamsNonStreaming\n", + "from anthropic.types.messages.batch_create_params import Request\n", + "\n", + "# Create directories\n", + "output_dir = \"processed_variants\"\n", + "os.makedirs(output_dir, exist_ok=True)\n", + "\n", + "# API key setup - multiple options for different environments\n", + "api_key = os.getenv('ANTHROPIC_API_KEY')\n", + "\n", + "# For Google Colab users, uncomment these lines:\n", + "# from google.colab import userdata\n", + "# api_key = userdata.get('ANTHROPIC_API_KEY')\n", + "\n", + "if not api_key:\n", + " api_key = input(\"Enter your Anthropic API key: \")\n", + "\n", + "# Create Anthropic client\n", + "client = anthropic.Anthropic(api_key=api_key)\n", + "\n", + "# Function to load variant data\n", + "def load_variant_data(file_path):\n", + " \"\"\"Load variant data from a TSV file.\"\"\"\n", + " variants = []\n", + "\n", + " with open(file_path, 'r', encoding='utf-8') as f:\n", + " # Get header line\n", + " header = f.readline().strip().split('\\t')\n", + "\n", + " # Read each line and create a dictionary\n", + " for line in f:\n", + " values = line.strip().split('\\t')\n", + " if len(values) == len(header):\n", + " variant = {header[i]: values[i] for i in range(len(header))}\n", + " variants.append(variant)\n", + " else:\n", + " print(f\"Skipping malformed line: {line[:50]}...\")\n", + "\n", + " return variants\n", + "\n", + "# Function to create the prompt\n", + "def create_variant_prompt(variant):\n", + " \"\"\"Create a prompt for analyzing a genetic variant.\"\"\"\n", + " # Parse Gene field\n", + " gene_info = {}\n", + " gene_names = \"Unknown\"\n", + " try:\n", + " # First try to parse as JSON\n", + " if variant.get('Gene') and variant.get('Gene').startswith('{'):\n", + " gene_info = json.loads(variant.get('Gene', '{}'))\n", + " gene_names = \", \".join([g.split(';')[0] for g in gene_info.values()]) if gene_info else \"Unknown\"\n", + " else:\n", + " # If not JSON, use as is\n", + " gene_names = variant.get('Gene', 'Unknown')\n", + " except:\n", + " gene_names = variant.get('Gene', 'Unknown')\n", + "\n", + " # Parse Disease field\n", + " disease_info = {}\n", + " disease_name = \"Unknown\"\n", + " try:\n", + " # First try to parse as JSON\n", + " if variant.get('Disease') and variant.get('Disease').startswith('{'):\n", + " disease_info = json.loads(variant.get('Disease', '{}'))\n", + " disease_name = list(disease_info.keys())[0] if disease_info else \"Unknown\"\n", + " else:\n", + " # If not JSON, use as is\n", + " disease_name = variant.get('Disease', 'Unknown')\n", + " except:\n", + " disease_name = variant.get('Disease', 'Unknown')\n", + "\n", + " prompt = f\"\"\"# Genetic Variant Analysis Prompt\n", + "\n", + " You are a genetics expert analyzing disease-causing mutations. For the following variant data, create a detailed reasoning path explaining the biological mechanism and disease relationship.\n", + "\n", + " ## Variant Data:\n", + " - Variant ID: {variant.get('Var_ID', 'Unknown')}\n", + " - Gene: {variant.get('ENTRY', 'Unknown')} ({gene_names})\n", + " - Chromosome: {variant.get('Chr', 'Unknown')}\n", + " - Position: {variant.get('Start', 'Unknown')}\n", + " - Reference Allele: {variant.get('RefAllele', 'Unknown')}\n", + " - Alternative Allele: {variant.get('AltAllele', 'Unknown')}\n", + " - Network: {variant.get('Network Definition', 'Unknown')}\n", + " - Associated Disease: {disease_name}\n", + "\n", + " ## Instructions\n", + " 1. Based on this variant data, provide a structured analysis in valid JSON format with the following components:\n", + " - Keep the complete raw_data object containing all original fields\n", + " - Generate one detailed question about the biological effect of this variant and what disease it might contribute to\n", + " - Provide a concise answer (2-3 sentences) summarizing the mechanism and disease relationship\n", + " - Develop a comprehensive reasoning path containing:\n", + " - The variant identifier\n", + " - The HGVS notation\n", + " - 8-12 sequential reasoning steps that trace the causal pathway from the genetic mutation to its cellular effects and disease manifestation\n", + " - Relevant labels for pathways, diseases, and genes\n", + "\n", + " ## Output Format\n", + " ```json\n", + " {{\n", + " \"raw_data\": {{\n", + " // Complete original data object with all fields\n", + " }},\n", + " \"question\": \"What is the biological effect of the [gene] mutation [id] ([ref]>[alt] at [position]) and what disease might it contribute to?\",\n", + " \"answer\": \"Concise 2-3 sentence answer summarizing mechanism and disease\",\n", + " \"reasoning\": {{\n", + " \"variant_id\": \"ID\",\n", + " \"hgvs\": \"Formal HGVS notation\",\n", + " \"reasoning_steps\": [\n", + " \"Step 1: Description of mutation at molecular level\",\n", + " \"Step 2: Effect on protein structure/function\",\n", + " \"Step 3: Effect on cellular pathway/process\",\n", + " // Additional steps showing causal chain\n", + " \"Final step: How this contributes to disease pathology\"\n", + " ],\n", + " \"labels\": {{\n", + " \"pathway\": [\"Pathway identifiers\"],\n", + " \"disease\": [\"Disease names\"],\n", + " \"gene\": [\"Gene names\"]\n", + " }}\n", + " }}\n", + " }}\n", + " Important notes:\n", + "\n", + " Ensure your response is VALID JSON without ANY explanatory text outside the JSON structure\n", + " Do not include markdown code blocks (```) in your response - just provide the raw JSON\n", + " Provide detailed, scientifically accurate reasoning steps that show the complete causal pathway\n", + " For HGVS notation, include both genomic (g.) and protein (p.) level changes\n", + "\n", + " Analyze this variant data and provide your complete analysis in valid JSON format:\n", + " \"\"\"\n", + " return prompt\n", + "\n", + "## Function to process variants in batches\n", + "def process_variants_in_batches(variants, batch_size=5, model=\"claude-3-7-sonnet-20250219\", max_tokens=6000):\n", + " \"\"\"Process variants in batches using the Anthropic SDK.\"\"\"\n", + " print(f\"Processing {len(variants)} variants in batches of {batch_size}\")\n", + "\n", + " # Process in batches\n", + " for i in range(0, len(variants), batch_size):\n", + " batch_variants = variants[i:i+batch_size]\n", + " print(f\"Processing batch {i//batch_size + 1} with {len(batch_variants)} variants\")\n", + "\n", + " # Create batch requests\n", + " batch_requests = []\n", + " for variant in batch_variants:\n", + " # Extract the Var_ID as the unique identifier\n", + " var_id = variant.get('Var_ID', f'variant_{i}_{len(batch_requests)}')\n", + "\n", + " # Create the prompt\n", + " prompt = create_variant_prompt(variant)\n", + "\n", + " # Add to batch requests\n", + " batch_requests.append(\n", + " Request(\n", + " custom_id=var_id, # Use Var_ID directly as the custom_id\n", + " params=MessageCreateParamsNonStreaming(\n", + " model=model,\n", + " max_tokens=max_tokens,\n", + " temperature=0.2,\n", + " system=\"You are a genetics expert analyzing disease-causing mutations. Provide your analysis in VALID JSON format only, with no markdown formatting or explanatory text. Your JSON should contain raw_data, question, answer, and reasoning components.\",\n", + " messages=[\n", + " {\"role\": \"user\", \"content\": prompt}\n", + " ]\n", + " )\n", + " )\n", + " )\n", + "\n", + " # Submit batch\n", + " print(f\"Submitting batch with {len(batch_requests)} requests...\")\n", + " batch = client.messages.batches.create(requests=batch_requests)\n", + " print(f\"Batch created with ID: {batch.id}\")\n", + " print(f\"Initial status: {batch.processing_status}\")\n", + "\n", + " # Poll for batch completion\n", + " polling_interval = 10 # seconds\n", + " while True:\n", + " # Get batch status\n", + " batch_status = client.messages.batches.retrieve(batch.id)\n", + "\n", + " # Print status\n", + " print(f\"Batch status: {batch_status.processing_status}\")\n", + " print(f\"Processing: {batch_status.request_counts.processing}, \"\n", + " f\"Succeeded: {batch_status.request_counts.succeeded}, \"\n", + " f\"Errored: {batch_status.request_counts.errored}\")\n", + "\n", + " # Exit loop if processing is complete\n", + " if batch_status.processing_status == \"ended\":\n", + " break\n", + "\n", + " # Wait before checking again\n", + " print(f\"Waiting {polling_interval} seconds...\")\n", + " time.sleep(polling_interval)\n", + "\n", + " # Process batch results\n", + " print(\"Processing batch results...\")\n", + " try:\n", + " for result in client.messages.batches.results(batch.id):\n", + " # Get the variant ID from custom_id (which should be the Var_ID)\n", + " variant_id = result.custom_id\n", + " output_file = os.path.join(output_dir, f\"{variant_id}_processed.json\")\n", + "\n", + " # Handle different result types\n", + " if result.result.type == \"succeeded\":\n", + " # Get the message content\n", + " message = result.result.message\n", + " content = message.content[0].text if message.content else \"\"\n", + "\n", + " # Extract and parse the JSON\n", + " try:\n", + " # Try multiple approaches to extract and parse the JSON\n", + " json_text = None\n", + " parsed_json = None\n", + "\n", + " # Try direct parsing first\n", + " try:\n", + " parsed_json = json.loads(content)\n", + " print(f\"✓ Direct JSON parsing successful for {variant_id}\")\n", + " except json.JSONDecodeError:\n", + " # Try removing markdown code blocks if present\n", + " if \"```json\" in content or \"```\" in content:\n", + " cleaned_content = re.sub(r'```json\\s*', '', content)\n", + " cleaned_content = re.sub(r'```\\s*', '', cleaned_content)\n", + " try:\n", + " parsed_json = json.loads(cleaned_content)\n", + " print(f\"✓ JSON parsing after markdown removal successful for {variant_id}\")\n", + " except json.JSONDecodeError:\n", + " pass # Will try next method\n", + "\n", + " # Try extracting just the JSON part\n", + " if not parsed_json:\n", + " json_start = content.find('{')\n", + " json_end = content.rfind('}') + 1\n", + "\n", + " if json_start >= 0 and json_end > json_start:\n", + " json_text = content[json_start:json_end]\n", + " try:\n", + " parsed_json = json.loads(json_text)\n", + " print(f\"✓ JSON extraction and parsing successful for {variant_id}\")\n", + " except json.JSONDecodeError:\n", + " # Try fixing common JSON syntax issues\n", + " fixed_json = re.sub(r'\"\\s*\"', '\", \"', json_text)\n", + " fixed_json = re.sub(r'}\\s*{', '}, {', fixed_json)\n", + " fixed_json = re.sub(r']\\s*{', '], {', fixed_json)\n", + " fixed_json = re.sub(r'}\\s*\\[', '}, [', fixed_json)\n", + " fixed_json = re.sub(r']\\s*\\[', '], [', fixed_json)\n", + "\n", + " try:\n", + " parsed_json = json.loads(fixed_json)\n", + " print(f\"✓ JSON parsing after fixing syntax successful for {variant_id}\")\n", + " except json.JSONDecodeError as e:\n", + " print(f\"✗ All JSON parsing methods failed for {variant_id}: {e}\")\n", + "\n", + " # Save the parsed result or error\n", + " if parsed_json:\n", + " with open(output_file, 'w', encoding='utf-8') as f:\n", + " json.dump(parsed_json, f, indent=2)\n", + " print(f\"✓ Saved result for {variant_id}\")\n", + " else:\n", + " # Save the full raw response for manual fixing\n", + " with open(output_file, 'w', encoding='utf-8') as f:\n", + " json.dump({\n", + " \"error\": \"Invalid JSON in response\",\n", + " \"raw_response\": content\n", + " }, f, indent=2)\n", + " print(f\"✗ JSON parsing error for {variant_id}, saved full raw response\")\n", + "\n", + " # Also save raw content to a text file for easier manual fixing\n", + " with open(f\"{output_file}_raw.txt\", 'w', encoding='utf-8') as f:\n", + " f.write(content)\n", + "\n", + " except Exception as e:\n", + " print(f\"✗ Error processing result for {variant_id}: {e}\")\n", + " # Save the raw content\n", + " with open(output_file, 'w', encoding='utf-8') as f:\n", + " json.dump({\"error\": str(e), \"raw_content\": content}, f, indent=2)\n", + "\n", + " # Also save as text file for manual fixing\n", + " with open(f\"{output_file}_raw.txt\", 'w', encoding='utf-8') as f:\n", + " f.write(content)\n", + "\n", + " elif result.result.type == \"errored\":\n", + " error_message = \"Unknown error\"\n", + " if hasattr(result.result, 'error') and hasattr(result.result.error, 'message'):\n", + " error_message = result.result.error.message\n", + "\n", + " print(f\"✗ Error processing {variant_id}: {error_message}\")\n", + " # Save the error\n", + " with open(output_file, 'w', encoding='utf-8') as f:\n", + " json.dump({\"error\": error_message}, f, indent=2)\n", + "\n", + " except Exception as e:\n", + " print(f\"Error processing batch results: {str(e)}\")\n", + "\n", + " # Wait between batches\n", + " if i + batch_size < len(variants):\n", + " print(\"Waiting 5 seconds before next batch...\")\n", + " time.sleep(5)\n", + "\n", + " print(\"All batches processed!\")\n", + "\n", + "## Function to combine all results\n", + "def combine_all_results():\n", + " \"\"\"Combine all processed results into a single JSON file.\"\"\"\n", + " all_results = []\n", + " error_count = 0\n", + "\n", + " # List all JSON files in the output directory (excluding raw text files)\n", + " json_files = [f for f in glob.glob(os.path.join(output_dir, \"*.json\"))\n", + " if not f.endswith(\"_raw.txt\")]\n", + "\n", + " print(f\"Found {len(json_files)} JSON files to combine\")\n", + "\n", + " for file_path in json_files:\n", + " try:\n", + " with open(file_path, 'r', encoding='utf-8') as f:\n", + " data = json.load(f)\n", + "\n", + " # Skip files with errors\n", + " if \"error\" in data:\n", + " error_count += 1\n", + " print(f\"Skipping file with error: {os.path.basename(file_path)}\")\n", + " continue\n", + "\n", + " all_results.append(data)\n", + " print(f\"Added {os.path.basename(file_path)} to combined results\")\n", + "\n", + " except Exception as e:\n", + " print(f\"Error loading {os.path.basename(file_path)}: {e}\")\n", + " error_count += 1\n", + "\n", + " print(f\"Successfully combined {len(all_results)} results. {error_count} files had errors.\")\n", + "\n", + " # Save the combined collection\n", + " with open(\"all_variant_analyses.json\", 'w', encoding='utf-8') as f:\n", + " json.dump(all_results, f, indent=2)\n", + "\n", + " print(\"Saved all results to 'all_variant_analyses.json'\")\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Complete Processing Pipeline\n", + "\n", + "This section contains the complete pipeline with all functions integrated for easier execution." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ATfwtjWB4gr-" + }, + "outputs": [], + "source": [ + "## Main function that you will call\n", + "def process_genetic_variants(file_path, num_variants=20, batch_size=5, model=\"claude-3-7-sonnet-20250219\"):\n", + " \"\"\"\n", + " Process genetic variants from a TSV file.\n", + " Parameters:\n", + " file_path (str): Path to the TSV file containing variant data\n", + " num_variants (int): Number of variants to process (default: 20)\n", + " batch_size (int): Number of variants to process in each batch (default: 5)\n", + " model (str): Claude model to use (default: claude-3-7-sonnet-20250219)\n", + " \"\"\"\n", + " print(f\"Genetic Variant Analysis Script\")\n", + " print(f\"===============================\")\n", + " print(f\"Processing {num_variants} variants in batches of {batch_size} using {model}\")\n", + "\n", + " # Load data\n", + " print(f\"Loading variant data from {file_path}...\")\n", + " all_variants = load_variant_data(file_path)\n", + " print(f\"Loaded {len(all_variants)} variants in total\")\n", + "\n", + " # Limit to specified number of variants\n", + " variants = all_variants[:num_variants]\n", + " print(f\"Limited to the first {len(variants)} variants for processing\")\n", + "\n", + " # Process variants\n", + " process_variants_in_batches(\n", + " variants,\n", + " batch_size=batch_size,\n", + " model=model\n", + " )\n", + "\n", + " # Combine results\n", + " print(\"Combining results...\")\n", + " combine_all_results()\n", + "\n", + " print(\"Processing complete!\")\n", + " return f\"Results saved to {output_dir} and combined in all_variant_analyses.json\"\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Main Processing Function\n", + "\n", + "Convenience function to run the complete analysis pipeline." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "BdCGzxeAFVU_", + "outputId": "5a3b6fc9-330f-4e38-b576-c3b9f1e64048" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Genetic Variant Analysis Script\n", + "===============================\n", + "Processing 440 variants in batches of 20 using claude-3-7-sonnet-20250219\n", + "Loading variant data from final_network_with_variant.tsv...\n", + "Loaded 289 variants in total\n", + "Limited to the first 289 variants for processing\n", + "Processing 289 variants in batches of 20\n", + "Processing batch 1 with 20 variants\n", + "Submitting batch with 20 requests...\n", + "Batch created with ID: msgbatch_013VgvncRWMwgGiuSD3ZU1Ug\n", + "Initial status: in_progress\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: ended\n", + "Processing: 0, Succeeded: 20, Errored: 0\n", + "Processing batch results...\n", + "✓ Direct JSON parsing successful for KEGG_1161\n", + "✓ Saved result for KEGG_1161\n", + "✓ Direct JSON parsing successful for KEGG_1162\n", + "✓ Saved result for KEGG_1162\n", + "✓ Direct JSON parsing successful for KEGG_1163\n", + "✓ Saved result for KEGG_1163\n", + "✓ Direct JSON parsing successful for KEGG_1164\n", + "✓ Saved result for KEGG_1164\n", + "✓ Direct JSON parsing successful for KEGG_1165\n", + "✓ Saved result for KEGG_1165\n", + "✓ Direct JSON parsing successful for KEGG_1166\n", + "✓ Saved result for KEGG_1166\n", + "✓ Direct JSON parsing successful for KEGG_1167\n", + "✓ Saved result for KEGG_1167\n", + "✓ Direct JSON parsing successful for KEGG_1168\n", + "✓ Saved result for KEGG_1168\n", + "✓ Direct JSON parsing successful for KEGG_1169\n", + "✓ Saved result for KEGG_1169\n", + "✓ Direct JSON parsing successful for KEGG_1170\n", + "✓ Saved result for KEGG_1170\n", + "✓ Direct JSON parsing successful for KEGG_1171\n", + "✓ Saved result for KEGG_1171\n", + "✓ Direct JSON parsing successful for KEGG_1172\n", + "✓ Saved result for KEGG_1172\n", + "✓ Direct JSON parsing successful for KEGG_1173\n", + "✓ Saved result for KEGG_1173\n", + "✓ Direct JSON parsing successful for KEGG_1174\n", + "✓ Saved result for KEGG_1174\n", + "✓ Direct JSON parsing successful for KEGG_1175\n", + "✓ Saved result for KEGG_1175\n", + "✗ All JSON parsing methods failed for KEGG_1176: Extra data: line 1 column 4259 (char 4258)\n", + "✗ JSON parsing error for KEGG_1176, saved full raw response\n", + "✓ Direct JSON parsing successful for KEGG_1177\n", + "✓ Saved result for KEGG_1177\n", + "✓ Direct JSON parsing successful for KEGG_1178\n", + "✓ Saved result for KEGG_1178\n", + "✓ Direct JSON parsing successful for KEGG_1179\n", + "✓ Saved result for KEGG_1179\n", + "✓ Direct JSON parsing successful for KEGG_1180\n", + "✓ Saved result for KEGG_1180\n", + "Waiting 5 seconds before next batch...\n", + "Processing batch 2 with 20 variants\n", + "Submitting batch with 20 requests...\n", + "Batch created with ID: msgbatch_01WyxbNt22ncwGbQ1TQe2N62\n", + "Initial status: in_progress\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: ended\n", + "Processing: 0, Succeeded: 20, Errored: 0\n", + "Processing batch results...\n", + "✓ Direct JSON parsing successful for KEGG_1181\n", + "✓ Saved result for KEGG_1181\n", + "✓ Direct JSON parsing successful for KEGG_1182\n", + "✓ Saved result for KEGG_1182\n", + "✓ Direct JSON parsing successful for KEGG_1183\n", + "✓ Saved result for KEGG_1183\n", + "✓ Direct JSON parsing successful for KEGG_1184\n", + "✓ Saved result for KEGG_1184\n", + "✓ Direct JSON parsing successful for KEGG_1185\n", + "✓ Saved result for KEGG_1185\n", + "✓ Direct JSON parsing successful for KEGG_1186\n", + "✓ Saved result for KEGG_1186\n", + "✓ Direct JSON parsing successful for KEGG_1187\n", + "✓ Saved result for KEGG_1187\n", + "✓ Direct JSON parsing successful for KEGG_1188\n", + "✓ Saved result for KEGG_1188\n", + "✓ Direct JSON parsing successful for KEGG_1189\n", + "✓ Saved result for KEGG_1189\n", + "✓ Direct JSON parsing successful for KEGG_1190\n", + "✓ Saved result for KEGG_1190\n", + "✓ Direct JSON parsing successful for KEGG_1191\n", + "✓ Saved result for KEGG_1191\n", + "✓ Direct JSON parsing successful for KEGG_1192\n", + "✓ Saved result for KEGG_1192\n", + "✓ Direct JSON parsing successful for KEGG_1193\n", + "✓ Saved result for KEGG_1193\n", + "✓ Direct JSON parsing successful for KEGG_1194\n", + "✓ Saved result for KEGG_1194\n", + "✓ Direct JSON parsing successful for KEGG_1195\n", + "✓ Saved result for KEGG_1195\n", + "✓ Direct JSON parsing successful for KEGG_1196\n", + "✓ Saved result for KEGG_1196\n", + "✓ Direct JSON parsing successful for KEGG_1197\n", + "✓ Saved result for KEGG_1197\n", + "✓ Direct JSON parsing successful for KEGG_1198\n", + "✓ Saved result for KEGG_1198\n", + "✓ Direct JSON parsing successful for KEGG_1199\n", + "✓ Saved result for KEGG_1199\n", + "✓ Direct JSON parsing successful for KEGG_1200\n", + "✓ Saved result for KEGG_1200\n", + "Waiting 5 seconds before next batch...\n", + "Processing batch 3 with 20 variants\n", + "Submitting batch with 20 requests...\n", + "Batch created with ID: msgbatch_01KuXnQxQkDhRrFdr2GoXJyN\n", + "Initial status: in_progress\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: ended\n", + "Processing: 0, Succeeded: 20, Errored: 0\n", + "Processing batch results...\n", + "✓ Direct JSON parsing successful for KEGG_1201\n", + "✓ Saved result for KEGG_1201\n", + "✓ Direct JSON parsing successful for KEGG_1202\n", + "✓ Saved result for KEGG_1202\n", + "✓ Direct JSON parsing successful for KEGG_1203\n", + "✓ Saved result for KEGG_1203\n", + "✓ Direct JSON parsing successful for KEGG_1204\n", + "✓ Saved result for KEGG_1204\n", + "✓ Direct JSON parsing successful for KEGG_1205\n", + "✓ Saved result for KEGG_1205\n", + "✓ Direct JSON parsing successful for KEGG_1206\n", + "✓ Saved result for KEGG_1206\n", + "✓ Direct JSON parsing successful for KEGG_1207\n", + "✓ Saved result for KEGG_1207\n", + "✓ Direct JSON parsing successful for KEGG_1208\n", + "✓ Saved result for KEGG_1208\n", + "✓ Direct JSON parsing successful for KEGG_1209\n", + "✓ Saved result for KEGG_1209\n", + "✓ Direct JSON parsing successful for KEGG_1210\n", + "✓ Saved result for KEGG_1210\n", + "✓ Direct JSON parsing successful for KEGG_1211\n", + "✓ Saved result for KEGG_1211\n", + "✓ Direct JSON parsing successful for KEGG_1212\n", + "✓ Saved result for KEGG_1212\n", + "✓ Direct JSON parsing successful for KEGG_1213\n", + "✓ Saved result for KEGG_1213\n", + "✓ Direct JSON parsing successful for KEGG_1214\n", + "✓ Saved result for KEGG_1214\n", + "✓ Direct JSON parsing successful for KEGG_1215\n", + "✓ Saved result for KEGG_1215\n", + "✓ Direct JSON parsing successful for KEGG_1216\n", + "✓ Saved result for KEGG_1216\n", + "✓ Direct JSON parsing successful for KEGG_1217\n", + "✓ Saved result for KEGG_1217\n", + "✓ Direct JSON parsing successful for KEGG_1218\n", + "✓ Saved result for KEGG_1218\n", + "✓ Direct JSON parsing successful for KEGG_1219\n", + "✓ Saved result for KEGG_1219\n", + "✓ Direct JSON parsing successful for KEGG_1220\n", + "✓ Saved result for KEGG_1220\n", + "Waiting 5 seconds before next batch...\n", + "Processing batch 4 with 20 variants\n", + "Submitting batch with 20 requests...\n", + "Batch created with ID: msgbatch_01AvcVJWBaxzqKsHJQaQ3RVT\n", + "Initial status: in_progress\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: ended\n", + "Processing: 0, Succeeded: 20, Errored: 0\n", + "Processing batch results...\n", + "✓ Direct JSON parsing successful for KEGG_1221\n", + "✓ Saved result for KEGG_1221\n", + "✓ Direct JSON parsing successful for KEGG_1222\n", + "✓ Saved result for KEGG_1222\n", + "✓ Direct JSON parsing successful for KEGG_1223\n", + "✓ Saved result for KEGG_1223\n", + "✓ Direct JSON parsing successful for KEGG_1224\n", + "✓ Saved result for KEGG_1224\n", + "✓ Direct JSON parsing successful for KEGG_1225\n", + "✓ Saved result for KEGG_1225\n", + "✓ Direct JSON parsing successful for KEGG_1226\n", + "✓ Saved result for KEGG_1226\n", + "✓ Direct JSON parsing successful for KEGG_1227\n", + "✓ Saved result for KEGG_1227\n", + "✓ Direct JSON parsing successful for KEGG_1228\n", + "✓ Saved result for KEGG_1228\n", + "✓ Direct JSON parsing successful for KEGG_1229\n", + "��� Saved result for KEGG_1229\n", + "✓ Direct JSON parsing successful for KEGG_1230\n", + "✓ Saved result for KEGG_1230\n", + "✓ Direct JSON parsing successful for KEGG_1231\n", + "✓ Saved result for KEGG_1231\n", + "✓ Direct JSON parsing successful for KEGG_1232\n", + "✓ Saved result for KEGG_1232\n", + "✓ Direct JSON parsing successful for KEGG_1233\n", + "✓ Saved result for KEGG_1233\n", + "✓ Direct JSON parsing successful for KEGG_1234\n", + "✓ Saved result for KEGG_1234\n", + "✓ Direct JSON parsing successful for KEGG_1235\n", + "✓ Saved result for KEGG_1235\n", + "✓ Direct JSON parsing successful for KEGG_1236\n", + "✓ Saved result for KEGG_1236\n", + "✓ Direct JSON parsing successful for KEGG_1237\n", + "✓ Saved result for KEGG_1237\n", + "✓ Direct JSON parsing successful for KEGG_1238\n", + "✓ Saved result for KEGG_1238\n", + "✓ Direct JSON parsing successful for KEGG_1239\n", + "✓ Saved result for KEGG_1239\n", + "✓ Direct JSON parsing successful for KEGG_1240\n", + "✓ Saved result for KEGG_1240\n", + "Waiting 5 seconds before next batch...\n", + "Processing batch 5 with 20 variants\n", + "Submitting batch with 20 requests...\n", + "Batch created with ID: msgbatch_01AYb2QqhD3rVnbgphZXpR74\n", + "Initial status: in_progress\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: ended\n", + "Processing: 0, Succeeded: 20, Errored: 0\n", + "Processing batch results...\n", + "✓ Direct JSON parsing successful for KEGG_1241\n", + "✓ Saved result for KEGG_1241\n", + "✓ Direct JSON parsing successful for KEGG_1242\n", + "✓ Saved result for KEGG_1242\n", + "✓ Direct JSON parsing successful for KEGG_1243\n", + "✓ Saved result for KEGG_1243\n", + "✓ Direct JSON parsing successful for KEGG_1244\n", + "✓ Saved result for KEGG_1244\n", + "✓ Direct JSON parsing successful for KEGG_1245\n", + "✓ Saved result for KEGG_1245\n", + "✓ Direct JSON parsing successful for KEGG_1246\n", + "✓ Saved result for KEGG_1246\n", + "✓ Direct JSON parsing successful for KEGG_1247\n", + "✓ Saved result for KEGG_1247\n", + "✓ Direct JSON parsing successful for KEGG_1248\n", + "✓ Saved result for KEGG_1248\n", + "✓ Direct JSON parsing successful for KEGG_1249\n", + "✓ Saved result for KEGG_1249\n", + "✓ Direct JSON parsing successful for KEGG_1250\n", + "✓ Saved result for KEGG_1250\n", + "✓ Direct JSON parsing successful for KEGG_1251\n", + "✓ Saved result for KEGG_1251\n", + "✓ Direct JSON parsing successful for KEGG_1252\n", + "✓ Saved result for KEGG_1252\n", + "✓ Direct JSON parsing successful for KEGG_1253\n", + "✓ Saved result for KEGG_1253\n", + "✓ Direct JSON parsing successful for KEGG_1254\n", + "✓ Saved result for KEGG_1254\n", + "✓ Direct JSON parsing successful for KEGG_1255\n", + "✓ Saved result for KEGG_1255\n", + "✓ Direct JSON parsing successful for KEGG_1256\n", + "✓ Saved result for KEGG_1256\n", + "✓ Direct JSON parsing successful for KEGG_1257\n", + "✓ Saved result for KEGG_1257\n", + "✓ Direct JSON parsing successful for KEGG_1258\n", + "✓ Saved result for KEGG_1258\n", + "✓ Direct JSON parsing successful for KEGG_1259\n", + "✓ Saved result for KEGG_1259\n", + "✓ Direct JSON parsing successful for KEGG_1260\n", + "✓ Saved result for KEGG_1260\n", + "Waiting 5 seconds before next batch...\n", + "Processing batch 6 with 20 variants\n", + "Submitting batch with 20 requests...\n", + "Batch created with ID: msgbatch_015ZnmQjCJc4DAtWMRLmrWow\n", + "Initial status: in_progress\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: ended\n", + "Processing: 0, Succeeded: 20, Errored: 0\n", + "Processing batch results...\n", + "✓ Direct JSON parsing successful for KEGG_1261\n", + "✓ Saved result for KEGG_1261\n", + "✓ Direct JSON parsing successful for KEGG_1262\n", + "✓ Saved result for KEGG_1262\n", + "✓ Direct JSON parsing successful for KEGG_1263\n", + "✓ Saved result for KEGG_1263\n", + "✓ Direct JSON parsing successful for KEGG_1264\n", + "✓ Saved result for KEGG_1264\n", + "✓ Direct JSON parsing successful for KEGG_1265\n", + "✓ Saved result for KEGG_1265\n", + "✓ Direct JSON parsing successful for KEGG_1266\n", + "✓ Saved result for KEGG_1266\n", + "✓ Direct JSON parsing successful for KEGG_1267\n", + "✓ Saved result for KEGG_1267\n", + "✓ Direct JSON parsing successful for KEGG_1268\n", + "✓ Saved result for KEGG_1268\n", + "✓ Direct JSON parsing successful for KEGG_1269\n", + "✓ Saved result for KEGG_1269\n", + "✓ Direct JSON parsing successful for KEGG_1270\n", + "✓ Saved result for KEGG_1270\n", + "✓ Direct JSON parsing successful for KEGG_1271\n", + "✓ Saved result for KEGG_1271\n", + "✓ Direct JSON parsing successful for KEGG_1272\n", + "✓ Saved result for KEGG_1272\n", + "✓ Direct JSON parsing successful for KEGG_1273\n", + "✓ Saved result for KEGG_1273\n", + "✓ Direct JSON parsing successful for KEGG_1274\n", + "✓ Saved result for KEGG_1274\n", + "✓ Direct JSON parsing successful for KEGG_1275\n", + "✓ Saved result for KEGG_1275\n", + "✓ Direct JSON parsing successful for KEGG_1276\n", + "✓ Saved result for KEGG_1276\n", + "✓ Direct JSON parsing successful for KEGG_1277\n", + "✓ Saved result for KEGG_1277\n", + "✓ Direct JSON parsing successful for KEGG_1278\n", + "✓ Saved result for KEGG_1278\n", + "✓ Direct JSON parsing successful for KEGG_1279\n", + "✓ Saved result for KEGG_1279\n", + "✓ Direct JSON parsing successful for KEGG_1280\n", + "✓ Saved result for KEGG_1280\n", + "Waiting 5 seconds before next batch...\n", + "Processing batch 7 with 20 variants\n", + "Submitting batch with 20 requests...\n", + "Batch created with ID: msgbatch_013t9ZmJtiVrtsZ3jjZkw72i\n", + "Initial status: in_progress\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: ended\n", + "Processing: 0, Succeeded: 20, Errored: 0\n", + "Processing batch results...\n", + "✓ Direct JSON parsing successful for KEGG_1281\n", + "✓ Saved result for KEGG_1281\n", + "✓ Direct JSON parsing successful for KEGG_1282\n", + "✓ Saved result for KEGG_1282\n", + "✓ Direct JSON parsing successful for KEGG_1283\n", + "✓ Saved result for KEGG_1283\n", + "✓ Direct JSON parsing successful for KEGG_1284\n", + "✓ Saved result for KEGG_1284\n", + "✓ Direct JSON parsing successful for KEGG_1285\n", + "✓ Saved result for KEGG_1285\n", + "✓ Direct JSON parsing successful for KEGG_1286\n", + "✓ Saved result for KEGG_1286\n", + "✓ Direct JSON parsing successful for KEGG_1287\n", + "✓ Saved result for KEGG_1287\n", + "✓ Direct JSON parsing successful for KEGG_1288\n", + "✓ Saved result for KEGG_1288\n", + "✓ Direct JSON parsing successful for KEGG_1289\n", + "✓ Saved result for KEGG_1289\n", + "✓ Direct JSON parsing successful for KEGG_1290\n", + "✓ Saved result for KEGG_1290\n", + "✓ Direct JSON parsing successful for KEGG_1291\n", + "✓ Saved result for KEGG_1291\n", + "✗ All JSON parsing methods failed for KEGG_1292: Extra data: line 1 column 5072 (char 5071)\n", + "✗ JSON parsing error for KEGG_1292, saved full raw response\n", + "✓ Direct JSON parsing successful for KEGG_1293\n", + "✓ Saved result for KEGG_1293\n", + "✓ Direct JSON parsing successful for KEGG_1294\n", + "✓ Saved result for KEGG_1294\n", + "✓ Direct JSON parsing successful for KEGG_1295\n", + "✓ Saved result for KEGG_1295\n", + "✓ Direct JSON parsing successful for KEGG_1296\n", + "✓ Saved result for KEGG_1296\n", + "✓ Direct JSON parsing successful for KEGG_1297\n", + "✓ Saved result for KEGG_1297\n", + "✓ Direct JSON parsing successful for KEGG_1298\n", + "✓ Saved result for KEGG_1298\n", + "✓ Direct JSON parsing successful for KEGG_1299\n", + "✓ Saved result for KEGG_1299\n", + "✓ Direct JSON parsing successful for KEGG_1300\n", + "✓ Saved result for KEGG_1300\n", + "Waiting 5 seconds before next batch...\n", + "Processing batch 8 with 20 variants\n", + "Submitting batch with 20 requests...\n", + "Batch created with ID: msgbatch_01MydwyHdbpKxdsYiHBAQWKy\n", + "Initial status: in_progress\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: ended\n", + "Processing: 0, Succeeded: 20, Errored: 0\n", + "Processing batch results...\n", + "✓ Direct JSON parsing successful for KEGG_1301\n", + "✓ Saved result for KEGG_1301\n", + "✓ Direct JSON parsing successful for KEGG_1302\n", + "✓ Saved result for KEGG_1302\n", + "✓ Direct JSON parsing successful for KEGG_1303\n", + "✓ Saved result for KEGG_1303\n", + "✓ Direct JSON parsing successful for KEGG_1304\n", + "✓ Saved result for KEGG_1304\n", + "✓ Direct JSON parsing successful for KEGG_1305\n", + "✓ Saved result for KEGG_1305\n", + "✓ Direct JSON parsing successful for KEGG_1306\n", + "✓ Saved result for KEGG_1306\n", + "✓ Direct JSON parsing successful for KEGG_1307\n", + "✓ Saved result for KEGG_1307\n", + "✓ Direct JSON parsing successful for KEGG_1308\n", + "✓ Saved result for KEGG_1308\n", + "✓ Direct JSON parsing successful for KEGG_1309\n", + "✓ Saved result for KEGG_1309\n", + "✓ Direct JSON parsing successful for KEGG_1310\n", + "✓ Saved result for KEGG_1310\n", + "✓ Direct JSON parsing successful for KEGG_1311\n", + "✓ Saved result for KEGG_1311\n", + "✓ Direct JSON parsing successful for KEGG_1312\n", + "✓ Saved result for KEGG_1312\n", + "✓ Direct JSON parsing successful for KEGG_1313\n", + "✓ Saved result for KEGG_1313\n", + "✓ Direct JSON parsing successful for KEGG_1314\n", + "✓ Saved result for KEGG_1314\n", + "✓ Direct JSON parsing successful for KEGG_1315\n", + "✓ Saved result for KEGG_1315\n", + "✓ Direct JSON parsing successful for KEGG_1316\n", + "✓ Saved result for KEGG_1316\n", + "✓ Direct JSON parsing successful for KEGG_1317\n", + "✓ Saved result for KEGG_1317\n", + "✓ Direct JSON parsing successful for KEGG_1318\n", + "✓ Saved result for KEGG_1318\n", + "✓ Direct JSON parsing successful for KEGG_1319\n", + "✓ Saved result for KEGG_1319\n", + "✓ Direct JSON parsing successful for KEGG_1320\n", + "✓ Saved result for KEGG_1320\n", + "Waiting 5 seconds before next batch...\n", + "Processing batch 9 with 20 variants\n", + "Submitting batch with 20 requests...\n", + "Batch created with ID: msgbatch_01DmhWnyT88xsC5qMMZpNRWX\n", + "Initial status: in_progress\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: ended\n", + "Processing: 0, Succeeded: 20, Errored: 0\n", + "Processing batch results...\n", + "✓ Direct JSON parsing successful for KEGG_1321\n", + "✓ Saved result for KEGG_1321\n", + "✓ Direct JSON parsing successful for KEGG_1322\n", + "✓ Saved result for KEGG_1322\n", + "✓ Direct JSON parsing successful for KEGG_1323\n", + "✓ Saved result for KEGG_1323\n", + "✗ All JSON parsing methods failed for KEGG_1324: Extra data: line 1 column 3176 (char 3175)\n", + "✗ JSON parsing error for KEGG_1324, saved full raw response\n", + "✓ Direct JSON parsing successful for KEGG_1325\n", + "✓ Saved result for KEGG_1325\n", + "✓ Direct JSON parsing successful for KEGG_1326\n", + "✓ Saved result for KEGG_1326\n", + "✓ Direct JSON parsing successful for KEGG_1327\n", + "✓ Saved result for KEGG_1327\n", + "✓ Direct JSON parsing successful for KEGG_1328\n", + "✓ Saved result for KEGG_1328\n", + "✓ Direct JSON parsing successful for KEGG_1329\n", + "✓ Saved result for KEGG_1329\n", + "✓ Direct JSON parsing successful for KEGG_1330\n", + "✓ Saved result for KEGG_1330\n", + "✓ Direct JSON parsing successful for KEGG_1331\n", + "✓ Saved result for KEGG_1331\n", + "✓ Direct JSON parsing successful for KEGG_1332\n", + "✓ Saved result for KEGG_1332\n", + "✓ Direct JSON parsing successful for KEGG_1333\n", + "✓ Saved result for KEGG_1333\n", + "✓ Direct JSON parsing successful for KEGG_1334\n", + "✓ Saved result for KEGG_1334\n", + "✓ Direct JSON parsing successful for KEGG_1335\n", + "✓ Saved result for KEGG_1335\n", + "✓ Direct JSON parsing successful for KEGG_1336\n", + "✓ Saved result for KEGG_1336\n", + "✓ Direct JSON parsing successful for KEGG_1337\n", + "✓ Saved result for KEGG_1337\n", + "✓ Direct JSON parsing successful for KEGG_1338\n", + "✓ Saved result for KEGG_1338\n", + "✓ Direct JSON parsing successful for KEGG_1339\n", + "✓ Saved result for KEGG_1339\n", + "✓ Direct JSON parsing successful for KEGG_1340\n", + "✓ Saved result for KEGG_1340\n", + "Waiting 5 seconds before next batch...\n", + "Processing batch 10 with 20 variants\n", + "Submitting batch with 20 requests...\n", + "Batch created with ID: msgbatch_01WXTtNdeVvu6hAachYJLekw\n", + "Initial status: in_progress\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: ended\n", + "Processing: 0, Succeeded: 20, Errored: 0\n", + "Processing batch results...\n", + "✓ Direct JSON parsing successful for KEGG_1341\n", + "✓ Saved result for KEGG_1341\n", + "✓ Direct JSON parsing successful for KEGG_1342\n", + "✓ Saved result for KEGG_1342\n", + "✓ Direct JSON parsing successful for KEGG_1343\n", + "✓ Saved result for KEGG_1343\n", + "✓ Direct JSON parsing successful for KEGG_1344\n", + "✓ Saved result for KEGG_1344\n", + "✓ Direct JSON parsing successful for KEGG_1345\n", + "✓ Saved result for KEGG_1345\n", + "✓ Direct JSON parsing successful for KEGG_1346\n", + "✓ Saved result for KEGG_1346\n", + "✓ Direct JSON parsing successful for KEGG_1347\n", + "✓ Saved result for KEGG_1347\n", + "✓ Direct JSON parsing successful for KEGG_1348\n", + "✓ Saved result for KEGG_1348\n", + "✓ Direct JSON parsing successful for KEGG_1349\n", + "✓ Saved result for KEGG_1349\n", + "✓ Direct JSON parsing successful for KEGG_1350\n", + "✓ Saved result for KEGG_1350\n", + "✓ Direct JSON parsing successful for KEGG_1351\n", + "✓ Saved result for KEGG_1351\n", + "✓ Direct JSON parsing successful for KEGG_1352\n", + "✓ Saved result for KEGG_1352\n", + "✓ Direct JSON parsing successful for KEGG_1353\n", + "✓ Saved result for KEGG_1353\n", + "✓ Direct JSON parsing successful for KEGG_1354\n", + "✓ Saved result for KEGG_1354\n", + "✓ Direct JSON parsing successful for KEGG_1355\n", + "✓ Saved result for KEGG_1355\n", + "✓ Direct JSON parsing successful for KEGG_1356\n", + "✓ Saved result for KEGG_1356\n", + "✓ Direct JSON parsing successful for KEGG_1357\n", + "✓ Saved result for KEGG_1357\n", + "✓ Direct JSON parsing successful for KEGG_1358\n", + "✓ Saved result for KEGG_1358\n", + "✓ Direct JSON parsing successful for KEGG_1359\n", + "✓ Saved result for KEGG_1359\n", + "✓ Direct JSON parsing successful for KEGG_1360\n", + "✓ Saved result for KEGG_1360\n", + "Waiting 5 seconds before next batch...\n", + "Processing batch 11 with 20 variants\n", + "Submitting batch with 20 requests...\n", + "Batch created with ID: msgbatch_01EozBcgr17wzXEjkZYfi3rX\n", + "Initial status: in_progress\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: ended\n", + "Processing: 0, Succeeded: 20, Errored: 0\n", + "Processing batch results...\n", + "✓ Direct JSON parsing successful for KEGG_1361\n", + "✓ Saved result for KEGG_1361\n", + "✓ Direct JSON parsing successful for KEGG_1362\n", + "✓ Saved result for KEGG_1362\n", + "✓ Direct JSON parsing successful for KEGG_1363\n", + "✓ Saved result for KEGG_1363\n", + "✓ Direct JSON parsing successful for KEGG_1364\n", + "✓ Saved result for KEGG_1364\n", + "✓ Direct JSON parsing successful for KEGG_1365\n", + "✓ Saved result for KEGG_1365\n", + "✓ Direct JSON parsing successful for KEGG_1366\n", + "✓ Saved result for KEGG_1366\n", + "✓ Direct JSON parsing successful for KEGG_1367\n", + "✓ Saved result for KEGG_1367\n", + "✓ Direct JSON parsing successful for KEGG_1368\n", + "✓ Saved result for KEGG_1368\n", + "✓ Direct JSON parsing successful for KEGG_1369\n", + "✓ Saved result for KEGG_1369\n", + "✓ Direct JSON parsing successful for KEGG_1370\n", + "✓ Saved result for KEGG_1370\n", + "✓ Direct JSON parsing successful for KEGG_1371\n", + "✓ Saved result for KEGG_1371\n", + "✓ Direct JSON parsing successful for KEGG_1372\n", + "✓ Saved result for KEGG_1372\n", + "✓ Direct JSON parsing successful for KEGG_1373\n", + "✓ Saved result for KEGG_1373\n", + "✓ Direct JSON parsing successful for KEGG_1374\n", + "✓ Saved result for KEGG_1374\n", + "✓ Direct JSON parsing successful for KEGG_1375\n", + "✓ Saved result for KEGG_1375\n", + "✓ Direct JSON parsing successful for KEGG_1376\n", + "✓ Saved result for KEGG_1376\n", + "✓ Direct JSON parsing successful for KEGG_1377\n", + "✓ Saved result for KEGG_1377\n", + "✓ Direct JSON parsing successful for KEGG_1378\n", + "✓ Saved result for KEGG_1378\n", + "✓ Direct JSON parsing successful for KEGG_1379\n", + "✓ Saved result for KEGG_1379\n", + "✓ Direct JSON parsing successful for KEGG_1380\n", + "✓ Saved result for KEGG_1380\n", + "Waiting 5 seconds before next batch...\n", + "Processing batch 12 with 20 variants\n", + "Submitting batch with 20 requests...\n", + "Batch created with ID: msgbatch_01VpCVdDVpCo6KxrZoKifjy8\n", + "Initial status: in_progress\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: ended\n", + "Processing: 0, Succeeded: 20, Errored: 0\n", + "Processing batch results...\n", + "✓ Direct JSON parsing successful for KEGG_1381\n", + "✓ Saved result for KEGG_1381\n", + "✓ Direct JSON parsing successful for KEGG_1382\n", + "✓ Saved result for KEGG_1382\n", + "✓ Direct JSON parsing successful for KEGG_1383\n", + "✓ Saved result for KEGG_1383\n", + "✓ Direct JSON parsing successful for KEGG_1384\n", + "✓ Saved result for KEGG_1384\n", + "✓ Direct JSON parsing successful for KEGG_1385\n", + "✓ Saved result for KEGG_1385\n", + "✓ Direct JSON parsing successful for KEGG_1386\n", + "✓ Saved result for KEGG_1386\n", + "✓ Direct JSON parsing successful for KEGG_1387\n", + "✓ Saved result for KEGG_1387\n", + "✓ Direct JSON parsing successful for KEGG_1388\n", + "✓ Saved result for KEGG_1388\n", + "✓ Direct JSON parsing successful for KEGG_1389\n", + "✓ Saved result for KEGG_1389\n", + "✓ Direct JSON parsing successful for KEGG_1390\n", + "✓ Saved result for KEGG_1390\n", + "✓ Direct JSON parsing successful for KEGG_1391\n", + "✓ Saved result for KEGG_1391\n", + "✓ Direct JSON parsing successful for KEGG_1392\n", + "✓ Saved result for KEGG_1392\n", + "✓ Direct JSON parsing successful for KEGG_1393\n", + "✓ Saved result for KEGG_1393\n", + "✓ Direct JSON parsing successful for KEGG_1394\n", + "✓ Saved result for KEGG_1394\n", + "✓ Direct JSON parsing successful for KEGG_1395\n", + "✓ Saved result for KEGG_1395\n", + "✓ Direct JSON parsing successful for KEGG_1396\n", + "✓ Saved result for KEGG_1396\n", + "✓ Direct JSON parsing successful for KEGG_1397\n", + "✓ Saved result for KEGG_1397\n", + "✓ Direct JSON parsing successful for KEGG_1398\n", + "✓ Saved result for KEGG_1398\n", + "✓ Direct JSON parsing successful for KEGG_1399\n", + "✓ Saved result for KEGG_1399\n", + "✓ Direct JSON parsing successful for KEGG_1400\n", + "✓ Saved result for KEGG_1400\n", + "Waiting 5 seconds before next batch...\n", + "Processing batch 13 with 20 variants\n", + "Submitting batch with 20 requests...\n", + "Batch created with ID: msgbatch_01Aq6eQygzrFD1uckWDMF6uE\n", + "Initial status: in_progress\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: ended\n", + "Processing: 0, Succeeded: 20, Errored: 0\n", + "Processing batch results...\n", + "✓ Direct JSON parsing successful for KEGG_1401\n", + "✓ Saved result for KEGG_1401\n", + "✓ Direct JSON parsing successful for KEGG_1402\n", + "✓ Saved result for KEGG_1402\n", + "✓ Direct JSON parsing successful for KEGG_1403\n", + "✓ Saved result for KEGG_1403\n", + "✓ Direct JSON parsing successful for KEGG_1404\n", + "✓ Saved result for KEGG_1404\n", + "✓ Direct JSON parsing successful for KEGG_1405\n", + "✓ Saved result for KEGG_1405\n", + "✓ Direct JSON parsing successful for KEGG_1406\n", + "✓ Saved result for KEGG_1406\n", + "✓ Direct JSON parsing successful for KEGG_1407\n", + "✓ Saved result for KEGG_1407\n", + "✓ Direct JSON parsing successful for KEGG_1408\n", + "✓ Saved result for KEGG_1408\n", + "✓ Direct JSON parsing successful for KEGG_1409\n", + "✓ Saved result for KEGG_1409\n", + "✓ Direct JSON parsing successful for KEGG_1410\n", + "✓ Saved result for KEGG_1410\n", + "✓ Direct JSON parsing successful for KEGG_1411\n", + "✓ Saved result for KEGG_1411\n", + "✓ Direct JSON parsing successful for KEGG_1412\n", + "✓ Saved result for KEGG_1412\n", + "✓ Direct JSON parsing successful for KEGG_1413\n", + "✓ Saved result for KEGG_1413\n", + "✓ Direct JSON parsing successful for KEGG_1414\n", + "✓ Saved result for KEGG_1414\n", + "✓ Direct JSON parsing successful for KEGG_1415\n", + "✓ Saved result for KEGG_1415\n", + "✓ Direct JSON parsing successful for KEGG_1416\n", + "✓ Saved result for KEGG_1416\n", + "✓ Direct JSON parsing successful for KEGG_1417\n", + "✓ Saved result for KEGG_1417\n", + "✓ Direct JSON parsing successful for KEGG_1418\n", + "✓ Saved result for KEGG_1418\n", + "✓ Direct JSON parsing successful for KEGG_1419\n", + "✓ Saved result for KEGG_1419\n", + "✓ Direct JSON parsing successful for KEGG_1420\n", + "✓ Saved result for KEGG_1420\n", + "Waiting 5 seconds before next batch...\n", + "Processing batch 14 with 20 variants\n", + "Submitting batch with 20 requests...\n", + "Batch created with ID: msgbatch_01JtWcxJDaeVL6SquFG9Rriv\n", + "Initial status: in_progress\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 20, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: ended\n", + "Processing: 0, Succeeded: 20, Errored: 0\n", + "Processing batch results...\n", + "✓ Direct JSON parsing successful for KEGG_1421\n", + "✓ Saved result for KEGG_1421\n", + "✓ Direct JSON parsing successful for KEGG_1422\n", + "✓ Saved result for KEGG_1422\n", + "✓ Direct JSON parsing successful for KEGG_1423\n", + "✓ Saved result for KEGG_1423\n", + "✓ Direct JSON parsing successful for KEGG_1424\n", + "✓ Saved result for KEGG_1424\n", + "✗ All JSON parsing methods failed for KEGG_1425: Extra data: line 1 column 4488 (char 4487)\n", + "✗ JSON parsing error for KEGG_1425, saved full raw response\n", + "✗ All JSON parsing methods failed for KEGG_1426: Extra data: line 1 column 4285 (char 4284)\n", + "✗ JSON parsing error for KEGG_1426, saved full raw response\n", + "✓ Direct JSON parsing successful for KEGG_1427\n", + "✓ Saved result for KEGG_1427\n", + "✓ Direct JSON parsing successful for KEGG_1428\n", + "✓ Saved result for KEGG_1428\n", + "✓ Direct JSON parsing successful for KEGG_1429\n", + "✓ Saved result for KEGG_1429\n", + "✓ Direct JSON parsing successful for KEGG_1430\n", + "✓ Saved result for KEGG_1430\n", + "✓ Direct JSON parsing successful for KEGG_1431\n", + "✓ Saved result for KEGG_1431\n", + "✓ Direct JSON parsing successful for KEGG_1432\n", + "✓ Saved result for KEGG_1432\n", + "✓ Direct JSON parsing successful for KEGG_1433\n", + "✓ Saved result for KEGG_1433\n", + "✓ Direct JSON parsing successful for KEGG_1434\n", + "✓ Saved result for KEGG_1434\n", + "✓ Direct JSON parsing successful for KEGG_1435\n", + "✓ Saved result for KEGG_1435\n", + "✓ Direct JSON parsing successful for KEGG_1436\n", + "✓ Saved result for KEGG_1436\n", + "✓ Direct JSON parsing successful for KEGG_1437\n", + "✓ Saved result for KEGG_1437\n", + "✓ Direct JSON parsing successful for KEGG_1438\n", + "✓ Saved result for KEGG_1438\n", + "✓ Direct JSON parsing successful for KEGG_1439\n", + "✓ Saved result for KEGG_1439\n", + "✓ Direct JSON parsing successful for KEGG_1440\n", + "✓ Saved result for KEGG_1440\n", + "Waiting 5 seconds before next batch...\n", + "Processing batch 15 with 9 variants\n", + "Submitting batch with 9 requests...\n", + "Batch created with ID: msgbatch_01JpVNThUhArnk2wVpKbcxko\n", + "Initial status: in_progress\n", + "Batch status: in_progress\n", + "Processing: 9, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 9, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 9, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 9, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 9, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 9, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 9, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 9, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 9, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 9, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 9, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: in_progress\n", + "Processing: 9, Succeeded: 0, Errored: 0\n", + "Waiting 10 seconds...\n", + "Batch status: ended\n", + "Processing: 0, Succeeded: 9, Errored: 0\n", + "Processing batch results...\n", + "✓ Direct JSON parsing successful for KEGG_1441\n", + "✓ Saved result for KEGG_1441\n", + "✓ Direct JSON parsing successful for KEGG_1442\n", + "✓ Saved result for KEGG_1442\n", + "✓ Direct JSON parsing successful for KEGG_1443\n", + "✓ Saved result for KEGG_1443\n", + "✓ Direct JSON parsing successful for KEGG_1444\n", + "✓ Saved result for KEGG_1444\n", + "✓ Direct JSON parsing successful for KEGG_1445\n", + "✓ Saved result for KEGG_1445\n", + "✓ Direct JSON parsing successful for KEGG_1446\n", + "✓ Saved result for KEGG_1446\n", + "✓ Direct JSON parsing successful for KEGG_1447\n", + "✓ Saved result for KEGG_1447\n", + "✓ Direct JSON parsing successful for KEGG_1448\n", + "✓ Saved result for KEGG_1448\n", + "✓ Direct JSON parsing successful for KEGG_1449\n", + "✓ Saved result for KEGG_1449\n", + "All batches processed!\n", + "Combining results...\n", + "Found 289 JSON files to combine\n", + "Added KEGG_1161_processed.json to combined results\n", + "Added KEGG_1162_processed.json to combined results\n", + "Added KEGG_1163_processed.json to combined results\n", + "Added KEGG_1164_processed.json to combined results\n", + "Added KEGG_1165_processed.json to combined results\n", + "Added KEGG_1166_processed.json to combined results\n", + "Added KEGG_1167_processed.json to combined results\n", + "Added KEGG_1168_processed.json to combined results\n", + "Added KEGG_1169_processed.json to combined results\n", + "Added KEGG_1170_processed.json to combined results\n", + "Added KEGG_1171_processed.json to combined results\n", + "Added KEGG_1172_processed.json to combined results\n", + "Added KEGG_1173_processed.json to combined results\n", + "Added KEGG_1174_processed.json to combined results\n", + "Added KEGG_1175_processed.json to combined results\n", + "Skipping file with error: KEGG_1176_processed.json\n", + "Added KEGG_1177_processed.json to combined results\n", + "Added KEGG_1178_processed.json to combined results\n", + "Added KEGG_1179_processed.json to combined results\n", + "Added KEGG_1180_processed.json to combined results\n", + "Added KEGG_1181_processed.json to combined results\n", + "Added KEGG_1182_processed.json to combined results\n", + "Added KEGG_1183_processed.json to combined results\n", + "Added KEGG_1184_processed.json to combined results\n", + "Added KEGG_1185_processed.json to combined results\n", + "Added KEGG_1186_processed.json to combined results\n", + "Added KEGG_1187_processed.json to combined results\n", + "Added KEGG_1188_processed.json to combined results\n", + "Added KEGG_1189_processed.json to combined results\n", + "Added KEGG_1190_processed.json to combined results\n", + "Added KEGG_1191_processed.json to combined results\n", + "Added KEGG_1192_processed.json to combined results\n", + "Added KEGG_1193_processed.json to combined results\n", + "Added KEGG_1194_processed.json to combined results\n", + "Added KEGG_1195_processed.json to combined results\n", + "Added KEGG_1196_processed.json to combined results\n", + "Added KEGG_1197_processed.json to combined results\n", + "Added KEGG_1198_processed.json to combined results\n", + "Added KEGG_1199_processed.json to combined results\n", + "Added KEGG_1200_processed.json to combined results\n", + "Added KEGG_1201_processed.json to combined results\n", + "Added KEGG_1202_processed.json to combined results\n", + "Added KEGG_1203_processed.json to combined results\n", + "Added KEGG_1204_processed.json to combined results\n", + "Added KEGG_1205_processed.json to combined results\n", + "Added KEGG_1206_processed.json to combined results\n", + "Added KEGG_1207_processed.json to combined results\n", + "Added KEGG_1208_processed.json to combined results\n", + "Added KEGG_1209_processed.json to combined results\n", + "Added KEGG_1210_processed.json to combined results\n", + "Added KEGG_1211_processed.json to combined results\n", + "Added KEGG_1212_processed.json to combined results\n", + "Added KEGG_1213_processed.json to combined results\n", + "Added KEGG_1214_processed.json to combined results\n", + "Added KEGG_1215_processed.json to combined results\n", + "Added KEGG_1216_processed.json to combined results\n", + "Added KEGG_1217_processed.json to combined results\n", + "Added KEGG_1218_processed.json to combined results\n", + "Added KEGG_1219_processed.json to combined results\n", + "Added KEGG_1220_processed.json to combined results\n", + "Added KEGG_1221_processed.json to combined results\n", + "Added KEGG_1222_processed.json to combined results\n", + "Added KEGG_1223_processed.json to combined results\n", + "Added KEGG_1224_processed.json to combined results\n", + "Added KEGG_1225_processed.json to combined results\n", + "Added KEGG_1226_processed.json to combined results\n", + "Added KEGG_1227_processed.json to combined results\n", + "Added KEGG_1228_processed.json to combined results\n", + "Added KEGG_1229_processed.json to combined results\n", + "Added KEGG_1230_processed.json to combined results\n", + "Added KEGG_1231_processed.json to combined results\n", + "Added KEGG_1232_processed.json to combined results\n", + "Added KEGG_1233_processed.json to combined results\n", + "Added KEGG_1234_processed.json to combined results\n", + "Added KEGG_1235_processed.json to combined results\n", + "Added KEGG_1236_processed.json to combined results\n", + "Added KEGG_1237_processed.json to combined results\n", + "Added KEGG_1238_processed.json to combined results\n", + "Added KEGG_1239_processed.json to combined results\n", + "Added KEGG_1240_processed.json to combined results\n", + "Added KEGG_1241_processed.json to combined results\n", + "Added KEGG_1242_processed.json to combined results\n", + "Added KEGG_1243_processed.json to combined results\n", + "Added KEGG_1244_processed.json to combined results\n", + "Added KEGG_1245_processed.json to combined results\n", + "Added KEGG_1246_processed.json to combined results\n", + "Added KEGG_1247_processed.json to combined results\n", + "Added KEGG_1248_processed.json to combined results\n", + "Added KEGG_1249_processed.json to combined results\n", + "Added KEGG_1250_processed.json to combined results\n", + "Added KEGG_1251_processed.json to combined results\n", + "Added KEGG_1252_processed.json to combined results\n", + "Added KEGG_1253_processed.json to combined results\n", + "Added KEGG_1254_processed.json to combined results\n", + "Added KEGG_1255_processed.json to combined results\n", + "Added KEGG_1256_processed.json to combined results\n", + "Added KEGG_1257_processed.json to combined results\n", + "Added KEGG_1258_processed.json to combined results\n", + "Added KEGG_1259_processed.json to combined results\n", + "Added KEGG_1260_processed.json to combined results\n", + "Added KEGG_1261_processed.json to combined results\n", + "Added KEGG_1262_processed.json to combined results\n", + "Added KEGG_1263_processed.json to combined results\n", + "Added KEGG_1264_processed.json to combined results\n", + "Added KEGG_1265_processed.json to combined results\n", + "Added KEGG_1266_processed.json to combined results\n", + "Added KEGG_1267_processed.json to combined results\n", + "Added KEGG_1268_processed.json to combined results\n", + "Added KEGG_1269_processed.json to combined results\n", + "Added KEGG_1270_processed.json to combined results\n", + "Added KEGG_1271_processed.json to combined results\n", + "Added KEGG_1272_processed.json to combined results\n", + "Added KEGG_1273_processed.json to combined results\n", + "Added KEGG_1274_processed.json to combined results\n", + "Added KEGG_1275_processed.json to combined results\n", + "Added KEGG_1276_processed.json to combined results\n", + "Added KEGG_1277_processed.json to combined results\n", + "Added KEGG_1278_processed.json to combined results\n", + "Added KEGG_1279_processed.json to combined results\n", + "Added KEGG_1280_processed.json to combined results\n", + "Added KEGG_1281_processed.json to combined results\n", + "Added KEGG_1282_processed.json to combined results\n", + "Added KEGG_1283_processed.json to combined results\n", + "Added KEGG_1284_processed.json to combined results\n", + "Added KEGG_1285_processed.json to combined results\n", + "Added KEGG_1286_processed.json to combined results\n", + "Added KEGG_1287_processed.json to combined results\n", + "Added KEGG_1288_processed.json to combined results\n", + "Added KEGG_1289_processed.json to combined results\n", + "Added KEGG_1290_processed.json to combined results\n", + "Added KEGG_1291_processed.json to combined results\n", + "Skipping file with error: KEGG_1292_processed.json\n", + "Added KEGG_1293_processed.json to combined results\n", + "Added KEGG_1294_processed.json to combined results\n", + "Added KEGG_1295_processed.json to combined results\n", + "Added KEGG_1296_processed.json to combined results\n", + "Added KEGG_1297_processed.json to combined results\n", + "Added KEGG_1298_processed.json to combined results\n", + "Added KEGG_1299_processed.json to combined results\n", + "Added KEGG_1300_processed.json to combined results\n", + "Added KEGG_1301_processed.json to combined results\n", + "Added KEGG_1302_processed.json to combined results\n", + "Added KEGG_1303_processed.json to combined results\n", + "Added KEGG_1304_processed.json to combined results\n", + "Added KEGG_1305_processed.json to combined results\n", + "Added KEGG_1306_processed.json to combined results\n", + "Added KEGG_1307_processed.json to combined results\n", + "Added KEGG_1308_processed.json to combined results\n", + "Added KEGG_1309_processed.json to combined results\n", + "Added KEGG_1310_processed.json to combined results\n", + "Added KEGG_1311_processed.json to combined results\n", + "Added KEGG_1312_processed.json to combined results\n", + "Added KEGG_1313_processed.json to combined results\n", + "Added KEGG_1314_processed.json to combined results\n", + "Added KEGG_1315_processed.json to combined results\n", + "Added KEGG_1316_processed.json to combined results\n", + "Added KEGG_1317_processed.json to combined results\n", + "Added KEGG_1318_processed.json to combined results\n", + "Added KEGG_1319_processed.json to combined results\n", + "Added KEGG_1320_processed.json to combined results\n", + "Added KEGG_1321_processed.json to combined results\n", + "Added KEGG_1322_processed.json to combined results\n", + "Added KEGG_1323_processed.json to combined results\n", + "Skipping file with error: KEGG_1324_processed.json\n", + "Added KEGG_1325_processed.json to combined results\n", + "Added KEGG_1326_processed.json to combined results\n", + "Added KEGG_1327_processed.json to combined results\n", + "Added KEGG_1328_processed.json to combined results\n", + "Added KEGG_1329_processed.json to combined results\n", + "Added KEGG_1330_processed.json to combined results\n", + "Added KEGG_1331_processed.json to combined results\n", + "Added KEGG_1332_processed.json to combined results\n", + "Added KEGG_1333_processed.json to combined results\n", + "Added KEGG_1334_processed.json to combined results\n", + "Added KEGG_1335_processed.json to combined results\n", + "Added KEGG_1336_processed.json to combined results\n", + "Added KEGG_1337_processed.json to combined results\n", + "Added KEGG_1338_processed.json to combined results\n", + "Added KEGG_1339_processed.json to combined results\n", + "Added KEGG_1340_processed.json to combined results\n", + "Added KEGG_1341_processed.json to combined results\n", + "Added KEGG_1342_processed.json to combined results\n", + "Added KEGG_1343_processed.json to combined results\n", + "Added KEGG_1344_processed.json to combined results\n", + "Added KEGG_1345_processed.json to combined results\n", + "Added KEGG_1346_processed.json to combined results\n", + "Added KEGG_1347_processed.json to combined results\n", + "Added KEGG_1348_processed.json to combined results\n", + "Added KEGG_1349_processed.json to combined results\n", + "Added KEGG_1350_processed.json to combined results\n", + "Added KEGG_1351_processed.json to combined results\n", + "Added KEGG_1352_processed.json to combined results\n", + "Added KEGG_1353_processed.json to combined results\n", + "Added KEGG_1354_processed.json to combined results\n", + "Added KEGG_1355_processed.json to combined results\n", + "Added KEGG_1356_processed.json to combined results\n", + "Added KEGG_1357_processed.json to combined results\n", + "Added KEGG_1358_processed.json to combined results\n", + "Added KEGG_1359_processed.json to combined results\n", + "Added KEGG_1360_processed.json to combined results\n", + "Added KEGG_1361_processed.json to combined results\n", + "Added KEGG_1362_processed.json to combined results\n", + "Added KEGG_1363_processed.json to combined results\n", + "Added KEGG_1364_processed.json to combined results\n", + "Added KEGG_1365_processed.json to combined results\n", + "Added KEGG_1366_processed.json to combined results\n", + "Added KEGG_1367_processed.json to combined results\n", + "Added KEGG_1368_processed.json to combined results\n", + "Added KEGG_1369_processed.json to combined results\n", + "Added KEGG_1370_processed.json to combined results\n", + "Added KEGG_1371_processed.json to combined results\n", + "Added KEGG_1372_processed.json to combined results\n", + "Added KEGG_1373_processed.json to combined results\n", + "Added KEGG_1374_processed.json to combined results\n", + "Added KEGG_1375_processed.json to combined results\n", + "Added KEGG_1376_processed.json to combined results\n", + "Added KEGG_1377_processed.json to combined results\n", + "Added KEGG_1378_processed.json to combined results\n", + "Added KEGG_1379_processed.json to combined results\n", + "Added KEGG_1380_processed.json to combined results\n", + "Added KEGG_1381_processed.json to combined results\n", + "Added KEGG_1382_processed.json to combined results\n", + "Added KEGG_1383_processed.json to combined results\n", + "Added KEGG_1384_processed.json to combined results\n", + "Added KEGG_1385_processed.json to combined results\n", + "Added KEGG_1386_processed.json to combined results\n", + "Added KEGG_1387_processed.json to combined results\n", + "Added KEGG_1388_processed.json to combined results\n", + "Added KEGG_1389_processed.json to combined results\n", + "Added KEGG_1390_processed.json to combined results\n", + "Added KEGG_1391_processed.json to combined results\n", + "Added KEGG_1392_processed.json to combined results\n", + "Added KEGG_1393_processed.json to combined results\n", + "Added KEGG_1394_processed.json to combined results\n", + "Added KEGG_1395_processed.json to combined results\n", + "Added KEGG_1396_processed.json to combined results\n", + "Added KEGG_1397_processed.json to combined results\n", + "Added KEGG_1398_processed.json to combined results\n", + "Added KEGG_1399_processed.json to combined results\n", + "Added KEGG_1400_processed.json to combined results\n", + "Added KEGG_1401_processed.json to combined results\n", + "Added KEGG_1402_processed.json to combined results\n", + "Added KEGG_1403_processed.json to combined results\n", + "Added KEGG_1404_processed.json to combined results\n", + "Added KEGG_1405_processed.json to combined results\n", + "Added KEGG_1406_processed.json to combined results\n", + "Added KEGG_1407_processed.json to combined results\n", + "Added KEGG_1408_processed.json to combined results\n", + "Added KEGG_1409_processed.json to combined results\n", + "Added KEGG_1410_processed.json to combined results\n", + "Added KEGG_1411_processed.json to combined results\n", + "Added KEGG_1412_processed.json to combined results\n", + "Added KEGG_1413_processed.json to combined results\n", + "Added KEGG_1414_processed.json to combined results\n", + "Added KEGG_1415_processed.json to combined results\n", + "Added KEGG_1416_processed.json to combined results\n", + "Added KEGG_1417_processed.json to combined results\n", + "Added KEGG_1418_processed.json to combined results\n", + "Added KEGG_1419_processed.json to combined results\n", + "Added KEGG_1420_processed.json to combined results\n", + "Added KEGG_1421_processed.json to combined results\n", + "Added KEGG_1422_processed.json to combined results\n", + "Added KEGG_1423_processed.json to combined results\n", + "Added KEGG_1424_processed.json to combined results\n", + "Skipping file with error: KEGG_1425_processed.json\n", + "Skipping file with error: KEGG_1426_processed.json\n", + "Added KEGG_1427_processed.json to combined results\n", + "Added KEGG_1428_processed.json to combined results\n", + "Added KEGG_1429_processed.json to combined results\n", + "Added KEGG_1430_processed.json to combined results\n", + "Added KEGG_1431_processed.json to combined results\n", + "Added KEGG_1432_processed.json to combined results\n", + "Added KEGG_1433_processed.json to combined results\n", + "Added KEGG_1434_processed.json to combined results\n", + "Added KEGG_1435_processed.json to combined results\n", + "Added KEGG_1436_processed.json to combined results\n", + "Added KEGG_1437_processed.json to combined results\n", + "Added KEGG_1438_processed.json to combined results\n", + "Added KEGG_1439_processed.json to combined results\n", + "Added KEGG_1440_processed.json to combined results\n", + "Added KEGG_1441_processed.json to combined results\n", + "Added KEGG_1442_processed.json to combined results\n", + "Added KEGG_1443_processed.json to combined results\n", + "Added KEGG_1444_processed.json to combined results\n", + "Added KEGG_1445_processed.json to combined results\n", + "Added KEGG_1446_processed.json to combined results\n", + "Added KEGG_1447_processed.json to combined results\n", + "Added KEGG_1448_processed.json to combined results\n", + "Added KEGG_1449_processed.json to combined results\n", + "Successfully combined 284 results. 5 files had errors.\n", + "Saved all results to 'all_variant_analyses.json'\n", + "Processing complete!\n", + "Results saved to processed_variants and combined in all_variant_analyses.json\n" + ] + } + ], + "source": [ + "# 2. Then run the process_genetic_variants function:\n", + "# Run the function with your parameters\n", + "result = process_genetic_variants(\n", + " file_path=\"final_network_with_variant.tsv\",\n", + " num_variants=20,\n", + " batch_size=5,\n", + " model=\"claude-3-7-sonnet-20250219\"\n", + ")\n", + "print(result)\n", + "\n", + "def process_genetic_variants(file_path, num_variants=20, batch_size=5, model=\"claude-3-7-sonnet-20250219\"):\n", + " \"\"\"\n", + " Process genetic variants from a TSV file using the Anthropic Claude API.\n", + " \n", + " Parameters:\n", + " -----------\n", + " file_path : str\n", + " Path to the TSV file containing variant data (relative to notebook location)\n", + " num_variants : int, optional\n", + " Number of variants to process (default: 20)\n", + " Set to None to process all variants in the file\n", + " batch_size : int, optional\n", + " Number of variants to process in each API batch (default: 5)\n", + " Smaller batches provide better error handling but may be slower\n", + " model : str, optional\n", + " Claude model to use (default: \"claude-3-7-sonnet-20250219\")\n", + " \n", + " Returns:\n", + " --------\n", + " str\n", + " Status message indicating completion and output locations\n", + " \n", + " Output Files:\n", + " -------------\n", + " - Individual analyses: saved in processed_variants/ directory\n", + " - Combined results: saved as all_variant_analyses.json\n", + " \"\"\"\n", + " print(f\"Genetic Variant Analysis Script\")\n", + " print(f\"===============================\")\n", + " print(f\"Model: {model}\")\n", + " print(f\"Batch size: {batch_size}\")\n", + " \n", + " # Load data\n", + " print(f\"Loading variant data from {file_path}...\")\n", + " try:\n", + " all_variants = load_variant_data(file_path)\n", + " print(f\"Loaded {len(all_variants)} variants in total\")\n", + " except FileNotFoundError:\n", + " return f\"Error: File '{file_path}' not found. Please check the file path.\"\n", + " except Exception as e:\n", + " return f\"Error loading data: {str(e)}\"\n", + " \n", + " # Limit to specified number of variants\n", + " if num_variants is None:\n", + " variants = all_variants\n", + " print(f\"Processing all {len(variants)} variants\")\n", + " else:\n", + " variants = all_variants[:num_variants]\n", + " print(f\"Processing the first {len(variants)} variants\")\n", + " \n", + " if not variants:\n", + " return \"Error: No variants to process\"\n", + " \n", + " # Process variants\n", + " try:\n", + " process_variants_in_batches(\n", + " variants,\n", + " batch_size=batch_size,\n", + " model=model\n", + " )\n", + " except Exception as e:\n", + " return f\"Error during processing: {str(e)}\"\n", + " \n", + " # Combine results\n", + " print(\"Combining results...\")\n", + " try:\n", + " combine_all_results()\n", + " except Exception as e:\n", + " print(f\"Warning: Error combining results: {str(e)}\")\n", + " \n", + " print(\"Processing complete!\")\n", + " return f\"Analysis complete. Results saved to '{output_dir}/' directory and combined in 'all_variant_analyses.json'\"" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Usage Examples\n", + "\n", + "Examples of how to run the genetic variant analysis with different parameters." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Notes and Considerations\n", + "\n", + "### API Usage\n", + "- This notebook uses the Anthropic Claude API which requires an API key\n", + "- Processing large numbers of variants will consume significant API credits\n", + "- Consider rate limits and batch sizes based on your API plan\n", + "\n", + "### Data Requirements\n", + "- Input data should be in TSV format with required columns\n", + "- Gene and Disease fields should contain valid JSON when structured data is available\n", + "- Ensure your input file path is correct relative to the notebook location\n", + "\n", + "### Output\n", + "- Individual variant analyses are saved in the `processed_variants/` directory\n", + "- Combined results are saved as `all_variant_analyses.json`\n", + "- Failed analyses are saved with error information for debugging\n", + "\n", + "### Customization\n", + "- Adjust `num_variants` and `batch_size` parameters based on your needs\n", + "- Modify the prompt template in `create_variant_prompt()` for different analysis focuses\n", + "- Change the output directory by modifying the `output_dir` variable\n", + "\n", + "### Example 1: Basic usage with default parameters\n", + "Process first 20 variants from the KEGG dataset\n", + "```python\n", + "file_path = \"kegg_data/final_network_with_variant.tsv\"\n", + "\n", + "result = process_genetic_variants(\n", + " file_path=file_path,\n", + " num_variants=20, # Process first 20 variants\n", + " batch_size=5, # Process 5 variants per batch\n", + " model=\"claude-3-7-sonnet-20250219\"\n", + ")\n", + "print(result)\n", + "```\n", + "\n", + "### Example 2: Process more variants with larger batches\n", + "Uncomment the following lines to run:\n", + "```python\n", + "result = process_genetic_variants(\n", + " file_path=file_path,\n", + " num_variants=100, # Process first 100 variants\n", + " batch_size=10, # Larger batches for efficiency\n", + " model=\"claude-3-7-sonnet-20250219\"\n", + ")\n", + "print(result)\n", + "```\n", + "\n", + "### Example 3: Process all variants in the file\n", + "Uncomment the following lines to run (be aware of API costs):\n", + "```python\n", + "result = process_genetic_variants(\n", + " file_path=file_path,\n", + " num_variants=None, # Process all variants\n", + " batch_size=5, # Conservative batch size\n", + " model=\"claude-3-7-sonnet-20250219\"\n", + ")\n", + "print(result)\n", + "```" + ] + } + ], + "metadata": { + "colab": { + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + }, + "language_info": { + "name": "python" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +}