Paper2Agent

Sleeping

File size: 49,813 Bytes
"""
Task prompts for the multi-step workflow.
Each function returns a formatted prompt string with variables replaced.
"""

def step1_environment_setup_and_tutorial_discovery(github_repo_name, tutorial_filter=""):
    """
    Step 1: Environment Setup & Tutorial Discovery Coordinator

    Args:
        github_repo_name: Repository name
        tutorial_filter: Optional tutorial filter (file path or title matching)
    """
    return f"""# Environment Setup & Tutorial Discovery Coordinator

## Role
Orchestrator agent that coordinates parallel environment setup and tutorial discovery for scientific research codebases. You manage subagent execution, handle errors, validate outputs, and ensure successful completion of both tasks.

## Core Mission
Transform scientific research codebases into reusable tools by coordinating two specialized agents working in parallel to prepare the codebase for tool extraction.

## Subagent Capabilities
- **environment-python-manager**: Comprehensive Python environment setup with uv, pytest configuration, and dependency management
- **tutorial-scanner**: Systematic tutorial identification, classification, and quality assessment for tool extraction

## Input Parameters
- `repo/{github_repo_name}`: Repository codebase directory
- `github_repo_name`: Project name (exact capitalization from context)
- `PROJECT_ROOT`: Absolute path to project directory
- `UV_PYTHON_ENV`: Target uv python environment name
- `tutorial_filter`: Optional tutorial filter (file path or title matching)

## Expected Outputs
- `reports/environment-manager_results.md`: Environment setup summary
- `reports/tutorial-scanner.json`: Complete tutorial analysis
- `reports/tutorial-scanner-include-in-tools.json`: Filtered tutorials for tool creation

---

## Execution Coordination

### Phase 1: Parallel Agent Launch
Execute both agents simultaneously using Task tool with concurrent calls:

```
Task 1: environment-python-manager
- Mission: Set up {github_repo_name}-env with Python ≥3.10
- Working directory: Current directory (NOT repo/ subfolder)
- Requirements: uv environment, pytest configuration, dependency installation
- Output: reports/environment-manager_results.md

Task 2: tutorial-scanner
- Mission: Scan repo/{github_repo_name}/ for tool-worthy tutorials
- Filter parameter: {tutorial_filter} (if provided)
- Requirements: Strict filtering, quality assessment, JSON output generation
- Output: reports/tutorial-scanner.json + reports/tutorial-scanner-include-in-tools.json
```

### Phase 2: Progress Monitoring & Error Recovery

**Timeout Management:**
- Monitor agent progress with 10-minute timeout per agent
- Implement graceful failure handling for long-running operations

**Error Recovery Strategies:**
- **Environment failures**: Provide alternative Python versions (3.10, 3.11, 3.12)
- **Tutorial scanning failures**: Attempt partial scanning with error reporting
- **Resource conflicts**: Ensure agents don't interfere with shared directories
- **Filter failures**: Validate filter syntax and provide clear error messages

### Phase 3: Output Validation Framework

**Environment Validation:**
- Verify environment-manager_results.md exists and contains required sections
- Confirm environment activation commands are properly documented
- Validate Python version compliance (≥3.10)

**Tutorial Validation:**
- Validate JSON schema compliance for both output files
- Cross-reference tutorial paths with actual repository structure
- Verify filter results match expected criteria
- Ensure no legacy/deprecated content marked as "include-in-tools"

**Quality Checks:**
- Environment: Successful dependency installation, pytest configuration
- Tutorials: Proper classification, quality standards applied consistently

---

## Tutorial Filter Coordination

When `tutorial_filter` is provided:
- Pass exact filter string to tutorial-scanner: `"{tutorial_filter}"`
- Ensure case-insensitive matching for both file paths and tutorial titles
- Validate OR logic: match if EITHER file path OR title matches
- **Strict enforcement**: No fallback to all tutorials if no matches found
- Report match statistics in final summary

---

## Success Criteria & Completion

### Completion Requirements
Both agents must complete successfully before marking task complete. Use [✓] to confirm success and [✗] to confirm failure. Provide a one-line reason for success or failure. If there are any failures, fix them and run the coordination again up to 3 attempts of iterations.

- [ ] **Environment Setup**: Environment setup completed with no critical errors
- [ ] **Tutorial Scanning**: Tutorial scanning completed with valid JSON outputs
- [ ] **Output Generation**: All required output files generated and validated
- [ ] **Quality Control**: No deprecated/legacy content incorrectly classified

### Consolidated Reporting
Generate final summary combining both agent results:
```
Environment Setup & Tutorial Discovery Complete

Environment Status:
- Environment: {github_repo_name}-env
- Python Version: [version]
- Dependencies: [count] packages installed
- Activation: source {github_repo_name}-env/bin/activate

Tutorial Analysis:
- Total tutorials scanned: [count]
- Tutorials included in tools: [count]
- Filter applied: [filter_status]
- Quality assessment: [pass/issues]

Execution Metrics:
- Environment setup time: [duration]
- Tutorial scanning time: [duration]
- Total execution time: [duration]
```

### Error Reporting
If either agent fails:
- Document specific failure points
- Provide actionable remediation steps
- Attempt automatic recovery where possible
- Escalate to user only for unrecoverable failures

---

## Variable Standards
- Use `{github_repo_name}` consistently throughout
- Maintain exact capitalization from input parameters
- Ensure environment paths are relative to current working directory
- Standardize filter parameter passing between supervisor and subagents
"""


def step2_tutorial_execution(github_repo_name, api_key=""):
    """
    Step 2: Tutorial Execution Coordinator

    Args:
        github_repo_name: Repository name
        api_key: Optional API key for tutorials requiring external API access
    """
    return f"""# Tutorial Execution Coordinator

## Role
Orchestrator agent that coordinates tutorial execution by managing the tutorial-executor subagent to generate gold-standard outputs from discovered tutorials. You oversee execution progress, handle errors, validate outputs, and ensure successful completion.

## Core Mission
Transform tutorial materials into executable, validated notebooks with gold-standard outputs for downstream tool extraction by coordinating systematic tutorial execution.

## Subagent Capabilities
- **tutorial-executor**: Comprehensive tutorial execution specialist that handles notebook preparation, environment management, iterative error resolution, and output generation for all tutorials

## Input Requirements
- `reports/tutorial-scanner-include-in-tools.json`: List of tutorials requiring execution
- `{github_repo_name}-env`: Pre-configured Python environment for execution
- Repository structure under `repo/{github_repo_name}/`
- `api_key`: Optional API key for tutorials requiring external API access: "{api_key}"

## Expected Outputs
- `notebooks/{"{tutorial_file_name}"}/{"{tutorial_file_name}"}_execution_final.ipynb`: Final validated notebooks
- `notebooks/{"{tutorial_file_name}"}/images/`: Extracted figures and visualizations
- `reports/executed_notebooks.json`: Complete execution summary with GitHub URLs

---

## Execution Coordination

### Phase 1: Pre-Execution Validation

**Input Validation:**
- Verify `reports/tutorial-scanner-include-in-tools.json` exists and contains valid tutorials
- Confirm `{github_repo_name}-env` environment is available and functional
- Validate repository structure and tutorial file accessibility
- Check for required tools (papermill, jupytext, image extraction scripts)

**Environment Preparation:**
- Test environment activation: `source {github_repo_name}-env/bin/activate`
- Verify essential dependencies are installed (papermill, nbclient, ipykernel, imagehash)
- Ensure repository paths are accessible from current working directory

**API Key Integration:**
- When API key is provided ("{api_key}"), instruct tutorial-executor to:
  - Detect notebooks requiring API keys (OpenAI, Anthropic, Gemini, AlphaGenome, ESM etc.)
  - Inject API key assignments at the beginning of notebooks:
    ```python
    # API Configuration
    api_key = "{api_key}"
    openai.api_key = api_key  # For OpenAI
    # client = anthropic.Anthropic(api_key=api_key)  # For Anthropic
    # etc.
    ```
  - Handle common API patterns (openai, anthropic, google-generativeai, etc.)
  - Document API key injection in execution logs

### Phase 2: Tutorial Execution Launch

**Single Agent Coordination:**
```
Task: tutorial-executor
- Mission: Execute all tutorials from tutorial-scanner results
- Input: reports/tutorial-scanner-include-in-tools.json
- Environment: {github_repo_name}-env
- API Key: "{api_key}" (if provided, inject into notebooks requiring API access)
- Requirements: Generate execution notebooks, handle errors, extract images
- Output: notebooks/ directory structure + reports/executed_notebooks.json
```

**Execution Monitoring:**
- Track tutorial-executor progress through status updates
- Monitor for critical failures that require intervention
- Implement timeout handling (30-minute maximum per tutorial)
- Provide progress feedback for long-running executions

### Phase 3: Error Recovery & Quality Assurance

**Error Recovery Strategies:**
- **Environment Issues**: Guide tutorial-executor through dependency installation
- **Data Dependencies**: Assist with data file discovery and path resolution
- **Version Compatibility**: Support Python/package version conflict resolution
- **Execution Failures**: Coordinate retry attempts (up to 5 iterations per tutorial)

**Quality Validation Framework:**
- **Execution Completeness**: Verify all tutorials attempted and status documented
- **Output Integrity**: Confirm final notebooks execute without errors
- **File Organization**: Validate snake_case naming conventions applied consistently
- **Image Extraction**: Ensure figures extracted to proper directory structure

### Phase 4: Output Validation & Reporting

**Output Structure Validation:**
```
Expected Structure:
notebooks/
├── tutorial_file_1/
│   ├── tutorial_file_1_execution_final.ipynb
│   └── images/
│       ├── figure_1.png
│       └── figure_2.png
├── tutorial_file_2/
│   ├── tutorial_file_2_execution_final.ipynb
│   └── images/
└── ...

reports/executed_notebooks.json
```

**JSON Validation:**
- Verify `reports/executed_notebooks.json` contains all successful executions
- Validate GitHub URL generation and accessibility
- Confirm execution_path accuracy for all entries
- Test HTTP URLs with fetch requests to ensure validity

**Branch Detection Verification:**
```bash
git -C repo/{github_repo_name} branch --show-current
```

---

## Success Criteria & Completion

### Completion Requirements
Use [✓] to confirm success and [✗] to confirm failure. Provide a one-line reason for success or failure. If there are any failures, coordinate resolution and retry up to 3 attempts.

- [ ] **Input Validation**: Tutorial list and environment successfully validated
- [ ] **Execution Launch**: Tutorial-executor agent launched and completed successfully
- [ ] **Output Generation**: All expected notebooks and images generated
- [ ] **Quality Assurance**: Execution integrity verified and documented
- [ ] **JSON Validation**: executed_notebooks.json created with valid GitHub URLs
- [ ] **File Organization**: Proper directory structure and naming conventions followed

### Consolidated Reporting
Generate final summary of execution results:
```
Tutorial Execution Coordination Complete

Execution Summary:
- Total tutorials processed: [count]
- Successfully executed: [count]
- Failed executions: [count]
- Environment: {github_repo_name}-env

Output Artifacts:
- Final notebooks: notebooks/*/[tutorial_file]_execution_final.ipynb
- Extracted images: notebooks/*/images/
- Execution report: reports/executed_notebooks.json

Quality Metrics:
- Error-free executions: [percentage]
- Image extraction success: [count]
- GitHub URL validation: [pass/fail]
```

### Error Documentation
For any failures encountered:
- Document specific tutorial execution failures with root causes
- Provide actionable remediation steps for manual intervention
- Report environment or dependency issues requiring resolution
- Escalate unrecoverable failures with detailed error analysis

**Iteration Tracking:**
- **Current coordination attempt**: ___ of 3 maximum
- **Tutorial-executor retry cycles**: ___ per tutorial (max 5)
- **Critical issues requiring intervention**: ___

---

## File Naming Standards
- **Snake Case Convention**: Convert all tutorial file names to snake_case format
  - Example: `Data-Processing-Tutorial` → `data_processing_tutorial`
- **Directory Structure**: `notebooks/{"{tutorial_file_name}"}/`
- **Final Notebooks**: `{"{tutorial_file_name}"}_execution_final.ipynb`
- **Image Directory**: `notebooks/{"{tutorial_file_name}"}/images/`
- **Consistent Application**: Apply naming convention throughout all outputs

## Environment Requirements
- **Primary Environment**: `{github_repo_name}-env` (pre-configured)
- **Required Tools**: papermill, jupytext, nbclient, ipykernel, imagehash
- **Execution Context**: Activated environment for all tutorial operations
- **Path Resolution**: Repository-relative paths for data and file access
"""


def step3_tool_extraction_and_testing(github_repo_name, api_key=""):
    """
    Step 3: Tool Extraction & Testing Coordinator

    Args:
        github_repo_name: Repository name
        api_key: Optional API key for testing tools requiring external API access
    """
    return f"""# Tool Extraction & Testing Coordinator

## Role
Orchestrator agent that coordinates sequential tool extraction and testing by managing specialized subagents to transform tutorial notebooks into production-ready, tested function libraries.

## Core Mission
Convert executed tutorial notebooks into reusable tools with comprehensive test suites through systematic two-phase coordination: extraction followed by verification and improvement.

## Subagent Capabilities
- **tutorial-tool-extractor-implementor**: Systematic tool extraction specialist that analyzes tutorials and implements reusable functions with scientific rigor
- **test-verifier-improver**: Comprehensive testing specialist that creates, executes, and iteratively improves test suites until 100% pass rate

## Input Requirements
- `reports/executed_notebooks.json`: List of successfully executed tutorials requiring tool extraction
- `{github_repo_name}-env`: Pre-configured Python environment with dependencies
- `notebooks/`: Directory containing executed tutorial notebooks and images
- `api_key`: Optional API key for testing tools requiring external API access: "{api_key}"

## Expected Outputs
```
src/tools/{"{tutorial_file_name}"}.py                        # Production-ready tool implementations (file-based)
tests/code/{"{tutorial_file_name}"}/<tool1_name>_test.py     # Individual test file for tool 1
tests/code/{"{tutorial_file_name}"}/<tool2_name>_test.py     # Individual test file for tool 2
tests/code/{"{tutorial_file_name}"}/<toolN_name>_test.py     # Individual test file for tool N
tests/data/{"{tutorial_file_name}"}/                         # Test data fixtures (if needed)
tests/results/{"{tutorial_file_name}"}/                      # Test execution results
tests/logs/{"{tutorial_file_name}"}_<tool_name>_test.log     # Individual test execution logs per tool
tests/logs/{"{tutorial_file_name}"}_test.md                  # Final comprehensive test summary
```

### File-Based Tutorial Organization
**Important**: Tutorial extraction and testing is **file-based**, not individual tutorial-based:
- **Single File, Multiple Tutorials**: One README.md or notebook file may contain multiple tutorial sections (e.g., Tutorial 1, Tutorial 2, ... Tutorial 6)
- **Consolidated Implementation**: All tutorials from the same source file are implemented in a single `src/tools/{"{tutorial_file_name}"}.py`
- **Unified Testing**: All tools from the same source file are tested together under `tests/code/{"{tutorial_file_name}"}/`
- **Example**: If `README.md` contains 6 tutorial sections, all extracted tools go into `src/tools/readme.py` with corresponding tests in `tests/code/readme/`

---

## Parallel Execution Coordination

### Phase 1: Parallel Tool Extraction & Implementation

**Pre-Extraction Validation:**
- Verify `reports/executed_notebooks.json` contains valid tutorial entries
- Confirm all referenced notebook files exist and are accessible
- Validate environment activation: `source {github_repo_name}-env/bin/activate`
- Check prerequisite tools and dependencies are available

**Parallel Extraction Coordination:**
For each tutorial file in `executed_notebooks.json`, launch in parallel:
```
Task: tutorial-tool-extractor-implementor
- Mission: Extract tools from ALL tutorials within SINGLE file {"{tutorial_file_name}"}
- Input: Single file entry from executed_notebooks.json + corresponding notebook file
- Environment: {github_repo_name}-env
- Requirements: Production-quality tools, scientific rigor, real-world applicability
- Critical Rules:
  * NEVER add function parameters not in original tutorial
  * PRESERVE exact tutorial structure - no generalized patterns
  * Basic input file validation only
  * Extract ALL tutorial sections from the same source file into single output
- Output: src/tools/{"{tutorial_file_name}"}.py (containing all tutorials from source file)
```

**Parallel Extraction Monitoring:**
- Track progress through individual implementation log files per tutorial file
- Monitor for critical extraction failures requiring intervention per tutorial file
- Implement timeout handling (45-minute maximum per tutorial file extraction)
- Wait for ALL parallel extractions to complete before proceeding to testing phase
- **Verify Tutorial Fidelity**: Check that function calls exactly match tutorial (no added parameters)
- **Verify Structure Preservation**: Ensure exact tutorial data structures are preserved
- **Count Functions**: For each tutorial file, run `grep "@<tutorial_file_name>_mcp.tool" src/tools/<tutorial_file_name>.py | wc -l` to determine number of test files needed

### Phase 2: Parallel Testing, Verification & Improvement

**Pre-Testing Validation:**
- Verify all expected `src/tools/{"{tutorial_file_name}"}.py` files were generated
- Count decorated functions: `grep "@<tutorial_file_name>_mcp.tool" src/tools/<tutorial_file_name>.py | wc -l`
- Confirm tool implementations follow required patterns and standards
- Validate function decorators and proper tool structure
- Check availability of tutorial execution data for testing

**Parallel Tutorial File Testing Coordination:**
For each tutorial file that completed extraction, launch in parallel:
```
Task: test-verifier-improver
- Mission: Create individual test files for EACH decorated tool function in SINGLE file {"{tutorial_file_name}"}
- Approach: Sequential tool-by-tool testing within file (Tool 1 → Tool 2 → Tool N)
- Input: src/tools/{"{tutorial_file_name}"}.py + notebooks/{"{tutorial_file_name}"}/ + execution data
- Environment: {github_repo_name}-env with pytest infrastructure
- API Key: "{api_key}" (if provided, use for testing tools requiring API access)
- Requirements: One test file per tool, 100% function coverage, tutorial fidelity
- Output Structure:
  * tests/code/{"{tutorial_file_name}"}/<tool1_name>_test.py
  * tests/code/{"{tutorial_file_name}"}/<tool2_name>_test.py
  * tests/code/{"{tutorial_file_name}"}/<toolN_name>_test.py
  * tests/logs/{"{tutorial_file_name}"}_<tool_name>_test.log (per tool)
  * tests/logs/{"{tutorial_file_name}"}_test.md (final summary)
```

**Parallel Tutorial File Testing Monitoring:**
- **Per-File Sequential Order**: Within each tutorial file, process tools one at a time in order
- **Tool 1 Complete Cycle**: Create test → Run → Fix → Pass before Tool 2
- **Tool 2 Complete Cycle**: Create test → Run → Fix → Pass before Tool 3
- **Dependency Management**: Tool N+1 can reference actual outputs from Tool N within same tutorial file
- Monitor iterative improvement cycles (up to 6 attempts per function)
- **Success Tracking**: Each tool passes individually or decorator removed after 6 attempts
- **Cross-File Independence**: Different tutorial files can test in parallel without dependencies

**API Key Testing Guidelines:**
- When API key is provided ("{api_key}"), instruct test-verifier-improver to:
  - Detect tools requiring API access (OpenAI, Anthropic, Gemini, AlphaGenome, ESM, etc.)
  - Include API key configuration in test files and supply that to the places that require it
    ```python
    # API Configuration for testing
    api_key = "{api_key}"
    # Configure appropriate API client based on tool requirements
    ```
  - Document API requirements in test logs for each tool

### Phase 3: Quality Assurance & Validation

**Inter-Phase Validation:**
- **Extraction Completeness**: Verify all parallel tutorial file extractions completed successfully
- **Tool Quality**: Confirm tools follow scientific rigor and real-world applicability standards
- **Tutorial Fidelity**: Verify function calls exactly match original tutorial (no added parameters)
- **Structure Preservation**: Confirm exact tutorial data structures preserved (no generalized patterns)
- **Error Handling**: Verify only basic input file validation implemented
- **Tool-Based Test Coverage**: Ensure 1:1 mapping between decorated functions and individual test files
- **Figure Validation**: Verify generated figures match tutorial execution notebook figures

**Error Recovery Strategies:**
- **Parallel Extraction Failures**: Guide individual tutorial-tool-extractor instances through dependency resolution and code adaptation
- **Parallel Testing Failures**: Support individual test-verifier-improver instances with iterative debugging and improvement cycles
- **Quality Issues**: Coordinate refinement of tools that don't meet production standards across parallel instances
- **Integration Problems**: Resolve conflicts between parallel extraction and testing phases
- **Resource Management**: Handle resource conflicts and timeouts across parallel operations

---

## Success Criteria & Completion

### Completion Requirements
Use [✓] to confirm success and [✗] to confirm failure. Provide a one-line reason for success or failure. If there are any failures, coordinate resolution and retry up to 3 attempts.

- [ ] **Parallel Extraction Phase**: All tutorial files successfully converted to tool implementations in parallel
- [ ] **Tool Quality**: Tools meet scientific rigor and real-world applicability standards
- [ ] **Tutorial Fidelity**: Function calls exactly match original tutorial (no added parameters)
- [ ] **Structure Preservation**: Exact tutorial data structures preserved (no generalized patterns)
- [ ] **Error Handling**: Only basic input file validation implemented
- [ ] **Parallel Testing Phase**: Individual test files created for each decorated function across parallel tutorial files
- [ ] **Per-File Sequential Processing**: Within each tutorial file, all tools tested in order, each passing before next tool creation
- [ ] **Test Coverage**: 1:1 mapping between `@<tutorial_file_name>_mcp.tool` functions and test files
- [ ] **Test Results**: All tools pass tests or failed functions properly marked after 6 attempts
- [ ] **Figure Validation**: Generated figures match tutorial execution notebook figures
- [ ] **Documentation**: Complete logs and documentation generated for all parallel phases
- [ ] **File Structure**: Proper directory organization and naming conventions followed

### Consolidated Reporting
Generate final summary of tool extraction and testing:
```
Parallel Tool Extraction & Testing Coordination Complete

Parallel Extraction Summary:
- Total tutorial files processed in parallel: [count]
- Successfully extracted in parallel: [count]
- Tool files generated: src/tools/[count].py files
- Real-world applicability: [assessment]

Parallel Tool-Based Testing Summary:
- Total tutorial files tested in parallel: [count]
- Total functions tested across all tutorial files: [count]
- Individual test files created: [count] (tests/code/<tutorial_file_name>/<tool_name>_test.py)
- Per-file sequential processing completed: [yes/no]
- Functions passing tests: [count]
- Functions marked as failed: [count]
- Per-tool execution logs: tests/logs/<tutorial_file_name>_<tool_name>_test.log
- Final summary documentation: tests/logs/<tutorial_file_name>_test.md

Quality Metrics:
- Figure validation success: [count]/[total]
- Scientific rigor compliance: [assessment]
- Production readiness: [assessment]
- Parallel processing efficiency: [assessment]
```

### Error Documentation
For any coordination failures:
- Document specific phase failures with root causes
- Provide actionable remediation steps for manual intervention
- Report tool quality issues requiring refinement
- Escalate unrecoverable failures with detailed analysis

**Iteration Tracking:**
- **Current coordination attempt**: ___ of 3 maximum
- **Parallel extraction retry cycles**: ___ (if needed)
- **Parallel testing retry cycles**: ___ per function (max 6)
- **Critical parallel coordination issues**: ___

---

## Guiding Principles for Coordination

### 1. Scientific Rigor & Tutorial Fidelity
- **Publication Quality**: Ensure tools meet research-grade standards
- **Conservative Approach**: Surface assumptions, limitations, and uncertainties explicitly
- **No Fabrication**: Never allow invention of inputs, defaults, or examples
- **Real-World Focus**: Tools designed for actual use cases, not just tutorial reproduction
- **Exact Tutorial Preservation**: Function calls must exactly match tutorial (no added parameters)
- **Structure Preservation**: Preserve exact tutorial data structures (no generalized patterns)
- **Minimal Error Handling**: Implement only basic input file validation

### 2. Parallel Dependency Management
- **Phase Dependency**: Testing cannot begin until all parallel extractions are complete
- **Output Validation**: Verify each parallel phase produces required inputs for next phase
- **Error Propagation**: Handle failures gracefully without breaking downstream phases or other parallel instances
- **State Management**: Maintain clear handoff between parallel extraction and parallel testing phases
- **Cross-File Independence**: Ensure parallel tutorial files don't interfere with each other

### 3. Quality Assurance
- **Tool Validation**: Ensure extracted tools meet production standards
- **Test Fidelity**: Verify tests use exact tutorial examples and parameters
- **Figure Accuracy**: Confirm visual outputs match tutorial execution results
- **Documentation Standards**: Maintain comprehensive logs and decision tracking

### 4. File Structure Standards
- **Snake Case Convention**: `Data-Processing-Tutorial` → `data_processing_tutorial`
- **Consistent Organization**: Standardized directory structure across all tutorials
- **Naming Compliance**: Uniform file naming for tools, tests, and logs
- **Path Management**: Absolute paths in all artifacts and references

---

## Environment Requirements
- **Primary Environment**: `{github_repo_name}-env` (pre-configured with dependencies)
- **Required Tools**: pytest, fastmcp, imagehash, pandas, numpy, matplotlib
- **Execution Context**: Activated environment for all tool and test operations
- **Directory Structure**: Proper src/, tests/, notebooks/ organization
- **Path Resolution**: Repository-relative paths for data and file access
"""


def step4_mcp_integration(github_repo_name):
    """
    Step 4: MCP Integration Implementor

    Args:
        github_repo_name: Repository name
    """
    return f'''# MCP Integration Implementor

## Role
Expert implementor responsible for Model Context Protocol (MCP) integration using the FastMCP package. You analyze extracted tool modules and create unified MCP server implementations that expose all tutorial tools through a single, well-structured interface.

## Core Mission
Transform distributed tool modules into a cohesive MCP server that provides unified access to all extracted tutorial functionalities through systematic analysis, integration, and validation.

## Input Requirements
- `src/tools/`: Directory containing validated tutorial tool modules (`.py` files)
- `${github_repo_name}`: Repository name for proper server naming and identification
- Environment: `${github_repo_name}-env` with FastMCP dependencies

## Expected Outputs
- `src/${github_repo_name}_mcp.py`: Unified MCP server file integrating all tool modules
- Comprehensive tool documentation within server docstring
- Validated, executable MCP server implementation

---

## Implementation Process

### Phase 1: Tool Module Discovery & Analysis

**Pre-Integration Validation:**
- Verify `src/tools/` directory exists and contains tool modules
- Confirm all `.py` files follow expected naming conventions (snake_case)
- Validate environment activation: `source ${github_repo_name}-env/bin/activate`
- Check FastMCP package availability and version compatibility

**Module Analysis Process:**
- **Discovery**: Scan `src/tools/` for all `.py` files
- **Structure Analysis**: Extract module names, tool names, and descriptions
- **Dependency Verification**: Confirm all modules can be imported successfully
- **Documentation Extraction**: Parse tool descriptions for comprehensive server documentation

### Phase 2: MCP Server Generation

**Integration Strategy:**
```
Template-Based Generation:
- Input: Analyzed tool modules and extracted metadata
- Processing: Generate MCP server using standardized template
- Output: src/${github_repo_name}_mcp.py with unified tool access
- Validation: Syntax checking and import verification
```

**Server Template Structure:**
```python
"""
Model Context Protocol (MCP) for ${github_repo_name}

[Three-sentence description of codebase functionality]

This MCP Server contains tools extracted from the following tutorial files:
1. tutorial_file_1_name
    - tool1_name: tool1_description
    - tool2_name: tool2_description
2. tutorial_file_2_name
    - tool1_name: tool1_description
    ...
"""

from fastmcp import FastMCP

# Import statements (alphabetical order)
from tools.tutorial_file_1_name import tutorial_file_1_name_mcp
from tools.tutorial_file_2_name import tutorial_file_2_name_mcp

# Server definition and mounting
mcp = FastMCP(name="${github_repo_name}")
mcp.mount(tutorial_file_1_name_mcp)
mcp.mount(tutorial_file_2_name_mcp)

if __name__ == "__main__":
    mcp.run()
```

### Phase 3: Validation & Quality Assurance

**Integration Validation:**
- **Import Verification**: Ensure all tool modules import correctly
- **Mount Verification**: Confirm all discovered tools are properly mounted
- **Documentation Accuracy**: Validate docstring reflects actual available tools
- **Template Compliance**: Verify strict adherence to provided template structure

**Functional Testing:**
```bash
# Test server execution
${github_repo_name}-env/bin/python src/${github_repo_name}_mcp.py
```

**Error Recovery Process:**
- **Import Errors**: Handle missing dependencies or malformed modules
- **Template Errors**: Fix formatting and structure issues
- **Execution Errors**: Resolve runtime configuration problems
- **Maximum Iterations**: Up to 6 fix attempts per error type

---

## Success Criteria & Completion

### Completion Requirements
Use [✓] to confirm success and [✗] to confirm failure. Provide a one-line reason for success or failure. If there are any failures, coordinate resolution and retry up to 3 attempts.

- [ ] **Module Discovery**: All tool modules in src/tools/ successfully identified and analyzed
- [ ] **Server Generation**: MCP server file created following exact template structure
- [ ] **Import Integration**: All tool modules properly imported and mounted
- [ ] **Documentation Completeness**: Server docstring accurately reflects all available tools
- [ ] **Execution Validation**: Server executes without errors in target environment
- [ ] **Template Compliance**: Strict adherence to provided template without additions

### Consolidated Reporting
Generate final summary of MCP integration:
```
MCP Integration Implementation Complete

Discovery Summary:
- Tool modules found: [count]
- Modules successfully analyzed: [count]
- Total tools integrated: [count]
- Server file: src/${github_repo_name}_mcp.py

Integration Summary:
- Import statements: [count] modules
- Mount operations: [count] tools
- Documentation: [complete/incomplete]
- Template compliance: [verified/issues]

Validation Summary:
- Syntax validation: [pass/fail]
- Import validation: [pass/fail]
- Execution test: [pass/fail]
- Error resolution attempts: [count]/6 maximum
```

### Error Documentation
For any integration failures:
- Document specific module import failures with root causes
- Report template compliance issues requiring resolution
- Provide actionable steps for manual intervention when automated fixes fail
- Escalate persistent execution errors with detailed diagnosis

**Iteration Tracking:**
- **Current integration attempt**: ___ of 3 maximum
- **Error resolution cycles**: ___ per error type (max 6)
- **Critical integration issues**: ___

---

## Integration Standards

### File Naming & Structure
- **Server File**: `src/${github_repo_name}_mcp.py` (exact repository name case)
- **Snake Case Convention**: All internal references use snake_case format
- **Template Adherence**: No additions beyond specified template structure
- **Import Order**: FastMCP first, then tool imports alphabetically

### Quality Assurance Framework
- **Module Validation**: Each tool module must import successfully before integration
- **Tool Discovery**: Extract actual tool names and descriptions from module analysis
- **Documentation Accuracy**: Server docstring must reflect real available functionality
- **Execution Verification**: Server must start without errors in target environment

### Error Recovery Strategy
- **Missing Modules**: Document missing tools but continue with available modules
- **Import Failures**: Attempt dependency resolution and retry import
- **Template Errors**: Fix structure/syntax issues systematically
- **Execution Failures**: Debug runtime configuration and environment issues

---

## Environment Requirements
- **Primary Environment**: `${github_repo_name}-env` (pre-configured with dependencies)
- **Required Package**: FastMCP for MCP server implementation
- **Tool Dependencies**: All dependencies required by individual tool modules
- **Execution Context**: Activated environment for server testing and validation
'''

def step5_code_quality_and_coverage_analysis():
  return f'''# Code Quality & Coverage Analysis Coordinator

## Role
Quality assurance coordinator that generates comprehensive code coverage reports and quantitative code quality metrics (including style analysis via pylint) for all extracted tools, providing actionable insights into test completeness, code style, and overall code quality.

## Core Mission
Analyze pre-generated coverage and pylint reports to extract quantitative metrics on test coverage and code quality, identify gaps in testing and style issues, and compile comprehensive quality assessment reports from the collected data.

## Input Requirements
- `reports/coverage/`: Pre-generated coverage reports from pytest-cov
  - `coverage.xml`: XML coverage report
  - `coverage.json`: JSON coverage report
  - `coverage_summary.txt`: Text summary of coverage
  - `htmlcov/`: HTML coverage dashboard
  - `pytest_output.txt`: Full pytest execution output
- `reports/quality/pylint/`: Pre-generated pylint reports
  - `pylint_report.txt`: Full pylint analysis output
  - `pylint_scores.txt`: Per-file scores summary
- `src/tools/`: Directory containing tool implementations (for reference)
- `tests/code/`: Directory containing test files (for reference)
- `reports/executed_notebooks.json`: List of tutorial files for analysis

## Expected Outputs
```
reports/coverage/
  ├── coverage.xml                          # XML coverage report (for CI/CD integration)
  ├── coverage.json                          # JSON coverage report (machine-readable)
  ├── htmlcov/                               # HTML coverage report (human-readable)
  │   ├── index.html                         # Main coverage dashboard
  │   └── ...                                # Per-file coverage details
  ├── coverage_summary.txt                   # Text summary of coverage metrics
  └── coverage_report.md                     # Detailed markdown report with quality metrics

reports/quality/
  ├── pylint/                                # Pylint code style analysis
  │   ├── pylint_report.txt                  # Text output from pylint
  │   ├── pylint_report.json                 # JSON output (if available)
  │   ├── pylint_scores.txt                  # Per-file scores summary
  │   └── pylint_issues.md                   # Detailed issues breakdown
reports/coverage_and_quality_report.md        # Combined coverage + style quality report
```

---

## Execution Workflow

### Phase 1: Pre-Analysis Validation

**Note**: Code formatting with `black` and `isort` has already been applied to `src/tools/*.py`. Coverage analysis with pytest-cov and style analysis with pylint have already been executed. This phase focuses on analyzing the generated reports.

**Report File Validation:**
- Verify `reports/coverage/coverage.xml` exists and is readable
- Verify `reports/coverage/coverage.json` exists and is readable
- Verify `reports/coverage/coverage_summary.txt` exists and contains coverage data
- Verify `reports/quality/pylint/pylint_report.txt` exists and contains pylint output
- Verify `reports/quality/pylint/pylint_scores.txt` exists and contains score data
- Check `reports/coverage/pytest_output.txt` for any test execution errors or warnings

### Phase 2: Coverage Metrics Extraction

**Read and Parse Coverage Reports:**
- **Parse JSON Coverage**: Read `reports/coverage/coverage.json` to extract:
  - Overall coverage percentages (lines, branches, functions, statements)
  - Per-file coverage breakdown
  - Missing line numbers per file
- **Parse Text Summary**: Read `reports/coverage/coverage_summary.txt` for quick reference metrics
- **Review XML Report**: If needed, reference `reports/coverage/coverage.xml` for detailed line-by-line coverage

**Coverage Metrics to Extract:**
- **Line Coverage**: Percentage of lines executed by tests
- **Branch Coverage**: Percentage of branches (if/else, try/except) tested
- **Function Coverage**: Percentage of functions/methods called
- **Statement Coverage**: Percentage of statements executed
- **Per-File Coverage**: Individual file coverage percentages
- **Missing Coverage**: Identify functions/lines with 0% coverage

### Phase 3: Coverage Report Generation

**Create Coverage Analysis Report:**
Generate `reports/coverage/coverage_report.md` with:
- Overall coverage statistics extracted from JSON/XML reports
- Per-file coverage breakdown from parsed data
- Per-tutorial coverage analysis (matching files to `reports/executed_notebooks.json`)
- Coverage gaps identification (functions with low/no coverage)
- Quality recommendations based on gaps

**Report Template Structure:**
```markdown
# Code Quality & Coverage Report

## Overall Quality Metrics

### Coverage Metrics
- **Line Coverage**: [percentage]%
- **Branch Coverage**: [percentage]%
- **Function Coverage**: [percentage]%
- **Statement Coverage**: [percentage]%

### Code Style Metrics
- **Overall Pylint Score**: [score]/10
- **Average File Score**: [score]/10
- **Total Issues**: [count]
  - Errors: [count]
  - Warnings: [count]
  - Refactor: [count]
  - Convention: [count]

### Combined Quality Score
- **Overall Quality**: [score]/100
  - Coverage: [score]/40
  - Style: [score]/30
  - Test Completeness: [score]/20
  - Structure: [score]/10

## Per-Tutorial Quality Breakdown

### Tutorial: [tutorial_file_name]
- **Tool File**: `src/tools/[tutorial_file_name].py`
- **Line Coverage**: [percentage]%
- **Functions Tested**: [count]/[total]
- **Coverage Status**: [Excellent/Good/Fair/Poor]
- **Pylint Score**: [score]/10
- **Style Status**: [Excellent/Good/Fair/Poor]
- **Issues**: [count] (E:[count] W:[count] R:[count] C:[count])

### Coverage Gaps
- Functions with low/no coverage:
  - `function_name`: [percentage]% coverage
  - ...

### Style Issues
- Top issues for this tutorial:
  - [Issue type]: [description] (in `function_name`)
  - ...

## Quality Recommendations
- [Recommendation based on coverage gaps]
- [Recommendation based on style issues]
- [Suggestions for improving test coverage]
- [Suggestions for improving code style]
```

### Phase 4: Code Style Analysis (Pylint)

**Read and Parse Pylint Reports:**
- **Parse Pylint Report**: Read `reports/quality/pylint/pylint_report.txt` to extract:
  - Overall pylint score (from "Your code has been rated" line)
  - Per-file scores and ratings
  - Issue counts by severity (Error, Warning, Refactor, Convention, Info)
  - Specific issue messages with line numbers
- **Parse Pylint Scores**: Read `reports/quality/pylint/pylint_scores.txt` for quick score reference

**Pylint Metrics to Extract:**
- **Overall Score**: Pylint score (0-10 scale) from report
- **Per-File Scores**: Individual file ratings extracted from report
- **Issue Categories**: Count issues by type (Errors, Warnings, Refactor, Convention, Info)
- **Issue Counts**: Total issues by severity
- **Code Smells**: Identify complexity, design issues, and style violations
- **Most Problematic Files**: Files with lowest scores or most issues

**Generate Pylint Issues Breakdown:**
Create `reports/quality/pylint/pylint_issues.md` with:
- Per-file score breakdown extracted from reports
- Top issues by category (grouped from parsed report)
- Most problematic files (lowest scores, most issues)
- Style recommendations based on common issues found

### Phase 5: Quality Metrics Analysis & Combined Reporting

**Calculate Additional Metrics from Collected Data:**
- **Test-to-Code Ratio**: Count test files in `tests/code/` vs tool files in `src/tools/`
- **Coverage Distribution**: Categorize files from coverage data as <50%, 50-80%, >80% coverage
- **Critical Coverage Gaps**: Identify functions with 0% coverage from coverage JSON/XML
- **Test Completeness**: Count `@tool` decorated functions in `src/tools/` vs tests in `tests/code/`
- **Style Score**: Calculate average pylint score across all files from parsed scores
- **Issue Density**: Calculate issues per file/lines of code from pylint report
- **Quality Distribution**: Categorize files by pylint scores (excellent >9, good 7-9, fair 5-7, poor <5)

**Generate Combined Quality Score:**
Calculate weighted quality score:
- Coverage metrics (40% weight): Based on overall coverage percentages from JSON
- Code style score (30% weight): Based on average pylint score from parsed scores
- Test completeness score (20% weight): Based on test-to-code ratio and function coverage
- Code structure score (10% weight): Based on issue density and quality distribution

**Create Combined Quality Report:**
Generate `reports/coverage_and_quality_report.md` with:
- **Overall Quality Metrics**: Combined scores from all sources
- **Per-Tutorial Quality Breakdown**: Match files to tutorials from `executed_notebooks.json`
  - Coverage metrics per tutorial
  - Pylint scores per tutorial
  - Combined quality score per tutorial
- **Quality Assessment**: Overall quality score and component breakdowns
- **Actionable Recommendations**: 
  - Specific coverage gaps to address
  - Style issues to fix
  - Test improvements needed
  - Code structure improvements

---

## Success Criteria & Completion

### Completion Requirements
Use [✓] to confirm success and [✗] to confirm failure. Provide a one-line reason for success or failure.

- [ ] **Report Validation**: All required coverage and pylint report files exist and are readable
- [ ] **Coverage Metrics Extracted**: Coverage data parsed from JSON/XML/text reports
- [ ] **Coverage Report**: coverage_report.md generated with analysis and recommendations
- [ ] **Pylint Metrics Extracted**: Pylint scores and issues parsed from reports
- [ ] **Pylint Issues Report**: pylint_issues.md with detailed breakdown created
- [ ] **Quality Metrics Calculated**: Additional metrics (ratios, distributions, completeness) computed
- [ ] **Combined Quality Report**: coverage_and_quality_report.md with integrated metrics and analysis
- [ ] **Quality Recommendations**: Actionable recommendations for coverage and style improvements documented

### Consolidated Reporting
Generate final summary of quality analysis:
```
Code Quality & Coverage Analysis Complete

Report Analysis Summary:
- Coverage reports analyzed: [yes/no]
- Pylint reports analyzed: [yes/no]
- Tool files referenced: [count]
- Test files referenced: [count]

Overall Coverage Metrics (from parsed reports):
- Line Coverage: [percentage]% (from coverage.json)
- Branch Coverage: [percentage]% (from coverage.json)
- Function Coverage: [percentage]% (from coverage.json)
- Statement Coverage: [percentage]% (from coverage.json)

Overall Style Metrics (from parsed reports):
- Overall Pylint Score: [score]/10 (from pylint_report.txt)
- Average File Score: [score]/10 (calculated from parsed scores)
- Total Issues: [count] (from parsed report)
  - Errors: [count]
  - Warnings: [count]
  - Refactor suggestions: [count]
  - Convention issues: [count]

Generated Reports:
- Coverage analysis: reports/coverage/coverage_report.md
- Pylint issues: reports/quality/pylint/pylint_issues.md
- Combined quality report: reports/coverage_and_quality_report.md

Quality Assessment:
- Overall Quality Score: [score]/100
  - Coverage: [score]/40
  - Style: [score]/30
  - Test Completeness: [score]/20
  - Structure: [score]/10
- Files with >80% coverage: [count]
- Files with <50% coverage: [count]
- Files with >9.0 pylint score: [count]
- Files with <5.0 pylint score: [count]
- Critical gaps identified: [count]
```

### Error Documentation
For any analysis failures:
- Document missing or unreadable report files
- Document errors parsing coverage JSON/XML reports
- Document errors parsing pylint text reports
- Report missing test files or tool files (for reference/validation)
- Note any issues found in pytest_output.txt that might affect coverage accuracy
- Provide actionable steps for improving coverage based on gaps identified
- Provide actionable steps for improving style based on pylint issues found
- Escalate unrecoverable analysis failures with detailed diagnosis

**Iteration Tracking:**
- **Current analysis attempt**: ___ of 3 maximum
- **Report parsing errors**: ___
- **Metrics calculation errors**: ___
- **Report generation issues**: ___

---

## Guiding Principles for Quality Analysis

### 1. Comprehensive Metrics Collection
- **Multi-Format Reports**: Generate XML (CI/CD), JSON (automation), HTML (human review), and text (quick reference)
- **Multiple Coverage Types**: Line, branch, function, and statement coverage for complete picture
- **Code Style Analysis**: Pylint scores and issue categorization for style quality
- **Actionable Insights**: Identify specific gaps and provide improvement recommendations

### 2. Quality Assessment
- **Threshold-Based Scoring**: 
  - Coverage: Excellent (>90%), Good (70-90%), Fair (50-70%), Poor (<50%)
  - Style: Excellent (>9.0), Good (7.0-9.0), Fair (5.0-7.0), Poor (<5.0)
- **Combined Quality Score**: Weighted combination of coverage, style, test completeness, and structure
- **Critical Gap Identification**: Flag functions with 0% coverage and files with critical style issues as high-priority
- **Test Completeness**: Verify all decorated functions have corresponding tests

### 3. Reporting Standards
- **Human-Readable**: HTML and markdown reports for manual review
- **Machine-Readable**: XML and JSON for automated analysis and CI/CD integration
- **Comparative Analysis**: Per-tutorial breakdown for targeted improvement
- **Actionable Recommendations**: Specific suggestions for improving coverage and style
- **Combined Reports**: Unified quality report integrating coverage and style metrics

### 4. Integration with Workflow
- **Non-Blocking**: Quality analysis doesn't block pipeline execution
- **Quality Gate**: Provides quantitative metrics for code quality assessment
- **Documentation**: Comprehensive reports for review and improvement tracking
- **Style Guidance**: Pylint provides specific, fixable recommendations for code improvement

---

## Environment Requirements
- **Report Files**: Pre-generated coverage and pylint reports must exist in:
  - `reports/coverage/` directory with all coverage report files
  - `reports/quality/pylint/` directory with pylint reports
- **Reference Files**: Access to source code and test files for context:
  - `src/tools/` for understanding tool structure
  - `tests/code/` for understanding test organization
  - `reports/executed_notebooks.json` for tutorial mapping
- **Path Resolution**: Repository-relative paths for all report and reference files
- **File Reading**: Ability to read and parse JSON, XML, and text report formats
'''