SpatialAI_MCP / docs /AGENT_RULES.md
avaliev's picture
Demo Deployment - 0.0.1 version
c75526e verified

A newer version of the Gradio SDK is available: 6.12.0

Upgrade

OpenProblems Spatial Transcriptomics Agent Rules

Build & Development Commands

Viash Component Development

  • Use viash run for executing components: viash run src/methods/component_name/config.vsh.yaml -- --input_train data.h5ad --output result.h5ad
  • Build components with Docker engine: Always specify --engine docker for consistent environments
  • Test individual components: Use viash test src/methods/component_name/config.vsh.yaml before integration
  • Run parallel testing: Execute viash ns test --parallel --engine docker for comprehensive validation
  • Validate configurations: Every component must have a valid config.vsh.yaml file
  • Use test data: Always test with resources from resources_test/ directory first

Nextflow Workflow Commands

  • Run workflows locally: Use nextflow run workflow.nf with proper parameters
  • Validate pipeline syntax: Execute nextflow config workflow.nf to check configuration
  • Use profiles: Specify appropriate profiles with -profile docker,test for development
  • Monitor execution: Use nextflow log to track workflow progress and debug issues
  • Resume failed runs: Apply -resume flag to continue from last successful checkpoint

Docker Integration Commands

  • Build component images: Use Docker engine through Viash for consistency
  • Test containerized components: Verify all dependencies are included in containers
  • Push to registries: Use standardized tagging conventions for component images
  • Validate environments: Ensure Python/R environments match OpenProblems specifications

Testing Guidelines

Component Testing Strategy

  • Run unit tests first: Execute viash test on individual components before integration
  • Test with multiple datasets: Validate components work across different spatial datasets
  • Validate input/output formats: Ensure h5ad files maintain proper structure and metadata
  • Test edge cases: Include empty datasets, single-cell data, and boundary conditions
  • Verify Docker builds: Confirm all components build successfully in containerized environments

Integration Testing Approach

  • Test complete workflows: Run end-to-end pipelines with realistic data sizes
  • Validate metric calculations: Ensure accuracy metrics produce expected ranges and distributions
  • Test control methods: Verify positive and negative controls behave as expected
  • Cross-validate results: Compare outputs across different methods for consistency
  • Performance benchmarking: Measure execution time and memory usage for scalability

Quality Assurance Checklist

  • Check GitHub Actions: Ensure all CI/CD checks pass before merging
  • Validate test coverage: Confirm critical code paths are tested
  • Review error handling: Test failure modes and error message clarity
  • Verify reproducibility: Ensure identical inputs produce identical outputs
  • Test resource requirements: Validate memory and compute constraints are met

Code Style & Guidelines

Viash Component Structure

  • Follow standard layout: Organize components with config.vsh.yaml, script.py/R, and test.py/R
  • Use descriptive names: Component names should clearly indicate their function and scope
  • Define clear inputs/outputs: Specify all required and optional parameters with types
  • Include comprehensive metadata: Add author, description, keywords, and version information
  • Implement proper logging: Use structured logging for debugging and monitoring

Python Code Standards

  • Follow PEP 8: Use consistent indentation, naming, and formatting
  • Use type hints: Annotate function parameters and return types
  • Handle AnnData objects: Follow scanpy/squidpy conventions for spatial data manipulation
  • Implement error handling: Use try-catch blocks with informative error messages
  • Document functions: Include docstrings with parameter descriptions and examples

R Code Standards

  • Use tidyverse conventions: Apply consistent data manipulation and visualization patterns
  • Handle Seurat objects: Follow best practices for spatial transcriptomics analysis
  • Implement proper error handling: Use tryCatch with meaningful error messages
  • Document functions: Include roxygen2 documentation for all functions
  • Use consistent naming: Apply snake_case for functions and variables

Configuration Management

  • Use YAML for configs: Structure configuration files with clear hierarchies
  • Define resource requirements: Specify CPU, memory, and disk requirements accurately
  • Include version constraints: Pin software versions for reproducibility
  • Document parameters: Provide clear descriptions and default values
  • Validate inputs: Implement parameter validation and type checking

Documentation Guidelines

Component Documentation

  • Write clear descriptions: Explain the biological/computational problem being addressed
  • Document algorithm details: Describe the core methodology and implementation approach
  • Provide usage examples: Include concrete examples with sample data and parameters
  • List dependencies: Document all required software, packages, and versions
  • Include references: Cite relevant papers and methodological sources

Task Documentation Structure

  • Define task motivation: Explain the biological significance and research gaps addressed
  • Describe datasets: Detail input data types, formats, and expected characteristics
  • Outline methods: List implemented methods with brief algorithmic descriptions
  • Specify metrics: Define evaluation criteria and interpretation guidelines
  • Document controls: Explain positive and negative control implementations

Workflow Documentation

  • Create process diagrams: Visualize workflow steps and data flow
  • Document parameters: Explain all configurable options and their effects
  • Provide troubleshooting: Include common issues and resolution strategies
  • List output formats: Describe all generated files and their contents
  • Include performance notes: Document expected runtime and resource usage

API Documentation Standards

  • Use OpenAPI specifications: Document REST endpoints with complete schemas
  • Provide request/response examples: Include realistic data samples
  • Document error codes: Explain all possible error conditions and responses
  • Include authentication: Detail security requirements and token usage
  • Maintain versioning: Document API changes and backwards compatibility

Collaboration & Review Guidelines

Pull Request Standards

  • Create focused PRs: Address single features or bug fixes per request
  • Write descriptive titles: Clearly summarize changes and their purpose
  • Include comprehensive descriptions: Explain motivation, changes, and testing performed
  • Add reviewers: Tag appropriate domain experts and maintainers
  • Respond to feedback: Address review comments promptly and thoroughly

Code Review Process

  • Review for correctness: Verify algorithmic implementation and logic
  • Check for consistency: Ensure adherence to established patterns and conventions
  • Validate testing: Confirm adequate test coverage and quality
  • Assess documentation: Review clarity and completeness of documentation
  • Consider performance: Evaluate computational efficiency and scalability

Community Engagement

  • Use GitHub discussions: Engage in technical discussions and feature planning
  • Participate in Discord: Join real-time conversations and collaboration
  • Follow issue templates: Use structured formats for bug reports and feature requests
  • Share knowledge: Contribute to documentation and community resources
  • Mentor newcomers: Help onboard new contributors to the ecosystem

Quality Control & Validation

Data Quality Standards

  • Validate spatial coordinates: Ensure x,y coordinates are properly formatted and scaled
  • Check gene expression: Verify count matrices have appropriate ranges and distributions
  • Assess metadata completeness: Confirm required annotations and sample information
  • Test data integrity: Validate file formats and cross-reference identifiers
  • Monitor data provenance: Track data sources and processing steps

Results Validation Process

  • Cross-method comparison: Compare results across different algorithmic approaches
  • Statistical validation: Apply appropriate statistical tests and multiple comparison corrections
  • Biological interpretation: Ensure results align with known biological principles
  • Reproducibility testing: Verify consistent results across multiple runs
  • External validation: Compare against published benchmarks and literature

Performance Monitoring

  • Track execution metrics: Monitor runtime, memory usage, and resource consumption
  • Assess scalability: Test performance across different data sizes and complexities
  • Monitor quality metrics: Track accuracy, precision, recall, and domain-specific measures
  • Evaluate user experience: Gather feedback on usability and documentation quality
  • Continuous improvement: Regularly review and optimize component performance