Spaces:

Agents-MCP-Hackathon
/

SpatialAI_MCP

Sleeping

App Files Files Community

SpatialAI_MCP / docs /AGENT_RULES.md

avaliev

Demo Deployment - 0.0.1 version

c75526e verified 10 months ago

preview code

raw

history blame contribute delete

9.33 kB

A newer version of the Gradio SDK is available: 6.12.0

Upgrade

OpenProblems Spatial Transcriptomics Agent Rules

Build & Development Commands

Viash Component Development

Use viash run for executing components: viash run src/methods/component_name/config.vsh.yaml -- --input_train data.h5ad --output result.h5ad
Build components with Docker engine: Always specify --engine docker for consistent environments
Test individual components: Use viash test src/methods/component_name/config.vsh.yaml before integration
Run parallel testing: Execute viash ns test --parallel --engine docker for comprehensive validation
Validate configurations: Every component must have a valid config.vsh.yaml file
Use test data: Always test with resources from resources_test/ directory first

Nextflow Workflow Commands

Run workflows locally: Use nextflow run workflow.nf with proper parameters
Validate pipeline syntax: Execute nextflow config workflow.nf to check configuration
Use profiles: Specify appropriate profiles with -profile docker,test for development
Monitor execution: Use nextflow log to track workflow progress and debug issues
Resume failed runs: Apply -resume flag to continue from last successful checkpoint

Docker Integration Commands

Build component images: Use Docker engine through Viash for consistency
Test containerized components: Verify all dependencies are included in containers
Push to registries: Use standardized tagging conventions for component images
Validate environments: Ensure Python/R environments match OpenProblems specifications

Testing Guidelines

Component Testing Strategy

Run unit tests first: Execute viash test on individual components before integration
Test with multiple datasets: Validate components work across different spatial datasets
Validate input/output formats: Ensure h5ad files maintain proper structure and metadata
Test edge cases: Include empty datasets, single-cell data, and boundary conditions
Verify Docker builds: Confirm all components build successfully in containerized environments

Integration Testing Approach

Test complete workflows: Run end-to-end pipelines with realistic data sizes
Validate metric calculations: Ensure accuracy metrics produce expected ranges and distributions
Test control methods: Verify positive and negative controls behave as expected
Cross-validate results: Compare outputs across different methods for consistency
Performance benchmarking: Measure execution time and memory usage for scalability

Quality Assurance Checklist

Check GitHub Actions: Ensure all CI/CD checks pass before merging
Validate test coverage: Confirm critical code paths are tested
Review error handling: Test failure modes and error message clarity
Verify reproducibility: Ensure identical inputs produce identical outputs
Test resource requirements: Validate memory and compute constraints are met

Code Style & Guidelines

Viash Component Structure

Follow standard layout: Organize components with config.vsh.yaml, script.py/R, and test.py/R
Use descriptive names: Component names should clearly indicate their function and scope
Define clear inputs/outputs: Specify all required and optional parameters with types
Include comprehensive metadata: Add author, description, keywords, and version information
Implement proper logging: Use structured logging for debugging and monitoring

Python Code Standards

Follow PEP 8: Use consistent indentation, naming, and formatting
Use type hints: Annotate function parameters and return types
Handle AnnData objects: Follow scanpy/squidpy conventions for spatial data manipulation
Implement error handling: Use try-catch blocks with informative error messages
Document functions: Include docstrings with parameter descriptions and examples

R Code Standards

Use tidyverse conventions: Apply consistent data manipulation and visualization patterns
Handle Seurat objects: Follow best practices for spatial transcriptomics analysis
Implement proper error handling: Use tryCatch with meaningful error messages
Document functions: Include roxygen2 documentation for all functions
Use consistent naming: Apply snake_case for functions and variables

Configuration Management

Use YAML for configs: Structure configuration files with clear hierarchies
Define resource requirements: Specify CPU, memory, and disk requirements accurately
Include version constraints: Pin software versions for reproducibility
Document parameters: Provide clear descriptions and default values
Validate inputs: Implement parameter validation and type checking

Documentation Guidelines

Component Documentation

Write clear descriptions: Explain the biological/computational problem being addressed
Document algorithm details: Describe the core methodology and implementation approach
Provide usage examples: Include concrete examples with sample data and parameters
List dependencies: Document all required software, packages, and versions
Include references: Cite relevant papers and methodological sources

Task Documentation Structure

Define task motivation: Explain the biological significance and research gaps addressed
Describe datasets: Detail input data types, formats, and expected characteristics
Outline methods: List implemented methods with brief algorithmic descriptions
Specify metrics: Define evaluation criteria and interpretation guidelines
Document controls: Explain positive and negative control implementations

Workflow Documentation

Create process diagrams: Visualize workflow steps and data flow
Document parameters: Explain all configurable options and their effects
Provide troubleshooting: Include common issues and resolution strategies
List output formats: Describe all generated files and their contents
Include performance notes: Document expected runtime and resource usage

API Documentation Standards

Use OpenAPI specifications: Document REST endpoints with complete schemas
Provide request/response examples: Include realistic data samples
Document error codes: Explain all possible error conditions and responses
Include authentication: Detail security requirements and token usage
Maintain versioning: Document API changes and backwards compatibility

Collaboration & Review Guidelines

Pull Request Standards

Create focused PRs: Address single features or bug fixes per request
Write descriptive titles: Clearly summarize changes and their purpose
Include comprehensive descriptions: Explain motivation, changes, and testing performed
Add reviewers: Tag appropriate domain experts and maintainers
Respond to feedback: Address review comments promptly and thoroughly

Code Review Process

Review for correctness: Verify algorithmic implementation and logic
Check for consistency: Ensure adherence to established patterns and conventions
Validate testing: Confirm adequate test coverage and quality
Assess documentation: Review clarity and completeness of documentation
Consider performance: Evaluate computational efficiency and scalability

Community Engagement

Use GitHub discussions: Engage in technical discussions and feature planning
Participate in Discord: Join real-time conversations and collaboration
Follow issue templates: Use structured formats for bug reports and feature requests
Share knowledge: Contribute to documentation and community resources
Mentor newcomers: Help onboard new contributors to the ecosystem

Quality Control & Validation

Data Quality Standards

Validate spatial coordinates: Ensure x,y coordinates are properly formatted and scaled
Check gene expression: Verify count matrices have appropriate ranges and distributions
Assess metadata completeness: Confirm required annotations and sample information
Test data integrity: Validate file formats and cross-reference identifiers
Monitor data provenance: Track data sources and processing steps

Results Validation Process

Cross-method comparison: Compare results across different algorithmic approaches
Statistical validation: Apply appropriate statistical tests and multiple comparison corrections
Biological interpretation: Ensure results align with known biological principles
Reproducibility testing: Verify consistent results across multiple runs
External validation: Compare against published benchmarks and literature

Performance Monitoring

Track execution metrics: Monitor runtime, memory usage, and resource consumption
Assess scalability: Test performance across different data sizes and complexities
Monitor quality metrics: Track accuracy, precision, recall, and domain-specific measures
Evaluate user experience: Gather feedback on usability and documentation quality
Continuous improvement: Regularly review and optimize component performance