Spaces:
Sleeping
Sleeping
A newer version of the Gradio SDK is available: 6.12.0
OpenProblems Spatial Transcriptomics Agent Rules
Build & Development Commands
Viash Component Development
- Use
viash runfor executing components:viash run src/methods/component_name/config.vsh.yaml -- --input_train data.h5ad --output result.h5ad - Build components with Docker engine: Always specify
--engine dockerfor consistent environments - Test individual components: Use
viash test src/methods/component_name/config.vsh.yamlbefore integration - Run parallel testing: Execute
viash ns test --parallel --engine dockerfor comprehensive validation - Validate configurations: Every component must have a valid
config.vsh.yamlfile - Use test data: Always test with resources from
resources_test/directory first
Nextflow Workflow Commands
- Run workflows locally: Use
nextflow run workflow.nfwith proper parameters - Validate pipeline syntax: Execute
nextflow config workflow.nfto check configuration - Use profiles: Specify appropriate profiles with
-profile docker,testfor development - Monitor execution: Use
nextflow logto track workflow progress and debug issues - Resume failed runs: Apply
-resumeflag to continue from last successful checkpoint
Docker Integration Commands
- Build component images: Use Docker engine through Viash for consistency
- Test containerized components: Verify all dependencies are included in containers
- Push to registries: Use standardized tagging conventions for component images
- Validate environments: Ensure Python/R environments match OpenProblems specifications
Testing Guidelines
Component Testing Strategy
- Run unit tests first: Execute
viash teston individual components before integration - Test with multiple datasets: Validate components work across different spatial datasets
- Validate input/output formats: Ensure h5ad files maintain proper structure and metadata
- Test edge cases: Include empty datasets, single-cell data, and boundary conditions
- Verify Docker builds: Confirm all components build successfully in containerized environments
Integration Testing Approach
- Test complete workflows: Run end-to-end pipelines with realistic data sizes
- Validate metric calculations: Ensure accuracy metrics produce expected ranges and distributions
- Test control methods: Verify positive and negative controls behave as expected
- Cross-validate results: Compare outputs across different methods for consistency
- Performance benchmarking: Measure execution time and memory usage for scalability
Quality Assurance Checklist
- Check GitHub Actions: Ensure all CI/CD checks pass before merging
- Validate test coverage: Confirm critical code paths are tested
- Review error handling: Test failure modes and error message clarity
- Verify reproducibility: Ensure identical inputs produce identical outputs
- Test resource requirements: Validate memory and compute constraints are met
Code Style & Guidelines
Viash Component Structure
- Follow standard layout: Organize components with
config.vsh.yaml,script.py/R, andtest.py/R - Use descriptive names: Component names should clearly indicate their function and scope
- Define clear inputs/outputs: Specify all required and optional parameters with types
- Include comprehensive metadata: Add author, description, keywords, and version information
- Implement proper logging: Use structured logging for debugging and monitoring
Python Code Standards
- Follow PEP 8: Use consistent indentation, naming, and formatting
- Use type hints: Annotate function parameters and return types
- Handle AnnData objects: Follow scanpy/squidpy conventions for spatial data manipulation
- Implement error handling: Use try-catch blocks with informative error messages
- Document functions: Include docstrings with parameter descriptions and examples
R Code Standards
- Use tidyverse conventions: Apply consistent data manipulation and visualization patterns
- Handle Seurat objects: Follow best practices for spatial transcriptomics analysis
- Implement proper error handling: Use tryCatch with meaningful error messages
- Document functions: Include roxygen2 documentation for all functions
- Use consistent naming: Apply snake_case for functions and variables
Configuration Management
- Use YAML for configs: Structure configuration files with clear hierarchies
- Define resource requirements: Specify CPU, memory, and disk requirements accurately
- Include version constraints: Pin software versions for reproducibility
- Document parameters: Provide clear descriptions and default values
- Validate inputs: Implement parameter validation and type checking
Documentation Guidelines
Component Documentation
- Write clear descriptions: Explain the biological/computational problem being addressed
- Document algorithm details: Describe the core methodology and implementation approach
- Provide usage examples: Include concrete examples with sample data and parameters
- List dependencies: Document all required software, packages, and versions
- Include references: Cite relevant papers and methodological sources
Task Documentation Structure
- Define task motivation: Explain the biological significance and research gaps addressed
- Describe datasets: Detail input data types, formats, and expected characteristics
- Outline methods: List implemented methods with brief algorithmic descriptions
- Specify metrics: Define evaluation criteria and interpretation guidelines
- Document controls: Explain positive and negative control implementations
Workflow Documentation
- Create process diagrams: Visualize workflow steps and data flow
- Document parameters: Explain all configurable options and their effects
- Provide troubleshooting: Include common issues and resolution strategies
- List output formats: Describe all generated files and their contents
- Include performance notes: Document expected runtime and resource usage
API Documentation Standards
- Use OpenAPI specifications: Document REST endpoints with complete schemas
- Provide request/response examples: Include realistic data samples
- Document error codes: Explain all possible error conditions and responses
- Include authentication: Detail security requirements and token usage
- Maintain versioning: Document API changes and backwards compatibility
Collaboration & Review Guidelines
Pull Request Standards
- Create focused PRs: Address single features or bug fixes per request
- Write descriptive titles: Clearly summarize changes and their purpose
- Include comprehensive descriptions: Explain motivation, changes, and testing performed
- Add reviewers: Tag appropriate domain experts and maintainers
- Respond to feedback: Address review comments promptly and thoroughly
Code Review Process
- Review for correctness: Verify algorithmic implementation and logic
- Check for consistency: Ensure adherence to established patterns and conventions
- Validate testing: Confirm adequate test coverage and quality
- Assess documentation: Review clarity and completeness of documentation
- Consider performance: Evaluate computational efficiency and scalability
Community Engagement
- Use GitHub discussions: Engage in technical discussions and feature planning
- Participate in Discord: Join real-time conversations and collaboration
- Follow issue templates: Use structured formats for bug reports and feature requests
- Share knowledge: Contribute to documentation and community resources
- Mentor newcomers: Help onboard new contributors to the ecosystem
Quality Control & Validation
Data Quality Standards
- Validate spatial coordinates: Ensure x,y coordinates are properly formatted and scaled
- Check gene expression: Verify count matrices have appropriate ranges and distributions
- Assess metadata completeness: Confirm required annotations and sample information
- Test data integrity: Validate file formats and cross-reference identifiers
- Monitor data provenance: Track data sources and processing steps
Results Validation Process
- Cross-method comparison: Compare results across different algorithmic approaches
- Statistical validation: Apply appropriate statistical tests and multiple comparison corrections
- Biological interpretation: Ensure results align with known biological principles
- Reproducibility testing: Verify consistent results across multiple runs
- External validation: Compare against published benchmarks and literature
Performance Monitoring
- Track execution metrics: Monitor runtime, memory usage, and resource consumption
- Assess scalability: Test performance across different data sizes and complexities
- Monitor quality metrics: Track accuracy, precision, recall, and domain-specific measures
- Evaluate user experience: Gather feedback on usability and documentation quality
- Continuous improvement: Regularly review and optimize component performance