Spaces:
Sleeping
Training Tutorial - Tool Extraction Summary
Source Information
- Tutorial: GLUE model training workflow
- Source URL: https://github.com/gao-lab/GLUE/blob/master/docs/training.ipynb
- Notebook: notebooks/training/training_execution_final.ipynb
- Output File: src/tools/training.py
Extracted Tools
1. glue_configure_datasets
Purpose: Configure RNA-seq and ATAC-seq datasets for GLUE model training
When to use: First step in GLUE workflow after preprocessing; prepares datasets for model training
Inputs:
rna_path: Preprocessed RNA-seq h5ad fileatac_path: Preprocessed ATAC-seq h5ad fileguidance_path: Guidance graph file- Configuration parameters (prob_model, use_highly_variable, etc.)
Outputs:
- Configured RNA h5ad file
- Configured ATAC h5ad file
- HVF-filtered guidance graph
Tutorial Section: "Configure data"
2. glue_train_model
Purpose: Train GLUE model for multi-omics integration
When to use: After configuring datasets; core model training step
Inputs:
rna_path: Configured RNA-seq h5ad fileatac_path: Configured ATAC-seq h5ad fileguidance_hvf_path: HVF-filtered guidance graphtraining_dir: Directory for model snapshots and logs (optional)
Outputs:
- Trained GLUE model (.dill file)
- Training logs directory
Tutorial Section: "Train GLUE model"
3. glue_check_integration_consistency
Purpose: Evaluate integration quality with consistency scores
When to use: After model training to validate integration quality
Inputs:
model_path: Trained GLUE model filerna_path: Configured RNA-seq h5ad fileatac_path: Configured ATAC-seq h5ad fileguidance_hvf_path: HVF-filtered guidance graph
Outputs:
- Consistency scores table (CSV)
- Consistency plot (PNG)
Tutorial Section: "Check integration diagnostics"
Interpretation: Consistency scores above 0.05 indicate reliable integration
4. glue_generate_embeddings
Purpose: Generate cell and feature embeddings from trained GLUE model and visualize alignment
When to use: After successful model training and validation; produces final embeddings for downstream analysis
Inputs:
model_path: Trained GLUE model filerna_path: Configured RNA-seq h5ad fileatac_path: Configured ATAC-seq h5ad fileguidance_hvf_path: HVF-filtered guidance graphcolor_vars: Variables to color UMAP by (default: ["cell_type", "domain"])
Outputs:
- RNA h5ad with cell and feature embeddings
- ATAC h5ad with cell and feature embeddings
- HVF guidance graph
- UMAP visualization (PNG)
Tutorial Section: "Apply model for cell and feature embedding"
Typical Workflow
1. glue_configure_datasets
β (produces configured h5ad files + HVF guidance graph)
2. glue_train_model
β (produces trained model)
3. glue_check_integration_consistency
β (validates integration quality)
4. glue_generate_embeddings
β (produces final embeddings for downstream analysis)
Key Design Decisions
- Parameter Preservation: All function calls exactly match the tutorial - no additional parameters added
- Structure Preservation: Data structures like lists are preserved exactly as in tutorial
- Input Design: All tools use file paths as primary inputs for maximum reusability
- Workflow Integration: Tools designed for sequential execution matching tutorial flow
- Output Completeness: All code-generated figures and essential data are saved automatically
Quality Validation
All 4 tools passed comprehensive quality review on first iteration:
- β Tool design validation
- β Input/output validation
- β Tutorial logic adherence validation
- β Implementation quality checks
- β Syntax and import verification
Testing Readiness
The implementation is production-ready and follows all extraction guidelines:
- Conservative approach with exact tutorial fidelity
- Scientific rigor maintained throughout
- Real-world applicability for user data
- No mock data or demonstration code
- Ready for testing phase