--- license: mit tags: - cancer-genomics - bioinformatics - graph-database - neo4j - distributed-computing - boinc - healthcare - genomics - fastq - blast - variant-calling - gdc-portal - tcga library_name: cancer-at-home-v2 pipeline_tag: other --- # Cancer@Home v2
Version License Python Neo4j
## ๐Ÿงฌ Overview Cancer@Home v2 is a comprehensive distributed computing platform for cancer genomics research that combines **BOINC distributed computing**, **GDC cancer data analysis**, **sequence processing (FASTQ/BLAST)**, and **Neo4j graph visualization** into a unified, easy-to-use system. Inspired by [Cancer@Home v1](https://www.herox.com/DCx/round/516/entry/23285) and [Andrew Kamal's Neo4j Dashboard](https://medium.com/neo4j/visualize-cancer-1c80a95f5bb4), this platform makes cancer genomics research accessible, distributed, and visual. ## ๐ŸŽฏ Key Features - ๐ŸŒ **Interactive Web Dashboard** - Modern UI with real-time visualizations - ๐Ÿ” **Neo4j Graph Database** - Model complex gene-mutation-patient relationships - โšก **BOINC Integration** - Distributed computing for intensive analyses - ๐Ÿ“Š **GraphQL API** - Flexible data querying - ๐Ÿงช **Bioinformatics Pipeline** - FASTQ processing, BLAST alignment, variant calling - ๐Ÿ“š **GDC Portal Integration** - Access TCGA/TARGET cancer datasets - ๐Ÿš€ **Quick Setup** - Running in under 5 minutes ## ๐Ÿ—๏ธ Architecture ``` โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ Web Dashboard (D3.js + Chart.js) โ”‚ โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค โ”‚ FastAPI Backend (REST + GraphQL) โ”‚ โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค โ”‚Neo4j โ”‚BOINC โ”‚ GDC โ”‚FASTQ โ”‚ BLAST/Variant โ”‚ โ”‚Graph โ”‚Clientโ”‚ API โ”‚ QC โ”‚ Calling โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ ``` ## ๐Ÿ“ฆ Installation ### Prerequisites - Python 3.8+ - Docker Desktop - 8GB RAM (16GB recommended) ### Quick Start **Windows:** ```powershell git clone https://huggingface.co/OpenPeerAI/CancerAtHomeV2 cd CancerAtHomeV2 .\setup.ps1 python run.py ``` **Linux/Mac:** ```bash git clone https://huggingface.co/OpenPeerAI/CancerAtHomeV2 cd CancerAtHomeV2 chmod +x setup.sh ./setup.sh python run.py ``` Then open: **http://localhost:5000** ## ๐Ÿš€ Usage ### Web Dashboard Access the interactive dashboard at http://localhost:5000 with: - **Dashboard Tab**: Overview statistics and mutation charts - **Neo4j Visualization**: Interactive graph of cancer relationships - **BOINC Tasks**: Submit and monitor distributed computing tasks - **GDC Data**: Browse and download cancer datasets - **Pipeline Tools**: Run FASTQ QC, BLAST, and variant calling ### GraphQL API Query cancer data at http://localhost:5000/graphql **Example: Get mutations in TP53 gene** ```graphql query { mutations(gene: "TP53") { mutation_id chromosome position consequence } } ``` **Example: Get patient statistics** ```graphql query { cancerStatistics(cancer_type_id: "BRCA") { total_patients total_mutations avg_mutations_per_patient } } ``` ### REST API **Database Summary:** ```bash curl http://localhost:5000/api/neo4j/summary ``` **Submit BOINC Task:** ```bash curl -X POST http://localhost:5000/api/boinc/submit \ -H "Content-Type: application/json" \ -d '{"workunit_type": "variant_calling", "input_file": "sample.fastq"}' ``` ### Python API **FASTQ Processing:** ```python from backend.pipeline import FASTQProcessor processor = FASTQProcessor() stats = processor.calculate_statistics("input.fastq") filtered = processor.quality_filter("input.fastq") ``` **Variant Calling:** ```python from backend.pipeline import VariantCaller, VariantAnalyzer caller = VariantCaller() vcf_file = caller.call_variants("alignment.bam", "reference.fa") variants = caller.filter_variants(vcf_file) analyzer = VariantAnalyzer() cancer_variants = analyzer.identify_cancer_variants(variants) tmb = analyzer.calculate_mutation_burden(variants) ``` **Neo4j Queries:** ```python from backend.neo4j import DatabaseManager db = DatabaseManager() query = """ MATCH (g:Gene {symbol: 'TP53'})<-[:AFFECTS]-(m:Mutation) RETURN m.position, m.consequence """ results = db.execute_query(query) db.close() ``` ## ๐Ÿ“Š Data Model ### Neo4j Graph Schema **Nodes:** - **Gene**: Genes with mutations (TP53, BRCA1, KRAS, etc.) - **Mutation**: Genetic variants with position and consequence - **Patient**: Individual cases with demographics - **CancerType**: Cancer classifications (BRCA, LUAD, COAD, GBM) **Relationships:** - `Gene โ† AFFECTS โ† Mutation` - `Patient โ†’ HAS_MUTATION โ†’ Mutation` - `Patient โ†’ DIAGNOSED_WITH โ†’ CancerType` ### Sample Data Included - **7 Genes**: TP53, BRAF, BRCA1, BRCA2, PIK3CA, KRAS, EGFR - **5 Mutations**: Cancer-associated variants - **5 Patients**: Representative TCGA cases - **4 Cancer Types**: BRCA, LUAD, COAD, GBM ## ๐Ÿ”ง Technology Stack - **Backend**: FastAPI, Python 3.8+ - **Database**: Neo4j 5.13 (Graph Database) - **API**: GraphQL (Strawberry), REST - **Frontend**: HTML5, CSS3, JavaScript, D3.js, Chart.js - **Bioinformatics**: Biopython, BLAST+ - **Data Source**: GDC Portal API (TCGA/TARGET) - **Infrastructure**: Docker, Docker Compose - **Distributed Computing**: BOINC Framework ## ๐Ÿ“š Documentation - [README.md](README.md) - Complete project overview - [QUICKSTART.md](QUICKSTART.md) - 5-minute setup guide - [USER_GUIDE.md](USER_GUIDE.md) - Detailed usage documentation - [GRAPHQL_EXAMPLES.md](GRAPHQL_EXAMPLES.md) - Query examples - [ARCHITECTURE.md](ARCHITECTURE.md) - System architecture - [PROJECT_SUMMARY.md](PROJECT_SUMMARY.md) - Feature overview ## ๐ŸŽ“ Use Cases 1. **Cancer Research**: Analyze genomics data with distributed computing 2. **Education**: Learn cancer genetics and bioinformatics 3. **Data Visualization**: Explore gene-mutation-patient relationships 4. **Pipeline Development**: Test bioinformatics workflows 5. **Graph Analytics**: Query complex biological networks ## ๐Ÿ”ฌ Supported Cancer Projects - **TCGA-BRCA**: Breast Cancer (1,098 cases) - **TCGA-LUAD**: Lung Adenocarcinoma (585 cases) - **TCGA-COAD**: Colon Adenocarcinoma (461 cases) - **TCGA-GBM**: Glioblastoma (617 cases) - **TARGET-AML**: Acute Myeloid Leukemia (238 cases) ## ๐Ÿ“ˆ Bioinformatics Pipeline ### FASTQ Processing - Quality control and filtering - Adapter trimming - Statistics calculation - QC report generation ### BLAST Alignment - BLASTN for nucleotide sequences - BLASTP for protein sequences - Hit filtering by identity/e-value - Homology detection ### Variant Calling - VCF generation from alignments - Quality filtering - Cancer variant identification - Tumor mutation burden (TMB) calculation ## ๐ŸŒ Access Points - **Application**: http://localhost:5000 - **API Docs**: http://localhost:5000/docs (Swagger UI) - **GraphQL**: http://localhost:5000/graphql - **Neo4j Browser**: http://localhost:7474 (neo4j/cancer123) ## ๐Ÿ› ๏ธ Configuration Edit `config.yml` to customize: ```yaml neo4j: uri: "bolt://localhost:7687" password: "cancer123" gdc: download_dir: "./data/gdc" projects: ["TCGA-BRCA", "TCGA-LUAD", "TCGA-COAD"] pipeline: fastq: quality_threshold: 20 min_length: 50 blast: evalue: 0.001 num_threads: 4 ``` ## ๐Ÿค Contributing Contributions are welcome! This project is open source under the MIT License. ### Development Setup ```bash python -m venv venv source venv/bin/activate # or venv\Scripts\activate on Windows pip install -r requirements.txt pytest test_cancer_at_home.py ``` ## ๐Ÿ“„ License MIT License - See [LICENSE](LICENSE) file Copyright (c) 2025 OpenPeer AI, Riemann Computing Inc., Bleunomics, Andrew Magdy Kamal ## ๐Ÿ™ Acknowledgments ### Inspiration - [Cancer@Home v1](https://www.herox.com/DCx/round/516/entry/23285) - HeroX DCx Challenge - [Andrew Kamal's Neo4j Cancer Visualization](https://medium.com/neo4j/visualize-cancer-1c80a95f5bb4) ### Data Sources - [Genomic Data Commons (GDC) Portal](https://portal.gdc.cancer.gov/) - The Cancer Genome Atlas (TCGA) Program - Therapeutically Applicable Research to Generate Effective Treatments (TARGET) ### Technologies - Neo4j Graph Database - BOINC Distributed Computing Project - Biopython Community - FastAPI Framework ## ๐Ÿ‘ฅ Authors - **OpenPeer AI** - Core development and architecture - **Riemann Computing Inc.** - Distributed computing integration - **Bleunomics** - Bioinformatics pipeline and genomics expertise - **Andrew Magdy Kamal** - Graph database design and visualization ## ๐Ÿ“ž Support - **Documentation**: See project documentation files - **Issues**: Check logs in `logs/cancer_at_home.log` - **Configuration**: Review `config.yml` - **Health Check**: http://localhost:5000/api/health ## ๐Ÿ”ฎ Roadmap ### Planned Features - Machine learning for mutation prediction - Multi-omics data integration (RNA-seq, proteomics) - Survival analysis and clinical outcomes - Advanced graph algorithms (PageRank, community detection) - Cloud deployment support (AWS, Azure, GCP) - Mobile-responsive design - User authentication and authorization ## ๐Ÿ“Š Statistics - **Lines of Code**: ~5,000+ - **Modules**: 9 Python modules - **API Endpoints**: 15+ REST + GraphQL - **Documentation**: 2,500+ lines - **Setup Time**: < 5 minutes - **Sample Data**: 7 genes, 5 mutations, 5 patients ## ๐ŸŽฏ Citation If you use Cancer@Home v2 in your research, please cite: ```bibtex @software{cancer_at_home_v2, title = {Cancer@Home v2: Distributed Cancer Genomics Research Platform}, author = {OpenPeer AI and Riemann Computing Inc. and Bleunomics and Andrew Magdy Kamal}, year = {2025}, url = {https://huggingface.co/OpenPeerAI/CancerAtHomeV2}, license = {MIT} } ``` ## ๐Ÿท๏ธ Tags `cancer-genomics` `bioinformatics` `neo4j` `graph-database` `distributed-computing` `boinc` `fastq` `blast` `variant-calling` `gdc-portal` `tcga` `target` `graphql` `fastapi` `python` `docker` `healthcare` `precision-medicine` `computational-biology` --- **Made with โค๏ธ by OpenPeer AI, Riemann Computing Inc., Bleunomics, and Andrew Magdy Kamal** **For cancer research, by researchers, accessible to all.**