CancerAtHomeV2 / README.md
Mentors4EDU's picture
Update README.md
9a93226 verified
---
license: mit
tags:
- cancer-genomics
- bioinformatics
- graph-database
- neo4j
- distributed-computing
- boinc
- healthcare
- genomics
- fastq
- blast
- variant-calling
- gdc-portal
- tcga
library_name: cancer-at-home-v2
pipeline_tag: other
metrics:
- accuracy
- bleu
- bleurt
---
# Cancer@Home v2
A distributed computing platform for cancer genomics research, combining BOINC distributed computing, GDC cancer data analysis, sequence processing (FASTQ/BLAST), and Neo4j graph visualization.
## πŸš€ Quick Start (5 minutes)
### Prerequisites
- Python 3.8+
- Docker Desktop
- 8GB RAM minimum
### Installation
1. **Clone and setup**
```bash
cd CancerAtHome2
python -m venv venv
venv\Scripts\activate # Windows
pip install -r requirements.txt
```
2. **Start Neo4j Database**
```bash
docker-compose up -d
```
3. **Run the application**
```bash
python run.py
```
4. **Open your browser**
- Application: http://localhost:5000
- Neo4j Browser: http://localhost:7474 (username: neo4j, password: cancer123)
## 🎯 Features
### 1. **Distributed Computing (BOINC Integration)**
- Submit cancer research computational tasks
- Monitor distributed workload processing
- Real-time task status tracking
### 2. **GDC Data Integration**
- Download cancer genomics data from GDC Portal
- Support for various cancer types (TCGA, TARGET projects)
- Automatic data parsing and normalization
### 3. **Sequence Analysis Pipeline**
- FASTQ file processing
- BLAST sequence alignment
- Variant calling and annotation
### 4. **Neo4j Graph Database**
- Graph-based cancer data modeling
- Relationships: Gene β†’ Mutation β†’ Patient β†’ Cancer Type
- Interactive graph visualization
### 5. **GraphQL API**
- Query cancer data flexibly
- Filter by gene, mutation, patient cohort
- Aggregate statistics
### 6. **Interactive Dashboard**
- Real-time data visualization
- Network graphs for gene interactions
- Mutation frequency charts
- Patient cohort analysis
## πŸ“Š Architecture
```
Cancer@Home v2
β”‚
β”œβ”€β”€ Frontend (React + D3.js)
β”‚ β”œβ”€β”€ Dashboard
β”‚ β”œβ”€β”€ Neo4j Visualization
β”‚ └── Task Monitor
β”‚
β”œβ”€β”€ Backend (FastAPI)
β”‚ β”œβ”€β”€ REST API
β”‚ β”œβ”€β”€ GraphQL Endpoint
β”‚ └── WebSocket (real-time updates)
β”‚
β”œβ”€β”€ Data Layer
β”‚ β”œβ”€β”€ Neo4j (Graph Database)
β”‚ β”œβ”€β”€ BOINC Client
β”‚ └── GDC API Client
β”‚
└── Analysis Pipeline
β”œβ”€β”€ FASTQ Parser
β”œβ”€β”€ BLAST Wrapper
└── Variant Annotator
```
## πŸ—‚οΈ Project Structure
```
CancerAtHome2/
β”œβ”€β”€ backend/
β”‚ β”œβ”€β”€ api/ # FastAPI routes
β”‚ β”œβ”€β”€ boinc/ # BOINC integration
β”‚ β”œβ”€β”€ gdc/ # GDC data fetcher
β”‚ β”œβ”€β”€ neo4j/ # Neo4j database layer
β”‚ β”œβ”€β”€ pipeline/ # Bioinformatics pipeline
β”‚ └── graphql/ # GraphQL schema
β”œβ”€β”€ frontend/
β”‚ β”œβ”€β”€ public/
β”‚ └── src/
β”‚ β”œβ”€β”€ components/ # React components
β”‚ β”œβ”€β”€ views/ # Page views
β”‚ └── api/ # API client
β”œβ”€β”€ data/ # Downloaded datasets
β”œβ”€β”€ docker-compose.yml # Neo4j container
β”œβ”€β”€ requirements.txt # Python dependencies
└── run.py # Main entry point
```
## 🧬 Data Flow
1. **Data Ingestion**: Download cancer genomics data from GDC Portal
2. **Processing**: Run FASTQ/BLAST analysis on distributed BOINC network
3. **Storage**: Store results in Neo4j graph database
4. **Visualization**: Query and visualize via web dashboard
## πŸ”§ Configuration
Edit `config.yml` to customize:
- Neo4j connection settings
- GDC API parameters
- BOINC project URL
- Analysis pipeline options
## πŸ“– Usage Examples
### Query Mutations by Gene
```graphql
query {
mutations(gene: "TP53") {
id
position
consequence
patients {
cancerType
stage
}
}
}
```
### Submit Analysis Task
```python
from backend.boinc import BOINCClient
client = BOINCClient()
task_id = client.submit_task(
workunit_type="variant_calling",
input_file="sample.fastq"
)
```
## 🀝 Inspired By
- [Cancer@Home v1](https://www.herox.com/DCx/round/516/entry/23285) - Distributed cancer research
- [Neo4j Cancer Visualization](https://medium.com/neo4j/visualize-cancer-1c80a95f5bb4) - Graph-based cancer data modeling
## πŸ“„ License
MIT License
## πŸ›Ÿ Support
For issues or questions, please open a Huggingface or GitHub issue.