|
|
--- |
|
|
license: mit |
|
|
tags: |
|
|
- cancer-genomics |
|
|
- bioinformatics |
|
|
- graph-database |
|
|
- neo4j |
|
|
- distributed-computing |
|
|
- boinc |
|
|
- healthcare |
|
|
- genomics |
|
|
- fastq |
|
|
- blast |
|
|
- variant-calling |
|
|
- gdc-portal |
|
|
- tcga |
|
|
library_name: cancer-at-home-v2 |
|
|
pipeline_tag: other |
|
|
metrics: |
|
|
- accuracy |
|
|
- bleu |
|
|
- bleurt |
|
|
--- |
|
|
|
|
|
# Cancer@Home v2 |
|
|
|
|
|
A distributed computing platform for cancer genomics research, combining BOINC distributed computing, GDC cancer data analysis, sequence processing (FASTQ/BLAST), and Neo4j graph visualization. |
|
|
|
|
|
## π Quick Start (5 minutes) |
|
|
|
|
|
### Prerequisites |
|
|
- Python 3.8+ |
|
|
- Docker Desktop |
|
|
- 8GB RAM minimum |
|
|
|
|
|
### Installation |
|
|
|
|
|
1. **Clone and setup** |
|
|
```bash |
|
|
cd CancerAtHome2 |
|
|
python -m venv venv |
|
|
venv\Scripts\activate # Windows |
|
|
pip install -r requirements.txt |
|
|
``` |
|
|
|
|
|
2. **Start Neo4j Database** |
|
|
```bash |
|
|
docker-compose up -d |
|
|
``` |
|
|
|
|
|
3. **Run the application** |
|
|
```bash |
|
|
python run.py |
|
|
``` |
|
|
|
|
|
4. **Open your browser** |
|
|
- Application: http://localhost:5000 |
|
|
- Neo4j Browser: http://localhost:7474 (username: neo4j, password: cancer123) |
|
|
|
|
|
## π― Features |
|
|
|
|
|
### 1. **Distributed Computing (BOINC Integration)** |
|
|
- Submit cancer research computational tasks |
|
|
- Monitor distributed workload processing |
|
|
- Real-time task status tracking |
|
|
|
|
|
### 2. **GDC Data Integration** |
|
|
- Download cancer genomics data from GDC Portal |
|
|
- Support for various cancer types (TCGA, TARGET projects) |
|
|
- Automatic data parsing and normalization |
|
|
|
|
|
### 3. **Sequence Analysis Pipeline** |
|
|
- FASTQ file processing |
|
|
- BLAST sequence alignment |
|
|
- Variant calling and annotation |
|
|
|
|
|
### 4. **Neo4j Graph Database** |
|
|
- Graph-based cancer data modeling |
|
|
- Relationships: Gene β Mutation β Patient β Cancer Type |
|
|
- Interactive graph visualization |
|
|
|
|
|
### 5. **GraphQL API** |
|
|
- Query cancer data flexibly |
|
|
- Filter by gene, mutation, patient cohort |
|
|
- Aggregate statistics |
|
|
|
|
|
### 6. **Interactive Dashboard** |
|
|
- Real-time data visualization |
|
|
- Network graphs for gene interactions |
|
|
- Mutation frequency charts |
|
|
- Patient cohort analysis |
|
|
|
|
|
## π Architecture |
|
|
|
|
|
``` |
|
|
Cancer@Home v2 |
|
|
β |
|
|
βββ Frontend (React + D3.js) |
|
|
β βββ Dashboard |
|
|
β βββ Neo4j Visualization |
|
|
β βββ Task Monitor |
|
|
β |
|
|
βββ Backend (FastAPI) |
|
|
β βββ REST API |
|
|
β βββ GraphQL Endpoint |
|
|
β βββ WebSocket (real-time updates) |
|
|
β |
|
|
βββ Data Layer |
|
|
β βββ Neo4j (Graph Database) |
|
|
β βββ BOINC Client |
|
|
β βββ GDC API Client |
|
|
β |
|
|
βββ Analysis Pipeline |
|
|
βββ FASTQ Parser |
|
|
βββ BLAST Wrapper |
|
|
βββ Variant Annotator |
|
|
``` |
|
|
|
|
|
## ποΈ Project Structure |
|
|
|
|
|
``` |
|
|
CancerAtHome2/ |
|
|
βββ backend/ |
|
|
β βββ api/ # FastAPI routes |
|
|
β βββ boinc/ # BOINC integration |
|
|
β βββ gdc/ # GDC data fetcher |
|
|
β βββ neo4j/ # Neo4j database layer |
|
|
β βββ pipeline/ # Bioinformatics pipeline |
|
|
β βββ graphql/ # GraphQL schema |
|
|
βββ frontend/ |
|
|
β βββ public/ |
|
|
β βββ src/ |
|
|
β βββ components/ # React components |
|
|
β βββ views/ # Page views |
|
|
β βββ api/ # API client |
|
|
βββ data/ # Downloaded datasets |
|
|
βββ docker-compose.yml # Neo4j container |
|
|
βββ requirements.txt # Python dependencies |
|
|
βββ run.py # Main entry point |
|
|
``` |
|
|
|
|
|
## 𧬠Data Flow |
|
|
|
|
|
1. **Data Ingestion**: Download cancer genomics data from GDC Portal |
|
|
2. **Processing**: Run FASTQ/BLAST analysis on distributed BOINC network |
|
|
3. **Storage**: Store results in Neo4j graph database |
|
|
4. **Visualization**: Query and visualize via web dashboard |
|
|
|
|
|
## π§ Configuration |
|
|
|
|
|
Edit `config.yml` to customize: |
|
|
- Neo4j connection settings |
|
|
- GDC API parameters |
|
|
- BOINC project URL |
|
|
- Analysis pipeline options |
|
|
|
|
|
## π Usage Examples |
|
|
|
|
|
### Query Mutations by Gene |
|
|
```graphql |
|
|
query { |
|
|
mutations(gene: "TP53") { |
|
|
id |
|
|
position |
|
|
consequence |
|
|
patients { |
|
|
cancerType |
|
|
stage |
|
|
} |
|
|
} |
|
|
} |
|
|
``` |
|
|
|
|
|
### Submit Analysis Task |
|
|
```python |
|
|
from backend.boinc import BOINCClient |
|
|
|
|
|
client = BOINCClient() |
|
|
task_id = client.submit_task( |
|
|
workunit_type="variant_calling", |
|
|
input_file="sample.fastq" |
|
|
) |
|
|
``` |
|
|
|
|
|
## π€ Inspired By |
|
|
|
|
|
- [Cancer@Home v1](https://www.herox.com/DCx/round/516/entry/23285) - Distributed cancer research |
|
|
- [Neo4j Cancer Visualization](https://medium.com/neo4j/visualize-cancer-1c80a95f5bb4) - Graph-based cancer data modeling |
|
|
|
|
|
## π License |
|
|
|
|
|
MIT License |
|
|
|
|
|
## π Support |
|
|
|
|
|
For issues or questions, please open a Huggingface or GitHub issue. |