File size: 1,149 Bytes
349aa7a | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 | # BioReasoning Data Curation
Jupyter notebooks for processing genetic variant data and creating ML datasets for biological reasoning tasks.
## Notebooks
**Core Analysis**
- `BioReasoning_DataCuration_KEGG.ipynb` - KEGG pathway analysis with Claude API
- `Clinvar_Coding.ipynb` - ClinVar variant processing and gene mapping
- `Clinvar_SNV_Non_SNV.ipynb` - SNV/structural variant datasets with VEP annotations
**KEGG Pipeline**
- `KEGG_Data_1.ipynb` - KEGG network data processing and variant identification
- `KEGG_Data_2.ipynb` - Variant parsing and sequence generation
- `KEGG_Data_3.ipynb` - Final ML dataset creation with Q&A pairs
**Variant Prediction**
- `VEP.ipynb` - Variant effect prediction datasets (ClinVar, OMIM, eQTL)
## Setup
```bash
brew install brewsci/bio/edirect # For ClinVar (macOS)
export ANTHROPIC_API_KEY="your-key" # For KEGG analysis
```
## Usage
Each notebook has a configuration section - update paths/keys as needed, then run sequentially.
**Key Outputs:**
- KEGG biological reasoning datasets
- ClinVar variant-disease associations
- VEP prediction task datasets
- Genomic sequences with variant context
|