File size: 8,039 Bytes
d4bd444 c84671e 29cd566 d4bd444 2d5b71d d4bd444 2d5b71d d4bd444 35e6a9d 10be57c 35e6a9d e07e6e3 35e6a9d 10be57c 35e6a9d e07e6e3 35e6a9d 10be57c 35e6a9d | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 | ---
title: K R&D Lab — Cancer Research Suite
short_description: Real‑time data integration for precision oncology
emoji: 🧬
colorFrom: indigo
colorTo: blue
sdk: gradio
sdk_version: "5.22.0"
python_version: "3.10"
app_file: app.py
pinned: false
---
# K R&D Lab — Cancer Research Suite
**Author:** Oksana Kolisnyk | [kosatiks-group.pp.ua](https://kosatiks-group.pp.ua)
**DEMO Repo:** [github.com/TEZv/K-RnD-Lab-PHYLO-DEMO_03-2026](https://github.com/TEZv/K-RnD-Lab-PHYLO-DEMO_03-2026)
**ORCID:** 0009-0003-5780-2290
**Generated:** 2026-03-07
---
## Overview
A Gradio-based research suite combining live cancer data APIs with educational simulation tools. Designed for researchers, ML engineers, and students working at the intersection of cancer biology, drug delivery, and precision oncology.
**Two tab groups:**
- **Group A — Real Data Tools** (5 + 1 tabs): Live APIs, real results, never hallucinated
- **Group B — Learning Sandbox** (5 tabs): Rule-based simulations, clearly labeled ⚠️ SIMULATED
---
## File Structure
```
K-RnD-Lab/
├── app.py # Main Gradio application (all 10 tabs + Lab Journal)
├── chatbot.py # RAG chatbot module (sentence-transformers + FAISS)
├── requirements.txt # Python dependencies
├── README.md # This file
├── research_gaps.md # Part 2: 10 underexplored research directions
├── learning_cases.md # Part 3: 5 guided investigation cases
└── data_sources.md # All API endpoints and data sources
```
Runtime-generated:
```
├── cache/ # API response cache (JSON, 24h TTL)
└── lab_journal.csv # Auto-logged research journal
```
---
## Quick Start
### Local
```bash
# 1. Clone
git clone https://github.com/TEZv/K-RnD-Lab-PHYLO-DEMO_03-2026
cd K-RnD-Lab-PHYLO-DEMO_03-2026
# 2. Install dependencies
pip install -r requirements.txt
# 3. Run
python app.py
# → Opens at http://localhost:7860
```
### HuggingFace Spaces
1. Create a new Space: **Gradio** SDK, Python 3.10+
2. Upload `app.py`, `chatbot.py`, `requirements.txt`
3. Space auto-deploys — no secrets or API keys needed
> The RAG chatbot downloads the `all-MiniLM-L6-v2` model (~80 MB) on first run.
> Subsequent runs use the HF cache. Allow ~60s for first startup.
---
## Tab Reference
### Group A — Real Data Tools
| Tab | Function | APIs Used |
|-----|----------|-----------|
| **A1 — Gray Zones Explorer** | Heatmap of biological process × cancer type paper counts; top 5 gaps | PubMed E-utilities |
| **A2 — Understudied Target Finder** | Essential genes with high gap index (essentiality / log(papers+1)) | OpenTargets GraphQL, PubMed, ClinicalTrials.gov v2 |
| **A3 — Real Variant Lookup** | ClinVar classification + gnomAD allele frequency for any HGVS variant | ClinVar E-utilities, gnomAD GraphQL |
| **A4 — Literature Gap Finder** | Papers/year chart with gap detection (zero and low-activity years) | PubMed E-utilities |
| **A5 — Druggable Orphans** | Essential cancer genes with no approved drug and no active trial | OpenTargets GraphQL, ClinicalTrials.gov v2 |
| **A6 — Research Assistant** | RAG chatbot indexed on 20 curated papers; confidence-flagged answers | sentence-transformers + FAISS (local) |
### Group B — Learning Sandbox ⚠️ SIMULATED
| Tab | Function | Model |
|-----|----------|-------|
| **B1 — miRNA Explorer** | Predicted miRNA binding energy + expression in BRCA1/BRCA2/TP53-mutant tumors | Curated lookup table |
| **B2 — siRNA Targets** | siRNA efficacy + off-target risk for LUAD/BRCA/COAD | Curated efficacy estimates |
| **B3 — LNP Corona** | Protein corona composition from formulation sliders (PEG, ionizable lipid, size) | Langmuir adsorption model |
| **B4 — Flow Corona** | Vroman effect kinetics (competitive albumin/ApoE adsorption) | Competitive Langmuir ODE |
| **B5 — Variant Concepts** | ACMG/AMP classification criteria and codes by tier | ACMG 2015 rule set |
### Shared — Lab Journal (sidebar)
- Auto-logs every tab run with timestamp, action, and result summary
- Manual note field for researcher observations
- Exports to `lab_journal.csv`
- Click **Refresh Journal** to view last 20 entries
---
## Supported Cancer Types
| Code | Full Name | EFO ID |
|------|-----------|--------|
| GBM | Glioblastoma multiforme | EFO_0000519 |
| PDAC | Pancreatic ductal adenocarcinoma | EFO_0002618 |
| SCLC | Small cell lung cancer | EFO_0000702 |
| UVM | Uveal melanoma | EFO_0004339 |
| DIPG | Diffuse intrinsic pontine glioma | EFO_0009708 |
| ACC | Adrenocortical carcinoma | EFO_0003060 |
| MCC | Merkel cell carcinoma | EFO_0005558 |
| PCNSL | Primary CNS lymphoma | EFO_0005543 |
| Pediatric AML | Pediatric acute myeloid leukemia | EFO_0000222 |
---
## Biological Processes Screened (Tab A1)
autophagy · ferroptosis · protein corona · RNA splicing · phase separation · m6A · circRNA · synthetic lethality · immune exclusion · enhancer hijacking · lncRNA regulation · metabolic reprogramming · exosome biogenesis · senescence · mitophagy · liquid-liquid phase separation · cryptic splicing · proteostasis · redox biology · translation regulation
---
## RAG Chatbot Details (Tab A6)
- **Model:** `sentence-transformers/all-MiniLM-L6-v2` (80 MB, CPU-only, no GPU needed)
- **Index:** FAISS `IndexFlatIP` with L2-normalized embeddings (cosine similarity)
- **Corpus:** 20 curated paper abstracts on LNP delivery, protein corona, cancer variants, liquid biopsy
- **Confidence flags:**
- 🟢 HIGH — retrieval score ≥ 0.55, ≥ 2 matching papers
- 🟡 MEDIUM — score 0.35–0.55
- 🔴 SPECULATIVE — score < 0.35
- **Out-of-scope:** Returns explicit "not in indexed papers" message — never fabricates
---
## Caching & Rate Limiting
- All API responses cached in `./cache/` as JSON files (24h TTL)
- PubMed: `time.sleep(0.34)` between requests (≤3 req/sec, NCBI policy)
- All API calls wrapped in `try/except` → returns `"Data unavailable"` on failure, never fake data
- Cache can be cleared by deleting `./cache/` directory
---
## Data Attribution
Every result panel displays a source note:
```
Source: [API name] | Date: YYYY-MM-DD
```
Full API documentation: see `data_sources.md`
---
## Technical Notes
### DepMap Essentiality Scores
Per DepMap convention: **negative scores = essential genes** (knockout kills cells).
The app inverts scores before display: `essentiality_displayed = -raw_score`
so that **positive values = more essential** (intuitive direction).
Gap index = `essentiality_inverted / log(papers + 1)`
### Variant Lookup Policy
Tab A3 strictly follows a no-hallucination policy:
- If a variant is not found in ClinVar → displays: *"Not in database. Do not interpret."*
- If gnomAD API fails → displays: *"Data unavailable — API error."*
- Never infers, guesses, or extrapolates variant classifications
### SIMULATED Data Policy
All Group B tabs display a prominent ⚠️ SIMULATED banner.
Simulated results must not be used for:
- Clinical decision-making
- Publication without independent experimental validation
- Drug development or patient care
---
## Dependencies
| Package | Version | Purpose |
|---------|---------|---------|
| gradio | ≥4.20.0 | UI framework |
| numpy | ≥1.24.0 | Numerical computing |
| pandas | ≥2.0.0 | Data tables |
| matplotlib | ≥3.7.0 | Visualizations |
| Pillow | ≥10.0.0 | Image handling |
| requests | ≥2.31.0 | HTTP API calls |
| sentence-transformers | ≥2.6.0 | RAG embeddings |
| faiss-cpu | ≥1.7.4 | Vector similarity search |
| torch | ≥2.0.0 | sentence-transformers backend |
---
## License
Research and educational use. All real-data results sourced from public APIs (PubMed, OpenTargets, ClinVar, gnomAD, ClinicalTrials.gov) under their respective open-access licenses. See `data_sources.md` for details.
---
## Citation
```
Kolisnyk O. K R&D Lab — Cancer Research Suite. 2026.
GitHub: github.com/TEZv/K-RnD-Lab-PHYLO-DEMO_03-2026
ORCID: 0009-0003-5780-2290
```
|