File size: 8,963 Bytes
35e6a9d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
# Data Sources & API Endpoints
**K R&D Lab — Cancer Research Suite**
Author: Oksana Kolisnyk | kosatiks-group.pp.ua
Repo: github.com/TEZv/K-RnD-Lab-PHYLO-03_2026
Generated: 2026-03-07

---

## Real Data APIs (Group A Tabs)

### 1. PubMed E-utilities (NCBI)
| Property | Value |
|----------|-------|
| **Base URL** | `https://eutils.ncbi.nlm.nih.gov/entrez/eutils` |
| **Auth** | None required (free, no API key) |
| **Rate limit** | 3 requests/sec without key; enforced via `time.sleep(0.34)` |
| **Endpoints used** | `esearch.fcgi` — search & count; `esummary.fcgi` — fetch metadata |
| **Used in tabs** | A1 (paper counts per process), A4 (papers per year), A2 (gene paper counts) |
| **Docs** | https://www.ncbi.nlm.nih.gov/books/NBK25501/ |
| **Terms of use** | https://www.ncbi.nlm.nih.gov/home/about/policies/ |

**Example call (paper count):**
```
GET https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi
  ?db=pubmed
  &term="ferroptosis" AND "GBM"[tiab]
  &rettype=count
  &retmode=json
```

---

### 2. ClinVar E-utilities (NCBI)
| Property | Value |
|----------|-------|
| **Base URL** | `https://eutils.ncbi.nlm.nih.gov/entrez/eutils` |
| **Auth** | None required |
| **Rate limit** | Same as PubMed (3 req/sec) |
| **Endpoints used** | `esearch.fcgi?db=clinvar` — variant search; `esummary.fcgi?db=clinvar` — classification |
| **Used in tabs** | A3 (Real Variant Lookup) |
| **Docs** | https://www.ncbi.nlm.nih.gov/clinvar/docs/api_http/ |
| **Data policy** | All ClinVar data is public domain |

**Example call:**
```
GET https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi
  ?db=clinvar
  &term=NM_007294.4:c.5266dupC
  &retmode=json
  &retmax=5
```

---

### 3. OpenTargets Platform GraphQL API
| Property | Value |
|----------|-------|
| **Base URL** | `https://api.platform.opentargets.org/api/v4/graphql` |
| **Auth** | None required (free, open access) |
| **Rate limit** | No hard limit; reasonable use expected |
| **Endpoints used** | GraphQL POST — disease associations, tractability, known drugs |
| **Used in tabs** | A1 (process associations), A2 (target gap index), A5 (druggable orphans) |
| **Docs** | https://platform-docs.opentargets.org/data-access/graphql-api |
| **Data release** | Updated quarterly; cite as "Open Targets Platform [release date]" |
| **License** | CC0 (public domain) |

**Example query (disease-associated targets):**
```graphql
query AssocTargets($efoId: String!, $size: Int!) {
  disease(efoId: $efoId) {
    associatedTargets(page: {index: 0, size: $size}) {
      rows {
        target { approvedSymbol approvedName }
        score
      }
    }
  }
}
```

**EFO IDs used:**
| Cancer | EFO ID |
|--------|--------|
| GBM | EFO_0000519 |
| PDAC | EFO_0002618 |
| SCLC | EFO_0000702 |
| UVM | EFO_0004339 |
| DIPG | EFO_0009708 |
| ACC | EFO_0003060 |
| MCC | EFO_0005558 |
| PCNSL | EFO_0005543 |
| Pediatric AML | EFO_0000222 |

---

### 4. gnomAD GraphQL API
| Property | Value |
|----------|-------|
| **Base URL** | `https://gnomad.broadinstitute.org/api` |
| **Auth** | None required |
| **Rate limit** | No hard limit; reasonable use expected |
| **Endpoints used** | GraphQL POST — `variantSearch` query |
| **Dataset** | `gnomad_r4` (v4, 807,162 individuals) |
| **Used in tabs** | A3 (Real Variant Lookup — allele frequency) |
| **Docs** | https://gnomad.broadinstitute.org/api |
| **License** | ODC Open Database License (ODbL) |

**Example query:**
```graphql
query VariantSearch($query: String!, $dataset: DatasetId!) {
  variantSearch(query: $query, dataset: $dataset) {
    variant_id
    rsids
    exome { af }
    genome { af }
  }
}
```

---

### 5. ClinicalTrials.gov API v2
| Property | Value |
|----------|-------|
| **Base URL** | `https://clinicaltrials.gov/api/v2` |
| **Auth** | None required |
| **Rate limit** | No hard limit documented; polite use recommended |
| **Endpoints used** | `GET /studies` — trial search by gene + cancer type |
| **Used in tabs** | A2 (trial counts per gene), A5 (orphan target trial check) |
| **Docs** | https://clinicaltrials.gov/data-api/api |
| **Data policy** | Public domain (US government) |

**Example call:**
```
GET https://clinicaltrials.gov/api/v2/studies
  ?query.term=KRAS GBM
  &pageSize=1
  &format=json
```

---

### 6. DepMap Public Data
| Property | Value |
|----------|-------|
| **Source** | Broad Institute DepMap Portal |
| **URL** | https://depmap.org/portal/download/all/ |
| **File** | `CRISPR_gene_effect.csv` (Chronos scores) |
| **Auth** | None required (public download) |
| **Used in tabs** | A2 (essentiality scores for gap index) |
| **Score convention** | **Negative = essential** (−1 = median essential gene effect); inverted in app per know-how guide |
| **License** | CC BY 4.0 |
| **Citation** | Broad Institute DepMap, [release]. DepMap Public [release]. figshare. |

> **Implementation note:** The app uses a curated reference gene set with representative scores as a lightweight proxy. For full analysis, download the complete CRISPR_gene_effect.csv (~500 MB) from depmap.org and replace `_load_depmap_sample()` in `app.py`.

---

## Simulated Data Sources (Group B Tabs)

All Group B tabs use **rule-based computational models** — no external APIs.

| Tab | Model Type | Basis |
|-----|-----------|-------|
| B1 — miRNA Explorer | Curated lookup table | Published miRNA-target databases (miRDB, TargetScan concepts) |
| B2 — siRNA Targets | Curated efficacy estimates | Published siRNA screen literature |
| B3 — LNP Corona | Langmuir adsorption model | Corona proteomics literature (Monopoli et al. 2012; Lundqvist et al. 2017) |
| B4 — Flow Corona | Competitive Langmuir kinetics | Vroman effect literature (Vroman 1962; Hirsh et al. 2013) |
| B5 — Variant Concepts | ACMG/AMP 2015 rule set | Richards et al. 2015 ACMG guidelines |

> ⚠️ All Group B outputs are labeled **SIMULATED** in the UI and must not be used for clinical or research decisions.

---

## RAG Chatbot (Tab A6)

| Property | Value |
|----------|-------|
| **Embedding model** | `all-MiniLM-L6-v2` (sentence-transformers) |
| **Model size** | ~80 MB, CPU-compatible |
| **Vector index** | FAISS `IndexFlatIP` (cosine similarity on L2-normalized vectors) |
| **Corpus** | 20 curated paper abstracts (see `chatbot.py` `PAPER_CORPUS`) |
| **Source** | PubMed abstracts (public domain) |
| **No external API** | Fully offline after model download |

**20 Indexed PMIDs** *(all verified against PubMed esummary + efetch, 2026-03-07):*
| PMID | First Author | Topic | Journal | Year |
|------|-------------|-------|---------|------|
| 34394960 | Hou X | LNP mRNA delivery review | Nat Rev Mater | 2021 |
| 32251383 | Cheng Q | SORT LNPs organ selectivity | Nat Nanotechnol | 2020 |
| 29653760 | Sabnis S | Novel amino lipid series for mRNA | Mol Ther | 2018 |
| 22782619 | Jayaraman M | Ionizable lipid siRNA LNP potency | Angew Chem Int Ed | 2012 |
| 33208369 | Rosenblum D | CRISPR-Cas9 LNP cancer therapy | Sci Adv | 2020 |
| 18809927 | Lundqvist M | Nanoparticle size/surface protein corona | PNAS | 2008 |
| 22086677 | Walkey CD | Nanomaterial-protein interactions | Chem Soc Rev | 2012 |
| 31565943 | Park M | Accessible surface area nanoparticle corona | Nano Lett | 2019 |
| 33754708 | Sebastiani F | ApoE binding drives LNP rearrangement | ACS Nano | 2021 |
| 20461061 | Akinc A | Endogenous ApoE-mediated LNP liver delivery | Mol Ther | 2010 |
| 30096302 | Bailey MH | Cancer driver genes TCGA pan-cancer | Cell | 2018 |
| 30311387 | Landrum MJ | ClinVar at five years | Hum Mutat | 2018 |
| 32461654 | Karczewski KJ | gnomAD mutational constraint 141,456 humans | Nature | 2020 |
| 27328919 | Bouaoun L | TP53 variations IARC database | Hum Mutat | 2016 |
| 31820981 | Lanman BA | KRAS G12C covalent inhibitor AMG 510 | J Med Chem | 2020 |
| 28678784 | Sahin U | Personalized RNA mutanome vaccines | Nature | 2017 |
| 31348638 | Kozma GT | Anti-PEG IgM complement activation LNP | ACS Nano | 2019 |
| 33016924 | Cafri G | mRNA neoantigen T cell immunity GI cancer | J Clin Invest | 2020 |
| 31142840 | Cristiano S | Genome-wide cfDNA fragmentation in cancer | Nature | 2019 |
| 33883548 | Larson MH | Cell-free transcriptome tissue biomarkers | Nat Commun | 2021 |

---

## Caching System

All real API calls are cached locally to reduce latency and respect rate limits.

| Property | Value |
|----------|-------|
| **Cache directory** | `./cache/` |
| **TTL** | 24 hours |
| **Key format** | `{endpoint}_{md5(query)}.json` |
| **Format** | JSON |
| **Invalidation** | Automatic on TTL expiry; manual by deleting `./cache/` |

---

## Lab Journal

| Property | Value |
|----------|-------|
| **File** | `./lab_journal.csv` |
| **Format** | CSV (timestamp, tab, action, result_summary, note) |
| **Auto-logged** | Every tab run automatically logs an entry |
| **Manual notes** | Via sidebar note field |

---
*Data Sources documented by K R&D Lab Cancer Research Suite | 2026-03-07*