| # Data Sources & API Endpoints |
| **K R&D Lab — Cancer Research Suite** |
| Author: Oksana Kolisnyk | kosatiks-group.pp.ua |
| Repo: github.com/TEZv/K-RnD-Lab-PHYLO-03_2026 |
| Generated: 2026-03-07 |
| |
| --- |
| |
| ## Real Data APIs (Group A Tabs) |
| |
| ### 1. PubMed E-utilities (NCBI) |
| | Property | Value | |
| |----------|-------| |
| | **Base URL** | `https://eutils.ncbi.nlm.nih.gov/entrez/eutils` | |
| | **Auth** | None required (free, no API key) | |
| | **Rate limit** | 3 requests/sec without key; enforced via `time.sleep(0.34)` | |
| | **Endpoints used** | `esearch.fcgi` — search & count; `esummary.fcgi` — fetch metadata | |
| | **Used in tabs** | A1 (paper counts per process), A4 (papers per year), A2 (gene paper counts) | |
| | **Docs** | https://www.ncbi.nlm.nih.gov/books/NBK25501/ | |
| | **Terms of use** | https://www.ncbi.nlm.nih.gov/home/about/policies/ | |
| |
| **Example call (paper count):** |
| ``` |
| GET https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi |
| ?db=pubmed |
| &term="ferroptosis" AND "GBM"[tiab] |
| &rettype=count |
| &retmode=json |
| ``` |
| |
| --- |
| |
| ### 2. ClinVar E-utilities (NCBI) |
| | Property | Value | |
| |----------|-------| |
| | **Base URL** | `https://eutils.ncbi.nlm.nih.gov/entrez/eutils` | |
| | **Auth** | None required | |
| | **Rate limit** | Same as PubMed (3 req/sec) | |
| | **Endpoints used** | `esearch.fcgi?db=clinvar` — variant search; `esummary.fcgi?db=clinvar` — classification | |
| | **Used in tabs** | A3 (Real Variant Lookup) | |
| | **Docs** | https://www.ncbi.nlm.nih.gov/clinvar/docs/api_http/ | |
| | **Data policy** | All ClinVar data is public domain | |
|
|
| **Example call:** |
| ``` |
| GET https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi |
| ?db=clinvar |
| &term=NM_007294.4:c.5266dupC |
| &retmode=json |
| &retmax=5 |
| ``` |
|
|
| --- |
|
|
| ### 3. OpenTargets Platform GraphQL API |
| | Property | Value | |
| |----------|-------| |
| | **Base URL** | `https://api.platform.opentargets.org/api/v4/graphql` | |
| | **Auth** | None required (free, open access) | |
| | **Rate limit** | No hard limit; reasonable use expected | |
| | **Endpoints used** | GraphQL POST — disease associations, tractability, known drugs | |
| | **Used in tabs** | A1 (process associations), A2 (target gap index), A5 (druggable orphans) | |
| | **Docs** | https://platform-docs.opentargets.org/data-access/graphql-api | |
| | **Data release** | Updated quarterly; cite as "Open Targets Platform [release date]" | |
| | **License** | CC0 (public domain) | |
|
|
| **Example query (disease-associated targets):** |
| ```graphql |
| query AssocTargets($efoId: String!, $size: Int!) { |
| disease(efoId: $efoId) { |
| associatedTargets(page: {index: 0, size: $size}) { |
| rows { |
| target { approvedSymbol approvedName } |
| score |
| } |
| } |
| } |
| } |
| ``` |
|
|
| **EFO IDs used:** |
| | Cancer | EFO ID | |
| |--------|--------| |
| | GBM | EFO_0000519 | |
| | PDAC | EFO_0002618 | |
| | SCLC | EFO_0000702 | |
| | UVM | EFO_0004339 | |
| | DIPG | EFO_0009708 | |
| | ACC | EFO_0003060 | |
| | MCC | EFO_0005558 | |
| | PCNSL | EFO_0005543 | |
| | Pediatric AML | EFO_0000222 | |
| |
| --- |
| |
| ### 4. gnomAD GraphQL API |
| | Property | Value | |
| |----------|-------| |
| | **Base URL** | `https://gnomad.broadinstitute.org/api` | |
| | **Auth** | None required | |
| | **Rate limit** | No hard limit; reasonable use expected | |
| | **Endpoints used** | GraphQL POST — `variantSearch` query | |
| | **Dataset** | `gnomad_r4` (v4, 807,162 individuals) | |
| | **Used in tabs** | A3 (Real Variant Lookup — allele frequency) | |
| | **Docs** | https://gnomad.broadinstitute.org/api | |
| | **License** | ODC Open Database License (ODbL) | |
|
|
| **Example query:** |
| ```graphql |
| query VariantSearch($query: String!, $dataset: DatasetId!) { |
| variantSearch(query: $query, dataset: $dataset) { |
| variant_id |
| rsids |
| exome { af } |
| genome { af } |
| } |
| } |
| ``` |
|
|
| --- |
|
|
| ### 5. ClinicalTrials.gov API v2 |
| | Property | Value | |
| |----------|-------| |
| | **Base URL** | `https://clinicaltrials.gov/api/v2` | |
| | **Auth** | None required | |
| | **Rate limit** | No hard limit documented; polite use recommended | |
| | **Endpoints used** | `GET /studies` — trial search by gene + cancer type | |
| | **Used in tabs** | A2 (trial counts per gene), A5 (orphan target trial check) | |
| | **Docs** | https://clinicaltrials.gov/data-api/api | |
| | **Data policy** | Public domain (US government) | |
|
|
| **Example call:** |
| ``` |
| GET https://clinicaltrials.gov/api/v2/studies |
| ?query.term=KRAS GBM |
| &pageSize=1 |
| &format=json |
| ``` |
|
|
| --- |
|
|
| ### 6. DepMap Public Data |
| | Property | Value | |
| |----------|-------| |
| | **Source** | Broad Institute DepMap Portal | |
| | **URL** | https://depmap.org/portal/download/all/ | |
| | **File** | `CRISPR_gene_effect.csv` (Chronos scores) | |
| | **Auth** | None required (public download) | |
| | **Used in tabs** | A2 (essentiality scores for gap index) | |
| | **Score convention** | **Negative = essential** (−1 = median essential gene effect); inverted in app per know-how guide | |
| | **License** | CC BY 4.0 | |
| | **Citation** | Broad Institute DepMap, [release]. DepMap Public [release]. figshare. | |
|
|
| > **Implementation note:** The app uses a curated reference gene set with representative scores as a lightweight proxy. For full analysis, download the complete CRISPR_gene_effect.csv (~500 MB) from depmap.org and replace `_load_depmap_sample()` in `app.py`. |
| |
| --- |
| |
| ## Simulated Data Sources (Group B Tabs) |
| |
| All Group B tabs use **rule-based computational models** — no external APIs. |
| |
| | Tab | Model Type | Basis | |
| |-----|-----------|-------| |
| | B1 — miRNA Explorer | Curated lookup table | Published miRNA-target databases (miRDB, TargetScan concepts) | |
| | B2 — siRNA Targets | Curated efficacy estimates | Published siRNA screen literature | |
| | B3 — LNP Corona | Langmuir adsorption model | Corona proteomics literature (Monopoli et al. 2012; Lundqvist et al. 2017) | |
| | B4 — Flow Corona | Competitive Langmuir kinetics | Vroman effect literature (Vroman 1962; Hirsh et al. 2013) | |
| | B5 — Variant Concepts | ACMG/AMP 2015 rule set | Richards et al. 2015 ACMG guidelines | |
| |
| > ⚠️ All Group B outputs are labeled **SIMULATED** in the UI and must not be used for clinical or research decisions. |
| |
| --- |
| |
| ## RAG Chatbot (Tab A6) |
| |
| | Property | Value | |
| |----------|-------| |
| | **Embedding model** | `all-MiniLM-L6-v2` (sentence-transformers) | |
| | **Model size** | ~80 MB, CPU-compatible | |
| | **Vector index** | FAISS `IndexFlatIP` (cosine similarity on L2-normalized vectors) | |
| | **Corpus** | 20 curated paper abstracts (see `chatbot.py` `PAPER_CORPUS`) | |
| | **Source** | PubMed abstracts (public domain) | |
| | **No external API** | Fully offline after model download | |
|
|
| **20 Indexed PMIDs** *(all verified against PubMed esummary + efetch, 2026-03-07):* |
| | PMID | First Author | Topic | Journal | Year | |
| |------|-------------|-------|---------|------| |
| | 34394960 | Hou X | LNP mRNA delivery review | Nat Rev Mater | 2021 | |
| | 32251383 | Cheng Q | SORT LNPs organ selectivity | Nat Nanotechnol | 2020 | |
| | 29653760 | Sabnis S | Novel amino lipid series for mRNA | Mol Ther | 2018 | |
| | 22782619 | Jayaraman M | Ionizable lipid siRNA LNP potency | Angew Chem Int Ed | 2012 | |
| | 33208369 | Rosenblum D | CRISPR-Cas9 LNP cancer therapy | Sci Adv | 2020 | |
| | 18809927 | Lundqvist M | Nanoparticle size/surface protein corona | PNAS | 2008 | |
| | 22086677 | Walkey CD | Nanomaterial-protein interactions | Chem Soc Rev | 2012 | |
| | 31565943 | Park M | Accessible surface area nanoparticle corona | Nano Lett | 2019 | |
| | 33754708 | Sebastiani F | ApoE binding drives LNP rearrangement | ACS Nano | 2021 | |
| | 20461061 | Akinc A | Endogenous ApoE-mediated LNP liver delivery | Mol Ther | 2010 | |
| | 30096302 | Bailey MH | Cancer driver genes TCGA pan-cancer | Cell | 2018 | |
| | 30311387 | Landrum MJ | ClinVar at five years | Hum Mutat | 2018 | |
| | 32461654 | Karczewski KJ | gnomAD mutational constraint 141,456 humans | Nature | 2020 | |
| | 27328919 | Bouaoun L | TP53 variations IARC database | Hum Mutat | 2016 | |
| | 31820981 | Lanman BA | KRAS G12C covalent inhibitor AMG 510 | J Med Chem | 2020 | |
| | 28678784 | Sahin U | Personalized RNA mutanome vaccines | Nature | 2017 | |
| | 31348638 | Kozma GT | Anti-PEG IgM complement activation LNP | ACS Nano | 2019 | |
| | 33016924 | Cafri G | mRNA neoantigen T cell immunity GI cancer | J Clin Invest | 2020 | |
| | 31142840 | Cristiano S | Genome-wide cfDNA fragmentation in cancer | Nature | 2019 | |
| | 33883548 | Larson MH | Cell-free transcriptome tissue biomarkers | Nat Commun | 2021 | |
|
|
| --- |
|
|
| ## Caching System |
|
|
| All real API calls are cached locally to reduce latency and respect rate limits. |
|
|
| | Property | Value | |
| |----------|-------| |
| | **Cache directory** | `./cache/` | |
| | **TTL** | 24 hours | |
| | **Key format** | `{endpoint}_{md5(query)}.json` | |
| | **Format** | JSON | |
| | **Invalidation** | Automatic on TTL expiry; manual by deleting `./cache/` | |
|
|
| --- |
|
|
| ## Lab Journal |
|
|
| | Property | Value | |
| |----------|-------| |
| | **File** | `./lab_journal.csv` | |
| | **Format** | CSV (timestamp, tab, action, result_summary, note) | |
| | **Auto-logged** | Every tab run automatically logs an entry | |
| | **Manual notes** | Via sidebar note field | |
| |
| --- |
| *Data Sources documented by K R&D Lab Cancer Research Suite | 2026-03-07* |
| |