TEZv's picture
Upload 7 files
35e6a9d verified

A newer version of the Gradio SDK is available: 6.11.0

Upgrade

Data Sources & API Endpoints

K R&D Lab — Cancer Research Suite Author: Oksana Kolisnyk | kosatiks-group.pp.ua Repo: github.com/TEZv/K-RnD-Lab-PHYLO-03_2026 Generated: 2026-03-07


Real Data APIs (Group A Tabs)

1. PubMed E-utilities (NCBI)

Property Value
Base URL https://eutils.ncbi.nlm.nih.gov/entrez/eutils
Auth None required (free, no API key)
Rate limit 3 requests/sec without key; enforced via time.sleep(0.34)
Endpoints used esearch.fcgi — search & count; esummary.fcgi — fetch metadata
Used in tabs A1 (paper counts per process), A4 (papers per year), A2 (gene paper counts)
Docs https://www.ncbi.nlm.nih.gov/books/NBK25501/
Terms of use https://www.ncbi.nlm.nih.gov/home/about/policies/

Example call (paper count):

GET https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi
  ?db=pubmed
  &term="ferroptosis" AND "GBM"[tiab]
  &rettype=count
  &retmode=json

2. ClinVar E-utilities (NCBI)

Property Value
Base URL https://eutils.ncbi.nlm.nih.gov/entrez/eutils
Auth None required
Rate limit Same as PubMed (3 req/sec)
Endpoints used esearch.fcgi?db=clinvar — variant search; esummary.fcgi?db=clinvar — classification
Used in tabs A3 (Real Variant Lookup)
Docs https://www.ncbi.nlm.nih.gov/clinvar/docs/api_http/
Data policy All ClinVar data is public domain

Example call:

GET https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi
  ?db=clinvar
  &term=NM_007294.4:c.5266dupC
  &retmode=json
  &retmax=5

3. OpenTargets Platform GraphQL API

Property Value
Base URL https://api.platform.opentargets.org/api/v4/graphql
Auth None required (free, open access)
Rate limit No hard limit; reasonable use expected
Endpoints used GraphQL POST — disease associations, tractability, known drugs
Used in tabs A1 (process associations), A2 (target gap index), A5 (druggable orphans)
Docs https://platform-docs.opentargets.org/data-access/graphql-api
Data release Updated quarterly; cite as "Open Targets Platform [release date]"
License CC0 (public domain)

Example query (disease-associated targets):

query AssocTargets($efoId: String!, $size: Int!) {
  disease(efoId: $efoId) {
    associatedTargets(page: {index: 0, size: $size}) {
      rows {
        target { approvedSymbol approvedName }
        score
      }
    }
  }
}

EFO IDs used:

Cancer EFO ID
GBM EFO_0000519
PDAC EFO_0002618
SCLC EFO_0000702
UVM EFO_0004339
DIPG EFO_0009708
ACC EFO_0003060
MCC EFO_0005558
PCNSL EFO_0005543
Pediatric AML EFO_0000222

4. gnomAD GraphQL API

Property Value
Base URL https://gnomad.broadinstitute.org/api
Auth None required
Rate limit No hard limit; reasonable use expected
Endpoints used GraphQL POST — variantSearch query
Dataset gnomad_r4 (v4, 807,162 individuals)
Used in tabs A3 (Real Variant Lookup — allele frequency)
Docs https://gnomad.broadinstitute.org/api
License ODC Open Database License (ODbL)

Example query:

query VariantSearch($query: String!, $dataset: DatasetId!) {
  variantSearch(query: $query, dataset: $dataset) {
    variant_id
    rsids
    exome { af }
    genome { af }
  }
}

5. ClinicalTrials.gov API v2

Property Value
Base URL https://clinicaltrials.gov/api/v2
Auth None required
Rate limit No hard limit documented; polite use recommended
Endpoints used GET /studies — trial search by gene + cancer type
Used in tabs A2 (trial counts per gene), A5 (orphan target trial check)
Docs https://clinicaltrials.gov/data-api/api
Data policy Public domain (US government)

Example call:

GET https://clinicaltrials.gov/api/v2/studies
  ?query.term=KRAS GBM
  &pageSize=1
  &format=json

6. DepMap Public Data

Property Value
Source Broad Institute DepMap Portal
URL https://depmap.org/portal/download/all/
File CRISPR_gene_effect.csv (Chronos scores)
Auth None required (public download)
Used in tabs A2 (essentiality scores for gap index)
Score convention Negative = essential (−1 = median essential gene effect); inverted in app per know-how guide
License CC BY 4.0
Citation Broad Institute DepMap, [release]. DepMap Public [release]. figshare.

Implementation note: The app uses a curated reference gene set with representative scores as a lightweight proxy. For full analysis, download the complete CRISPR_gene_effect.csv (~500 MB) from depmap.org and replace _load_depmap_sample() in app.py.


Simulated Data Sources (Group B Tabs)

All Group B tabs use rule-based computational models — no external APIs.

Tab Model Type Basis
B1 — miRNA Explorer Curated lookup table Published miRNA-target databases (miRDB, TargetScan concepts)
B2 — siRNA Targets Curated efficacy estimates Published siRNA screen literature
B3 — LNP Corona Langmuir adsorption model Corona proteomics literature (Monopoli et al. 2012; Lundqvist et al. 2017)
B4 — Flow Corona Competitive Langmuir kinetics Vroman effect literature (Vroman 1962; Hirsh et al. 2013)
B5 — Variant Concepts ACMG/AMP 2015 rule set Richards et al. 2015 ACMG guidelines

⚠️ All Group B outputs are labeled SIMULATED in the UI and must not be used for clinical or research decisions.


RAG Chatbot (Tab A6)

Property Value
Embedding model all-MiniLM-L6-v2 (sentence-transformers)
Model size ~80 MB, CPU-compatible
Vector index FAISS IndexFlatIP (cosine similarity on L2-normalized vectors)
Corpus 20 curated paper abstracts (see chatbot.py PAPER_CORPUS)
Source PubMed abstracts (public domain)
No external API Fully offline after model download

20 Indexed PMIDs (all verified against PubMed esummary + efetch, 2026-03-07):

PMID First Author Topic Journal Year
34394960 Hou X LNP mRNA delivery review Nat Rev Mater 2021
32251383 Cheng Q SORT LNPs organ selectivity Nat Nanotechnol 2020
29653760 Sabnis S Novel amino lipid series for mRNA Mol Ther 2018
22782619 Jayaraman M Ionizable lipid siRNA LNP potency Angew Chem Int Ed 2012
33208369 Rosenblum D CRISPR-Cas9 LNP cancer therapy Sci Adv 2020
18809927 Lundqvist M Nanoparticle size/surface protein corona PNAS 2008
22086677 Walkey CD Nanomaterial-protein interactions Chem Soc Rev 2012
31565943 Park M Accessible surface area nanoparticle corona Nano Lett 2019
33754708 Sebastiani F ApoE binding drives LNP rearrangement ACS Nano 2021
20461061 Akinc A Endogenous ApoE-mediated LNP liver delivery Mol Ther 2010
30096302 Bailey MH Cancer driver genes TCGA pan-cancer Cell 2018
30311387 Landrum MJ ClinVar at five years Hum Mutat 2018
32461654 Karczewski KJ gnomAD mutational constraint 141,456 humans Nature 2020
27328919 Bouaoun L TP53 variations IARC database Hum Mutat 2016
31820981 Lanman BA KRAS G12C covalent inhibitor AMG 510 J Med Chem 2020
28678784 Sahin U Personalized RNA mutanome vaccines Nature 2017
31348638 Kozma GT Anti-PEG IgM complement activation LNP ACS Nano 2019
33016924 Cafri G mRNA neoantigen T cell immunity GI cancer J Clin Invest 2020
31142840 Cristiano S Genome-wide cfDNA fragmentation in cancer Nature 2019
33883548 Larson MH Cell-free transcriptome tissue biomarkers Nat Commun 2021

Caching System

All real API calls are cached locally to reduce latency and respect rate limits.

Property Value
Cache directory ./cache/
TTL 24 hours
Key format {endpoint}_{md5(query)}.json
Format JSON
Invalidation Automatic on TTL expiry; manual by deleting ./cache/

Lab Journal

Property Value
File ./lab_journal.csv
Format CSV (timestamp, tab, action, result_summary, note)
Auto-logged Every tab run automatically logs an entry
Manual notes Via sidebar note field

Data Sources documented by K R&D Lab Cancer Research Suite | 2026-03-07