Spaces:
Running
Running
BioFlow Ingestion Guide (Phase 3)
This guide explains how to ingest data from PubMed, UniProt, and ChEMBL into Qdrant.
1) FastAPI Endpoints (Recommended)
PubMed
POST /api/ingest/pubmed
{
"query": "EGFR lung cancer",
"limit": 100,
"batch_size": 50,
"rate_limit": 0.4,
"collection": "bioflow_memory",
"email": "you@example.com",
"api_key": "NCBI_API_KEY",
"sync": false
}
UniProt
POST /api/ingest/uniprot
{
"query": "EGFR AND organism_id:9606",
"limit": 50,
"batch_size": 50,
"rate_limit": 0.2,
"collection": "bioflow_memory",
"sync": false
}
ChEMBL
POST /api/ingest/chembl
{
"query": "EGFR",
"limit": 30,
"batch_size": 50,
"rate_limit": 0.3,
"collection": "bioflow_memory",
"search_mode": "target",
"sync": false
}
All Sources
POST /api/ingest/all
{
"query": "EGFR lung cancer",
"pubmed_limit": 100,
"uniprot_limit": 50,
"chembl_limit": 30,
"batch_size": 50,
"rate_limit": 0.3,
"collection": "bioflow_memory",
"sync": false
}
Job Status
GET /api/ingest/jobs/{job_id}
2) Next.js Proxy Routes (Optional)
If you want to call the backend through Next.js:
/api/ingest/pubmed
/api/ingest/uniprot
/api/ingest/chembl
/api/ingest/all
/api/ingest/jobs/{job_id}
3) CLI Ingestion
python -m bioflow.ingestion.ingest_all --query "EGFR lung cancer" --limit 100
4) Environment Variables
INGEST_BATCH_SIZEPUBMED_RATE_LIMITUNIPROT_RATE_LIMITCHEMBL_RATE_LIMITNCBI_EMAILNCBI_API_KEYCHEMBL_SEARCH_MODE
5) Recommended Minimums
- PubMed: 100 records
- UniProt: 50 records
- ChEMBL: 30 records