Spaces:
Running
Running
| # BioFlow Ingestion Guide (Phase 3) | |
| This guide explains how to ingest data from **PubMed**, **UniProt**, and **ChEMBL** into Qdrant. | |
| ## 1) FastAPI Endpoints (Recommended) | |
| ### PubMed | |
| `POST /api/ingest/pubmed` | |
| ```json | |
| { | |
| "query": "EGFR lung cancer", | |
| "limit": 100, | |
| "batch_size": 50, | |
| "rate_limit": 0.4, | |
| "collection": "bioflow_memory", | |
| "email": "you@example.com", | |
| "api_key": "NCBI_API_KEY", | |
| "sync": false | |
| } | |
| ``` | |
| ### UniProt | |
| `POST /api/ingest/uniprot` | |
| ```json | |
| { | |
| "query": "EGFR AND organism_id:9606", | |
| "limit": 50, | |
| "batch_size": 50, | |
| "rate_limit": 0.2, | |
| "collection": "bioflow_memory", | |
| "sync": false | |
| } | |
| ``` | |
| ### ChEMBL | |
| `POST /api/ingest/chembl` | |
| ```json | |
| { | |
| "query": "EGFR", | |
| "limit": 30, | |
| "batch_size": 50, | |
| "rate_limit": 0.3, | |
| "collection": "bioflow_memory", | |
| "search_mode": "target", | |
| "sync": false | |
| } | |
| ``` | |
| ### All Sources | |
| `POST /api/ingest/all` | |
| ```json | |
| { | |
| "query": "EGFR lung cancer", | |
| "pubmed_limit": 100, | |
| "uniprot_limit": 50, | |
| "chembl_limit": 30, | |
| "batch_size": 50, | |
| "rate_limit": 0.3, | |
| "collection": "bioflow_memory", | |
| "sync": false | |
| } | |
| ``` | |
| ### Job Status | |
| `GET /api/ingest/jobs/{job_id}` | |
| ## 2) Next.js Proxy Routes (Optional) | |
| If you want to call the backend through Next.js: | |
| ``` | |
| /api/ingest/pubmed | |
| /api/ingest/uniprot | |
| /api/ingest/chembl | |
| /api/ingest/all | |
| /api/ingest/jobs/{job_id} | |
| ``` | |
| ## 3) CLI Ingestion | |
| ``` | |
| python -m bioflow.ingestion.ingest_all --query "EGFR lung cancer" --limit 100 | |
| ``` | |
| ## 4) Environment Variables | |
| - `INGEST_BATCH_SIZE` | |
| - `PUBMED_RATE_LIMIT` | |
| - `UNIPROT_RATE_LIMIT` | |
| - `CHEMBL_RATE_LIMIT` | |
| - `NCBI_EMAIL` | |
| - `NCBI_API_KEY` | |
| - `CHEMBL_SEARCH_MODE` | |
| ## 5) Recommended Minimums | |
| - PubMed: 100 records | |
| - UniProt: 50 records | |
| - ChEMBL: 30 records | |