File size: 1,696 Bytes
673a52e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
# BioFlow Ingestion Guide (Phase 3)

This guide explains how to ingest data from **PubMed**, **UniProt**, and **ChEMBL** into Qdrant.

## 1) FastAPI Endpoints (Recommended)

### PubMed
`POST /api/ingest/pubmed`
```json
{
  "query": "EGFR lung cancer",
  "limit": 100,
  "batch_size": 50,
  "rate_limit": 0.4,
  "collection": "bioflow_memory",
  "email": "you@example.com",
  "api_key": "NCBI_API_KEY",
  "sync": false
}
```

### UniProt
`POST /api/ingest/uniprot`
```json
{
  "query": "EGFR AND organism_id:9606",
  "limit": 50,
  "batch_size": 50,
  "rate_limit": 0.2,
  "collection": "bioflow_memory",
  "sync": false
}
```

### ChEMBL
`POST /api/ingest/chembl`
```json
{
  "query": "EGFR",
  "limit": 30,
  "batch_size": 50,
  "rate_limit": 0.3,
  "collection": "bioflow_memory",
  "search_mode": "target",
  "sync": false
}
```

### All Sources
`POST /api/ingest/all`
```json
{
  "query": "EGFR lung cancer",
  "pubmed_limit": 100,
  "uniprot_limit": 50,
  "chembl_limit": 30,
  "batch_size": 50,
  "rate_limit": 0.3,
  "collection": "bioflow_memory",
  "sync": false
}
```

### Job Status
`GET /api/ingest/jobs/{job_id}`

## 2) Next.js Proxy Routes (Optional)
If you want to call the backend through Next.js:
```
/api/ingest/pubmed
/api/ingest/uniprot
/api/ingest/chembl
/api/ingest/all
/api/ingest/jobs/{job_id}
```

## 3) CLI Ingestion
```
python -m bioflow.ingestion.ingest_all --query "EGFR lung cancer" --limit 100
```

## 4) Environment Variables
- `INGEST_BATCH_SIZE`
- `PUBMED_RATE_LIMIT`
- `UNIPROT_RATE_LIMIT`
- `CHEMBL_RATE_LIMIT`
- `NCBI_EMAIL`
- `NCBI_API_KEY`
- `CHEMBL_SEARCH_MODE`

## 5) Recommended Minimums
- PubMed: 100 records
- UniProt: 50 records
- ChEMBL: 30 records