biblos-cf-api / DATA_SETUP.md
rdmlx
Initial commit: Church Fathers Commentary API
b773b72
# Data Setup Guide
The API requires Church Fathers commentary embeddings to be placed in the `data/` directory.
## Option 1: Use Existing Embeddings (if available)
If you have access to pre-generated embeddings from the church-fathers repository:
```bash
python prepare_data.py --source /path/to/church-fathers/commentary_embeddings
```
## Option 2: Generate Embeddings from Database
If you have the SQLite database from the [Historical Christian Faith Commentaries Database](https://github.com/HistoricalChristianFaith/Commentaries-Database):
1. Clone the Commentaries-Database repository:
```bash
git clone https://github.com/HistoricalChristianFaith/Commentaries-Database.git
```
2. Generate embeddings using the utility script from church-fathers:
```bash
# From the church-fathers directory
python util/commentary.py \
-db /path/to/Commentaries-Database/data.sqlite \
-m "BAAI/bge-large-en-v1.5" \
-o /path/to/biblos-cf-api/data
```
This will create JSON files organized by book in the data/ directory.
## Option 3: Use prepare_data.py with Database
Alternatively, use the prepare_data.py script directly:
```bash
python prepare_data.py \
--generate \
--db /path/to/data.sqlite \
--model "BAAI/bge-large-en-v1.5"
```
## Expected Data Structure
After preparation, your `data/` directory should look like:
```
data/
├── matthew/
│ ├── matthew_Augustine_of_Hippo_123.json
│ ├── matthew_Origen_of_Alexandria_456.json
│ └── ...
├── john/
│ ├── john_Augustine_of_Hippo_789.json
│ └── ...
└── ...
```
Each JSON file should have this structure:
```json
{
"content": "Commentary text...",
"metadata": {
"father_name": "Augustine of Hippo",
"book": "matthew",
"source_title": "Tractates on the Gospel of Matthew",
"location_start": "Mt 5:1",
"location_end": "Mt 5:12"
},
"embedding": [0.123, -0.456, ...]
}
```
## Testing Locally
Once data is prepared:
```bash
uvicorn app:app --reload
```
Visit http://localhost:8000/docs to test the API.