biblos-cf-api / DATA_SETUP.md
rdmlx
Initial commit: Church Fathers Commentary API
b773b72

Data Setup Guide

The API requires Church Fathers commentary embeddings to be placed in the data/ directory.

Option 1: Use Existing Embeddings (if available)

If you have access to pre-generated embeddings from the church-fathers repository:

python prepare_data.py --source /path/to/church-fathers/commentary_embeddings

Option 2: Generate Embeddings from Database

If you have the SQLite database from the Historical Christian Faith Commentaries Database:

  1. Clone the Commentaries-Database repository:
git clone https://github.com/HistoricalChristianFaith/Commentaries-Database.git
  1. Generate embeddings using the utility script from church-fathers:
# From the church-fathers directory
python util/commentary.py \
  -db /path/to/Commentaries-Database/data.sqlite \
  -m "BAAI/bge-large-en-v1.5" \
  -o /path/to/biblos-cf-api/data

This will create JSON files organized by book in the data/ directory.

Option 3: Use prepare_data.py with Database

Alternatively, use the prepare_data.py script directly:

python prepare_data.py \
  --generate \
  --db /path/to/data.sqlite \
  --model "BAAI/bge-large-en-v1.5"

Expected Data Structure

After preparation, your data/ directory should look like:

data/
├── matthew/
│   ├── matthew_Augustine_of_Hippo_123.json
│   ├── matthew_Origen_of_Alexandria_456.json
│   └── ...
├── john/
│   ├── john_Augustine_of_Hippo_789.json
│   └── ...
└── ...

Each JSON file should have this structure:

{
  "content": "Commentary text...",
  "metadata": {
    "father_name": "Augustine of Hippo",
    "book": "matthew",
    "source_title": "Tractates on the Gospel of Matthew",
    "location_start": "Mt 5:1",
    "location_end": "Mt 5:12"
  },
  "embedding": [0.123, -0.456, ...]
}

Testing Locally

Once data is prepared:

uvicorn app:app --reload

Visit http://localhost:8000/docs to test the API.