# Data Setup Guide The API requires Church Fathers commentary embeddings to be placed in the `data/` directory. ## Option 1: Use Existing Embeddings (if available) If you have access to pre-generated embeddings from the church-fathers repository: ```bash python prepare_data.py --source /path/to/church-fathers/commentary_embeddings ``` ## Option 2: Generate Embeddings from Database If you have the SQLite database from the [Historical Christian Faith Commentaries Database](https://github.com/HistoricalChristianFaith/Commentaries-Database): 1. Clone the Commentaries-Database repository: ```bash git clone https://github.com/HistoricalChristianFaith/Commentaries-Database.git ``` 2. Generate embeddings using the utility script from church-fathers: ```bash # From the church-fathers directory python util/commentary.py \ -db /path/to/Commentaries-Database/data.sqlite \ -m "BAAI/bge-large-en-v1.5" \ -o /path/to/biblos-cf-api/data ``` This will create JSON files organized by book in the data/ directory. ## Option 3: Use prepare_data.py with Database Alternatively, use the prepare_data.py script directly: ```bash python prepare_data.py \ --generate \ --db /path/to/data.sqlite \ --model "BAAI/bge-large-en-v1.5" ``` ## Expected Data Structure After preparation, your `data/` directory should look like: ``` data/ ├── matthew/ │ ├── matthew_Augustine_of_Hippo_123.json │ ├── matthew_Origen_of_Alexandria_456.json │ └── ... ├── john/ │ ├── john_Augustine_of_Hippo_789.json │ └── ... └── ... ``` Each JSON file should have this structure: ```json { "content": "Commentary text...", "metadata": { "father_name": "Augustine of Hippo", "book": "matthew", "source_title": "Tractates on the Gospel of Matthew", "location_start": "Mt 5:1", "location_end": "Mt 5:12" }, "embedding": [0.123, -0.456, ...] } ``` ## Testing Locally Once data is prepared: ```bash uvicorn app:app --reload ``` Visit http://localhost:8000/docs to test the API.