BibleAI / README.md
robzilla's picture
Release BibleAI: HF + GGUF + Ollama bundle
e32227b
---
license: apache-2.0
base_model: google/gemma-4-e4b
tags:
- bible
- theology
- gemma
- gguf
- ollama
- cpt
- sft
- dpo
language:
- en
library_name: transformers
pipeline_tag: text-generation
---
# BibleAI
BibleAI is a Gemma 4 E4B model refined for Bible, theology, church history, and faith Q&A using a full CPT -> SFT -> DPO pipeline.
## Identity
- Hugging Face repo: `rhemabible/BibleAI`
- Model name: `BibleAI`
- Ollama model names:
- `bibleaiq8`
- `bibleaibf16`
## Training Summary
### Stage 1: CPT Foundation
- Base architecture: `Gemma4ForConditionalGeneration`
- Model type: `gemma4`
- Verified CPT merged weight size: `15,992,595,884` bytes
- CPT merged SHA256 (recorded in training logs):
`419aab18717ea792b128e2ea10bd9e313232d627e3bc3c4f9c0d19311ef6ed9c`
### Stage 2: SFT (Instruction Tuning)
- Data source: `combined_train.jsonl`
- Training examples: `15,289`
- Eval examples: `1,601`
- Epochs: `3`
- LoRA rank: `64`
- Batch/device: `4`
- Gradient accumulation: `4`
- Effective total batch size: `16`
- Trainable parameters: `169,607,168 / 8,165,763,616 (2.08%)`
- Final eval loss: `0.4368`
- Final train loss: `0.1852`
### Stage 3: DPO (Preference Optimization)
- Data source: `dpo_pairs.jsonl`
- Preference pairs: `967`
- Epochs: `2`
- DPO beta: `0.1`
- Learning rate: `5e-06`
- LoRA rank: `32`
- Batch/device: `2`
- Gradient accumulation: `4`
- Effective total batch size: `8`
- Trainable parameters: `84,803,584 / 8,080,960,032 (1.05%)`
- Final train loss: `0.06077`
## System Prompt
```text
You are BibleAI.
Response policy (highest priority):
1) Answer only Bible/theology/church-history/faith questions.
2) Be concise by default.
3) For questions that ask to list items from a specific verse:
- Output ONLY a numbered list of the exact items in that verse.
- Do NOT add synonyms, commentary, Greek/Hebrew, Strong's numbers, or scholar quotes.
- Add one final line with the verse reference.
4) Do not fabricate verses, facts, or language details. If uncertain, say so.
5) If the user asks for deeper analysis, then provide it.
```
## Chat Template
```text
{{- if .System }}<start_of_turn>system
{{ .System }}<end_of_turn>
{{- end }}<start_of_turn>user
{{ .Prompt }}<end_of_turn>
<start_of_turn>model
```
Template files in this release:
- `ollama/Modelfile.q8`
- `ollama/Modelfile.bf16`
- `adapters/sft_final/chat_template.jinja`
- `adapters/dpo_final/chat_template.jinja`
- `ollama/Modelfile.canonical_project_reference`
## Model Variants
- `model.safetensors` (merged HF weights)
- `gguf/final_merged.Q8_0.gguf`
- `gguf/final_merged.BF16.gguf`
## Checksums
- `model.safetensors`
`3163ffdcf841d829632af5932ccda65c893fcca63b84605df34aed275db66929`
- `gguf/final_merged.Q8_0.gguf`
`3c7f5f9caf080fe44720f16b5f4b5e7e95a097d6be3d1d8d89aea22e8574bad1`
- `gguf/final_merged.BF16.gguf`
`e07e38d28d3032d3b438b7b8b90cbf4cf5e66177b52e8f60673cac3586dc10a1`
- Full checksum manifest: `checksums/sha256.txt`
## Quickstart
### Ollama
```bash
ollama create bibleaiq8 -f ollama/Modelfile.q8
ollama create bibleaibf16 -f ollama/Modelfile.bf16
```
### Transformers
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
repo_id = "rhemabible/BibleAI"
tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForCausalLM.from_pretrained(
repo_id,
torch_dtype="auto",
device_map="auto",
)
```
## Included Release Artifacts
- Root model files: `config.json`, `model.safetensors`, `tokenizer.json`, `tokenizer_config.json`
- GGUF exports: `gguf/`
- Ollama packaging: `ollama/`
- Final adapters: `adapters/sft_final/`, `adapters/dpo_final/`
- Training logs: `logs/`
- Integrity hashes: `checksums/`
- Release docs: `docs/`
## Intended Scope
- Bible study and scripture-centered theological support
- Church history and faith-oriented Q&A
- High-integrity citation-oriented responses without fabricated references