AliveAI Medical RAG Chatbot
Fully local medical RAG chatbot built with FastAPI, ChromaDB, SentenceTransformers, and Ollama.
Stack
- Knowledge-base embeddings:
BAAI/bge-base-en-v1.5 - NLP routing embeddings:
sentence-transformers/all-MiniLM-L6-v2 - Vector database: ChromaDB with cosine similarity
- Optional vector backend: Pinecone (auto-selected when Pinecone env vars are set)
- Health chat model: Hugging Face (
aaditya/Llama3-OpenBioLLM-8Bby default) - Dataset:
keivalya/MedQuad-MedicalQnADataset
Project Layout
app/
agent/
db/
nlp/
scripts/
data/
chroma_db/
test_rag.py
Setup
python3.12 -m pip install -r requirements.txt
python3.12 scripts/download_dataset.py
python3.12 scripts/download_models.py # optional fallback if Hugging Face TLS fails in Python
python3.12 scripts/prepare_dataset.py
python3.12 scripts/ingest.py
python3.12 -c "from test_rag import run_all_tests; run_all_tests()"
python3.12 -m uvicorn app.main:app --reload --port 8000
Pinecone Switch
Set these env vars to use Pinecone instead of local ChromaDB:
export ALIVEAI_PINECONE_API_KEY=your_key
export ALIVEAI_PINECONE_INDEX_NAME=your_index_name
# optional if your key requires host-level targeting:
export ALIVEAI_PINECONE_INDEX_HOST=your_index_host
# optional to isolate AliveAI data inside a shared index:
export ALIVEAI_PINECONE_NAMESPACE=aliveai
# optional backend selection: auto | pinecone | chroma
export ALIVEAI_VECTOR_BACKEND=auto
# optional explicit embedding dimension validation for Pinecone
export ALIVEAI_EMBEDDING_DIMENSION=768
When Pinecone vars are present, the DB adapter uses Pinecone automatically.
When they are not present, it falls back to local ChromaDB.
Pinecone index requirements for this project:
metric=cosinedimensionmust match the active embedding model output dimension (ALIVEAI_EMBEDDING_DIMENSIONcan be set explicitly, otherwise inferred from the model)
After enabling Pinecone, run:
python3.12 scripts/ingest.py --force
python3.12 -c "from test_rag import run_all_tests; run_all_tests()"
python3.12 -m uvicorn app.main:app --reload --port 8000
Endpoints
POST /chatGET /healthPOST /reset/{session_id}GET /validate?text1=chest+hurts&text2=chest+painGET /ingest/schemaPOST /ingest/textPOST /ingest/file(multipart; supports.txt,.pdf,.doc,.docx)GET /ingest/task/{task_id}
Note: ingestion endpoints store uploaded content into local ChromaDB (medical_kb collection).
Background Ingestion (Celery)
export ALIVEAI_CELERY_BROKER_URL=redis://localhost:6379/0
export ALIVEAI_CELERY_RESULT_BACKEND=redis://localhost:6379/0
celery -A app.celery_app.celery_app worker --loglevel=info
Use POST /ingest/text or POST /ingest/file with async_process=true (default) to queue ingestion in background.
RAG Parameters (Env)
export ALIVEAI_CHUNK_SIZE=700
export ALIVEAI_CHUNK_OVERLAP=150
export ALIVEAI_RAG_TOP_K=5
export ALIVEAI_LLM_TOP_P=0.9
export ALIVEAI_LLM_TOP_K=40
export ALIVEAI_HEALTH_MODEL=aaditya/Llama3-OpenBioLLM-8B
export ALIVEAI_HEALTH_MODEL_PROVIDER=hf
export ALIVEAI_HF_MAX_NEW_TOKENS=220
export ALIVEAI_LLM_TEMPERATURE=0.2
export ALIVEAI_OCR_ENABLED=true
export ALIVEAI_OCR_LANG=en
export ALIVEAI_OCR_MIN_PDF_TEXT=120
Notes
- All models are loaded lazily and instantiated as singletons.
- Emergency intent is short-circuited before retrieval.
- ChromaDB persists to the local
chroma_db/directory. - If Python cannot download Hugging Face assets in your environment,
scripts/download_models.pymirrors the two SentenceTransformer model repos intomodels/. HealthAgentuses Hugging Face generation first (aaditya/Llama3-OpenBioLLM-8Bdefault), then extractive context fallback if generation is unavailable.- PDF OCR fallback is enabled for scanned PDFs when PaddleOCR + pypdfium2 are installed.
- If your default
python3is a different interpreter (for example Python 3.13 via conda), usepython3.12so installed dependencies match this project.
Xet Storage Details
- Size:
- 4.08 kB
- Xet hash:
- e99af66460a1c4a0ccab4d99e28da2036e8e85d5559a3049cb754fbfc15138e9
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.