Buckets:

meet4150
/

ALIV_AI

4.08 kB

AliveAI Medical RAG Chatbot

Fully local medical RAG chatbot built with FastAPI, ChromaDB, SentenceTransformers, and Ollama.

Stack

Knowledge-base embeddings: BAAI/bge-base-en-v1.5
NLP routing embeddings: sentence-transformers/all-MiniLM-L6-v2
Vector database: ChromaDB with cosine similarity
Optional vector backend: Pinecone (auto-selected when Pinecone env vars are set)
Health chat model: Hugging Face (aaditya/Llama3-OpenBioLLM-8B by default)
Dataset: keivalya/MedQuad-MedicalQnADataset

Project Layout

app/
  agent/
  db/
  nlp/
scripts/
data/
chroma_db/
test_rag.py

Setup

python3.12 -m pip install -r requirements.txt
python3.12 scripts/download_dataset.py
python3.12 scripts/download_models.py   # optional fallback if Hugging Face TLS fails in Python
python3.12 scripts/prepare_dataset.py
python3.12 scripts/ingest.py
python3.12 -c "from test_rag import run_all_tests; run_all_tests()"
python3.12 -m uvicorn app.main:app --reload --port 8000

Pinecone Switch

Set these env vars to use Pinecone instead of local ChromaDB:

export ALIVEAI_PINECONE_API_KEY=your_key
export ALIVEAI_PINECONE_INDEX_NAME=your_index_name
# optional if your key requires host-level targeting:
export ALIVEAI_PINECONE_INDEX_HOST=your_index_host
# optional to isolate AliveAI data inside a shared index:
export ALIVEAI_PINECONE_NAMESPACE=aliveai
# optional backend selection: auto | pinecone | chroma
export ALIVEAI_VECTOR_BACKEND=auto
# optional explicit embedding dimension validation for Pinecone
export ALIVEAI_EMBEDDING_DIMENSION=768

When Pinecone vars are present, the DB adapter uses Pinecone automatically.
When they are not present, it falls back to local ChromaDB.

Pinecone index requirements for this project:

metric=cosine
dimension must match the active embedding model output dimension (ALIVEAI_EMBEDDING_DIMENSION can be set explicitly, otherwise inferred from the model)

After enabling Pinecone, run:

python3.12 scripts/ingest.py --force
python3.12 -c "from test_rag import run_all_tests; run_all_tests()"
python3.12 -m uvicorn app.main:app --reload --port 8000

Endpoints

POST /chat
GET /health
POST /reset/{session_id}
GET /validate?text1=chest+hurts&text2=chest+pain
GET /ingest/schema
POST /ingest/text
POST /ingest/file (multipart; supports .txt, .pdf, .doc, .docx)
GET /ingest/task/{task_id}

Note: ingestion endpoints store uploaded content into local ChromaDB (medical_kb collection).

Background Ingestion (Celery)

export ALIVEAI_CELERY_BROKER_URL=redis://localhost:6379/0
export ALIVEAI_CELERY_RESULT_BACKEND=redis://localhost:6379/0
celery -A app.celery_app.celery_app worker --loglevel=info

Use POST /ingest/text or POST /ingest/file with async_process=true (default) to queue ingestion in background.

RAG Parameters (Env)

export ALIVEAI_CHUNK_SIZE=700
export ALIVEAI_CHUNK_OVERLAP=150
export ALIVEAI_RAG_TOP_K=5
export ALIVEAI_LLM_TOP_P=0.9
export ALIVEAI_LLM_TOP_K=40
export ALIVEAI_HEALTH_MODEL=aaditya/Llama3-OpenBioLLM-8B
export ALIVEAI_HEALTH_MODEL_PROVIDER=hf
export ALIVEAI_HF_MAX_NEW_TOKENS=220
export ALIVEAI_LLM_TEMPERATURE=0.2
export ALIVEAI_OCR_ENABLED=true
export ALIVEAI_OCR_LANG=en
export ALIVEAI_OCR_MIN_PDF_TEXT=120

Notes

All models are loaded lazily and instantiated as singletons.
Emergency intent is short-circuited before retrieval.
ChromaDB persists to the local chroma_db/ directory.
If Python cannot download Hugging Face assets in your environment, scripts/download_models.py mirrors the two SentenceTransformer model repos into models/.
HealthAgent uses Hugging Face generation first (aaditya/Llama3-OpenBioLLM-8B default), then extractive context fallback if generation is unavailable.
PDF OCR fallback is enabled for scanned PDFs when PaddleOCR + pypdfium2 are installed.
If your default python3 is a different interpreter (for example Python 3.13 via conda), use python3.12 so installed dependencies match this project.

Xet Storage Details

Size:: 4.08 kB
Xet hash:: e99af66460a1c4a0ccab4d99e28da2036e8e85d5559a3049cb754fbfc15138e9

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.