Spaces:
Sleeping
Sleeping
A newer version of the Streamlit SDK is available: 1.56.0
Build & Rebuild the RAG Dataset (Deterministic)
Quickstart
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
bash scripts/rebuild_all.sh
streamlit run app.py
Inputs
- Books:
data/raw_pdfs/andsources.json - Articles:
sources_articles.json - MCP docs (optional):
mcp/
Outputs (default: data/normalized/)
chunks_books.jsonl,manifest_books.jsonchunks_articles.jsonl,manifest_articles.jsonchunks.jsonl,manifest.json(merged)
Clean re-index
FAISS indexes are built by the app. To force a rebuild:
make clean-index
Adding sources
Add a book
- Add PDF to
data/raw_pdfs/ - Add entry to
sources.json - Rebuild:
bash scripts/rebuild_all.sh
Add an article
- Add entry to
sources_articles.json - Rebuild:
bash scripts/rebuild_all.sh