SIAutoBot / auto_save_index_guide.md
Megalino111's picture
Create auto_save_index_guide.md
fdfa713 verified

A newer version of the Gradio SDK is available: 6.13.0

Upgrade

Auto-Save Index Feature πŸ”„

Your QA bot now automatically saves the FAISS index after building it!

How It Works

First Startup (No Index Files)

  1. βœ… Builds index from documents (~30 seconds)
  2. πŸ’Ύ Automatically saves siauto_qa.index + siauto_qa.pkl
  3. πŸš€ Next restart loads in ~5 seconds!

Subsequent Startups (Index Exists)

  1. πŸ“¦ Loads pre-built index (~5 seconds)
  2. βœ… Ready to answer questions immediately

If Load Fails

  1. πŸ”¨ Rebuilds from documents
  2. πŸ’Ύ Saves new index
  3. βœ… Ready for next time

🎯 Three Ways to Use

Option 1: Let HF Auto-Build (Easiest)

Upload 5 files to Hugging Face:

app.py
qa_bot.py
siauto_documents.py
requirements.txt
README.md

What happens:

  • First deploy: Builds index, saves it automatically
  • Future deploys: Loads saved index (fast!)

Note: If HF filesystem is read-only, it rebuilds each time (still works fine!)

Option 2: Pre-Build Locally (Fastest)

Build index on your computer first:

python build_index.py

Upload 7 files to Hugging Face:

app.py
qa_bot.py
siauto_documents.py
requirements.txt
README.md
siauto_qa.index      ← Pre-built
siauto_qa.pkl        ← Pre-built

What happens:

  • First deploy: Loads pre-built index immediately (~5 sec)
  • No need to build on HF

Option 3: Build Once on HF, Download for Local

  1. Deploy to HF without index files
  2. Let it build and save automatically
  3. Download the generated files from HF
  4. Use them locally or re-upload to other instances

πŸ“Š Performance Comparison

Scenario First Startup Next Startup Notes
No index files ~30 sec (build) ~5 sec (loads saved) Auto-saves after build
Pre-built index ~5 sec ~5 sec No build needed
Read-only filesystem ~30 sec ~30 sec Can't save, rebuilds each time

πŸ” Check the Logs

In Hugging Face β†’ Your Space β†’ Logs, you'll see:

First run (no index):

πŸ“š No pre-built index found. Building from 10 documents...
πŸ“š Indexing 10 documents...
βœ… FAISS index built with 45 vectors
πŸ’Ύ Saving index to siauto_qa.index for faster future startups...
βœ… Index saved! Next startup will be ~5x faster.
πŸ“Š Created files: siauto_qa.index + siauto_qa.pkl
βœ… QA Bot ready!

Second run (index exists):

πŸ“¦ Loading pre-built FAISS index from siauto_qa.index...
βœ… Loaded index with 45 vectors
βœ… QA Bot ready!

⚠️ Important Notes

Persistent Storage

  • HF Spaces Free Tier: Storage persists between restarts βœ…
  • HF Spaces Sleep: Index files survive sleep βœ…
  • Git Repo: Index files can be committed (optional)

When Index is Rebuilt

The bot rebuilds (and re-saves) the index when:

  • Documents are updated
  • Index files are deleted/corrupted
  • Load fails for any reason

File Permissions

If you see:

⚠️  Could not save index: [Permission denied]

This means the filesystem is read-only (rare on HF Spaces). The bot still works, but rebuilds each time.

πŸ’‘ Best Practice

For most users: Just upload 5 files and let it auto-save!

For fastest deployment: Pre-build locally with build_index.py

For version control: Add to .gitignore:

*.index
*.pkl

Then upload index files separately to HF.

πŸ”„ Updating Documents

When you update siauto_documents.py:

Option A (Automatic):

  1. Delete siauto_qa.index and siauto_qa.pkl from HF
  2. Restart the Space
  3. It rebuilds with new documents and saves

Option B (Manual):

  1. Run build_index.py locally with updated documents
  2. Upload new siauto_qa.index and siauto_qa.pkl to HF

πŸŽ‰ Benefits

  • βœ… No manual work: Saves automatically
  • βœ… Faster restarts: Loads saved index
  • βœ… Resilient: Rebuilds if needed
  • βœ… Flexible: Works with or without pre-built files
  • βœ… Smart: Only rebuilds when necessary

TL;DR: Just deploy and forget! The bot handles index management automatically. πŸš€