Spaces:
Sleeping
ChromaDB Collection Recovery Guide
Problem
After deleting chroma.sqlite3:
- β Collection UUID folders still exist with all data files
- β
chroma.sqlite3was recreated automatically - β Collections don't appear in the dropdown
- β ChromaDB can't see the collections
Root Cause: The new chroma.sqlite3 is empty - it doesn't have the metadata about which collections exist. ChromaDB doesn't auto-scan existing collection folders; it only knows about collections registered in sqlite3.
Solution 1: Restore from Backup (EASIEST) β
If you created a backup before deleting sqlite3:
Step 1: Locate Backup
# List backups
Get-ChildItem ".\chroma_db.backup_*" -Directory
# Find the most recent one
Get-ChildItem ".\chroma_db.backup_*" -Directory | Sort-Object LastWriteTime -Descending | Select-Object -First 1
Step 2: Restore Backup
# Stop Streamlit (Ctrl+C)
# Remove current chroma_db
Remove-Item -Path ".\chroma_db" -Recurse -Force
# Restore from backup
$latestBackup = Get-ChildItem ".\chroma_db.backup_*" -Directory | Sort-Object LastWriteTime -Descending | Select-Object -First 1
Copy-Item -Path $latestBackup.FullName -Destination ".\chroma_db" -Recurse
# Restart Streamlit
streamlit run streamlit_app.py
Step 3: Verify
Collections should now appear in dropdown β
Solution 2: Manually Rebuild SQLite Index (COMPLEX)
This requires directly using ChromaDB's internal APIs. Not recommended unless you're comfortable with Python.
Why it's complex:
- ChromaDB uses internal data structures
- Need to parse collection folder structure
- No public API to bulk import without re-embedding
Solution 3: Accept the Current State and Move Forward
Since the collections are lost from sqlite3's index:
Option A: Re-create Collections from Scratch
- Delete
./chroma_dbcompletely - Use Streamlit UI to create new collections
- This is clean and ensures everything is consistent
Option B: Try ChromaDB Reset
# Stop Streamlit (Ctrl+C)
# Delete chroma_db completely
Remove-Item -Path ".\chroma_db" -Recurse -Force
# Delete any Streamlit cache
Remove-Item -Path "$env:USERPROFILE\.streamlit" -Recurse -Force
# Restart
streamlit run streamlit_app.py
# Create new collections using UI
Solution 4: Check Backup Directory
Step 1: List All Backups
cd "d:\CapStoneProject\RAG Capstone Project"
Get-ChildItem -Filter "chroma_db.backup_*" -Directory | Select-Object Name, LastWriteTime
Step 2: Check If Backup Has Collections
# List collections in a specific backup
$backupPath = ".\chroma_db.backup_20251220_083000"
Get-ChildItem -Path $backupPath -Directory | Where-Object {$_.Name -match "^[a-f0-9\-]{36}$"} | Measure-Object
Step 3: Restore That Backup
# Stop Streamlit
# Remove current
Remove-Item -Path ".\chroma_db" -Recurse -Force
# Restore backup
Copy-Item -Path ".\chroma_db.backup_20251220_083000" -Destination ".\chroma_db" -Recurse
# Restart Streamlit
Why This Happens
ChromaDB Architecture:
chroma.sqlite3 (Metadata Index)
βββ Collection 1 metadata
βββ Collection 2 metadata
βββ Collection 3 metadata
β (references)
./chroma_db/
βββ UUID-folder-1/ (actual data files)
βββ UUID-folder-2/ (actual data files)
βββ UUID-folder-3/ (actual data files)
When you delete chroma.sqlite3:
- β UUID folders remain (data is safe)
- β Index is gone (relationships are broken)
- β ChromaDB rebuilds empty sqlite3
- β Doesn't have reference to UUID folders
Prevention for Next Time
Don't Just Delete sqlite3
Instead, let ChromaDB handle cleanup properly:
# WRONG - causes this issue:
Remove-Item -Path ".\chroma_db\chroma.sqlite3" -Force
# RIGHT - use ChromaDB API:
# (See below)
Use Proper Reset Method
Create a reset_chromadb.py script:
import chromadb
from chromadb.config import Settings
def reset_chromadb(keep_data=False):
"""Properly reset ChromaDB."""
client = chromadb.PersistentClient(
path="./chroma_db",
settings=Settings(
anonymized_telemetry=False,
allow_reset=True
)
)
if keep_data:
print("β οΈ Manual data recovery needed - see docs/CHROMADB_RECOVERY.md")
else:
print("π Resetting ChromaDB (will delete all collections)...")
try:
# Delete all collections properly
for collection in client.list_collections():
client.delete_collection(collection.name)
print("β
ChromaDB reset successfully")
except Exception as e:
print(f"β Error: {e}")
if __name__ == "__main__":
reset_chromadb()
Immediate Action Plan
Choose one:
Option 1 (Fastest): If you have a backup
# Restore backup
# Restart app
Option 2 (Clean restart): If no backup or backup damaged
# Delete entire chroma_db
# Restart Streamlit
# Create new collections using UI
Option 3 (Keep trying): For debugging
# Try Solution 2 (complex recovery)
# Run recover_collections.py for diagnostics
Files Provided
- recover_collections.py - Diagnostic script (tells you what's recoverable)
- This guide - Recovery procedures
Bottom Line
The safest approach: Use a backup or start fresh with new collections.
To proceed:
- Do you have a
chroma_db.backup_*folder? If yes, use it - If no, delete
./chroma_dband recreate collections - Always backup before making changes to chroma_db
Let me know which option you want to pursue! π οΈ