Spaces:
Sleeping
Safe Direct Directory Copy Guide for ChromaDB Merge
ChromaDB File Structure
Your current ChromaDB directory contains:
./chroma_db/
βββ chroma.sqlite3 # Main database file (metadata index)
βββ [Collection UUID folders] # One folder per collection
βββ 0808e537-cf80-4b64-a337-0f7ae74dc9d5/
βββ 24b5ff30-002d-43ff-ad39-44c87a8ac6d0/
βββ 40bcc7c3-167c-499e-b520-57cf8c723e28/
βββ 57a7a07d-2746-4b75-85f1-844a334871ba/
βββ 757cee3b-442d-4ebf-8fe7-ccd27a869786/
βββ 8d4abc47-3860-4eab-bcda-42aa64156c63/
βββ 98152a87-2e94-4077-be26-b9305747289f/
βββ c91615ef-3bc8-4667-a51f-9400741c7591/
βββ ec92fa49-44c7-4af8-8eb6-2d49a4ca6a82/
βββ f333d54b-ad48-4ede-9479-1aab4d56f332/
Critical Files to Copy
1. Main Database File (MUST COPY)
- File:
chroma.sqlite3 - Purpose: Stores collection metadata, document IDs, and embeddings references
- Size: ~500 MB (in your case)
- Status: β οΈ CRITICAL - DO NOT SKIP
2. Collection Folders (MUST COPY)
- Location: UUID-named directories inside
./chroma_db/ - Files per collection:
data_level0.bin- Actual vector embeddings dataheader.bin- Index header informationindex_metadata.pickle- Index metadatalength.bin- Document length informationlink_lists.bin- HNSW graph links (similarity search structure)
- Status: β οΈ CRITICAL - DO NOT SKIP
Safe Copy Strategy
Phase 1: Backup (Protect Current Collections)
Step 1: Stop the application
# Stop Streamlit if running
# Close terminal or press Ctrl+C
Step 2: Create backup of current collections
# Navigate to project directory
cd "d:\CapStoneProject\RAG Capstone Project"
# Create timestamped backup
$timestamp = Get-Date -Format "yyyyMMdd_HHmmss"
Copy-Item -Path ".\chroma_db" -Destination ".\chroma_db.backup_$timestamp" -Recurse
# Verify backup was created
Get-ChildItem -Path ".\chroma_db.backup_$timestamp" | Measure-Object | Select-Object Count
Step 3: Verify backup integrity
# List your current collections
Get-ChildItem ".\chroma_db" -Directory | Where-Object {$_.Name -match "^[a-f0-9\-]{36}$"} | Measure-Object
Phase 2: Copy External Collections
Step 4: Copy external collection folders ONLY
Option A: Copy Specific Collections (RECOMMENDED - Safest)
If external ChromaDB has UUID folders, identify which ones to copy:
# Copy only external collection folders (not chroma.sqlite3)
$externalPath = "C:\path\to\external\chroma_db"
$targetPath = ".\chroma_db"
# Get external collection folder UUIDs
$externalCollections = Get-ChildItem -Path $externalPath -Directory |
Where-Object {$_.Name -match "^[a-f0-9\-]{36}$"}
# Copy each one
foreach ($collection in $externalCollections) {
$sourceFolder = Join-Path $externalPath $collection.Name
$destFolder = Join-Path $targetPath $collection.Name
# Check if already exists
if (Test-Path $destFolder) {
Write-Host "β οΈ Collection $($collection.Name) already exists - SKIPPING"
} else {
Write-Host "π Copying collection $($collection.Name)..."
Copy-Item -Path $sourceFolder -Destination $destFolder -Recurse -Force
Write-Host "β
Copied successfully"
}
}
Option B: Copy Entire External ChromaDB (If confident)
# Copy all folders from external (NOT the sqlite3 file initially)
$externalPath = "C:\path\to\external\chroma_db"
$targetPath = ".\chroma_db"
# Copy all subdirectories
Get-ChildItem -Path $externalPath -Directory | ForEach-Object {
if ($_.Name -match "^[a-f0-9\-]{36}$") { # UUID format
$destFolder = Join-Path $targetPath $_.Name
if (-not (Test-Path $destFolder)) {
Copy-Item -Path $_.FullName -Destination $destFolder -Recurse -Force
Write-Host "β
Copied $($_.Name)"
} else {
Write-Host "β οΈ Skipped $($_.Name) (already exists)"
}
}
}
Phase 3: Handle the SQLite Database
β οΈ CRITICAL: DO NOT simply copy chroma.sqlite3
The chroma.sqlite3 file contains metadata that references collections. If you copy it, you might lose existing collections or create conflicts.
Step 5: Merge SQLite Databases (Choose ONE approach)
Option A: Let ChromaDB Rebuild the Index (SAFEST)
# 1. Delete the old chroma.sqlite3
Remove-Item -Path ".\chroma_db\chroma.sqlite3" -Force
# 2. Start your application - ChromaDB will rebuild it automatically
# 3. ChromaDB will scan all collection folders and rebuild the metadata
# Restart app:
streamlit run streamlit_app.py
ChromaDB will detect the new collection folders and automatically register them in the new sqlite3 file.
Option B: Merge SQLite Files (ADVANCED)
Only if you want to preserve both old and new collections' metadata:
# This requires SQLite tools - install if needed
# choco install sqlite # or: winget install sqlite
# 1. Backup both sqlite3 files
Copy-Item ".\chroma_db\chroma.sqlite3" -Destination ".\chroma_db\chroma.sqlite3.backup"
Copy-Item "C:\path\to\external\chroma_db\chroma.sqlite3" -Destination ".\chroma_db\chroma.sqlite3.external.backup"
# 2. Use SQLite merge (requires SQLite CLI knowledge)
# This is complex - recommended only if you're familiar with SQL
Step-by-Step Safe Copy Process
Complete Workflow:
1. STOP APPLICATION
ββ Close Streamlit
2. BACKUP CURRENT STATE
ββ Copy entire ./chroma_db to ./chroma_db.backup_YYYYMMDD_HHMMSS
3. IDENTIFY EXTERNAL COLLECTIONS
ββ Determine which collection UUID folders to copy
4. COPY EXTERNAL COLLECTION FOLDERS
ββ Copy only UUID folders (NOT chroma.sqlite3)
ββ Verify no naming conflicts
ββ Skip if collection name already exists
5. REBUILD METADATA
ββ Delete ./chroma_db/chroma.sqlite3
ββ OR restart application to rebuild automatically
6. START APPLICATION
ββ streamlit run streamlit_app.py
7. VERIFY IN UI
ββ Check "Existing Collections" dropdown
ββ Should show original + new external collections
8. TEST COLLECTIONS
ββ Load each collection
ββ Run test queries
ββ Verify retrieval works
9. CLEANUP (Optional)
ββ Delete backup after verification
Files to Copy Summary
| File/Folder | Copy? | Reason | Notes |
|---|---|---|---|
chroma.sqlite3 |
β NO | Conflicts | Let ChromaDB rebuild it |
| UUID folders | β YES | Collection data | Copy all new collections |
| Other files | β MAYBE | System files | Only if present in external |
What NOT to Copy
β Do NOT copy:
chroma.sqlite3directly- System/temporary files
- Old backup files from external ChromaDB
- Configuration files from external project
Verification Checklist
After merge, verify:
- β Streamlit starts without errors
- β Old collections still appear in dropdown
- β New collections appear in dropdown
- β Can load any collection without error
- β Can query and retrieve documents
- β Retrieved documents have correct embeddings
- β Evaluation runs without errors
- β chroma.sqlite3 file exists and is up-to-date
Troubleshooting
Problem: New collections don't appear
Solution:
# Delete sqlite3 and restart
Remove-Item -Path ".\chroma_db\chroma.sqlite3" -Force
# Restart Streamlit
Problem: Old collections disappeared
Restore from backup:
$timestamp = "YYYYMMDD_HHMMSS" # Use your backup timestamp
Remove-Item -Path ".\chroma_db" -Recurse -Force
Rename-Item -Path ".\chroma_db.backup_$timestamp" -NewName "chroma_db"
Problem: Collection name conflicts
Resolution:
# Rename the external collection folder before copying
# UUID folders are internally referenced, so renaming the folder name
# requires updating chroma.sqlite3 (complex)
# BETTER: Use different collection name
# In your project, import external collection with renamed name
Problem: File permission errors
Solution:
# Run PowerShell as Administrator
# Or check if files are locked by Streamlit process
# Restart PowerShell in admin mode:
Start-Process powershell -Verb RunAs
Safe Copy Command (Ready to Use)
For copying external collections safely:
# Set paths
$externalPath = "C:\path\to\external\chroma_db" # Update this
$projectPath = "d:\CapStoneProject\RAG Capstone Project"
$targetPath = "$projectPath\chroma_db"
# Backup current
$timestamp = Get-Date -Format "yyyyMMdd_HHmmss"
Copy-Item -Path $targetPath -Destination "$projectPath\chroma_db.backup_$timestamp" -Recurse
Write-Host "β
Backup created: chroma_db.backup_$timestamp"
# Copy external collections
$count = 0
Get-ChildItem -Path $externalPath -Directory | Where-Object {$_.Name -match "^[a-f0-9\-]{36}$"} | ForEach-Object {
$destFolder = Join-Path $targetPath $_.Name
if (-not (Test-Path $destFolder)) {
Copy-Item -Path $_.FullName -Destination $destFolder -Recurse -Force
$count++
Write-Host "β
Copied: $($_.Name)"
} else {
Write-Host "βοΈ Skipped: $($_.Name) (already exists)"
}
}
Write-Host ""
Write-Host "β
Copy complete! Copied $count new collections"
Write-Host "Next: Delete ./chroma_db/chroma.sqlite3 and restart application"
Summary
To safely merge with Direct Directory Copy:
- β
Backup your current
./chroma_db - β Copy only external collection UUID folders
- β DO NOT copy
chroma.sqlite3 - β
Delete old
chroma.sqlite3(let ChromaDB rebuild) - β Restart application
- β Verify all collections appear
Risk Level: π’ Low (if you follow this guide)
Your current collections are protected because:
- You backup before starting
- You don't overwrite sqlite3
- ChromaDB rebuilds the index safely
- You can restore from backup anytime