cr-agent / docs /STORAGE_MANAGEMENT.md
Sibi Krishnamoorthy
fix workflow
48a5851
# πŸ“ Storage Management Guide
## Overview
Your system uses three storage locations for organization and persistence:
```
Project Root
β”œβ”€β”€ uploads/ # Temporary files (auto-cleanup after 24h)
β”œβ”€β”€ persistent_docs/ # Permanent files (company policies, etc.)
└── chroma_db/ # Vector embeddings (independent of files)
```
## Storage Types
### uploads/
- Temporary chat uploads, one-time document queries
- Auto-deleted after 24 hours
### persistent_docs/
- Permanent storage for company policies, reference docs
- Manual cleanup only
### chroma_db/
- Persistent semantic embeddings for fast search
- Vectors remain even if source files are deleted
## Key Features
- **Automatic Cleanup:** Temporary uploads deleted after 24h (on startup or via API)
- **Persistent Documents:** Upload with `persistent=true` to store forever
- **Vector Store:** ChromaDB vectors always persist, even if files are deleted
## API Usage
### Upload File (Temporary)
```bash
curl -X POST "http://localhost:8000/upload" -F "file=@file.pdf"
# File goes to uploads/ and will be deleted after 24h
```
### Upload File (Persistent)
```bash
curl -X POST "http://localhost:8000/upload" -F "file=@file.pdf" -F "persistent=true"
# File goes to persistent_docs/ and stays forever
```
### Get Storage Info
```bash
curl http://localhost:8000/storage/info
```
### Manual Cleanup
```bash
curl -X POST "http://localhost:8000/storage/cleanup?max_age_hours=12"
# Removes temporary files older than 12 hours
```
## Vector Store Behavior
- Upload file β†’ Vectors created in chroma_db/
- Delete source file β†’ Vectors remain in chroma_db/
- Search works even if original file is gone
- To remove vectors, clear chroma_db/ manually
## Best Practices
- Use temporary storage for one-time analysis, personal uploads, testing
- Use persistent storage for policies, handbooks, SOPs, knowledge base
- Periodically clean chroma_db/ to free disk space
## Troubleshooting
- **Why can I still search deleted files?**
- Vectors persist in ChromaDB by design
- **How do I free up disk space?**
- Temporary files auto-delete; clear chroma_db/ for vectors
- **Change cleanup time?**
- Edit `cleanup_old_uploads(max_age_hours=24)` in main.py
- **Duplicate uploads?**
- Each upload gets a unique UUID filename; vectors stored by document_id
## Monitoring
Check usage regularly:
```bash
curl http://localhost:8000/storage/info
ls -lh uploads/
ls -lh persistent_docs/
du -sh chroma_db/
```
## Summary
- uploads/: Temporary, auto-cleanup (24h)
- persistent_docs/: Permanent, manual cleanup
- chroma_db/: Persistent vectors, independent of files
- Automatic and manual cleanup supported
- Storage info API for monitoring
Your multi-agent system now has production-ready storage management! πŸš€