cuatrolabs-scm-ms / DOCUMENTS_IMPLEMENTATION.md
MukeshKapoor25's picture
minio doc store
9b11567
# Documents & Images Storage Module - Implementation Summary
## βœ… Completed
### 1. Configuration (Settings Module)
- [app/core/config.py](app/core/config.py#L89-L108): Added MinIO & storage settings
- MinIO endpoint, credentials, region
- Storage buckets (private docs, private images, public images, temp exports)
- File size limits (images 10MB, docs 50MB)
- MIME type allowlists (images: png, jpeg, webp, svg; docs: pdf, docx, xlsx, csv, txt, json)
- Presigned URL expiry (900 seconds default)
### 2. Database Model (PostgreSQL)
- [app/documents/models.py](app/documents/models.py#L20-L49): `StoredObject` SQLAlchemy model
- Trans schema compliance
- Fields: id, tenant_id, domain, entity_id, category, bucket, object_key, file_name, mime_type, file_size, checksum, visibility, created_by, created_at, deleted_at, legal_hold
- Indexes on tenant+domain+entity (active records), tenant+checksum (dedup)
- Soft delete support via deleted_at
- Unique constraint on (tenant, domain, entity, object_key, deleted_at)
### 3. Persistence Layer (Repository)
- [app/documents/repository.py](app/documents/repository.py#L12-L94): `DocumentRepository`
- Async SQLAlchemy operations
- Create upload placeholder
- Finalize upload with checksum
- Get active/deleted object lookups
- Resolve by ID or by composite (domain, entity, category, filename)
- Checksum deduplication lookup
- Soft delete
### 4. Storage Adapter (MinIO)
- [app/documents/storage_adapter.py](app/documents/storage_adapter.py): `MinioStorageAdapter`
- Presigned PUT URLs (single-part uploads)
- Presigned GET URLs (downloads)
- Checksum verification (ETag comparison)
- Async wrapper using executor for blocking MinIO client
### 5. RBAC (Role-Based Access Control)
- [app/documents/rbac.py](app/documents/rbac.py): `DocumentRBAC`
- Tenant isolation enforcement
- Role-based permissions:
- buyer: po, grn, inventory domains
- supplier: dispatch, returns domains
- ops/operations: all except legal_hold bypass
- admin: full access
- Check read/write/delete with domain enforcement
### 6. Service Layer (Business Logic)
- [app/documents/service.py](app/documents/service.py#L24-L125): `DocumentService`
- Init upload: RBAC β†’ validation β†’ presigned URL generation β†’ metadata storage
- Complete upload: checksum verification β†’ deduplication β†’ metadata finalization
- Generate download URL: RBAC β†’ presigned URL issuance
- Get metadata: RBAC β†’ record fetch
- Soft delete: RBAC β†’ legal_hold check β†’ deletion
- MIME validation (images vs docs)
- Bucket selection by category/visibility
- Object key building: `<tenant>/<domain>/<entity>/<category>/<filename>`
- Filename sanitization (strips path separators, regex substitution)
- Cache integration (optional Redis with TTL)
### 7. Schemas (Pydantic DTOs)
- [app/documents/schemas.py](app/documents/schemas.py): Request/response models
- `UploadInitRequest`: domain, entity_id, category, file_name, mime_type, file_size, visibility
- `UploadInitResponse`: upload_id, bucket, object_key, presigned_urls
- `UploadCompleteRequest`: upload_id, checksum_sha256, parts
- `DownloadUrlRequest`: object_id OR composite lookup (domain, entity_id, category, file_name)
- `DownloadUrlResponse`: url, expires_in
- `ObjectMetadata`: full record DTO with Config.from_attributes=True
- `DeleteResponse`: status
### 8. FastAPI Router (API Endpoints)
- [app/documents/router.py](app/documents/router.py): `/scm/storage` routes
- `POST /upload/init` β†’ UploadInitResponse
- `POST /upload/complete` β†’ {id, deduplicated?}
- `POST /download-url` β†’ DownloadUrlResponse
- `GET /{object_id}` β†’ ObjectMetadata
- `DELETE /{object_id}` β†’ DeleteResponse
- Dependency injection: MinIO adapter (lazy-loaded singleton), DocumentService via session factory, RBAC, cache
- Auth guard: `get_current_user` required on all routes
### 9. App Integration
- [app/main.py](app/main.py#L39): Imported documents router
- [app/main.py](app/main.py#L146-L152): Registered `documents_router` in FastAPI
### 10. Tests (Unit & Integration)
- [tests/test_documents_storage.py](tests/test_documents_storage.py): **5 tests, all passing** βœ…
- Fake repository with in-memory storage
- Fake MinIO adapter with presigned URL simulation
- Test fixture with dependency overrides for service/auth
- `test_presign_and_upload_flow`: Upload init β†’ complete β†’ finalize
- `test_rbac_blocks_unauthorized_role`: Supplier cannot upload to promotions
- `test_tenant_isolation_on_fetch`: Tenant-2 cannot see tenant-1 objects
- `test_checksum_mismatch_raises_conflict`: 409 on checksum failure
- `test_soft_delete_hides_object`: Deleted objects return 404
### 11. Documentation
- [app/documents/README.md](app/documents/README.md): Setup guide
- MinIO environment variables
- Local MinIO Docker setup with bucket creation
- Database migration SQL
- API endpoints overview
- RBAC rules
- Test execution
### 12. Environment Configuration
- [.env.example](.env.example): Updated with storage section
- MinIO credentials, endpoint, secure flag
- Bucket names
- Size limits
- MIME allowlists
---
## πŸ“¦ Dependencies Added
- minio>=7.1.0,<8.0.0 (updated in requirements.txt)
---
## πŸš€ Next Steps (Manual)
### 1. Set Up MinIO (Local Development)
```bash
docker run -d \
-p 9000:9000 \
-p 9001:9001 \
--name minio \
-e "MINIO_ROOT_USER=minioadmin" \
-e "MINIO_ROOT_PASSWORD=minioadmin" \
quay.io/minio/minio server /data --console-address ":9001"
# Create buckets
docker exec minio mc alias set local http://localhost:9000 minioadmin minioadmin
for bucket in documents-private images-private images-public exports-temp; do
docker exec minio mc mb local/$bucket
done
# Make images-public public
docker exec minio mc anonymous set download local/images-public
```
### 2. Create Database Table
```bash
psql $DATABASE_URL << 'EOF'
CREATE TABLE IF NOT EXISTS trans.stored_objects (
id UUID PRIMARY KEY,
tenant_id TEXT NOT NULL,
domain TEXT NOT NULL,
entity_id TEXT NOT NULL,
bucket_name TEXT NOT NULL,
object_key TEXT NOT NULL,
category TEXT NOT NULL,
file_name TEXT NOT NULL,
mime_type TEXT NOT NULL,
file_size BIGINT,
checksum_sha256 TEXT,
visibility VARCHAR(16) NOT NULL DEFAULT 'private',
created_by TEXT NOT NULL,
created_at TIMESTAMP NOT NULL DEFAULT NOW(),
deleted_at TIMESTAMP,
legal_hold BOOLEAN NOT NULL DEFAULT FALSE,
CONSTRAINT uq_stored_object_active_key UNIQUE (tenant_id, domain, entity_id, object_key, deleted_at)
);
CREATE INDEX ix_stored_objects_active ON trans.stored_objects(tenant_id, domain, entity_id) WHERE deleted_at IS NULL;
CREATE INDEX ix_stored_objects_checksum ON trans.stored_objects(tenant_id, checksum_sha256) WHERE deleted_at IS NULL;
EOF
```
### 3. Configure .env
```env
# MinIO
MINIO_ENDPOINT=localhost:9000
MINIO_ACCESS_KEY=minioadmin
MINIO_SECRET_KEY=minioadmin
MINIO_SECURE=false
# Optional: Storage customizations
STORAGE_MAX_IMAGE_MB=10
STORAGE_MAX_DOC_MB=50
```
### 4. Run Application
```bash
uvicorn app.main:app --reload
```
### 5. Test API
```bash
# Request presigned URL
curl -X POST http://localhost:8000/scm/storage/upload/init \
-H "Authorization: Bearer <your_jwt_token>" \
-H "Content-Type: application/json" \
-d '{
"domain": "po",
"entity_id": "po-123",
"category": "invoice",
"file_name": "invoice.pdf",
"mime_type": "application/pdf",
"file_size": 102400,
"visibility": "private"
}'
```
### 6. Troubleshooting Database Connection
If you get DNS resolution errors for the Neon database:
**Option A: Use Local PostgreSQL**
```bash
# macOS
brew install postgresql@16
brew services start postgresql@16
createdb cuatrolabs
# Update .env
DB_HOST=localhost
DB_PORT=5432
DB_USER=$(whoami)
DATABASE_URL=postgresql+asyncpg://$(whoami)@localhost:5432/cuatrolabs
```
**Option B: Resume Neon Database**
- Go to https://console.neon.tech
- Find project ep-sweet-surf-a1qeduoy
- Click "Resume" if paused
- Wait 30 seconds and retry
---
## πŸ“Š Test Coverage
| Test | Purpose | Status |
|------|---------|--------|
| `test_presign_and_upload_flow` | Upload init β†’ complete β†’ finalize | βœ… PASS |
| `test_rbac_blocks_unauthorized_role` | Unauthorized role denied | βœ… PASS |
| `test_tenant_isolation_on_fetch` | Tenant-2 cannot access tenant-1 | βœ… PASS |
| `test_checksum_mismatch_raises_conflict` | Checksum validation | βœ… PASS |
| `test_soft_delete_hides_object` | Soft delete enforcement | βœ… PASS |
---
## πŸ—οΈ Architecture
```
FastAPI Router (/scm/storage)
↓ (auth, DI)
DocumentService (business logic, RBAC, validation)
β”œβ”€β†’ DocumentRepository (persistence)
β”œβ”€β†’ MinioStorageAdapter (object storage)
β”œβ”€β†’ DocumentRBAC (authorization)
└─→ Redis Cache (metadata caching, optional)
```
Clean separation of concerns: router β†’ service β†’ repository + adapter + RBAC.
---
## πŸ”’ Security
- **Tenant isolation**: All queries scoped to tenant_id
- **RBAC**: Role-based domain access (buyer, supplier, ops, admin)
- **Soft deletes**: No physical data removal; legal_hold flag prevents deletion
- **Presigned URLs**: Time-limited (900s) read/write access; no direct MinIO exposure
- **Checksum validation**: SHA-256 verification for upload integrity
- **Filename sanitization**: Path traversal prevention, unicode normalization
- **MIME validation**: Allowlist-based file type checking
---
## πŸ“ Notes
- **Future extraction**: Module designed as a self-contained service; can be extracted into separate microservice later (APIs already follow service boundary patterns).
- **Cache optional**: Redis integration is graceful; operations work without cache.
- **Presigned URLs**: Direct downloads bypass FastAPI; reduced latency for large files.
- **Multipart support**: Infrastructure in place; current adapter uses simple PUT; easy upgrade to multipart for large files.