# Documents & Images Storage Module - Implementation Summary ## ✅ Completed ### 1. Configuration (Settings Module) - [app/core/config.py](app/core/config.py#L89-L108): Added MinIO & storage settings - MinIO endpoint, credentials, region - Storage buckets (private docs, private images, public images, temp exports) - File size limits (images 10MB, docs 50MB) - MIME type allowlists (images: png, jpeg, webp, svg; docs: pdf, docx, xlsx, csv, txt, json) - Presigned URL expiry (900 seconds default) ### 2. Database Model (PostgreSQL) - [app/documents/models.py](app/documents/models.py#L20-L49): `StoredObject` SQLAlchemy model - Trans schema compliance - Fields: id, tenant_id, domain, entity_id, category, bucket, object_key, file_name, mime_type, file_size, checksum, visibility, created_by, created_at, deleted_at, legal_hold - Indexes on tenant+domain+entity (active records), tenant+checksum (dedup) - Soft delete support via deleted_at - Unique constraint on (tenant, domain, entity, object_key, deleted_at) ### 3. Persistence Layer (Repository) - [app/documents/repository.py](app/documents/repository.py#L12-L94): `DocumentRepository` - Async SQLAlchemy operations - Create upload placeholder - Finalize upload with checksum - Get active/deleted object lookups - Resolve by ID or by composite (domain, entity, category, filename) - Checksum deduplication lookup - Soft delete ### 4. Storage Adapter (MinIO) - [app/documents/storage_adapter.py](app/documents/storage_adapter.py): `MinioStorageAdapter` - Presigned PUT URLs (single-part uploads) - Presigned GET URLs (downloads) - Checksum verification (ETag comparison) - Async wrapper using executor for blocking MinIO client ### 5. RBAC (Role-Based Access Control) - [app/documents/rbac.py](app/documents/rbac.py): `DocumentRBAC` - Tenant isolation enforcement - Role-based permissions: - buyer: po, grn, inventory domains - supplier: dispatch, returns domains - ops/operations: all except legal_hold bypass - admin: full access - Check read/write/delete with domain enforcement ### 6. Service Layer (Business Logic) - [app/documents/service.py](app/documents/service.py#L24-L125): `DocumentService` - Init upload: RBAC → validation → presigned URL generation → metadata storage - Complete upload: checksum verification → deduplication → metadata finalization - Generate download URL: RBAC → presigned URL issuance - Get metadata: RBAC → record fetch - Soft delete: RBAC → legal_hold check → deletion - MIME validation (images vs docs) - Bucket selection by category/visibility - Object key building: `////` - Filename sanitization (strips path separators, regex substitution) - Cache integration (optional Redis with TTL) ### 7. Schemas (Pydantic DTOs) - [app/documents/schemas.py](app/documents/schemas.py): Request/response models - `UploadInitRequest`: domain, entity_id, category, file_name, mime_type, file_size, visibility - `UploadInitResponse`: upload_id, bucket, object_key, presigned_urls - `UploadCompleteRequest`: upload_id, checksum_sha256, parts - `DownloadUrlRequest`: object_id OR composite lookup (domain, entity_id, category, file_name) - `DownloadUrlResponse`: url, expires_in - `ObjectMetadata`: full record DTO with Config.from_attributes=True - `DeleteResponse`: status ### 8. FastAPI Router (API Endpoints) - [app/documents/router.py](app/documents/router.py): `/scm/storage` routes - `POST /upload/init` → UploadInitResponse - `POST /upload/complete` → {id, deduplicated?} - `POST /download-url` → DownloadUrlResponse - `GET /{object_id}` → ObjectMetadata - `DELETE /{object_id}` → DeleteResponse - Dependency injection: MinIO adapter (lazy-loaded singleton), DocumentService via session factory, RBAC, cache - Auth guard: `get_current_user` required on all routes ### 9. App Integration - [app/main.py](app/main.py#L39): Imported documents router - [app/main.py](app/main.py#L146-L152): Registered `documents_router` in FastAPI ### 10. Tests (Unit & Integration) - [tests/test_documents_storage.py](tests/test_documents_storage.py): **5 tests, all passing** ✅ - Fake repository with in-memory storage - Fake MinIO adapter with presigned URL simulation - Test fixture with dependency overrides for service/auth - `test_presign_and_upload_flow`: Upload init → complete → finalize - `test_rbac_blocks_unauthorized_role`: Supplier cannot upload to promotions - `test_tenant_isolation_on_fetch`: Tenant-2 cannot see tenant-1 objects - `test_checksum_mismatch_raises_conflict`: 409 on checksum failure - `test_soft_delete_hides_object`: Deleted objects return 404 ### 11. Documentation - [app/documents/README.md](app/documents/README.md): Setup guide - MinIO environment variables - Local MinIO Docker setup with bucket creation - Database migration SQL - API endpoints overview - RBAC rules - Test execution ### 12. Environment Configuration - [.env.example](.env.example): Updated with storage section - MinIO credentials, endpoint, secure flag - Bucket names - Size limits - MIME allowlists --- ## 📦 Dependencies Added - minio>=7.1.0,<8.0.0 (updated in requirements.txt) --- ## 🚀 Next Steps (Manual) ### 1. Set Up MinIO (Local Development) ```bash docker run -d \ -p 9000:9000 \ -p 9001:9001 \ --name minio \ -e "MINIO_ROOT_USER=minioadmin" \ -e "MINIO_ROOT_PASSWORD=minioadmin" \ quay.io/minio/minio server /data --console-address ":9001" # Create buckets docker exec minio mc alias set local http://localhost:9000 minioadmin minioadmin for bucket in documents-private images-private images-public exports-temp; do docker exec minio mc mb local/$bucket done # Make images-public public docker exec minio mc anonymous set download local/images-public ``` ### 2. Create Database Table ```bash psql $DATABASE_URL << 'EOF' CREATE TABLE IF NOT EXISTS trans.stored_objects ( id UUID PRIMARY KEY, tenant_id TEXT NOT NULL, domain TEXT NOT NULL, entity_id TEXT NOT NULL, bucket_name TEXT NOT NULL, object_key TEXT NOT NULL, category TEXT NOT NULL, file_name TEXT NOT NULL, mime_type TEXT NOT NULL, file_size BIGINT, checksum_sha256 TEXT, visibility VARCHAR(16) NOT NULL DEFAULT 'private', created_by TEXT NOT NULL, created_at TIMESTAMP NOT NULL DEFAULT NOW(), deleted_at TIMESTAMP, legal_hold BOOLEAN NOT NULL DEFAULT FALSE, CONSTRAINT uq_stored_object_active_key UNIQUE (tenant_id, domain, entity_id, object_key, deleted_at) ); CREATE INDEX ix_stored_objects_active ON trans.stored_objects(tenant_id, domain, entity_id) WHERE deleted_at IS NULL; CREATE INDEX ix_stored_objects_checksum ON trans.stored_objects(tenant_id, checksum_sha256) WHERE deleted_at IS NULL; EOF ``` ### 3. Configure .env ```env # MinIO MINIO_ENDPOINT=localhost:9000 MINIO_ACCESS_KEY=minioadmin MINIO_SECRET_KEY=minioadmin MINIO_SECURE=false # Optional: Storage customizations STORAGE_MAX_IMAGE_MB=10 STORAGE_MAX_DOC_MB=50 ``` ### 4. Run Application ```bash uvicorn app.main:app --reload ``` ### 5. Test API ```bash # Request presigned URL curl -X POST http://localhost:8000/scm/storage/upload/init \ -H "Authorization: Bearer " \ -H "Content-Type: application/json" \ -d '{ "domain": "po", "entity_id": "po-123", "category": "invoice", "file_name": "invoice.pdf", "mime_type": "application/pdf", "file_size": 102400, "visibility": "private" }' ``` ### 6. Troubleshooting Database Connection If you get DNS resolution errors for the Neon database: **Option A: Use Local PostgreSQL** ```bash # macOS brew install postgresql@16 brew services start postgresql@16 createdb cuatrolabs # Update .env DB_HOST=localhost DB_PORT=5432 DB_USER=$(whoami) DATABASE_URL=postgresql+asyncpg://$(whoami)@localhost:5432/cuatrolabs ``` **Option B: Resume Neon Database** - Go to https://console.neon.tech - Find project ep-sweet-surf-a1qeduoy - Click "Resume" if paused - Wait 30 seconds and retry --- ## 📊 Test Coverage | Test | Purpose | Status | |------|---------|--------| | `test_presign_and_upload_flow` | Upload init → complete → finalize | ✅ PASS | | `test_rbac_blocks_unauthorized_role` | Unauthorized role denied | ✅ PASS | | `test_tenant_isolation_on_fetch` | Tenant-2 cannot access tenant-1 | ✅ PASS | | `test_checksum_mismatch_raises_conflict` | Checksum validation | ✅ PASS | | `test_soft_delete_hides_object` | Soft delete enforcement | ✅ PASS | --- ## 🏗️ Architecture ``` FastAPI Router (/scm/storage) ↓ (auth, DI) DocumentService (business logic, RBAC, validation) ├─→ DocumentRepository (persistence) ├─→ MinioStorageAdapter (object storage) ├─→ DocumentRBAC (authorization) └─→ Redis Cache (metadata caching, optional) ``` Clean separation of concerns: router → service → repository + adapter + RBAC. --- ## 🔒 Security - **Tenant isolation**: All queries scoped to tenant_id - **RBAC**: Role-based domain access (buyer, supplier, ops, admin) - **Soft deletes**: No physical data removal; legal_hold flag prevents deletion - **Presigned URLs**: Time-limited (900s) read/write access; no direct MinIO exposure - **Checksum validation**: SHA-256 verification for upload integrity - **Filename sanitization**: Path traversal prevention, unicode normalization - **MIME validation**: Allowlist-based file type checking --- ## 📝 Notes - **Future extraction**: Module designed as a self-contained service; can be extracted into separate microservice later (APIs already follow service boundary patterns). - **Cache optional**: Redis integration is graceful; operations work without cache. - **Presigned URLs**: Direct downloads bypass FastAPI; reduced latency for large files. - **Multipart support**: Infrastructure in place; current adapter uses simple PUT; easy upgrade to multipart for large files.