Spaces:
Running
Running
| # Documents & Images Storage Module - Implementation Summary | |
| ## β Completed | |
| ### 1. Configuration (Settings Module) | |
| - [app/core/config.py](app/core/config.py#L89-L108): Added MinIO & storage settings | |
| - MinIO endpoint, credentials, region | |
| - Storage buckets (private docs, private images, public images, temp exports) | |
| - File size limits (images 10MB, docs 50MB) | |
| - MIME type allowlists (images: png, jpeg, webp, svg; docs: pdf, docx, xlsx, csv, txt, json) | |
| - Presigned URL expiry (900 seconds default) | |
| ### 2. Database Model (PostgreSQL) | |
| - [app/documents/models.py](app/documents/models.py#L20-L49): `StoredObject` SQLAlchemy model | |
| - Trans schema compliance | |
| - Fields: id, tenant_id, domain, entity_id, category, bucket, object_key, file_name, mime_type, file_size, checksum, visibility, created_by, created_at, deleted_at, legal_hold | |
| - Indexes on tenant+domain+entity (active records), tenant+checksum (dedup) | |
| - Soft delete support via deleted_at | |
| - Unique constraint on (tenant, domain, entity, object_key, deleted_at) | |
| ### 3. Persistence Layer (Repository) | |
| - [app/documents/repository.py](app/documents/repository.py#L12-L94): `DocumentRepository` | |
| - Async SQLAlchemy operations | |
| - Create upload placeholder | |
| - Finalize upload with checksum | |
| - Get active/deleted object lookups | |
| - Resolve by ID or by composite (domain, entity, category, filename) | |
| - Checksum deduplication lookup | |
| - Soft delete | |
| ### 4. Storage Adapter (MinIO) | |
| - [app/documents/storage_adapter.py](app/documents/storage_adapter.py): `MinioStorageAdapter` | |
| - Presigned PUT URLs (single-part uploads) | |
| - Presigned GET URLs (downloads) | |
| - Checksum verification (ETag comparison) | |
| - Async wrapper using executor for blocking MinIO client | |
| ### 5. RBAC (Role-Based Access Control) | |
| - [app/documents/rbac.py](app/documents/rbac.py): `DocumentRBAC` | |
| - Tenant isolation enforcement | |
| - Role-based permissions: | |
| - buyer: po, grn, inventory domains | |
| - supplier: dispatch, returns domains | |
| - ops/operations: all except legal_hold bypass | |
| - admin: full access | |
| - Check read/write/delete with domain enforcement | |
| ### 6. Service Layer (Business Logic) | |
| - [app/documents/service.py](app/documents/service.py#L24-L125): `DocumentService` | |
| - Init upload: RBAC β validation β presigned URL generation β metadata storage | |
| - Complete upload: checksum verification β deduplication β metadata finalization | |
| - Generate download URL: RBAC β presigned URL issuance | |
| - Get metadata: RBAC β record fetch | |
| - Soft delete: RBAC β legal_hold check β deletion | |
| - MIME validation (images vs docs) | |
| - Bucket selection by category/visibility | |
| - Object key building: `<tenant>/<domain>/<entity>/<category>/<filename>` | |
| - Filename sanitization (strips path separators, regex substitution) | |
| - Cache integration (optional Redis with TTL) | |
| ### 7. Schemas (Pydantic DTOs) | |
| - [app/documents/schemas.py](app/documents/schemas.py): Request/response models | |
| - `UploadInitRequest`: domain, entity_id, category, file_name, mime_type, file_size, visibility | |
| - `UploadInitResponse`: upload_id, bucket, object_key, presigned_urls | |
| - `UploadCompleteRequest`: upload_id, checksum_sha256, parts | |
| - `DownloadUrlRequest`: object_id OR composite lookup (domain, entity_id, category, file_name) | |
| - `DownloadUrlResponse`: url, expires_in | |
| - `ObjectMetadata`: full record DTO with Config.from_attributes=True | |
| - `DeleteResponse`: status | |
| ### 8. FastAPI Router (API Endpoints) | |
| - [app/documents/router.py](app/documents/router.py): `/scm/storage` routes | |
| - `POST /upload/init` β UploadInitResponse | |
| - `POST /upload/complete` β {id, deduplicated?} | |
| - `POST /download-url` β DownloadUrlResponse | |
| - `GET /{object_id}` β ObjectMetadata | |
| - `DELETE /{object_id}` β DeleteResponse | |
| - Dependency injection: MinIO adapter (lazy-loaded singleton), DocumentService via session factory, RBAC, cache | |
| - Auth guard: `get_current_user` required on all routes | |
| ### 9. App Integration | |
| - [app/main.py](app/main.py#L39): Imported documents router | |
| - [app/main.py](app/main.py#L146-L152): Registered `documents_router` in FastAPI | |
| ### 10. Tests (Unit & Integration) | |
| - [tests/test_documents_storage.py](tests/test_documents_storage.py): **5 tests, all passing** β | |
| - Fake repository with in-memory storage | |
| - Fake MinIO adapter with presigned URL simulation | |
| - Test fixture with dependency overrides for service/auth | |
| - `test_presign_and_upload_flow`: Upload init β complete β finalize | |
| - `test_rbac_blocks_unauthorized_role`: Supplier cannot upload to promotions | |
| - `test_tenant_isolation_on_fetch`: Tenant-2 cannot see tenant-1 objects | |
| - `test_checksum_mismatch_raises_conflict`: 409 on checksum failure | |
| - `test_soft_delete_hides_object`: Deleted objects return 404 | |
| ### 11. Documentation | |
| - [app/documents/README.md](app/documents/README.md): Setup guide | |
| - MinIO environment variables | |
| - Local MinIO Docker setup with bucket creation | |
| - Database migration SQL | |
| - API endpoints overview | |
| - RBAC rules | |
| - Test execution | |
| ### 12. Environment Configuration | |
| - [.env.example](.env.example): Updated with storage section | |
| - MinIO credentials, endpoint, secure flag | |
| - Bucket names | |
| - Size limits | |
| - MIME allowlists | |
| --- | |
| ## π¦ Dependencies Added | |
| - minio>=7.1.0,<8.0.0 (updated in requirements.txt) | |
| --- | |
| ## π Next Steps (Manual) | |
| ### 1. Set Up MinIO (Local Development) | |
| ```bash | |
| docker run -d \ | |
| -p 9000:9000 \ | |
| -p 9001:9001 \ | |
| --name minio \ | |
| -e "MINIO_ROOT_USER=minioadmin" \ | |
| -e "MINIO_ROOT_PASSWORD=minioadmin" \ | |
| quay.io/minio/minio server /data --console-address ":9001" | |
| # Create buckets | |
| docker exec minio mc alias set local http://localhost:9000 minioadmin minioadmin | |
| for bucket in documents-private images-private images-public exports-temp; do | |
| docker exec minio mc mb local/$bucket | |
| done | |
| # Make images-public public | |
| docker exec minio mc anonymous set download local/images-public | |
| ``` | |
| ### 2. Create Database Table | |
| ```bash | |
| psql $DATABASE_URL << 'EOF' | |
| CREATE TABLE IF NOT EXISTS trans.stored_objects ( | |
| id UUID PRIMARY KEY, | |
| tenant_id TEXT NOT NULL, | |
| domain TEXT NOT NULL, | |
| entity_id TEXT NOT NULL, | |
| bucket_name TEXT NOT NULL, | |
| object_key TEXT NOT NULL, | |
| category TEXT NOT NULL, | |
| file_name TEXT NOT NULL, | |
| mime_type TEXT NOT NULL, | |
| file_size BIGINT, | |
| checksum_sha256 TEXT, | |
| visibility VARCHAR(16) NOT NULL DEFAULT 'private', | |
| created_by TEXT NOT NULL, | |
| created_at TIMESTAMP NOT NULL DEFAULT NOW(), | |
| deleted_at TIMESTAMP, | |
| legal_hold BOOLEAN NOT NULL DEFAULT FALSE, | |
| CONSTRAINT uq_stored_object_active_key UNIQUE (tenant_id, domain, entity_id, object_key, deleted_at) | |
| ); | |
| CREATE INDEX ix_stored_objects_active ON trans.stored_objects(tenant_id, domain, entity_id) WHERE deleted_at IS NULL; | |
| CREATE INDEX ix_stored_objects_checksum ON trans.stored_objects(tenant_id, checksum_sha256) WHERE deleted_at IS NULL; | |
| EOF | |
| ``` | |
| ### 3. Configure .env | |
| ```env | |
| # MinIO | |
| MINIO_ENDPOINT=localhost:9000 | |
| MINIO_ACCESS_KEY=minioadmin | |
| MINIO_SECRET_KEY=minioadmin | |
| MINIO_SECURE=false | |
| # Optional: Storage customizations | |
| STORAGE_MAX_IMAGE_MB=10 | |
| STORAGE_MAX_DOC_MB=50 | |
| ``` | |
| ### 4. Run Application | |
| ```bash | |
| uvicorn app.main:app --reload | |
| ``` | |
| ### 5. Test API | |
| ```bash | |
| # Request presigned URL | |
| curl -X POST http://localhost:8000/scm/storage/upload/init \ | |
| -H "Authorization: Bearer <your_jwt_token>" \ | |
| -H "Content-Type: application/json" \ | |
| -d '{ | |
| "domain": "po", | |
| "entity_id": "po-123", | |
| "category": "invoice", | |
| "file_name": "invoice.pdf", | |
| "mime_type": "application/pdf", | |
| "file_size": 102400, | |
| "visibility": "private" | |
| }' | |
| ``` | |
| ### 6. Troubleshooting Database Connection | |
| If you get DNS resolution errors for the Neon database: | |
| **Option A: Use Local PostgreSQL** | |
| ```bash | |
| # macOS | |
| brew install postgresql@16 | |
| brew services start postgresql@16 | |
| createdb cuatrolabs | |
| # Update .env | |
| DB_HOST=localhost | |
| DB_PORT=5432 | |
| DB_USER=$(whoami) | |
| DATABASE_URL=postgresql+asyncpg://$(whoami)@localhost:5432/cuatrolabs | |
| ``` | |
| **Option B: Resume Neon Database** | |
| - Go to https://console.neon.tech | |
| - Find project ep-sweet-surf-a1qeduoy | |
| - Click "Resume" if paused | |
| - Wait 30 seconds and retry | |
| --- | |
| ## π Test Coverage | |
| | Test | Purpose | Status | | |
| |------|---------|--------| | |
| | `test_presign_and_upload_flow` | Upload init β complete β finalize | β PASS | | |
| | `test_rbac_blocks_unauthorized_role` | Unauthorized role denied | β PASS | | |
| | `test_tenant_isolation_on_fetch` | Tenant-2 cannot access tenant-1 | β PASS | | |
| | `test_checksum_mismatch_raises_conflict` | Checksum validation | β PASS | | |
| | `test_soft_delete_hides_object` | Soft delete enforcement | β PASS | | |
| --- | |
| ## ποΈ Architecture | |
| ``` | |
| FastAPI Router (/scm/storage) | |
| β (auth, DI) | |
| DocumentService (business logic, RBAC, validation) | |
| βββ DocumentRepository (persistence) | |
| βββ MinioStorageAdapter (object storage) | |
| βββ DocumentRBAC (authorization) | |
| βββ Redis Cache (metadata caching, optional) | |
| ``` | |
| Clean separation of concerns: router β service β repository + adapter + RBAC. | |
| --- | |
| ## π Security | |
| - **Tenant isolation**: All queries scoped to tenant_id | |
| - **RBAC**: Role-based domain access (buyer, supplier, ops, admin) | |
| - **Soft deletes**: No physical data removal; legal_hold flag prevents deletion | |
| - **Presigned URLs**: Time-limited (900s) read/write access; no direct MinIO exposure | |
| - **Checksum validation**: SHA-256 verification for upload integrity | |
| - **Filename sanitization**: Path traversal prevention, unicode normalization | |
| - **MIME validation**: Allowlist-based file type checking | |
| --- | |
| ## π Notes | |
| - **Future extraction**: Module designed as a self-contained service; can be extracted into separate microservice later (APIs already follow service boundary patterns). | |
| - **Cache optional**: Redis integration is graceful; operations work without cache. | |
| - **Presigned URLs**: Direct downloads bypass FastAPI; reduced latency for large files. | |
| - **Multipart support**: Infrastructure in place; current adapter uses simple PUT; easy upgrade to multipart for large files. | |