Spaces:
Running
Running
Documents & Images Storage Module - Implementation Summary
β Completed
1. Configuration (Settings Module)
- app/core/config.py: Added MinIO & storage settings
- MinIO endpoint, credentials, region
- Storage buckets (private docs, private images, public images, temp exports)
- File size limits (images 10MB, docs 50MB)
- MIME type allowlists (images: png, jpeg, webp, svg; docs: pdf, docx, xlsx, csv, txt, json)
- Presigned URL expiry (900 seconds default)
2. Database Model (PostgreSQL)
- app/documents/models.py:
StoredObjectSQLAlchemy model- Trans schema compliance
- Fields: id, tenant_id, domain, entity_id, category, bucket, object_key, file_name, mime_type, file_size, checksum, visibility, created_by, created_at, deleted_at, legal_hold
- Indexes on tenant+domain+entity (active records), tenant+checksum (dedup)
- Soft delete support via deleted_at
- Unique constraint on (tenant, domain, entity, object_key, deleted_at)
3. Persistence Layer (Repository)
- app/documents/repository.py:
DocumentRepository- Async SQLAlchemy operations
- Create upload placeholder
- Finalize upload with checksum
- Get active/deleted object lookups
- Resolve by ID or by composite (domain, entity, category, filename)
- Checksum deduplication lookup
- Soft delete
4. Storage Adapter (MinIO)
- app/documents/storage_adapter.py:
MinioStorageAdapter- Presigned PUT URLs (single-part uploads)
- Presigned GET URLs (downloads)
- Checksum verification (ETag comparison)
- Async wrapper using executor for blocking MinIO client
5. RBAC (Role-Based Access Control)
- app/documents/rbac.py:
DocumentRBAC- Tenant isolation enforcement
- Role-based permissions:
- buyer: po, grn, inventory domains
- supplier: dispatch, returns domains
- ops/operations: all except legal_hold bypass
- admin: full access
- Check read/write/delete with domain enforcement
6. Service Layer (Business Logic)
- app/documents/service.py:
DocumentService- Init upload: RBAC β validation β presigned URL generation β metadata storage
- Complete upload: checksum verification β deduplication β metadata finalization
- Generate download URL: RBAC β presigned URL issuance
- Get metadata: RBAC β record fetch
- Soft delete: RBAC β legal_hold check β deletion
- MIME validation (images vs docs)
- Bucket selection by category/visibility
- Object key building:
<tenant>/<domain>/<entity>/<category>/<filename> - Filename sanitization (strips path separators, regex substitution)
- Cache integration (optional Redis with TTL)
7. Schemas (Pydantic DTOs)
- app/documents/schemas.py: Request/response models
UploadInitRequest: domain, entity_id, category, file_name, mime_type, file_size, visibilityUploadInitResponse: upload_id, bucket, object_key, presigned_urlsUploadCompleteRequest: upload_id, checksum_sha256, partsDownloadUrlRequest: object_id OR composite lookup (domain, entity_id, category, file_name)DownloadUrlResponse: url, expires_inObjectMetadata: full record DTO with Config.from_attributes=TrueDeleteResponse: status
8. FastAPI Router (API Endpoints)
- app/documents/router.py:
/scm/storageroutesPOST /upload/initβ UploadInitResponsePOST /upload/completeβ {id, deduplicated?}POST /download-urlβ DownloadUrlResponseGET /{object_id}β ObjectMetadataDELETE /{object_id}β DeleteResponse- Dependency injection: MinIO adapter (lazy-loaded singleton), DocumentService via session factory, RBAC, cache
- Auth guard:
get_current_userrequired on all routes
9. App Integration
- app/main.py: Imported documents router
- app/main.py: Registered
documents_routerin FastAPI
10. Tests (Unit & Integration)
- tests/test_documents_storage.py: 5 tests, all passing β
- Fake repository with in-memory storage
- Fake MinIO adapter with presigned URL simulation
- Test fixture with dependency overrides for service/auth
test_presign_and_upload_flow: Upload init β complete β finalizetest_rbac_blocks_unauthorized_role: Supplier cannot upload to promotionstest_tenant_isolation_on_fetch: Tenant-2 cannot see tenant-1 objectstest_checksum_mismatch_raises_conflict: 409 on checksum failuretest_soft_delete_hides_object: Deleted objects return 404
11. Documentation
- app/documents/README.md: Setup guide
- MinIO environment variables
- Local MinIO Docker setup with bucket creation
- Database migration SQL
- API endpoints overview
- RBAC rules
- Test execution
12. Environment Configuration
- .env.example: Updated with storage section
- MinIO credentials, endpoint, secure flag
- Bucket names
- Size limits
- MIME allowlists
π¦ Dependencies Added
- minio>=7.1.0,<8.0.0 (updated in requirements.txt)
π Next Steps (Manual)
1. Set Up MinIO (Local Development)
docker run -d \
-p 9000:9000 \
-p 9001:9001 \
--name minio \
-e "MINIO_ROOT_USER=minioadmin" \
-e "MINIO_ROOT_PASSWORD=minioadmin" \
quay.io/minio/minio server /data --console-address ":9001"
# Create buckets
docker exec minio mc alias set local http://localhost:9000 minioadmin minioadmin
for bucket in documents-private images-private images-public exports-temp; do
docker exec minio mc mb local/$bucket
done
# Make images-public public
docker exec minio mc anonymous set download local/images-public
2. Create Database Table
psql $DATABASE_URL << 'EOF'
CREATE TABLE IF NOT EXISTS trans.stored_objects (
id UUID PRIMARY KEY,
tenant_id TEXT NOT NULL,
domain TEXT NOT NULL,
entity_id TEXT NOT NULL,
bucket_name TEXT NOT NULL,
object_key TEXT NOT NULL,
category TEXT NOT NULL,
file_name TEXT NOT NULL,
mime_type TEXT NOT NULL,
file_size BIGINT,
checksum_sha256 TEXT,
visibility VARCHAR(16) NOT NULL DEFAULT 'private',
created_by TEXT NOT NULL,
created_at TIMESTAMP NOT NULL DEFAULT NOW(),
deleted_at TIMESTAMP,
legal_hold BOOLEAN NOT NULL DEFAULT FALSE,
CONSTRAINT uq_stored_object_active_key UNIQUE (tenant_id, domain, entity_id, object_key, deleted_at)
);
CREATE INDEX ix_stored_objects_active ON trans.stored_objects(tenant_id, domain, entity_id) WHERE deleted_at IS NULL;
CREATE INDEX ix_stored_objects_checksum ON trans.stored_objects(tenant_id, checksum_sha256) WHERE deleted_at IS NULL;
EOF
3. Configure .env
# MinIO
MINIO_ENDPOINT=localhost:9000
MINIO_ACCESS_KEY=minioadmin
MINIO_SECRET_KEY=minioadmin
MINIO_SECURE=false
# Optional: Storage customizations
STORAGE_MAX_IMAGE_MB=10
STORAGE_MAX_DOC_MB=50
4. Run Application
uvicorn app.main:app --reload
5. Test API
# Request presigned URL
curl -X POST http://localhost:8000/scm/storage/upload/init \
-H "Authorization: Bearer <your_jwt_token>" \
-H "Content-Type: application/json" \
-d '{
"domain": "po",
"entity_id": "po-123",
"category": "invoice",
"file_name": "invoice.pdf",
"mime_type": "application/pdf",
"file_size": 102400,
"visibility": "private"
}'
6. Troubleshooting Database Connection
If you get DNS resolution errors for the Neon database:
Option A: Use Local PostgreSQL
# macOS
brew install postgresql@16
brew services start postgresql@16
createdb cuatrolabs
# Update .env
DB_HOST=localhost
DB_PORT=5432
DB_USER=$(whoami)
DATABASE_URL=postgresql+asyncpg://$(whoami)@localhost:5432/cuatrolabs
Option B: Resume Neon Database
- Go to https://console.neon.tech
- Find project ep-sweet-surf-a1qeduoy
- Click "Resume" if paused
- Wait 30 seconds and retry
π Test Coverage
| Test | Purpose | Status |
|---|---|---|
test_presign_and_upload_flow |
Upload init β complete β finalize | β PASS |
test_rbac_blocks_unauthorized_role |
Unauthorized role denied | β PASS |
test_tenant_isolation_on_fetch |
Tenant-2 cannot access tenant-1 | β PASS |
test_checksum_mismatch_raises_conflict |
Checksum validation | β PASS |
test_soft_delete_hides_object |
Soft delete enforcement | β PASS |
ποΈ Architecture
FastAPI Router (/scm/storage)
β (auth, DI)
DocumentService (business logic, RBAC, validation)
βββ DocumentRepository (persistence)
βββ MinioStorageAdapter (object storage)
βββ DocumentRBAC (authorization)
βββ Redis Cache (metadata caching, optional)
Clean separation of concerns: router β service β repository + adapter + RBAC.
π Security
- Tenant isolation: All queries scoped to tenant_id
- RBAC: Role-based domain access (buyer, supplier, ops, admin)
- Soft deletes: No physical data removal; legal_hold flag prevents deletion
- Presigned URLs: Time-limited (900s) read/write access; no direct MinIO exposure
- Checksum validation: SHA-256 verification for upload integrity
- Filename sanitization: Path traversal prevention, unicode normalization
- MIME validation: Allowlist-based file type checking
π Notes
- Future extraction: Module designed as a self-contained service; can be extracted into separate microservice later (APIs already follow service boundary patterns).
- Cache optional: Redis integration is graceful; operations work without cache.
- Presigned URLs: Direct downloads bypass FastAPI; reduced latency for large files.
- Multipart support: Infrastructure in place; current adapter uses simple PUT; easy upgrade to multipart for large files.