cuatrolabs-scm-ms / DOCUMENTS_IMPLEMENTATION.md
MukeshKapoor25's picture
minio doc store
9b11567

Documents & Images Storage Module - Implementation Summary

βœ… Completed

1. Configuration (Settings Module)

  • app/core/config.py: Added MinIO & storage settings
    • MinIO endpoint, credentials, region
    • Storage buckets (private docs, private images, public images, temp exports)
    • File size limits (images 10MB, docs 50MB)
    • MIME type allowlists (images: png, jpeg, webp, svg; docs: pdf, docx, xlsx, csv, txt, json)
    • Presigned URL expiry (900 seconds default)

2. Database Model (PostgreSQL)

  • app/documents/models.py: StoredObject SQLAlchemy model
    • Trans schema compliance
    • Fields: id, tenant_id, domain, entity_id, category, bucket, object_key, file_name, mime_type, file_size, checksum, visibility, created_by, created_at, deleted_at, legal_hold
    • Indexes on tenant+domain+entity (active records), tenant+checksum (dedup)
    • Soft delete support via deleted_at
    • Unique constraint on (tenant, domain, entity, object_key, deleted_at)

3. Persistence Layer (Repository)

  • app/documents/repository.py: DocumentRepository
    • Async SQLAlchemy operations
    • Create upload placeholder
    • Finalize upload with checksum
    • Get active/deleted object lookups
    • Resolve by ID or by composite (domain, entity, category, filename)
    • Checksum deduplication lookup
    • Soft delete

4. Storage Adapter (MinIO)

  • app/documents/storage_adapter.py: MinioStorageAdapter
    • Presigned PUT URLs (single-part uploads)
    • Presigned GET URLs (downloads)
    • Checksum verification (ETag comparison)
    • Async wrapper using executor for blocking MinIO client

5. RBAC (Role-Based Access Control)

  • app/documents/rbac.py: DocumentRBAC
    • Tenant isolation enforcement
    • Role-based permissions:
      • buyer: po, grn, inventory domains
      • supplier: dispatch, returns domains
      • ops/operations: all except legal_hold bypass
      • admin: full access
    • Check read/write/delete with domain enforcement

6. Service Layer (Business Logic)

  • app/documents/service.py: DocumentService
    • Init upload: RBAC β†’ validation β†’ presigned URL generation β†’ metadata storage
    • Complete upload: checksum verification β†’ deduplication β†’ metadata finalization
    • Generate download URL: RBAC β†’ presigned URL issuance
    • Get metadata: RBAC β†’ record fetch
    • Soft delete: RBAC β†’ legal_hold check β†’ deletion
    • MIME validation (images vs docs)
    • Bucket selection by category/visibility
    • Object key building: <tenant>/<domain>/<entity>/<category>/<filename>
    • Filename sanitization (strips path separators, regex substitution)
    • Cache integration (optional Redis with TTL)

7. Schemas (Pydantic DTOs)

  • app/documents/schemas.py: Request/response models
    • UploadInitRequest: domain, entity_id, category, file_name, mime_type, file_size, visibility
    • UploadInitResponse: upload_id, bucket, object_key, presigned_urls
    • UploadCompleteRequest: upload_id, checksum_sha256, parts
    • DownloadUrlRequest: object_id OR composite lookup (domain, entity_id, category, file_name)
    • DownloadUrlResponse: url, expires_in
    • ObjectMetadata: full record DTO with Config.from_attributes=True
    • DeleteResponse: status

8. FastAPI Router (API Endpoints)

  • app/documents/router.py: /scm/storage routes
    • POST /upload/init β†’ UploadInitResponse
    • POST /upload/complete β†’ {id, deduplicated?}
    • POST /download-url β†’ DownloadUrlResponse
    • GET /{object_id} β†’ ObjectMetadata
    • DELETE /{object_id} β†’ DeleteResponse
    • Dependency injection: MinIO adapter (lazy-loaded singleton), DocumentService via session factory, RBAC, cache
    • Auth guard: get_current_user required on all routes

9. App Integration

10. Tests (Unit & Integration)

  • tests/test_documents_storage.py: 5 tests, all passing βœ…
    • Fake repository with in-memory storage
    • Fake MinIO adapter with presigned URL simulation
    • Test fixture with dependency overrides for service/auth
    • test_presign_and_upload_flow: Upload init β†’ complete β†’ finalize
    • test_rbac_blocks_unauthorized_role: Supplier cannot upload to promotions
    • test_tenant_isolation_on_fetch: Tenant-2 cannot see tenant-1 objects
    • test_checksum_mismatch_raises_conflict: 409 on checksum failure
    • test_soft_delete_hides_object: Deleted objects return 404

11. Documentation

  • app/documents/README.md: Setup guide
    • MinIO environment variables
    • Local MinIO Docker setup with bucket creation
    • Database migration SQL
    • API endpoints overview
    • RBAC rules
    • Test execution

12. Environment Configuration

  • .env.example: Updated with storage section
    • MinIO credentials, endpoint, secure flag
    • Bucket names
    • Size limits
    • MIME allowlists

πŸ“¦ Dependencies Added

  • minio>=7.1.0,<8.0.0 (updated in requirements.txt)

πŸš€ Next Steps (Manual)

1. Set Up MinIO (Local Development)

docker run -d \
  -p 9000:9000 \
  -p 9001:9001 \
  --name minio \
  -e "MINIO_ROOT_USER=minioadmin" \
  -e "MINIO_ROOT_PASSWORD=minioadmin" \
  quay.io/minio/minio server /data --console-address ":9001"

# Create buckets
docker exec minio mc alias set local http://localhost:9000 minioadmin minioadmin
for bucket in documents-private images-private images-public exports-temp; do
  docker exec minio mc mb local/$bucket
done

# Make images-public public
docker exec minio mc anonymous set download local/images-public

2. Create Database Table

psql $DATABASE_URL << 'EOF'
CREATE TABLE IF NOT EXISTS trans.stored_objects (
    id UUID PRIMARY KEY,
    tenant_id TEXT NOT NULL,
    domain TEXT NOT NULL,
    entity_id TEXT NOT NULL,
    bucket_name TEXT NOT NULL,
    object_key TEXT NOT NULL,
    category TEXT NOT NULL,
    file_name TEXT NOT NULL,
    mime_type TEXT NOT NULL,
    file_size BIGINT,
    checksum_sha256 TEXT,
    visibility VARCHAR(16) NOT NULL DEFAULT 'private',
    created_by TEXT NOT NULL,
    created_at TIMESTAMP NOT NULL DEFAULT NOW(),
    deleted_at TIMESTAMP,
    legal_hold BOOLEAN NOT NULL DEFAULT FALSE,
    CONSTRAINT uq_stored_object_active_key UNIQUE (tenant_id, domain, entity_id, object_key, deleted_at)
);

CREATE INDEX ix_stored_objects_active ON trans.stored_objects(tenant_id, domain, entity_id) WHERE deleted_at IS NULL;
CREATE INDEX ix_stored_objects_checksum ON trans.stored_objects(tenant_id, checksum_sha256) WHERE deleted_at IS NULL;
EOF

3. Configure .env

# MinIO
MINIO_ENDPOINT=localhost:9000
MINIO_ACCESS_KEY=minioadmin
MINIO_SECRET_KEY=minioadmin
MINIO_SECURE=false

# Optional: Storage customizations
STORAGE_MAX_IMAGE_MB=10
STORAGE_MAX_DOC_MB=50

4. Run Application

uvicorn app.main:app --reload

5. Test API

# Request presigned URL
curl -X POST http://localhost:8000/scm/storage/upload/init \
  -H "Authorization: Bearer <your_jwt_token>" \
  -H "Content-Type: application/json" \
  -d '{
    "domain": "po",
    "entity_id": "po-123",
    "category": "invoice",
    "file_name": "invoice.pdf",
    "mime_type": "application/pdf",
    "file_size": 102400,
    "visibility": "private"
  }'

6. Troubleshooting Database Connection

If you get DNS resolution errors for the Neon database:

Option A: Use Local PostgreSQL

# macOS
brew install postgresql@16
brew services start postgresql@16
createdb cuatrolabs

# Update .env
DB_HOST=localhost
DB_PORT=5432
DB_USER=$(whoami)
DATABASE_URL=postgresql+asyncpg://$(whoami)@localhost:5432/cuatrolabs

Option B: Resume Neon Database


πŸ“Š Test Coverage

Test Purpose Status
test_presign_and_upload_flow Upload init β†’ complete β†’ finalize βœ… PASS
test_rbac_blocks_unauthorized_role Unauthorized role denied βœ… PASS
test_tenant_isolation_on_fetch Tenant-2 cannot access tenant-1 βœ… PASS
test_checksum_mismatch_raises_conflict Checksum validation βœ… PASS
test_soft_delete_hides_object Soft delete enforcement βœ… PASS

πŸ—οΈ Architecture

FastAPI Router (/scm/storage)
    ↓ (auth, DI)
DocumentService (business logic, RBAC, validation)
    β”œβ”€β†’ DocumentRepository (persistence)
    β”œβ”€β†’ MinioStorageAdapter (object storage)
    β”œβ”€β†’ DocumentRBAC (authorization)
    └─→ Redis Cache (metadata caching, optional)

Clean separation of concerns: router β†’ service β†’ repository + adapter + RBAC.


πŸ”’ Security

  • Tenant isolation: All queries scoped to tenant_id
  • RBAC: Role-based domain access (buyer, supplier, ops, admin)
  • Soft deletes: No physical data removal; legal_hold flag prevents deletion
  • Presigned URLs: Time-limited (900s) read/write access; no direct MinIO exposure
  • Checksum validation: SHA-256 verification for upload integrity
  • Filename sanitization: Path traversal prevention, unicode normalization
  • MIME validation: Allowlist-based file type checking

πŸ“ Notes

  • Future extraction: Module designed as a self-contained service; can be extracted into separate microservice later (APIs already follow service boundary patterns).
  • Cache optional: Redis integration is graceful; operations work without cache.
  • Presigned URLs: Direct downloads bypass FastAPI; reduced latency for large files.
  • Multipart support: Infrastructure in place; current adapter uses simple PUT; easy upgrade to multipart for large files.