videoscriber-backend / docs /hf-atlas-docs-analysis.md
Rimas Kavaliauskas
Switch Space to Docker backend and sync Videoscriber post-processing
6782be3

HF Spaces + MongoDB Atlas Documentation Analysis

Scope

Analysis target: deploy Videoscriber post-processing backend on Hugging Face Space and use MongoDB Atlas as canonical archive DB.

Key findings from official docs

Hugging Face Spaces

  1. Secrets and environment variables are managed in Space settings (Variables and Secrets), which fits this repo's env-based config (MONGODB_URI, HF_SHARED_SECRET, OPENAI_API_KEY, etc.).
  2. Docker Spaces are supported via sdk: docker, with app port configured in Space metadata (app_port).
  3. Space storage is ephemeral by default; persistent storage is a paid add-on. For this architecture, canonical data is in Atlas, so HF local disk should not be treated as source of truth.
  4. Networking docs state outbound requests are available on common web ports (80/443/8080). This can be a deployment risk for direct Mongo driver traffic if Atlas access requires port 27017 in your network path.

MongoDB Atlas

  1. Atlas + Node driver recommends SRV connection strings (mongodb+srv://...) and standard connection reuse (single client per process/runtime).
  2. Atlas requires network access configuration (IP access list / network rules). HF runtime egress must be allowed by Atlas network policy.
  3. Atlas connection troubleshooting highlights firewall/network port constraints as a common root cause.

Architecture implications for this repo

  1. Current bridge implementation is aligned with docs:
    • Vercel /api/transcribe stays primary.
    • Transcript fan-out to HF backend is authenticated (HF_SHARED_SECRET + HMAC signature).
    • HF backend persists and post-processes artifacts.
  2. Atlas remains canonical source-of-truth storage; HF disk remains non-canonical.
  3. Before go-live, validate HF->Atlas connectivity in the actual Space runtime. If Atlas connectivity fails due outbound port/network policy, choose one of:
    • adjust Atlas network setup (if compatible), or
    • move post-processing backend to runtime with unrestricted egress, or
    • add Atlas Data API path (HTTPS) as fallback architecture.

Source links