somratpro Claude Opus 4.7 commited on
Commit
866a8e6
·
1 Parent(s): a1ea643

fix: move next build to container startup — HF builder cgroup too small

Browse files

next build for Postiz needs ~4 GB RSS. The HF Space builder has a ~4 GB
cgroup limit, making it impossible to complete in docker build regardless
of heap tuning, single-thread flags, or multi-stage tricks (exit 137).

Solution: don't run next build in the Dockerfile at all. Run it in start.sh
at container startup where the runtime has 16 GB RAM.

Bonuses over the docker-build approach:
- NEXT_PUBLIC_BACKEND_URL is baked with the real SPACE_HOST URL (correct)
instead of localhost (broken) because we now know it at build time
- First boot ~5-8 min; all subsequent boots restore .next from HF Dataset
backup and skip the build entirely

Changes:
- Dockerfile: remove all frontend build steps; back to clean 2-stage;
HEALTHCHECK start-period raised to 600s for first-boot window
- start.sh: build frontend at startup if .next/BUILD_ID absent;
8 GB heap (runtime has 16 GB); shows progress in logs
- postiz-sync.py: include .next (minus webpack cache) in backup tarball;
restore on boot so subsequent starts skip the build;
raise SYNC_MAX_FILE_BYTES default 100→300 MB
- README: document first-boot behavior

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Files changed (4) hide show
  1. Dockerfile +30 -78
  2. README.md +4 -1
  3. postiz-sync.py +26 -1
  4. start.sh +28 -0
Dockerfile CHANGED
@@ -1,35 +1,31 @@
1
  # ============================================================================
2
  # HuggingPost — Postiz v2.11.3 on Hugging Face Spaces
3
  #
4
- # Three-stage build to beat the HF Space builder memory limit:
 
5
  #
6
- # Stage 1 (postiz-builder): clone patch full install
7
- # build backend + workers + cron
8
- # Stage 2 (postiz-frontend): fresh clone patch FILTERED install
9
- # (frontend dep tree only, skips NestJS/
10
- # Prisma/bcrypt/etc.) → build Next.js
11
- # Stage 3 (runtime): COPY server build from Stage 1
12
- # overlay .next from Stage 2
13
  #
14
- # Why fresh clone in Stage 2 (not COPY from Stage 1):
15
- # COPY --from=stage1 /build /build copies ~2 GB of node_modules (3817
16
- # packages). BuildKit decompresses that as a layer; the OS page-caches it.
17
- # Then next build loads its own module graph on top. Combined RSS exceeds
18
- # the builder cgroup limit exit 137 OOMKilled.
19
- # A filtered pnpm install in a fresh Stage 2 pulls only the frontend
20
- # package's npm dependency tree maybe 30-50% of the full install
21
- # so peak RSS stays within limits.
22
  #
23
  # Container layout at runtime:
24
  # - nginx (port 5000, internal) — Postiz frontend + backend + uploads
25
  # - PM2 → 4 Postiz procs (backend / frontend / workers / cron)
26
  # - postgres (port 5432, internal)
27
  # - redis (port 6379, internal)
28
- # - postiz-sync.py loop — backup DB + uploads to HF Dataset
29
  # - health-server.js (port 7860, public) — dashboard + reverse proxy
30
  # ============================================================================
31
 
32
- # ── Stage 1: Clone, patch, full install, build server apps ───────────────────
33
  FROM node:22.20-alpine AS postiz-builder
34
 
35
  WORKDIR /build
@@ -49,15 +45,14 @@ RUN npm install -g pnpm@10.6.1
49
  # Pinned to v2.11.3 — last release before Temporal became a hard requirement.
50
  RUN git clone --depth=1 --branch v2.11.3 https://github.com/gitroomhq/postiz-app.git .
51
 
52
- # Patch Next.js config (applied here so Stage 2's fresh clone also patches).
53
- # Stage 2 re-applies the same sed commands on its own clone.
54
- # 1. basePath/assetPrefix=/app → Postiz UI at /app; dashboard owns /
55
- # 2. productionBrowserSourceMaps: false → shaves ~500 MB RSS during emit
56
- # 3. Sentry sourcemap plugin: disable: true saves ~300 MB
57
- # 4. swcMinify: false → forces Terser (pure JS, V8-heap-bounded) instead
58
- # of the native SWC binary that adds RSS outside the V8 heap limit
59
- # 5. experimental.cpus=1 + workerThreads=false → single-thread webpack;
60
- # no parallel worker copies of the module graph eating extra RAM
61
  RUN sed -i "s|const nextConfig = {|const nextConfig = {\n basePath: '/app',\n assetPrefix: '/app',\n swcMinify: false,|" apps/frontend/next.config.js \
62
  && sed -i "s|productionBrowserSourceMaps: true|productionBrowserSourceMaps: false|" apps/frontend/next.config.js \
63
  && sed -i "s|disable: false,|disable: true,|" apps/frontend/next.config.js \
@@ -76,60 +71,20 @@ ENV SENTRY_DSN="" \
76
  NEXT_TELEMETRY_DISABLED=1 \
77
  NEXT_PRIVATE_SKIP_SIZE_MINIMIZATION=true
78
 
79
- # Full install — backend, workers, cron all need the complete dep tree.
80
  RUN pnpm install --frozen-lockfile=false
81
 
82
- # Build server-side apps only. Frontend is built in its own isolated stage.
 
83
  RUN NODE_OPTIONS="--max-old-space-size=3072" pnpm run build:backend
84
  RUN NODE_OPTIONS="--max-old-space-size=3072" pnpm run build:workers
85
  RUN NODE_OPTIONS="--max-old-space-size=3072" pnpm run build:cron
86
 
87
- # Remove dev artefacts before Stage 3 copies this tree into the runtime image.
88
  RUN find . -name ".git" -type d -prune -exec rm -rf {} + 2>/dev/null || true \
89
  && rm -rf .github reports Jenkins .devcontainer 2>/dev/null || true
90
 
91
 
92
- # ── Stage 2: Build Next.js frontend with minimal dep tree ────────────────────
93
- FROM node:22.20-alpine AS postiz-frontend
94
-
95
- WORKDIR /build
96
-
97
- RUN apk add --no-cache git bash
98
- RUN npm install -g pnpm@10.6.1
99
-
100
- # Fresh clone — gives a clean slate with no Stage 1 memory residue.
101
- RUN git clone --depth=1 --branch v2.11.3 https://github.com/gitroomhq/postiz-app.git .
102
-
103
- # Apply the same patches as Stage 1.
104
- RUN sed -i "s|const nextConfig = {|const nextConfig = {\n basePath: '/app',\n assetPrefix: '/app',\n swcMinify: false,|" apps/frontend/next.config.js \
105
- && sed -i "s|productionBrowserSourceMaps: true|productionBrowserSourceMaps: false|" apps/frontend/next.config.js \
106
- && sed -i "s|disable: false,|disable: true,|" apps/frontend/next.config.js \
107
- && sed -i "s|experimental: {|experimental: {\n cpus: 1,\n workerThreads: false,|" apps/frontend/next.config.js \
108
- && grep -q "basePath: '/app'" apps/frontend/next.config.js \
109
- && grep -q "swcMinify: false" apps/frontend/next.config.js \
110
- && grep -q "cpus: 1" apps/frontend/next.config.js \
111
- || (echo "PATCH FAILED — next.config.js shape changed upstream"; exit 1)
112
-
113
- ENV SENTRY_DSN="" \
114
- SENTRY_AUTH_TOKEN="" \
115
- SENTRY_ORG="" \
116
- SENTRY_PROJECT="" \
117
- NEXT_PUBLIC_SENTRY_DSN="" \
118
- NEXT_TELEMETRY_DISABLED=1 \
119
- NEXT_PRIVATE_SKIP_SIZE_MINIMIZATION=true
120
-
121
- # Filtered install — pulls only packages in the frontend's dependency tree.
122
- # Skips NestJS, Prisma, bcrypt, Bull, and other server-only packages.
123
- # Results in a much smaller node_modules → less OS page cache pressure
124
- # → lower peak RSS during next build.
125
- RUN pnpm install --filter "./apps/frontend..." --frozen-lockfile=false
126
-
127
- # Build Next.js frontend in isolation.
128
- # Stage 1's processes are dead; Stage 2 starts with a clean address space.
129
- RUN NODE_OPTIONS="--max-old-space-size=3072" pnpm run build:frontend
130
-
131
-
132
- # ── Stage 3: Runtime ──────────────────────────────────────────────────────────
133
  FROM node:22.20-alpine
134
 
135
  WORKDIR /app
@@ -159,14 +114,11 @@ RUN pip install --no-cache-dir --break-system-packages \
159
  huggingface_hub \
160
  PyYAML
161
 
162
- # Copy server build (backend + workers + cron + full node_modules, cleaned).
 
163
  COPY --from=postiz-builder /build /app
164
 
165
- # Overlay the compiled Next.js frontend from the isolated build stage.
166
- # This overwrites the empty apps/frontend/.next placeholder in the tree above.
167
- COPY --from=postiz-frontend /build/apps/frontend/.next /app/apps/frontend/.next
168
-
169
- # Use upstream's nginx.conf — routes /api→3000, /uploads→fs, /→4200.
170
  COPY --from=postiz-builder /build/var/docker/nginx.conf /etc/nginx/nginx.conf
171
 
172
  # Health-server outside /app to avoid pnpm workspace collisions.
@@ -192,7 +144,7 @@ RUN chmod +x /opt/start.sh /opt/setup-uptimerobot.sh
192
 
193
  EXPOSE 7860
194
 
195
- HEALTHCHECK --interval=30s --timeout=10s --start-period=240s --retries=3 \
196
  CMD curl -f http://localhost:7860/health || exit 1
197
 
198
  CMD ["/opt/start.sh"]
 
1
  # ============================================================================
2
  # HuggingPost — Postiz v2.11.3 on Hugging Face Spaces
3
  #
4
+ # Two-stage build: compile only the server-side apps (backend/workers/cron)
5
+ # during docker build. The Next.js frontend is intentionally NOT built here.
6
  #
7
+ # Why: `next build` for Postiz needs ~4 GB RSS. The HF Space builder has a
8
+ # ~4 GB cgroup limit, so it always OOMKills the process (exit 137) regardless
9
+ # of heap tuning, parallel/single-thread settings, or multi-stage tricks.
 
 
 
 
10
  #
11
+ # Solution: build the frontend in start.sh at container startup, where the
12
+ # runtime has 16 GB RAM. The compiled .next is included in the HF Dataset
13
+ # backup so subsequent restarts skip the build entirely.
14
+ #
15
+ # First boot: ~5-8 min (server apps start immediately; frontend compiles in
16
+ # background, then Postiz frontend process starts when done).
17
+ # Later boots: .next restored from backup Postiz starts normally (~90 s).
 
18
  #
19
  # Container layout at runtime:
20
  # - nginx (port 5000, internal) — Postiz frontend + backend + uploads
21
  # - PM2 → 4 Postiz procs (backend / frontend / workers / cron)
22
  # - postgres (port 5432, internal)
23
  # - redis (port 6379, internal)
24
+ # - postiz-sync.py loop — backup DB + uploads + .next
25
  # - health-server.js (port 7860, public) — dashboard + reverse proxy
26
  # ============================================================================
27
 
28
+ # ── Stage 1: Clone, patch, install deps, build server apps ───────────────────
29
  FROM node:22.20-alpine AS postiz-builder
30
 
31
  WORKDIR /build
 
45
  # Pinned to v2.11.3 — last release before Temporal became a hard requirement.
46
  RUN git clone --depth=1 --branch v2.11.3 https://github.com/gitroomhq/postiz-app.git .
47
 
48
+ # Patch Next.js config applied now so the patched file is in the image and
49
+ # `pnpm run build:frontend` in start.sh picks up all settings automatically.
50
+ # 1. basePath/assetPrefix=/app → Postiz UI at /app; HuggingPost dashboard owns /
51
+ # 2. productionBrowserSourceMaps: false → smaller build output
52
+ # 3. Sentry sourcemap plugin disabled no network calls during build
53
+ # 4. swcMinify: false → Terser (pure JS) instead of native SWC binary;
54
+ # avoids extra RSS outside the V8 heap
55
+ # 5. experimental.cpus=1 + workerThreads=false → single-thread webpack
 
56
  RUN sed -i "s|const nextConfig = {|const nextConfig = {\n basePath: '/app',\n assetPrefix: '/app',\n swcMinify: false,|" apps/frontend/next.config.js \
57
  && sed -i "s|productionBrowserSourceMaps: true|productionBrowserSourceMaps: false|" apps/frontend/next.config.js \
58
  && sed -i "s|disable: false,|disable: true,|" apps/frontend/next.config.js \
 
71
  NEXT_TELEMETRY_DISABLED=1 \
72
  NEXT_PRIVATE_SKIP_SIZE_MINIMIZATION=true
73
 
 
74
  RUN pnpm install --frozen-lockfile=false
75
 
76
+ # Build server-side apps. Sequential + 3 GB heap each.
77
+ # Frontend is NOT built here — see start.sh.
78
  RUN NODE_OPTIONS="--max-old-space-size=3072" pnpm run build:backend
79
  RUN NODE_OPTIONS="--max-old-space-size=3072" pnpm run build:workers
80
  RUN NODE_OPTIONS="--max-old-space-size=3072" pnpm run build:cron
81
 
82
+ # Clean up dev artefacts before Stage 2 copies this tree into the runtime image.
83
  RUN find . -name ".git" -type d -prune -exec rm -rf {} + 2>/dev/null || true \
84
  && rm -rf .github reports Jenkins .devcontainer 2>/dev/null || true
85
 
86
 
87
+ # ── Stage 2: Runtime ──────────────────────────────────────────────────────────
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
88
  FROM node:22.20-alpine
89
 
90
  WORKDIR /app
 
114
  huggingface_hub \
115
  PyYAML
116
 
117
+ # Copy fully-built Postiz server apps + node_modules + patched next.config.js.
118
+ # .next is intentionally absent here; start.sh builds or restores it at boot.
119
  COPY --from=postiz-builder /build /app
120
 
121
+ # nginx.conf: routes /api→3000, /uploads→fs, /→4200.
 
 
 
 
122
  COPY --from=postiz-builder /build/var/docker/nginx.conf /etc/nginx/nginx.conf
123
 
124
  # Health-server outside /app to avoid pnpm workspace collisions.
 
144
 
145
  EXPOSE 7860
146
 
147
+ HEALTHCHECK --interval=30s --timeout=10s --start-period=600s --retries=5 \
148
  CMD curl -f http://localhost:7860/health || exit 1
149
 
150
  CMD ["/opt/start.sh"]
README.md CHANGED
@@ -227,8 +227,11 @@ Total resident set ~3–6 GB under typical load — well within HF free tier's 1
227
 
228
  ## 🐛 Troubleshooting
229
 
 
 
 
230
  **"Postiz backend unavailable" on first load**
231
- First boot takes 30–90s after the build finishes. Wait for the dashboard to show green badges for both backend and frontend.
232
 
233
  **Data lost after restart**
234
  `HF_TOKEN` is not set, or it doesn't have write access. Add it and the next restart will restore from backup. The backup must have run at least once before the restart.
 
227
 
228
  ## 🐛 Troubleshooting
229
 
230
+ **First boot takes 5–8 minutes**
231
+ The Next.js frontend is not compiled during the Docker build (the HF builder's ~4 GB memory limit is less than `next build` needs). Instead it compiles on first container startup where 16 GB is available. Watch `[frontend-build]` lines in the Logs tab. Postiz starts automatically when done. All subsequent restarts are fast — the compiled `.next` is stored in the HF Dataset backup and restored at boot.
232
+
233
  **"Postiz backend unavailable" on first load**
234
+ On restarts after the first boot, wait 30–90 s for PM2 processes to come up. Check the dashboard status badges.
235
 
236
  **Data lost after restart**
237
  `HF_TOKEN` is not set, or it doesn't have write access. Add it and the next restart will restore from backup. The backup must have run at least once before the restart.
postiz-sync.py CHANGED
@@ -46,10 +46,11 @@ HF_TOKEN = os.environ.get("HF_TOKEN")
46
  HF_USERNAME = os.environ.get("HF_USERNAME")
47
  DATABASE_URL = os.environ.get("DATABASE_URL", "postgresql://postiz:postiz@localhost:5432/postiz")
48
  BACKUP_DATASET_NAME = os.environ.get("BACKUP_DATASET_NAME", "huggingpost-backup")
49
- SYNC_MAX_FILE_BYTES = int(os.environ.get("SYNC_MAX_FILE_BYTES", str(100 * 1024 * 1024))) # 100 MB
50
  POSTIZ_HOME = Path(os.environ.get("POSTIZ_HOME", "/postiz"))
51
  UPLOADS_DIR = Path(os.environ.get("UPLOAD_DIRECTORY", str(POSTIZ_HOME / "uploads")))
52
  SECRETS_DIR = POSTIZ_HOME / ".secrets"
 
53
  STATUS_FILE = Path("/tmp/sync-status.json")
54
 
55
 
@@ -141,6 +142,13 @@ def backup_database() -> tuple[str | None, bool]:
141
  return None, False
142
 
143
 
 
 
 
 
 
 
 
144
  def create_backup_tarball(dump_file: str) -> tuple[str | None, bool]:
145
  temp_dir = tempfile.mkdtemp()
146
  tarball = Path(temp_dir) / "huggingpost-backup.tar.gz"
@@ -151,6 +159,11 @@ def create_backup_tarball(dump_file: str) -> tuple[str | None, bool]:
151
  tar.add(str(UPLOADS_DIR), arcname="uploads")
152
  if SECRETS_DIR.exists():
153
  tar.add(str(SECRETS_DIR), arcname=".secrets")
 
 
 
 
 
154
  size = tarball.stat().st_size
155
  size_mb = size / 1024 / 1024
156
  logger.debug(f"Tarball created ({size_mb:.2f} MB)")
@@ -306,6 +319,18 @@ def download_and_restore() -> bool | None:
306
  except Exception as e:
307
  logger.warning(f"Failed to restore upload {item.name}: {e}")
308
 
 
 
 
 
 
 
 
 
 
 
 
 
309
  return restore_database(str(sql))
310
  except Exception as e:
311
  logger.error(f"Restore from HF failed: {e}")
 
46
  HF_USERNAME = os.environ.get("HF_USERNAME")
47
  DATABASE_URL = os.environ.get("DATABASE_URL", "postgresql://postiz:postiz@localhost:5432/postiz")
48
  BACKUP_DATASET_NAME = os.environ.get("BACKUP_DATASET_NAME", "huggingpost-backup")
49
+ SYNC_MAX_FILE_BYTES = int(os.environ.get("SYNC_MAX_FILE_BYTES", str(300 * 1024 * 1024))) # 300 MB
50
  POSTIZ_HOME = Path(os.environ.get("POSTIZ_HOME", "/postiz"))
51
  UPLOADS_DIR = Path(os.environ.get("UPLOAD_DIRECTORY", str(POSTIZ_HOME / "uploads")))
52
  SECRETS_DIR = POSTIZ_HOME / ".secrets"
53
+ NEXT_DIR = Path("/app/apps/frontend/.next") # compiled frontend; backed up to skip rebuild
54
  STATUS_FILE = Path("/tmp/sync-status.json")
55
 
56
 
 
142
  return None, False
143
 
144
 
145
+ def _exclude_next_cache(tarinfo: tarfile.TarInfo) -> tarfile.TarInfo | None:
146
+ """Filter for tarfile.add — drops .next/cache (webpack build cache, large and unneeded at runtime)."""
147
+ if "/frontend/.next/cache" in tarinfo.name or tarinfo.name.endswith("/.next/cache"):
148
+ return None
149
+ return tarinfo
150
+
151
+
152
  def create_backup_tarball(dump_file: str) -> tuple[str | None, bool]:
153
  temp_dir = tempfile.mkdtemp()
154
  tarball = Path(temp_dir) / "huggingpost-backup.tar.gz"
 
159
  tar.add(str(UPLOADS_DIR), arcname="uploads")
160
  if SECRETS_DIR.exists():
161
  tar.add(str(SECRETS_DIR), arcname=".secrets")
162
+ # Include compiled frontend so subsequent restarts skip the 5-min build.
163
+ # Exclude .next/cache (webpack cache) — large and not needed to run.
164
+ if NEXT_DIR.exists() and (NEXT_DIR / "BUILD_ID").exists():
165
+ tar.add(str(NEXT_DIR), arcname="frontend-next", filter=_exclude_next_cache)
166
+ logger.debug("Included .next in tarball (webpack cache excluded)")
167
  size = tarball.stat().st_size
168
  size_mb = size / 1024 / 1024
169
  logger.debug(f"Tarball created ({size_mb:.2f} MB)")
 
319
  except Exception as e:
320
  logger.warning(f"Failed to restore upload {item.name}: {e}")
321
 
322
+ # Restore compiled Next.js frontend (.next without cache).
323
+ # If present, start.sh will skip the 5-min `pnpm run build:frontend`.
324
+ next_src = Path(temp_dir) / "frontend-next"
325
+ if next_src.exists():
326
+ try:
327
+ if NEXT_DIR.exists():
328
+ shutil.rmtree(NEXT_DIR)
329
+ shutil.copytree(next_src, NEXT_DIR)
330
+ logger.info(f"Restored .next from backup ({sum(f.stat().st_size for f in NEXT_DIR.rglob('*') if f.is_file()) / 1024 / 1024:.1f} MB)")
331
+ except Exception as e:
332
+ logger.warning(f"Failed to restore .next (will rebuild on start): {e}")
333
+
334
  return restore_database(str(sql))
335
  except Exception as e:
336
  logger.error(f"Restore from HF failed: {e}")
start.sh CHANGED
@@ -157,6 +157,34 @@ else
157
  echo " Add HF_TOKEN as a Space secret to enable DB+uploads backup."
158
  fi
159
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
160
  # ── Cloudflare proxy bootstrap ───────────────────────────────────────────────
161
  if [ -n "${CLOUDFLARE_WORKERS_TOKEN:-}" ]; then
162
  echo "Setting up Cloudflare proxy..."
 
157
  echo " Add HF_TOKEN as a Space secret to enable DB+uploads backup."
158
  fi
159
 
160
+ # ── Build Next.js frontend (first boot or after a fresh deploy) ───────────────
161
+ # next build is NOT run during docker build — the HF builder's ~4 GB cgroup
162
+ # limit is less than what next build needs. We run it here where the runtime
163
+ # has 16 GB. On subsequent starts the .next directory is restored from the
164
+ # HF Dataset backup, so this block only executes once (or after a version bump).
165
+ FRONTEND_NEXT="${POSTIZ_DIR}/apps/frontend/.next"
166
+ if [ ! -f "${FRONTEND_NEXT}/BUILD_ID" ]; then
167
+ echo ""
168
+ echo " ┌─────────────────────────────────────────────────────────────────┐"
169
+ echo " │ Building Next.js frontend (first boot — takes ~5 min) │"
170
+ echo " │ Dashboard is live at ${PUBLIC_URL}/ │"
171
+ echo " │ Postiz will start automatically when the build finishes. │"
172
+ echo " └─────────────────────────────────────────────────────────────────┘"
173
+ echo ""
174
+ cd "${POSTIZ_DIR}"
175
+ SENTRY_DSN="" \
176
+ SENTRY_AUTH_TOKEN="" \
177
+ SENTRY_ORG="" \
178
+ SENTRY_PROJECT="" \
179
+ NEXT_PUBLIC_SENTRY_DSN="" \
180
+ NEXT_TELEMETRY_DISABLED=1 \
181
+ NEXT_PRIVATE_SKIP_SIZE_MINIMIZATION=true \
182
+ NODE_OPTIONS="--max-old-space-size=8192" \
183
+ pnpm run build:frontend 2>&1 | sed 's/^/[frontend-build] /'
184
+ echo "Frontend build complete."
185
+ cd /
186
+ fi
187
+
188
  # ── Cloudflare proxy bootstrap ───────────────────────────────────────────────
189
  if [ -n "${CLOUDFLARE_WORKERS_TOKEN:-}" ]; then
190
  echo "Setting up Cloudflare proxy..."