AUXteam commited on
Commit
11ec5a6
·
verified ·
1 Parent(s): 593d58e

Upload folder using huggingface_hub

Browse files
cloudflare/WASM-PROXY.md ADDED
@@ -0,0 +1,104 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # WASM Proxy Setup Guide
2
+
3
+ BentoPDF uses a Cloudflare Worker to proxy WASM library requests, bypassing CORS restrictions when loading AGPL-licensed components (PyMuPDF, Ghostscript, CoherentPDF) from external sources.
4
+
5
+ ## Quick Start
6
+
7
+ ### 1. Deploy the Worker
8
+
9
+ ```bash
10
+ cd cloudflare
11
+ npx wrangler login
12
+ npx wrangler deploy -c wasm-wrangler.toml
13
+ ```
14
+
15
+ ### 2. Configure Source URLs
16
+
17
+ Set environment secrets with the base URLs for your WASM files:
18
+
19
+ ```bash
20
+ # Option A: Interactive prompts
21
+ npx wrangler secret put PYMUPDF_SOURCE -c wasm-wrangler.toml
22
+ npx wrangler secret put GS_SOURCE -c wasm-wrangler.toml
23
+ npx wrangler secret put CPDF_SOURCE -c wasm-wrangler.toml
24
+
25
+ # Option B: Set via Cloudflare Dashboard
26
+ # Go to Workers & Pages > bentopdf-wasm-proxy > Settings > Variables
27
+ ```
28
+
29
+ **Recommended Source URLs:**
30
+
31
+ - PYMUPDF_SOURCE: `https://cdn.jsdelivr.net/npm/@bentopdf/pymupdf-wasm@0.11.14/`
32
+ - GS_SOURCE: `https://cdn.jsdelivr.net/npm/@bentopdf/gs-wasm/assets/`
33
+ - CPDF_SOURCE: `https://cdn.jsdelivr.net/npm/coherentpdf/dist/`
34
+
35
+ > **Note:** You can use your own hosted WASM files instead of the recommended URLs. Just ensure your files match the expected directory structure and file names that BentoPDF expects for each module.
36
+
37
+ ### 3. Configure BentoPDF
38
+
39
+ **Option A: Environment variables (recommended — zero-config for users)**
40
+
41
+ Set these in `.env.production` or pass as Docker build args:
42
+
43
+ ```bash
44
+ VITE_WASM_PYMUPDF_URL=https://bentopdf-wasm-proxy.<your-subdomain>.workers.dev/pymupdf/
45
+ VITE_WASM_GS_URL=https://bentopdf-wasm-proxy.<your-subdomain>.workers.dev/gs/
46
+ VITE_WASM_CPDF_URL=https://bentopdf-wasm-proxy.<your-subdomain>.workers.dev/cpdf/
47
+ ```
48
+
49
+ **Option B: Manual per-user configuration**
50
+
51
+ In BentoPDF's Advanced Settings (wasm-settings.html), enter:
52
+
53
+ | Module | URL |
54
+ | ----------- | ------------------------------------------------------------------- |
55
+ | PyMuPDF | `https://bentopdf-wasm-proxy.<your-subdomain>.workers.dev/pymupdf/` |
56
+ | Ghostscript | `https://bentopdf-wasm-proxy.<your-subdomain>.workers.dev/gs/` |
57
+ | CoherentPDF | `https://bentopdf-wasm-proxy.<your-subdomain>.workers.dev/cpdf/` |
58
+
59
+ ## Custom Domain (Optional)
60
+
61
+ To use a custom domain like `wasm.bentopdf.com`:
62
+
63
+ 1. Add route in `wasm-wrangler.toml`:
64
+
65
+ ```toml
66
+ routes = [
67
+ { pattern = "wasm.bentopdf.com/*", zone_name = "bentopdf.com" }
68
+ ]
69
+ ```
70
+
71
+ 2. Add DNS record in Cloudflare:
72
+ - Type: AAAA
73
+ - Name: wasm
74
+ - Content: 100::
75
+ - Proxied: Yes
76
+
77
+ 3. Redeploy:
78
+
79
+ ```bash
80
+ npx wrangler deploy -c wasm-wrangler.toml
81
+ ```
82
+
83
+ ## Security Features
84
+
85
+ - **Origin validation**: Only allows requests from configured origins
86
+ - **Rate limiting**: 100 requests/minute per IP (requires KV namespace)
87
+ - **File type restrictions**: Only WASM-related files (.js, .wasm, .data, etc.)
88
+ - **Size limits**: Max 100MB per file
89
+ - **Caching**: Reduces origin requests and improves performance
90
+
91
+ ## Self-Hosting Notes
92
+
93
+ 1. Update `ALLOWED_ORIGINS` in `wasm-proxy-worker.js` to include your domain
94
+ 2. Host your WASM files on any origin (R2, S3, or any CDN)
95
+ 3. Set source URLs as secrets in your worker
96
+
97
+ ## Endpoints
98
+
99
+ | Endpoint | Description |
100
+ | ------------ | -------------------------------------- |
101
+ | `/` | Health check, shows configured modules |
102
+ | `/pymupdf/*` | PyMuPDF WASM files |
103
+ | `/gs/*` | Ghostscript WASM files |
104
+ | `/cpdf/*` | CoherentPDF files |
cloudflare/cors-proxy-worker.js ADDED
@@ -0,0 +1,351 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ /**
2
+ * BentoPDF CORS Proxy Worker
3
+ *
4
+ * This Cloudflare Worker proxies certificate requests for the digital signing tool.
5
+ * It fetches certificates from external CAs that don't have CORS headers enabled
6
+ * and returns them with proper CORS headers.
7
+ *
8
+ *
9
+ * Deploy: npx wrangler deploy
10
+ *
11
+ * Required Environment Variables (set in wrangler.toml or Cloudflare dashboard):
12
+ * - PROXY_SECRET: Shared secret for HMAC signature verification
13
+ */
14
+
15
+ const ALLOWED_PATTERNS = [
16
+ /\.crt$/i,
17
+ /\.cer$/i,
18
+ /\.pem$/i,
19
+ /\/certs\//i,
20
+ /\/ocsp/i,
21
+ /\/crl/i,
22
+ /caIssuers/i,
23
+ ];
24
+
25
+ const ALLOWED_ORIGINS = [
26
+ 'https://www.bentopdf.com',
27
+ 'https://bentopdf.com',
28
+ ];
29
+
30
+ const BLOCKED_DOMAINS = [
31
+ 'localhost',
32
+ '127.0.0.1',
33
+ '0.0.0.0',
34
+ ];
35
+
36
+
37
+ const MAX_TIMESTAMP_AGE_MS = 5 * 60 * 1000;
38
+
39
+ const RATE_LIMIT_MAX_REQUESTS = 60;
40
+ const RATE_LIMIT_WINDOW_MS = 60 * 1000;
41
+
42
+ const MAX_FILE_SIZE_BYTES = 10 * 1024 * 1024;
43
+
44
+ async function verifySignature(message, signature, secret) {
45
+ try {
46
+ const encoder = new TextEncoder();
47
+ const key = await crypto.subtle.importKey(
48
+ 'raw',
49
+ encoder.encode(secret),
50
+ { name: 'HMAC', hash: 'SHA-256' },
51
+ false,
52
+ ['verify']
53
+ );
54
+
55
+ const signatureBytes = new Uint8Array(
56
+ signature.match(/.{1,2}/g).map(byte => parseInt(byte, 16))
57
+ );
58
+
59
+ return await crypto.subtle.verify(
60
+ 'HMAC',
61
+ key,
62
+ signatureBytes,
63
+ encoder.encode(message)
64
+ );
65
+ } catch (e) {
66
+ console.error('Signature verification error:', e);
67
+ return false;
68
+ }
69
+ }
70
+
71
+ async function generateSignature(message, secret) {
72
+ const encoder = new TextEncoder();
73
+ const key = await crypto.subtle.importKey(
74
+ 'raw',
75
+ encoder.encode(secret),
76
+ { name: 'HMAC', hash: 'SHA-256' },
77
+ false,
78
+ ['sign']
79
+ );
80
+
81
+ const signature = await crypto.subtle.sign(
82
+ 'HMAC',
83
+ key,
84
+ encoder.encode(message)
85
+ );
86
+
87
+ return Array.from(new Uint8Array(signature))
88
+ .map(b => b.toString(16).padStart(2, '0'))
89
+ .join('');
90
+ }
91
+
92
+ function isAllowedOrigin(origin) {
93
+ if (!origin) return false;
94
+ return ALLOWED_ORIGINS.some(allowed => origin.startsWith(allowed.replace(/\/$/, '')));
95
+ }
96
+
97
+ function isValidCertificateUrl(urlString) {
98
+ try {
99
+ const url = new URL(urlString);
100
+
101
+ if (!['http:', 'https:'].includes(url.protocol)) {
102
+ return false;
103
+ }
104
+
105
+ if (BLOCKED_DOMAINS.some(domain => url.hostname.includes(domain))) {
106
+ return false;
107
+ }
108
+
109
+ const hostname = url.hostname;
110
+ if (/^10\./.test(hostname) ||
111
+ /^172\.(1[6-9]|2[0-9]|3[0-1])\./.test(hostname) ||
112
+ /^192\.168\./.test(hostname)) {
113
+ return false;
114
+ }
115
+
116
+ return ALLOWED_PATTERNS.some(pattern => pattern.test(urlString));
117
+ } catch {
118
+ return false;
119
+ }
120
+ }
121
+
122
+ function corsHeaders(origin) {
123
+ return {
124
+ 'Access-Control-Allow-Origin': origin || '*',
125
+ 'Access-Control-Allow-Methods': 'GET, OPTIONS',
126
+ 'Access-Control-Allow-Headers': 'Content-Type',
127
+ 'Access-Control-Max-Age': '86400',
128
+ };
129
+ }
130
+
131
+ function handleOptions(request) {
132
+ const origin = request.headers.get('Origin');
133
+ return new Response(null, {
134
+ status: 204,
135
+ headers: corsHeaders(origin),
136
+ });
137
+ }
138
+
139
+ export default {
140
+ async fetch(request, env, ctx) {
141
+ const url = new URL(request.url);
142
+ const origin = request.headers.get('Origin');
143
+
144
+ if (request.method === 'OPTIONS') {
145
+ return handleOptions(request);
146
+ }
147
+
148
+ // NOTE: If you are selfhosting this proxy, you can remove this check, or can set it to only accept requests from your own domain
149
+ if (!isAllowedOrigin(origin)) {
150
+ return new Response(JSON.stringify({
151
+ error: 'Forbidden',
152
+ message: 'This proxy only accepts requests from bentopdf.com',
153
+ }), {
154
+ status: 403,
155
+ headers: {
156
+ 'Content-Type': 'application/json',
157
+ },
158
+ });
159
+ }
160
+
161
+ if (request.method !== 'GET') {
162
+ return new Response('Method not allowed', {
163
+ status: 405,
164
+ headers: corsHeaders(origin),
165
+ });
166
+ }
167
+
168
+ const targetUrl = url.searchParams.get('url');
169
+ const timestamp = url.searchParams.get('t');
170
+ const signature = url.searchParams.get('sig');
171
+
172
+ if (env.PROXY_SECRET) {
173
+ if (!timestamp || !signature) {
174
+ return new Response(JSON.stringify({
175
+ error: 'Missing authentication parameters',
176
+ message: 'Request must include timestamp (t) and signature (sig) parameters',
177
+ }), {
178
+ status: 401,
179
+ headers: {
180
+ ...corsHeaders(origin),
181
+ 'Content-Type': 'application/json',
182
+ },
183
+ });
184
+ }
185
+
186
+ const requestTime = parseInt(timestamp, 10);
187
+ const now = Date.now();
188
+ if (isNaN(requestTime) || Math.abs(now - requestTime) > MAX_TIMESTAMP_AGE_MS) {
189
+ return new Response(JSON.stringify({
190
+ error: 'Request expired or invalid timestamp',
191
+ message: 'Timestamp must be within 5 minutes of current time',
192
+ }), {
193
+ status: 401,
194
+ headers: {
195
+ ...corsHeaders(origin),
196
+ 'Content-Type': 'application/json',
197
+ },
198
+ });
199
+ }
200
+
201
+ const message = `${targetUrl}${timestamp}`;
202
+ const isValid = await verifySignature(message, signature, env.PROXY_SECRET);
203
+
204
+ if (!isValid) {
205
+ return new Response(JSON.stringify({
206
+ error: 'Invalid signature',
207
+ message: 'Request signature verification failed',
208
+ }), {
209
+ status: 401,
210
+ headers: {
211
+ ...corsHeaders(origin),
212
+ 'Content-Type': 'application/json',
213
+ },
214
+ });
215
+ }
216
+ }
217
+
218
+ if (!targetUrl) {
219
+ return new Response(JSON.stringify({
220
+ error: 'Missing url parameter',
221
+ usage: 'GET /?url=<certificate_url>',
222
+ }), {
223
+ status: 400,
224
+ headers: {
225
+ ...corsHeaders(origin),
226
+ 'Content-Type': 'application/json',
227
+ },
228
+ });
229
+ }
230
+
231
+ if (!isValidCertificateUrl(targetUrl)) {
232
+ return new Response(JSON.stringify({
233
+ error: 'Invalid or disallowed URL',
234
+ message: 'Only certificate-related URLs are allowed (*.crt, *.cer, *.pem, /certs/, /ocsp, /crl)',
235
+ }), {
236
+ status: 403,
237
+ headers: {
238
+ ...corsHeaders(origin),
239
+ 'Content-Type': 'application/json',
240
+ },
241
+ });
242
+ }
243
+
244
+ const clientIP = request.headers.get('CF-Connecting-IP') || 'unknown';
245
+ const rateLimitKey = `ratelimit:${clientIP}`;
246
+ const now = Date.now();
247
+
248
+ if (env.RATE_LIMIT_KV) {
249
+ const rateLimitData = await env.RATE_LIMIT_KV.get(rateLimitKey, { type: 'json' });
250
+ const requests = rateLimitData?.requests || [];
251
+
252
+ const recentRequests = requests.filter(t => now - t < RATE_LIMIT_WINDOW_MS);
253
+
254
+ if (recentRequests.length >= RATE_LIMIT_MAX_REQUESTS) {
255
+ return new Response(JSON.stringify({
256
+ error: 'Rate limit exceeded',
257
+ message: `Maximum ${RATE_LIMIT_MAX_REQUESTS} requests per minute. Please try again later.`,
258
+ retryAfter: Math.ceil((recentRequests[0] + RATE_LIMIT_WINDOW_MS - now) / 1000),
259
+ }), {
260
+ status: 429,
261
+ headers: {
262
+ ...corsHeaders(origin),
263
+ 'Content-Type': 'application/json',
264
+ 'Retry-After': Math.ceil((recentRequests[0] + RATE_LIMIT_WINDOW_MS - now) / 1000).toString(),
265
+ },
266
+ });
267
+ }
268
+
269
+ recentRequests.push(now);
270
+ await env.RATE_LIMIT_KV.put(rateLimitKey, JSON.stringify({ requests: recentRequests }), {
271
+ expirationTtl: 120,
272
+ });
273
+ }
274
+
275
+ try {
276
+ const response = await fetch(targetUrl, {
277
+ headers: {
278
+ 'User-Agent': 'BentoPDF-CertProxy/1.0',
279
+ },
280
+ });
281
+
282
+ if (!response.ok) {
283
+ return new Response(JSON.stringify({
284
+ error: 'Failed to fetch certificate',
285
+ status: response.status,
286
+ statusText: response.statusText,
287
+ }), {
288
+ status: response.status,
289
+ headers: {
290
+ ...corsHeaders(origin),
291
+ 'Content-Type': 'application/json',
292
+ },
293
+ });
294
+ }
295
+
296
+ const contentLength = parseInt(response.headers.get('Content-Length') || '0', 10);
297
+ if (contentLength > MAX_FILE_SIZE_BYTES) {
298
+ return new Response(JSON.stringify({
299
+ error: 'File too large',
300
+ message: `Certificate file exceeds maximum size of ${MAX_FILE_SIZE_BYTES / 1024}KB`,
301
+ size: contentLength,
302
+ maxSize: MAX_FILE_SIZE_BYTES,
303
+ }), {
304
+ status: 413,
305
+ headers: {
306
+ ...corsHeaders(origin),
307
+ 'Content-Type': 'application/json',
308
+ },
309
+ });
310
+ }
311
+
312
+ const certData = await response.arrayBuffer();
313
+
314
+ if (certData.byteLength > MAX_FILE_SIZE_BYTES) {
315
+ return new Response(JSON.stringify({
316
+ error: 'File too large',
317
+ message: `Certificate file exceeds maximum size of ${MAX_FILE_SIZE_BYTES / 1024}KB`,
318
+ size: certData.byteLength,
319
+ maxSize: MAX_FILE_SIZE_BYTES,
320
+ }), {
321
+ status: 413,
322
+ headers: {
323
+ ...corsHeaders(origin),
324
+ 'Content-Type': 'application/json',
325
+ },
326
+ });
327
+ }
328
+
329
+ return new Response(certData, {
330
+ status: 200,
331
+ headers: {
332
+ ...corsHeaders(origin),
333
+ 'Content-Type': response.headers.get('Content-Type') || 'application/x-x509-ca-cert',
334
+ 'Content-Length': certData.byteLength.toString(),
335
+ 'Cache-Control': 'public, max-age=86400',
336
+ },
337
+ });
338
+ } catch (error) {
339
+ return new Response(JSON.stringify({
340
+ error: 'Proxy error',
341
+ message: error.message,
342
+ }), {
343
+ status: 500,
344
+ headers: {
345
+ ...corsHeaders(origin),
346
+ 'Content-Type': 'application/json',
347
+ },
348
+ });
349
+ }
350
+ },
351
+ };
cloudflare/wasm-proxy-worker.js ADDED
@@ -0,0 +1,356 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ /**
2
+ * BentoPDF WASM Proxy Worker
3
+ *
4
+ * This Cloudflare Worker proxies WASM module requests to bypass CORS restrictions.
5
+ * It fetches WASM libraries (PyMuPDF, Ghostscript, CoherentPDF) from configured sources
6
+ * and serves them with proper CORS headers.
7
+ *
8
+ * Endpoints:
9
+ * - /pymupdf/* - Proxies to PyMuPDF WASM source
10
+ * - /gs/* - Proxies to Ghostscript WASM source
11
+ * - /cpdf/* - Proxies to CoherentPDF WASM source
12
+ *
13
+ * Deploy: cd cloudflare && npx wrangler deploy -c wasm-wrangler.toml
14
+ *
15
+ * Required Environment Variables (set in Cloudflare dashboard):
16
+ * - PYMUPDF_SOURCE: Base URL for PyMuPDF WASM files (e.g., https://cdn.example.com/pymupdf)
17
+ * - GS_SOURCE: Base URL for Ghostscript WASM files (e.g., https://cdn.example.com/gs)
18
+ * - CPDF_SOURCE: Base URL for CoherentPDF files (e.g., https://cdn.example.com/cpdf)
19
+ */
20
+
21
+ const ALLOWED_ORIGINS = ['https://www.bentopdf.com', 'https://bentopdf.com'];
22
+
23
+ const MAX_FILE_SIZE_BYTES = 100 * 1024 * 1024;
24
+
25
+ const RATE_LIMIT_MAX_REQUESTS = 100;
26
+ const RATE_LIMIT_WINDOW_MS = 60 * 1000;
27
+
28
+ const CACHE_TTL_SECONDS = 604800;
29
+
30
+ const ALLOWED_EXTENSIONS = [
31
+ '.js',
32
+ '.mjs',
33
+ '.wasm',
34
+ '.data',
35
+ '.py',
36
+ '.so',
37
+ '.zip',
38
+ '.json',
39
+ '.mem',
40
+ '.asm.js',
41
+ '.worker.js',
42
+ '.html',
43
+ ];
44
+
45
+ function isAllowedOrigin(origin) {
46
+ if (!origin) return true; // Allow no-origin requests (e.g., direct browser navigation)
47
+ return ALLOWED_ORIGINS.some((allowed) =>
48
+ origin.startsWith(allowed.replace(/\/$/, ''))
49
+ );
50
+ }
51
+
52
+ function isAllowedFile(pathname) {
53
+ const ext = pathname.substring(pathname.lastIndexOf('.')).toLowerCase();
54
+ if (ALLOWED_EXTENSIONS.includes(ext)) return true;
55
+
56
+ if (!pathname.includes('.') || pathname.endsWith('/')) return true;
57
+
58
+ return false;
59
+ }
60
+
61
+ function corsHeaders(origin) {
62
+ return {
63
+ 'Access-Control-Allow-Origin': origin || '*',
64
+ 'Access-Control-Allow-Methods': 'GET, HEAD, OPTIONS',
65
+ 'Access-Control-Allow-Headers': 'Content-Type, Range, Cache-Control',
66
+ 'Access-Control-Expose-Headers':
67
+ 'Content-Length, Content-Range, Content-Type',
68
+ 'Access-Control-Max-Age': '86400',
69
+ };
70
+ }
71
+
72
+ function handleOptions(request) {
73
+ const origin = request.headers.get('Origin');
74
+ return new Response(null, {
75
+ status: 204,
76
+ headers: corsHeaders(origin),
77
+ });
78
+ }
79
+
80
+ function getContentType(pathname) {
81
+ const ext = pathname.substring(pathname.lastIndexOf('.')).toLowerCase();
82
+ const contentTypes = {
83
+ '.js': 'application/javascript',
84
+ '.mjs': 'application/javascript',
85
+ '.wasm': 'application/wasm',
86
+ '.json': 'application/json',
87
+ '.data': 'application/octet-stream',
88
+ '.py': 'text/x-python',
89
+ '.so': 'application/octet-stream',
90
+ '.zip': 'application/zip',
91
+ '.mem': 'application/octet-stream',
92
+ '.html': 'text/html',
93
+ };
94
+ return contentTypes[ext] || 'application/octet-stream';
95
+ }
96
+
97
+ async function proxyRequest(request, env, sourceBaseUrl, subpath, origin) {
98
+ if (!sourceBaseUrl) {
99
+ return new Response(
100
+ JSON.stringify({
101
+ error: 'Source not configured',
102
+ message: 'This WASM module source URL has not been configured.',
103
+ }),
104
+ {
105
+ status: 503,
106
+ headers: {
107
+ ...corsHeaders(origin),
108
+ 'Content-Type': 'application/json',
109
+ },
110
+ }
111
+ );
112
+ }
113
+
114
+ const normalizedBase = sourceBaseUrl.endsWith('/')
115
+ ? sourceBaseUrl.slice(0, -1)
116
+ : sourceBaseUrl;
117
+ const normalizedPath = subpath.startsWith('/') ? subpath : `/${subpath}`;
118
+ const targetUrl = `${normalizedBase}${normalizedPath}`;
119
+
120
+ if (!isAllowedFile(normalizedPath)) {
121
+ return new Response(
122
+ JSON.stringify({
123
+ error: 'Forbidden file type',
124
+ message: 'Only WASM-related file types are allowed.',
125
+ }),
126
+ {
127
+ status: 403,
128
+ headers: {
129
+ ...corsHeaders(origin),
130
+ 'Content-Type': 'application/json',
131
+ },
132
+ }
133
+ );
134
+ }
135
+
136
+ try {
137
+ const cacheKey = new Request(targetUrl, request);
138
+ const cache = caches.default;
139
+ let response = await cache.match(cacheKey);
140
+
141
+ if (!response) {
142
+ response = await fetch(targetUrl, {
143
+ headers: {
144
+ 'User-Agent': 'BentoPDF-WASM-Proxy/1.0',
145
+ Accept: '*/*',
146
+ },
147
+ });
148
+
149
+ if (!response.ok) {
150
+ return new Response(
151
+ JSON.stringify({
152
+ error: 'Failed to fetch resource',
153
+ status: response.status,
154
+ statusText: response.statusText,
155
+ targetUrl: targetUrl,
156
+ }),
157
+ {
158
+ status: response.status,
159
+ headers: {
160
+ ...corsHeaders(origin),
161
+ 'Content-Type': 'application/json',
162
+ },
163
+ }
164
+ );
165
+ }
166
+
167
+ const contentLength = parseInt(
168
+ response.headers.get('Content-Length') || '0',
169
+ 10
170
+ );
171
+ if (contentLength > MAX_FILE_SIZE_BYTES) {
172
+ return new Response(
173
+ JSON.stringify({
174
+ error: 'File too large',
175
+ message: `File exceeds maximum size of ${MAX_FILE_SIZE_BYTES / 1024 / 1024}MB`,
176
+ }),
177
+ {
178
+ status: 413,
179
+ headers: {
180
+ ...corsHeaders(origin),
181
+ 'Content-Type': 'application/json',
182
+ },
183
+ }
184
+ );
185
+ }
186
+
187
+ response = new Response(response.body, response);
188
+ response.headers.set(
189
+ 'Cache-Control',
190
+ `public, max-age=${CACHE_TTL_SECONDS}`
191
+ );
192
+
193
+ if (response.status === 200) {
194
+ await cache.put(cacheKey, response.clone());
195
+ }
196
+ }
197
+
198
+ const bodyData = await response.arrayBuffer();
199
+
200
+ return new Response(bodyData, {
201
+ status: 200,
202
+ headers: {
203
+ ...corsHeaders(origin),
204
+ 'Content-Type': getContentType(normalizedPath),
205
+ 'Content-Length': bodyData.byteLength.toString(),
206
+ 'Cache-Control': `public, max-age=${CACHE_TTL_SECONDS}`,
207
+ 'X-Proxied-From': new URL(targetUrl).hostname,
208
+ },
209
+ });
210
+ } catch (error) {
211
+ return new Response(
212
+ JSON.stringify({
213
+ error: 'Proxy error',
214
+ message: error.message,
215
+ }),
216
+ {
217
+ status: 500,
218
+ headers: {
219
+ ...corsHeaders(origin),
220
+ 'Content-Type': 'application/json',
221
+ },
222
+ }
223
+ );
224
+ }
225
+ }
226
+
227
+ export default {
228
+ async fetch(request, env, ctx) {
229
+ const url = new URL(request.url);
230
+ const pathname = url.pathname;
231
+ const origin = request.headers.get('Origin');
232
+
233
+ if (request.method === 'OPTIONS') {
234
+ return handleOptions(request);
235
+ }
236
+
237
+ if (!isAllowedOrigin(origin)) {
238
+ return new Response(
239
+ JSON.stringify({
240
+ error: 'Forbidden',
241
+ message:
242
+ 'Origin not allowed. Add your domain to ALLOWED_ORIGINS if self-hosting.',
243
+ }),
244
+ {
245
+ status: 403,
246
+ headers: {
247
+ 'Content-Type': 'application/json',
248
+ ...corsHeaders(origin),
249
+ },
250
+ }
251
+ );
252
+ }
253
+
254
+ if (request.method !== 'GET' && request.method !== 'HEAD') {
255
+ return new Response('Method not allowed', {
256
+ status: 405,
257
+ headers: corsHeaders(origin),
258
+ });
259
+ }
260
+
261
+ if (env.RATE_LIMIT_KV) {
262
+ const clientIP = request.headers.get('CF-Connecting-IP') || 'unknown';
263
+ const rateLimitKey = `wasm-ratelimit:${clientIP}`;
264
+ const now = Date.now();
265
+
266
+ const rateLimitData = await env.RATE_LIMIT_KV.get(rateLimitKey, {
267
+ type: 'json',
268
+ });
269
+ const requests = rateLimitData?.requests || [];
270
+ const recentRequests = requests.filter(
271
+ (t) => now - t < RATE_LIMIT_WINDOW_MS
272
+ );
273
+
274
+ if (recentRequests.length >= RATE_LIMIT_MAX_REQUESTS) {
275
+ return new Response(
276
+ JSON.stringify({
277
+ error: 'Rate limit exceeded',
278
+ message: `Maximum ${RATE_LIMIT_MAX_REQUESTS} requests per minute.`,
279
+ }),
280
+ {
281
+ status: 429,
282
+ headers: {
283
+ ...corsHeaders(origin),
284
+ 'Content-Type': 'application/json',
285
+ 'Retry-After': '60',
286
+ },
287
+ }
288
+ );
289
+ }
290
+
291
+ recentRequests.push(now);
292
+ await env.RATE_LIMIT_KV.put(
293
+ rateLimitKey,
294
+ JSON.stringify({ requests: recentRequests }),
295
+ {
296
+ expirationTtl: 120,
297
+ }
298
+ );
299
+ }
300
+
301
+ if (pathname.startsWith('/pymupdf/')) {
302
+ const subpath = pathname.replace('/pymupdf', '');
303
+ return proxyRequest(request, env, env.PYMUPDF_SOURCE, subpath, origin);
304
+ }
305
+
306
+ if (pathname.startsWith('/gs/')) {
307
+ const subpath = pathname.replace('/gs', '');
308
+ return proxyRequest(request, env, env.GS_SOURCE, subpath, origin);
309
+ }
310
+
311
+ if (pathname.startsWith('/cpdf/')) {
312
+ const subpath = pathname.replace('/cpdf', '');
313
+ return proxyRequest(request, env, env.CPDF_SOURCE, subpath, origin);
314
+ }
315
+
316
+ if (pathname === '/' || pathname === '/health') {
317
+ return new Response(
318
+ JSON.stringify({
319
+ service: 'BentoPDF WASM Proxy',
320
+ version: '1.0.0',
321
+ endpoints: {
322
+ pymupdf: '/pymupdf/*',
323
+ gs: '/gs/*',
324
+ cpdf: '/cpdf/*',
325
+ },
326
+ configured: {
327
+ pymupdf: !!env.PYMUPDF_SOURCE,
328
+ gs: !!env.GS_SOURCE,
329
+ cpdf: !!env.CPDF_SOURCE,
330
+ },
331
+ }),
332
+ {
333
+ status: 200,
334
+ headers: {
335
+ ...corsHeaders(origin),
336
+ 'Content-Type': 'application/json',
337
+ },
338
+ }
339
+ );
340
+ }
341
+
342
+ return new Response(
343
+ JSON.stringify({
344
+ error: 'Not Found',
345
+ message: 'Use /pymupdf/*, /gs/*, or /cpdf/* endpoints',
346
+ }),
347
+ {
348
+ status: 404,
349
+ headers: {
350
+ ...corsHeaders(origin),
351
+ 'Content-Type': 'application/json',
352
+ },
353
+ }
354
+ );
355
+ },
356
+ };
cloudflare/wasm-wrangler.toml ADDED
@@ -0,0 +1,69 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name = "bentopdf-wasm-proxy"
2
+ main = "wasm-proxy-worker.js"
3
+ compatibility_date = "2024-01-01"
4
+
5
+ # =============================================================================
6
+ # DEPLOYMENT
7
+ # =============================================================================
8
+ # Deploy this worker:
9
+ # cd cloudflare
10
+ # npx wrangler deploy -c wasm-wrangler.toml
11
+ #
12
+ # Set environment secrets (one of the following methods):
13
+ # Option A: Cloudflare Dashboard
14
+ # Go to Workers & Pages > bentopdf-wasm-proxy > Settings > Variables
15
+ # Add: PYMUPDF_SOURCE, GS_SOURCE, CPDF_SOURCE
16
+ #
17
+ # Option B: Wrangler CLI
18
+ # npx wrangler secret put PYMUPDF_SOURCE -c wasm-wrangler.toml
19
+ # npx wrangler secret put GS_SOURCE -c wasm-wrangler.toml
20
+ # npx wrangler secret put CPDF_SOURCE -c wasm-wrangler.toml
21
+
22
+ # =============================================================================
23
+ # WASM SOURCE URLS
24
+ # =============================================================================
25
+ # Set these as secrets in the Cloudflare dashboard or via wrangler:
26
+ #
27
+ # PYMUPDF_SOURCE: Base URL to PyMuPDF WASM files
28
+ # Example: https://cdn.jsdelivr.net/npm/@bentopdf/pymupdf-wasm/assets
29
+ # https://your-bucket.r2.cloudflarestorage.com/pymupdf
30
+ #
31
+ # GS_SOURCE: Base URL to Ghostscript WASM files
32
+ # Example: https://cdn.jsdelivr.net/npm/@bentopdf/gs-wasm/assets
33
+ # https://your-bucket.r2.cloudflarestorage.com/gs
34
+ #
35
+ # CPDF_SOURCE: Base URL to CoherentPDF files
36
+ # Example: https://cdn.jsdelivr.net/npm/coherentpdf/cpdf
37
+ # https://your-bucket.r2.cloudflarestorage.com/cpdf
38
+
39
+ # =============================================================================
40
+ # USAGE FROM BENTOPDF
41
+ # =============================================================================
42
+ # In BentoPDF's WASM Settings page, configure URLs like:
43
+ # PyMuPDF: https://wasm.bentopdf.com/pymupdf/
44
+ # Ghostscript: https://wasm.bentopdf.com/gs/
45
+ # CoherentPDF: https://wasm.bentopdf.com/cpdf/
46
+
47
+ # =============================================================================
48
+ # RATE LIMITING (Optional but recommended)
49
+ # =============================================================================
50
+ # Create KV namespace:
51
+ # npx wrangler kv namespace create "RATE_LIMIT_KV"
52
+ #
53
+ # Then uncomment and update the ID below:
54
+ # [[kv_namespaces]]
55
+ # binding = "RATE_LIMIT_KV"
56
+ # id = "<YOUR_KV_NAMESPACE_ID>"
57
+
58
+ # Use the same KV namespace as the CORS proxy if you want shared rate limiting
59
+ [[kv_namespaces]]
60
+ binding = "RATE_LIMIT_KV"
61
+ id = "b88e030b308941118cd484e3fcb3ae49"
62
+
63
+ # =============================================================================
64
+ # CUSTOM DOMAIN (Optional)
65
+ # =============================================================================
66
+ # If you want a custom domain like wasm.bentopdf.com:
67
+ # routes = [
68
+ # { pattern = "wasm.bentopdf.com/*", zone_name = "bentopdf.com" }
69
+ # ]
cloudflare/wrangler.toml ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name = "bentopdf-cors-proxy"
2
+ main = "cors-proxy-worker.js"
3
+ compatibility_date = "2024-01-01"
4
+
5
+ # Deploy to Cloudflare's global network
6
+ # If you are self hosting change the name to your worker name
7
+ # Run: npx wrangler deploy
8
+
9
+ # =============================================================================
10
+ # SECURITY FEATURES
11
+ # =============================================================================
12
+ #
13
+ # 1. SIGNATURE VERIFICATION (Optional - for anti-spoofing)
14
+ # - Generate secret: openssl rand -hex 32
15
+ # - Set secret: npx wrangler secret put PROXY_SECRET
16
+ # - Note: Secret is visible in frontend JS, so provides limited protection
17
+ #
18
+ # 2. RATE LIMITING (Recommended - requires KV)
19
+ # - Create KV namespace: npx wrangler kv namespace create "RATE_LIMIT_KV"
20
+ # - Uncomment the kv_namespaces section below with the returned ID
21
+ # - Limits: 60 requests per IP per minute
22
+ #
23
+ # 3. FILE SIZE LIMIT
24
+ # - Automatic: Rejects files larger than 1MB
25
+ # - Certificates are typically <10KB, so this prevents abuse
26
+ #
27
+ # 4. URL RESTRICTIONS
28
+ # - Only certificate URLs allowed (*.crt, *.cer, *.pem, /certs/, etc.)
29
+ # - Blocks private IPs (localhost, 10.x, 192.168.x, 172.16-31.x)
30
+
31
+ # =============================================================================
32
+ # KV NAMESPACE FOR RATE LIMITING
33
+ # =============================================================================
34
+ [[kv_namespaces]]
35
+ binding = "RATE_LIMIT_KV"
36
+ id = "b88e030b308941118cd484e3fcb3ae49"
37
+
38
+ # Optional: Custom domain routing
39
+ # routes = [
40
+ # { pattern = "cors-proxy.bentopdf.com/*", zone_name = "bentopdf.com" }
41
+ # ]
42
+
43
+ # Optional: Environment variables (for non-secret config)
44
+ # [vars]
45
+ # ALLOWED_ORIGINS = "https://www.bentopdf.com,https://bentopdf.com"