File size: 4,752 Bytes
806fb75
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
# Simple Production Anti-Bot Strategy

This document replaces the overly-complex idea of forcing perfect Chrome TLS impersonation inside the core engine.

## Principle

**Do not make the core plugin engine a fragile browser clone.**

Keep the BEX engine:

- portable
- buildable everywhere
- easy to embed in C++ apps
- deterministic where possible
- independent from experimental TLS impersonation crates

Then add simple challenge handling around it.

## Recommended Flow

```text
Plugin request
   ↓
BEX normal HTTP backend
   ↓
Success? ────────────────→ return data
   ↓ no
Challenge detected?
   ↓ yes
Return CHALLENGE_REQUIRED with URL/domain/reason
   ↓
C++ app decides fallback:
   - use cached cookies
   - ask user to import cookies
   - open system browser/WebView only when needed
   - use app-specific HTTP fetcher
   - use optional proxy service
```

## Why This Is Better

Perfect Chrome impersonation is not simple:

- TLS JA3/JA4 changes with Chrome versions.
- HTTP/2 fingerprints change.
- Libraries using BoringSSL are harder to cross-compile.
- Mobile/iOS/Android builds need separate proof.
- One wrong cipher order or H2 setting can still get blocked.
- CAPTCHA/Turnstile still cannot be solved silently.

For an engine that must be used inside **many C++ apps**, the stable approach is:

- use portable Rust HTTP by default
- detect challenge pages reliably
- delegate rare hard anti-bot cases to the host app

## Challenge Detection

A response should be treated as anti-bot/challenge if any of these are true:

### Status codes

- `403`
- `429`
- `503`

### Headers

- `server: cloudflare`
- `cf-ray`
- `cf-chl-*`
- `x-datadome`
- `x-perimeterx`
- `akamai-*`

### Body markers

- `Just a moment...`
- `Checking your browser`
- `cf-browser-verification`
- `cf-chl-`
- `turnstile`
- `captcha`
- `datadome`
- `px-captcha`

## Engine-Level Behavior

The BEX engine should not try to solve every challenge itself.

Instead:

1. Detect likely challenge.
2. Return structured error:

```json
{
  "code": "CHALLENGE_REQUIRED",
  "url": "https://example.com/path",
  "final_url": "https://example.com/cdn-cgi/challenge-platform/...",
  "status": 403,
  "provider": "cloudflare",
  "domain": "example.com",
  "hint": "Host app should provide cookies or browser-backed fetch."
}
```

3. Host app can then retry with cookies or a browser-backed fetcher.

## Simple Fallback Options

### Option A β€” User-provided cookies

The app allows the user to paste/export cookies for a domain.

Then plugins can send:

```http
Cookie: cf_clearance=...; session=...
```

This is simple, cross-platform, and avoids hidden browser automation.

### Option B β€” App-level browser session

The app opens a system browser/WebView **only when needed**.

After challenge is solved, app stores cookies in BEX secret/KV store.

Future requests use those cookies and avoid WebView.

### Option C β€” External fetcher callback

Expose an optional C ABI hook:

```c
typedef bool (*BexExternalFetch)(
    void* user_data,
    const char* method,
    const char* url,
    const uint8_t* body,
    size_t body_len,
    BexFetchResult* out
);
```

Then the host app can provide:

- libcurl-impersonate
- platform-native HTTP stack
- browser-backed fetch
- company proxy
- Android/iOS native networking

The core engine stays simple.

### Option D β€” Optional proxy service

For apps that control their backend, route difficult sites through a server-side fetcher with proper browser fingerprinting.

The engine stays portable and does not embed fragile anti-bot logic.

## Plugin Guidance

Plugins should:

- set `Referer` correctly
- preserve cookies when provided
- avoid excessive retries
- return `PluginError::Forbidden` or `PluginError::RateLimited` for challenge pages
- prefer local JS ciphers over third-party helper APIs when possible

Plugins should not:

- hardcode fake TLS assumptions
- rely on one external decoder service forever
- endlessly retry CF challenge pages

## Recommended Near-Term Fixes

1. Add challenge detection in `HttpHostService`.
2. Map challenges to a structured error payload for C ABI.
3. Add cookie helper APIs:
   - set domain cookies
   - clear domain cookies
   - list stored challenge domains
4. Add optional external fetch callback in C ABI.
5. Keep advanced TLS impersonation as an optional backend only.

## Final Recommendation

For production:

- Default: `reqwest + rustls` portable backend.
- Add: challenge detection and external fallback hook.
- Optional later: verified impersonation backend behind feature flag.

This gives the best balance of:

- reliability
- cross-platform support
- maintainability
- app integration flexibility
- real-world anti-bot handling