File size: 4,752 Bytes
806fb75 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 | # Simple Production Anti-Bot Strategy
This document replaces the overly-complex idea of forcing perfect Chrome TLS impersonation inside the core engine.
## Principle
**Do not make the core plugin engine a fragile browser clone.**
Keep the BEX engine:
- portable
- buildable everywhere
- easy to embed in C++ apps
- deterministic where possible
- independent from experimental TLS impersonation crates
Then add simple challenge handling around it.
## Recommended Flow
```text
Plugin request
β
BEX normal HTTP backend
β
Success? βββββββββββββββββ return data
β no
Challenge detected?
β yes
Return CHALLENGE_REQUIRED with URL/domain/reason
β
C++ app decides fallback:
- use cached cookies
- ask user to import cookies
- open system browser/WebView only when needed
- use app-specific HTTP fetcher
- use optional proxy service
```
## Why This Is Better
Perfect Chrome impersonation is not simple:
- TLS JA3/JA4 changes with Chrome versions.
- HTTP/2 fingerprints change.
- Libraries using BoringSSL are harder to cross-compile.
- Mobile/iOS/Android builds need separate proof.
- One wrong cipher order or H2 setting can still get blocked.
- CAPTCHA/Turnstile still cannot be solved silently.
For an engine that must be used inside **many C++ apps**, the stable approach is:
- use portable Rust HTTP by default
- detect challenge pages reliably
- delegate rare hard anti-bot cases to the host app
## Challenge Detection
A response should be treated as anti-bot/challenge if any of these are true:
### Status codes
- `403`
- `429`
- `503`
### Headers
- `server: cloudflare`
- `cf-ray`
- `cf-chl-*`
- `x-datadome`
- `x-perimeterx`
- `akamai-*`
### Body markers
- `Just a moment...`
- `Checking your browser`
- `cf-browser-verification`
- `cf-chl-`
- `turnstile`
- `captcha`
- `datadome`
- `px-captcha`
## Engine-Level Behavior
The BEX engine should not try to solve every challenge itself.
Instead:
1. Detect likely challenge.
2. Return structured error:
```json
{
"code": "CHALLENGE_REQUIRED",
"url": "https://example.com/path",
"final_url": "https://example.com/cdn-cgi/challenge-platform/...",
"status": 403,
"provider": "cloudflare",
"domain": "example.com",
"hint": "Host app should provide cookies or browser-backed fetch."
}
```
3. Host app can then retry with cookies or a browser-backed fetcher.
## Simple Fallback Options
### Option A β User-provided cookies
The app allows the user to paste/export cookies for a domain.
Then plugins can send:
```http
Cookie: cf_clearance=...; session=...
```
This is simple, cross-platform, and avoids hidden browser automation.
### Option B β App-level browser session
The app opens a system browser/WebView **only when needed**.
After challenge is solved, app stores cookies in BEX secret/KV store.
Future requests use those cookies and avoid WebView.
### Option C β External fetcher callback
Expose an optional C ABI hook:
```c
typedef bool (*BexExternalFetch)(
void* user_data,
const char* method,
const char* url,
const uint8_t* body,
size_t body_len,
BexFetchResult* out
);
```
Then the host app can provide:
- libcurl-impersonate
- platform-native HTTP stack
- browser-backed fetch
- company proxy
- Android/iOS native networking
The core engine stays simple.
### Option D β Optional proxy service
For apps that control their backend, route difficult sites through a server-side fetcher with proper browser fingerprinting.
The engine stays portable and does not embed fragile anti-bot logic.
## Plugin Guidance
Plugins should:
- set `Referer` correctly
- preserve cookies when provided
- avoid excessive retries
- return `PluginError::Forbidden` or `PluginError::RateLimited` for challenge pages
- prefer local JS ciphers over third-party helper APIs when possible
Plugins should not:
- hardcode fake TLS assumptions
- rely on one external decoder service forever
- endlessly retry CF challenge pages
## Recommended Near-Term Fixes
1. Add challenge detection in `HttpHostService`.
2. Map challenges to a structured error payload for C ABI.
3. Add cookie helper APIs:
- set domain cookies
- clear domain cookies
- list stored challenge domains
4. Add optional external fetch callback in C ABI.
5. Keep advanced TLS impersonation as an optional backend only.
## Final Recommendation
For production:
- Default: `reqwest + rustls` portable backend.
- Add: challenge detection and external fallback hook.
- Optional later: verified impersonation backend behind feature flag.
This gives the best balance of:
- reliability
- cross-platform support
- maintainability
- app integration flexibility
- real-world anti-bot handling
|