Spaces:
Paused
Paused
File size: 4,328 Bytes
cc93261 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 |
# Plan: Integrate Browser Automation using Playwright
**Problem:** Direct API requests to Notion using `httpx` are failing, likely due to server-side checks (e.g., TLS fingerprinting).
**Solution:** Replace the direct `httpx` calls with browser automation using Playwright to mimic a real browser environment.
**Steps:**
1. **Add Dependency:**
* Add `playwright` to the [`requirements.txt`](requirements.txt) file.
* *Note:* After updating requirements, the browser binaries for Playwright will need to be installed (typically via `playwright install` in the terminal).
2. **Modify `stream_notion_response` Function ([`main.py:184`](main.py:184)):**
* **Remove `httpx`:** Delete the `async with httpx.AsyncClient(...)` block ([`main.py:216-263`](main.py:216)). Keep the surrounding error handling for now.
* **Initialize Playwright:** Add code to start Playwright, launch a Chromium browser instance, and create a new browser context.
* **Set Cookie:** Add the `NOTION_COOKIE` ([`main.py:26`](main.py:26)) to the browser context.
* **Create Page:** Open a new page within the context.
* **Execute Request via JavaScript:** Use `page.evaluate()` to run JavaScript code within the browser page. This JavaScript code will:
* Use the `fetch` API to make the POST request to [`NOTION_API_URL`](main.py:24).
* Include the necessary headers (copied from the original [`headers`](main.py:186) dictionary).
* Send the `notion_request_body` (serialized as JSON, similar to [`main.py:218`](main.py:218)).
* Handle the streaming response (`response.body.getReader()`) from `fetch`.
* Read chunks from the stream (`reader.read()`) and send them back to the Python environment (e.g., using `page.expose_function` to call a Python callback).
* **Process Streamed Chunks in Python:** The Python callback function (exposed to JS) will receive the raw chunks from the browser. This callback will need to decode the chunks (likely UTF-8) and process the `ndjson` lines similarly to the original code ([`main.py:228-249`](main.py:228)), yielding the formatted SSE chunks.
* **Handle End of Stream:** Ensure the `[DONE]` message is sent correctly after the browser stream finishes.
* **Cleanup:** Close the page, context, and browser instance properly (initially on a per-request basis).
* **Update Error Handling:** Adapt the `try...except` blocks to catch potential Playwright-specific errors.
**Diagram:**
```mermaid
graph TD
A[FastAPI Request /v1/chat/completions] --> B{Stream?};
B -- Yes --> C[Call stream_notion_response];
B -- No --> D[Call stream_notion_response internally];
subgraph stream_notion_response (Modified w/ Playwright)
E[Build NotionRequestBody] --> F;
F[Initialize Playwright & Launch Browser] --> G;
G[Create Context & Add Cookie] --> H;
H[Create Page & Expose Python Callback] --> I;
I[page.evaluate(): JS Fetch POST to Notion] --> J;
J[JS: Read Stream Chunks] --> K;
K[JS: Send Chunk to Python Callback] --> L;
L[Python Callback: Decode & Process Chunk] --> M;
M[Yield Formatted SSE Chunk] --> N{End of Stream?};
N -- No --> J;
N -- Yes --> O[Yield [DONE] Chunk];
O --> P[Cleanup Playwright (Page, Context, Browser)];
end
C --> Q[Return StreamingResponse];
D --> R[Collect Chunks from stream_notion_response];
R --> S[Format Non-Streaming Response];
S --> T[Return JSON Response];
Q --> U[Client Receives SSE Stream];
T --> U;
style F fill:#f9f,stroke:#333,stroke-width:2px
style G fill:#f9f,stroke:#333,stroke-width:2px
style H fill:#f9f,stroke:#333,stroke-width:2px
style I fill:#f9f,stroke:#333,stroke-width:2px
style J fill:#f9f,stroke:#333,stroke-width:2px
style K fill:#f9f,stroke:#333,stroke-width:2px
style L fill:#ccf,stroke:#333,stroke-width:1px
style M fill:#ccf,stroke:#333,stroke-width:1px
style O fill:#ccf,stroke:#333,stroke-width:1px
style P fill:#f9f,stroke:#333,stroke-width:2px
```
**Agreed Choices:**
* Dependency: `playwright`
* Browser: Chromium
* Browser Lifecycle: Launch/Close per request (initial approach) |