notion2api / PLAN.md
clash-linux's picture
Upload 9 files
cc93261 verified

Plan: Integrate Browser Automation using Playwright

Problem: Direct API requests to Notion using httpx are failing, likely due to server-side checks (e.g., TLS fingerprinting).

Solution: Replace the direct httpx calls with browser automation using Playwright to mimic a real browser environment.

Steps:

  1. Add Dependency:

    • Add playwright to the requirements.txt file.
    • Note: After updating requirements, the browser binaries for Playwright will need to be installed (typically via playwright install in the terminal).
  2. Modify stream_notion_response Function (main.py:184):

    • Remove httpx: Delete the async with httpx.AsyncClient(...) block (main.py:216-263). Keep the surrounding error handling for now.
    • Initialize Playwright: Add code to start Playwright, launch a Chromium browser instance, and create a new browser context.
    • Set Cookie: Add the NOTION_COOKIE (main.py:26) to the browser context.
    • Create Page: Open a new page within the context.
    • Execute Request via JavaScript: Use page.evaluate() to run JavaScript code within the browser page. This JavaScript code will:
      • Use the fetch API to make the POST request to NOTION_API_URL.
      • Include the necessary headers (copied from the original headers dictionary).
      • Send the notion_request_body (serialized as JSON, similar to main.py:218).
      • Handle the streaming response (response.body.getReader()) from fetch.
      • Read chunks from the stream (reader.read()) and send them back to the Python environment (e.g., using page.expose_function to call a Python callback).
    • Process Streamed Chunks in Python: The Python callback function (exposed to JS) will receive the raw chunks from the browser. This callback will need to decode the chunks (likely UTF-8) and process the ndjson lines similarly to the original code (main.py:228-249), yielding the formatted SSE chunks.
    • Handle End of Stream: Ensure the [DONE] message is sent correctly after the browser stream finishes.
    • Cleanup: Close the page, context, and browser instance properly (initially on a per-request basis).
    • Update Error Handling: Adapt the try...except blocks to catch potential Playwright-specific errors.

Diagram:

graph TD
    A[FastAPI Request /v1/chat/completions] --> B{Stream?};
    B -- Yes --> C[Call stream_notion_response];
    B -- No --> D[Call stream_notion_response internally];

    subgraph stream_notion_response (Modified w/ Playwright)
        E[Build NotionRequestBody] --> F;
        F[Initialize Playwright & Launch Browser] --> G;
        G[Create Context & Add Cookie] --> H;
        H[Create Page & Expose Python Callback] --> I;
        I[page.evaluate(): JS Fetch POST to Notion] --> J;
        J[JS: Read Stream Chunks] --> K;
        K[JS: Send Chunk to Python Callback] --> L;
        L[Python Callback: Decode & Process Chunk] --> M;
        M[Yield Formatted SSE Chunk] --> N{End of Stream?};
        N -- No --> J;
        N -- Yes --> O[Yield [DONE] Chunk];
        O --> P[Cleanup Playwright (Page, Context, Browser)];
    end

    C --> Q[Return StreamingResponse];
    D --> R[Collect Chunks from stream_notion_response];
    R --> S[Format Non-Streaming Response];
    S --> T[Return JSON Response];
    Q --> U[Client Receives SSE Stream];
    T --> U;

    style F fill:#f9f,stroke:#333,stroke-width:2px
    style G fill:#f9f,stroke:#333,stroke-width:2px
    style H fill:#f9f,stroke:#333,stroke-width:2px
    style I fill:#f9f,stroke:#333,stroke-width:2px
    style J fill:#f9f,stroke:#333,stroke-width:2px
    style K fill:#f9f,stroke:#333,stroke-width:2px
    style L fill:#ccf,stroke:#333,stroke-width:1px
    style M fill:#ccf,stroke:#333,stroke-width:1px
    style O fill:#ccf,stroke:#333,stroke-width:1px
    style P fill:#f9f,stroke:#333,stroke-width:2px

Agreed Choices:

  • Dependency: playwright
  • Browser: Chromium
  • Browser Lifecycle: Launch/Close per request (initial approach)