Luis Milke commited on
Commit
c4bac70
·
1 Parent(s): 3e56aaa

Add Docker README SDK config for Hugging Face

Browse files
Files changed (1) hide show
  1. README.md +8 -150
README.md CHANGED
@@ -1,154 +1,12 @@
1
- # GeminiWebAPI.js
2
-
3
- **GeminiWebAPI.js** is a powerful, self-contained Node.js class for interacting with the Gemini Web Interface (`gemini.google.com`) through browser automation using [Playwright](https://playwright.dev/).
4
-
5
- It provides a programmatic interface to Google's most powerful consumer AI models, bypassing the need for paid API keys while granting access to advanced features like **Gemini Advanced (Pro)**, **Flash Thinking Mode**, and **Web Search / Deep Research integrations**.
6
-
7
- ---
8
-
9
- ## 🌟 Key Features
10
-
11
- * **Real-time Streaming**: Stream responses word-by-word just like the native UI.
12
- * **Model Switching**: Dynamically switch between `Fast` (Flash), `Pro`, and `Thinking` (Reasoning) models.
13
- * **Temporary Chat Mode**: Built-in support for Temporary Chats to prevent polluting your Gemini history.
14
- * **File & Image Uploads**: Natively upload documents, PDFs, and images (Base64 or local paths).
15
- * **Anti-Crash & Auto-Recovery**: Handles Playwright session crashes, Chrome updates, and network timeouts with an automated backoff/recovery system.
16
- * **Headless or Visible Mode**: Run invisibly in the background, or visibly for debugging/manual login.
17
- * **Session Persistence**: Saves cookies automatically—log in once, and the session is preserved across restarts.
18
-
19
- ---
20
-
21
- ## 📦 Installation
22
-
23
- To use `GeminiWebAPI.js` in your project, drop the file into your source folder and install its dependencies via `npm`.
24
-
25
- ```bash
26
- npm install playwright turndown
27
- ```
28
-
29
- *(First-time Playwright users must install the browser binaries: `npx playwright install chromium`)*
30
-
31
- ---
32
-
33
- ## 🚀 Quick Start Example
34
-
35
- ```javascript
36
- const GeminiWebAPI = require('./GeminiWebAPI.js');
37
-
38
- async function runTest() {
39
- // 1. Initialize
40
- const api = new GeminiWebAPI({
41
- headless: false, // Set to true after you log in manually once
42
- alwaysUseTemporaryChat: true // Keeps your history clean
43
- });
44
-
45
- try {
46
- // 2. Authenticate
47
- console.log("Starting Chrome and checking session...");
48
- await api.auth();
49
- console.log("Ready!");
50
-
51
- // 3. Simple Question
52
- console.log("Asking a quick question...");
53
- const response = await api.ask("Explain quantum computing in one sentence.");
54
- console.log("Response:\\n", response);
55
-
56
- // 4. Streaming & Model Switching
57
- console.log("\\nAsking a complex question with stream and Thinking model...");
58
- const result = await api.askStream(
59
- "Write a Python script to scrape a website.",
60
- (chunk, thinking) => {
61
- // 'thinking' contains the internal thought process of the model
62
- if (thinking) console.log("[Thinking...]");
63
- process.stdout.write(chunk);
64
- },
65
- {
66
- model: "Thinking", // Can be "Fast", "Pro", or "Thinking"
67
- newChat: true // Starts a fresh context
68
- }
69
- );
70
-
71
- console.log("\\n\\nFinal Output length:", result.response.length);
72
-
73
- } catch (e) {
74
- console.error("Error:", e);
75
- } finally {
76
- // 5. Cleanup
77
- await api.close();
78
- }
79
- }
80
-
81
- runTest();
82
- ```
83
-
84
- ---
85
-
86
- ## ⚙️ Configuration Options
87
-
88
- When creating a new instance (`new GeminiWebAPI(options)`), you can pass a configuration object.
89
-
90
- | Option | Default | Description |
91
- |--------|---------|-------------|
92
- | `headless` | `false` | If `true`, runs Chromium in the background invisibly. |
93
- | `alwaysUseTemporaryChat` | `true` | Automatically clicks "Temporary Chat" on startup. |
94
- | `autoRecover` | `true` | If the browser crashes, automatically re-launches it. |
95
- | `timeout` | `120000` | Maximum time (ms) to wait for a response before throwing. |
96
- | `maxRetries` | `5` | Maximum number of internal retry attempts on interaction failure. |
97
- | `sessionPath` | `./user_session.json` | Path where the login cookies are stored. |
98
- | `userDataDir` | `./user_data_server` | Path for the persistent Chrome profile. |
99
- | `model` | `null` | Choose initial model on startup (e.g., `"Pro"`). |
100
-
101
  ---
102
-
103
- ## 📚 API Reference
104
-
105
- ### `await api.auth()`
106
- Initializes the browser and loads cookies. If you are not logged in, the script will pause and prompt you to log in via the open browser window. Once logged in, it saves the session and continues.
107
-
108
- ### `await api.ask(text, [filePath])`
109
- Sends a message and waits for the complete response.
110
- * **`text`**: The prompt to send.
111
- * **`filePath`** *(optional)*: Absolute path to a file (e.g., `C:/image.png`) to upload with the prompt.
112
- * **Returns**: A Promise resolving to a Markdown string of the response.
113
-
114
- ### `await api.askStream(text, onChunk, [options])`
115
- Sends a message and streams the response back in real-time.
116
- * **`text`**: The prompt to send.
117
- * **`onChunk(text, thinking)`**: Callback fired whenever new text arrives. `text` is the current full response. `thinking` contains the internal reasoning (if using the Thinking model).
118
- * **`options`** *(optional object)*:
119
- * `model`: Switch to this model before typing (e.g., `"Thinking"`).
120
- * `newChat`: If `true`, starts a new chat (and temporary mode) before typing.
121
- * `filePath`: Absolute path to a file to upload.
122
- * `images`: Array of base64 image objects: `[{data: "base64...", mimeType: "image/png"}]`.
123
- * **Returns**: `{ response: "Final Text", thinking: "Thought process" }`.
124
-
125
- ### `await api.startNewChat([temporary=true])`
126
- Clicks the "New Chat" button and optionally enables "Temporary Chat".
127
- *Note: `askStream` does this automatically if you pass `newChat: true`.*
128
-
129
- ### `await api.setModel(modelName)`
130
- Switches the Gemini model via the top-left dropdown.
131
- * Supported aliases for Model Names:
132
- * `"Fast"` / `"Flash"` / `"2.0 flash"`
133
- * `"Pro"` / `"1.5 Pro"` / `"2.5 Pro"`
134
- * `"Thinking"` / `"Reasoning"`
135
-
136
- ### `await api.activateTool(toolName)`
137
- Clicks the "Tools" menu in the input bar to activate a specific Gemini Extension.
138
- * Examples: `"Deep Research"`, `"Google Workspace"`, `"Web Search"`.
139
-
140
- ### `await api.close()`
141
- Safely closes the active browser context and saves the latest session data.
142
-
143
  ---
144
 
145
- ## ⚠️ Important Limitations and Tips
146
-
147
- 1. **Initial Login**: Since this uses local browser automation, **the very first run must be with `headless: false`**. You need to manually log in to your Google Account. Once `user_session.json` is generated, you can set `headless: true`.
148
- 2. **Speed & Stability**: Playwright is significantly faster than Selenium or Puppeteer, but UI changes by Google can break CSS selectors. The `GeminiWebAPI` has multiple fallback selectors and robust Turndown (Markdown parser) logic to minimize breakage.
149
- 3. **Sidebar Auto-Locking**: To prevent accidental clicks, the class disables clicks on previous conversations ("Gems" and Chat History) by injecting CSS rules during runtime.
150
-
151
- ## 🤝 Troubleshooting
152
 
153
- * **Model is not switching**: Make sure to give the UI time to load. Temp mode forces a reload of the UI which resets the model to "Fast". This wrapper automatically handles the delay, but on slow internet, it might need more time.
154
- * **Script crashes on startup**: Delete the `user_data_server` folder and `user_session.json` files and run with `headless: false` to force a clean login session.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: Gravity Claw Backend
3
+ emoji: 🐾
4
+ colorFrom: indigo
5
+ colorTo: purple
6
+ sdk: docker
7
+ pinned: false
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  ---
9
 
10
+ # Gravity Claw Backend
 
 
 
 
 
 
11
 
12
+ Dockerized AI Agent backend for the Gravity Claw project.