Spaces:
Build error
Build error
docs: document header usage
Browse files
README.md
CHANGED
|
@@ -14,6 +14,7 @@ Reader converts any URL to an **LLM-friendly** input with a simple prefix `https
|
|
| 14 |
|
| 15 |
## Updates
|
| 16 |
|
|
|
|
| 17 |
- **2024-04-15**: Reader now supports image reading! It captions all images at the specified URL and adds `Image [idx]: [caption]` as an alt tag (if they initially lack one). This enables downstream LLMs to interact with the images in reasoning, summarizing etc. [See example here](https://x.com/JinaAI_/status/1780094402071023926).
|
| 18 |
|
| 19 |
## Usage
|
|
@@ -57,13 +58,29 @@ Your LLM: LLM(streamContent1) | |
|
|
| 57 |
|
| 58 |
Note that in terms of completeness: `... > streamContent3 > streamContent2 > streamContent1`, each subsequent chunk contains more complete information.
|
| 59 |
|
| 60 |
-
### JSON mode
|
| 61 |
|
| 62 |
This is still very early and the result is not really a "useful" JSON. It contains three fields `url`, `title` and `content` only. Nonetheless, you can use accept-header to control the output format:
|
| 63 |
```bash
|
| 64 |
curl -H "Accept: application/json" https://r.jina.ai/https://en.m.wikipedia.org/wiki/Main_Page
|
| 65 |
```
|
| 66 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 67 |
## Install
|
| 68 |
|
| 69 |
You will need the following tools to run the project:
|
|
|
|
| 14 |
|
| 15 |
## Updates
|
| 16 |
|
| 17 |
+
- **2024-04-24**: You now have more fine-grained control over Reader API using headers, e.g. forwarding cookies, using HTTP proxy.
|
| 18 |
- **2024-04-15**: Reader now supports image reading! It captions all images at the specified URL and adds `Image [idx]: [caption]` as an alt tag (if they initially lack one). This enables downstream LLMs to interact with the images in reasoning, summarizing etc. [See example here](https://x.com/JinaAI_/status/1780094402071023926).
|
| 19 |
|
| 20 |
## Usage
|
|
|
|
| 58 |
|
| 59 |
Note that in terms of completeness: `... > streamContent3 > streamContent2 > streamContent1`, each subsequent chunk contains more complete information.
|
| 60 |
|
| 61 |
+
### JSON mode (super early beta)
|
| 62 |
|
| 63 |
This is still very early and the result is not really a "useful" JSON. It contains three fields `url`, `title` and `content` only. Nonetheless, you can use accept-header to control the output format:
|
| 64 |
```bash
|
| 65 |
curl -H "Accept: application/json" https://r.jina.ai/https://en.m.wikipedia.org/wiki/Main_Page
|
| 66 |
```
|
| 67 |
|
| 68 |
+
### Using request headers
|
| 69 |
+
|
| 70 |
+
As you have already seen above, one can control the behavior of the Reader API using request headers. Here is a complete list of supported headers.
|
| 71 |
+
|
| 72 |
+
- You can ask the Reader API to forward cookies settings via the `x-set-cookie` header.
|
| 73 |
+
- Note that requests with cookies will not be cached.
|
| 74 |
+
- You can bypass `readability` filtering via the `x-respond-with` header, specifically:
|
| 75 |
+
- `x-respond-with: html` returns `documentElement.outerHTML`
|
| 76 |
+
- `x-respond-with: text` returns `document.body.innerText`
|
| 77 |
+
- `x-respond-with: screenshot` returns or redirects to the URL of the webpage's screenshot
|
| 78 |
+
- The default behavior is equivalent to `x-respond-with: markdown`
|
| 79 |
+
- You can specify a proxy server via the `x-proxy-url` header.
|
| 80 |
+
- You can bypass the cached page (lifetime 300s) via the `x-no-cache` header.
|
| 81 |
+
|
| 82 |
+
|
| 83 |
+
|
| 84 |
## Install
|
| 85 |
|
| 86 |
You will need the following tools to run the project:
|