hanxiao commited on
Commit
ae788c3
·
unverified ·
1 Parent(s): 94a7205

docs: document header usage

Browse files
Files changed (1) hide show
  1. README.md +18 -1
README.md CHANGED
@@ -14,6 +14,7 @@ Reader converts any URL to an **LLM-friendly** input with a simple prefix `https
14
 
15
  ## Updates
16
 
 
17
  - **2024-04-15**: Reader now supports image reading! It captions all images at the specified URL and adds `Image [idx]: [caption]` as an alt tag (if they initially lack one). This enables downstream LLMs to interact with the images in reasoning, summarizing etc. [See example here](https://x.com/JinaAI_/status/1780094402071023926).
18
 
19
  ## Usage
@@ -57,13 +58,29 @@ Your LLM: LLM(streamContent1) | |
57
 
58
  Note that in terms of completeness: `... > streamContent3 > streamContent2 > streamContent1`, each subsequent chunk contains more complete information.
59
 
60
- ### JSON mode
61
 
62
  This is still very early and the result is not really a "useful" JSON. It contains three fields `url`, `title` and `content` only. Nonetheless, you can use accept-header to control the output format:
63
  ```bash
64
  curl -H "Accept: application/json" https://r.jina.ai/https://en.m.wikipedia.org/wiki/Main_Page
65
  ```
66
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
67
  ## Install
68
 
69
  You will need the following tools to run the project:
 
14
 
15
  ## Updates
16
 
17
+ - **2024-04-24**: You now have more fine-grained control over Reader API using headers, e.g. forwarding cookies, using HTTP proxy.
18
  - **2024-04-15**: Reader now supports image reading! It captions all images at the specified URL and adds `Image [idx]: [caption]` as an alt tag (if they initially lack one). This enables downstream LLMs to interact with the images in reasoning, summarizing etc. [See example here](https://x.com/JinaAI_/status/1780094402071023926).
19
 
20
  ## Usage
 
58
 
59
  Note that in terms of completeness: `... > streamContent3 > streamContent2 > streamContent1`, each subsequent chunk contains more complete information.
60
 
61
+ ### JSON mode (super early beta)
62
 
63
  This is still very early and the result is not really a "useful" JSON. It contains three fields `url`, `title` and `content` only. Nonetheless, you can use accept-header to control the output format:
64
  ```bash
65
  curl -H "Accept: application/json" https://r.jina.ai/https://en.m.wikipedia.org/wiki/Main_Page
66
  ```
67
 
68
+ ### Using request headers
69
+
70
+ As you have already seen above, one can control the behavior of the Reader API using request headers. Here is a complete list of supported headers.
71
+
72
+ - You can ask the Reader API to forward cookies settings via the `x-set-cookie` header.
73
+ - Note that requests with cookies will not be cached.
74
+ - You can bypass `readability` filtering via the `x-respond-with` header, specifically:
75
+ - `x-respond-with: html` returns `documentElement.outerHTML`
76
+ - `x-respond-with: text` returns `document.body.innerText`
77
+ - `x-respond-with: screenshot` returns or redirects to the URL of the webpage's screenshot
78
+ - The default behavior is equivalent to `x-respond-with: markdown`
79
+ - You can specify a proxy server via the `x-proxy-url` header.
80
+ - You can bypass the cached page (lifetime 300s) via the `x-no-cache` header.
81
+
82
+
83
+
84
  ## Install
85
 
86
  You will need the following tools to run the project: