Spaces:
Build error
Build error
docs: deep wiki come to the rescue
Browse files
README.md
CHANGED
|
@@ -18,13 +18,10 @@ Or just visit these URLs (**Read**) https://r.jina.ai/https://github.com/jina-ai
|
|
| 18 |
|
| 19 |
## Updates
|
| 20 |
|
| 21 |
-
- **2024-10-08**: Introduced an `adaptive crawler`. It can recursively crawl the website and extract the most relevant pages for a given webpage.
|
| 22 |
- **2024-07-15**: To restrict the results of `s.jina.ai` to certain domain/website, you can set e.g. `site=jina.ai` in the query parameters, which enables in-site search. For more options, [try our updated live-demo](https://jina.ai/reader/#apiform).
|
| 23 |
-
- **2024-07-01**: We have resolved a DDoS attack and other traffic abusing since June 27th. We also found a bug introduced on June 28th which may cause higher latency for some websites. The attack and the bug have been solved; if you have experienced high latency of r.jina.ai between June 27th-30th, it should back to normal now.
|
| 24 |
- **2024-05-30**: Reader can now read abitrary PDF from any URL! Check out [this PDF result from NASA.gov](https://r.jina.ai/https://www.nasa.gov/wp-content/uploads/2023/01/55583main_vision_space_exploration2.pdf) vs [the original](https://www.nasa.gov/wp-content/uploads/2023/01/55583main_vision_space_exploration2.pdf).
|
| 25 |
- **2024-05-15**: We introduced a new endpoint `s.jina.ai` that searches on the web and return top-5 results, each in a LLM-friendly format. [Read more about this new feature here](https://jina.ai/news/jina-reader-for-search-grounding-to-improve-factuality-of-llms).
|
| 26 |
- **2024-05-08**: Image caption is off by default for better latency. To turn it on, set `x-with-generated-alt: true` in the request header.
|
| 27 |
-
- **2024-05-03**: We finally resolved a DDoS attack since April 29th. Now our API is much more reliable and scalable than ever!
|
| 28 |
- **2024-04-24**: You now have more fine-grained control over Reader API [using headers](#using-request-headers), e.g. forwarding cookies, using HTTP proxy.
|
| 29 |
- **2024-04-15**: Reader now supports image reading! It captions all images at the specified URL and adds `Image [idx]: [caption]` as an alt tag (if they initially lack one). This enables downstream LLMs to interact with the images in reasoning, summarizing etc. [See example here](https://x.com/JinaAI_/status/1780094402071023926).
|
| 30 |
|
|
@@ -154,15 +151,8 @@ All images in that page that lack `alt` tag can be auto-captioned by a VLM (visi
|
|
| 154 |
curl -H "X-With-Generated-Alt: true" https://r.jina.ai/https://en.m.wikipedia.org/wiki/Main_Page
|
| 155 |
```
|
| 156 |
|
| 157 |
-
##
|
| 158 |
-
|
| 159 |
-
You will need the following tools to run the project:
|
| 160 |
-
- Node v18 (The build fails for Node version >18)
|
| 161 |
-
|
| 162 |
-
```bash
|
| 163 |
-
git clone git@github.com:jina-ai/reader.git
|
| 164 |
-
npm install
|
| 165 |
-
```
|
| 166 |
|
| 167 |
## What is `thinapps-shared` submodule?
|
| 168 |
|
|
|
|
| 18 |
|
| 19 |
## Updates
|
| 20 |
|
|
|
|
| 21 |
- **2024-07-15**: To restrict the results of `s.jina.ai` to certain domain/website, you can set e.g. `site=jina.ai` in the query parameters, which enables in-site search. For more options, [try our updated live-demo](https://jina.ai/reader/#apiform).
|
|
|
|
| 22 |
- **2024-05-30**: Reader can now read abitrary PDF from any URL! Check out [this PDF result from NASA.gov](https://r.jina.ai/https://www.nasa.gov/wp-content/uploads/2023/01/55583main_vision_space_exploration2.pdf) vs [the original](https://www.nasa.gov/wp-content/uploads/2023/01/55583main_vision_space_exploration2.pdf).
|
| 23 |
- **2024-05-15**: We introduced a new endpoint `s.jina.ai` that searches on the web and return top-5 results, each in a LLM-friendly format. [Read more about this new feature here](https://jina.ai/news/jina-reader-for-search-grounding-to-improve-factuality-of-llms).
|
| 24 |
- **2024-05-08**: Image caption is off by default for better latency. To turn it on, set `x-with-generated-alt: true` in the request header.
|
|
|
|
| 25 |
- **2024-04-24**: You now have more fine-grained control over Reader API [using headers](#using-request-headers), e.g. forwarding cookies, using HTTP proxy.
|
| 26 |
- **2024-04-15**: Reader now supports image reading! It captions all images at the specified URL and adds `Image [idx]: [caption]` as an alt tag (if they initially lack one). This enables downstream LLMs to interact with the images in reasoning, summarizing etc. [See example here](https://x.com/JinaAI_/status/1780094402071023926).
|
| 27 |
|
|
|
|
| 151 |
curl -H "X-With-Generated-Alt: true" https://r.jina.ai/https://en.m.wikipedia.org/wiki/Main_Page
|
| 152 |
```
|
| 153 |
|
| 154 |
+
## How it works
|
| 155 |
+
[](https://deepwiki.com/jina-ai/reader)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 156 |
|
| 157 |
## What is `thinapps-shared` submodule?
|
| 158 |
|