Content Processing Configure how documents and URLs are processed

Document Processing Engine ( )} /> toggleSection('doc')}> Help me choose

• Docling is a little slower but more accurate, specially if the documents contain tables and images.

• Simple will extract any content from the document without formatting it. It's ok for simple documents, but will lose quality in complex ones.

• Auto (recommended) will try to process through docling and default to simple.

URL Processing Engine ( )} /> toggleSection('url')}> Help me choose

• Firecrawl is a paid service (with a free tier), and very powerful.

• Jina is a good option as well and also has a free tier.

• Simple will use basic HTTP extraction and will miss content on javascript-based websites.

• Auto (recommended) will try to use firecrawl (if API Key is present). Then, it will use Jina until reaches the limit (or will keep using Jina if you setup the API Key). It will fallback to simple, when none of the previous options is possible.

Embedding and Search Configure search and embedding options

Default Embedding Option ( )} /> toggleSection('embedding')}> Help me choose

Embedding the content will make it easier to find by you and by your AI agents. If you are running a local embedding model (Ollama, for example), you shouldn't worry about cost and just embed everything. For online providers, you might want to be careful only if you process a lot of content (like 100s of documents at a day).

• Choose always if you are running a local embedding model or if your content volume is not that big

• Choose ask if you want to decide every time

• Choose never if you don't care about vector search or do not have an embedding provider.

As a reference, OpenAI's text-embedding-3-small costs about 0.02 for 1 million tokens -- which is about 30 times the Wikipedia page for Earth. With Gemini API, Text Embedding 004 is free with a rate limit of 1500 requests per minute.

File Management Configure file handling and storage options

Auto Delete Files ( )} /> toggleSection('files')}> Help me choose

Once your files are uploaded and processed, they are not required anymore. Most users should allow Open Notebook to delete uploaded files from the upload folder automatically. Choose no, ONLY if you are using Notebook as the primary storage location for those files (which you shouldn't be at all). This option will soon be deprecated in favor of always downloading the files.

• Choose yes (recommended) to automatically delete uploaded files after processing

• Choose no only if you need to keep the original files in the upload folder