Spaces:
No application file
No application file
| --- | |
| title: 🧩 Embedding models | |
| --- | |
| ## Overview | |
| Embedchain supports several embedding models from the following providers: | |
| <CardGroup cols={4}> | |
| <Card title="OpenAI" href="#openai"></Card> | |
| <Card title="GoogleAI" href="#google-ai"></Card> | |
| <Card title="Azure OpenAI" href="#azure-openai"></Card> | |
| <Card title="GPT4All" href="#gpt4all"></Card> | |
| <Card title="Hugging Face" href="#hugging-face"></Card> | |
| <Card title="Vertex AI" href="#vertex-ai"></Card> | |
| </CardGroup> | |
| ## OpenAI | |
| To use OpenAI embedding function, you have to set the `OPENAI_API_KEY` environment variable. You can obtain the OpenAI API key from the [OpenAI Platform](https://platform.openai.com/account/api-keys). | |
| Once you have obtained the key, you can use it like this: | |
| <CodeGroup> | |
| ```python main.py | |
| import os | |
| from embedchain import App | |
| os.environ['OPENAI_API_KEY'] = 'xxx' | |
| # load embedding model configuration from config.yaml file | |
| app = App.from_config(config_path="config.yaml") | |
| app.add("https://en.wikipedia.org/wiki/OpenAI") | |
| app.query("What is OpenAI?") | |
| ``` | |
| ```yaml config.yaml | |
| embedder: | |
| provider: openai | |
| config: | |
| model: 'text-embedding-3-small' | |
| ``` | |
| </CodeGroup> | |
| * OpenAI announced two new embedding models: `text-embedding-3-small` and `text-embedding-3-large`. Embedchain supports both these models. Below you can find YAML config for both: | |
| <CodeGroup> | |
| ```yaml text-embedding-3-small.yaml | |
| embedder: | |
| provider: openai | |
| config: | |
| model: 'text-embedding-3-small' | |
| ``` | |
| ```yaml text-embedding-3-large.yaml | |
| embedder: | |
| provider: openai | |
| config: | |
| model: 'text-embedding-3-large' | |
| ``` | |
| </CodeGroup> | |
| ## Google AI | |
| To use Google AI embedding function, you have to set the `GOOGLE_API_KEY` environment variable. You can obtain the Google API key from the [Google Maker Suite](https://makersuite.google.com/app/apikey) | |
| <CodeGroup> | |
| ```python main.py | |
| import os | |
| from embedchain import App | |
| os.environ["GOOGLE_API_KEY"] = "xxx" | |
| app = App.from_config(config_path="config.yaml") | |
| ``` | |
| ```yaml config.yaml | |
| embedder: | |
| provider: google | |
| config: | |
| model: 'models/embedding-001' | |
| task_type: "retrieval_document" | |
| title: "Embeddings for Embedchain" | |
| ``` | |
| </CodeGroup> | |
| <br/> | |
| <Note> | |
| For more details regarding the Google AI embedding model, please refer to the [Google AI documentation](https://ai.google.dev/tutorials/python_quickstart#use_embeddings). | |
| </Note> | |
| ## Azure OpenAI | |
| To use Azure OpenAI embedding model, you have to set some of the azure openai related environment variables as given in the code block below: | |
| <CodeGroup> | |
| ```python main.py | |
| import os | |
| from embedchain import App | |
| os.environ["OPENAI_API_TYPE"] = "azure" | |
| os.environ["AZURE_OPENAI_ENDPOINT"] = "https://xxx.openai.azure.com/" | |
| os.environ["AZURE_OPENAI_API_KEY"] = "xxx" | |
| os.environ["OPENAI_API_VERSION"] = "xxx" | |
| app = App.from_config(config_path="config.yaml") | |
| ``` | |
| ```yaml config.yaml | |
| llm: | |
| provider: azure_openai | |
| config: | |
| model: gpt-35-turbo | |
| deployment_name: your_llm_deployment_name | |
| temperature: 0.5 | |
| max_tokens: 1000 | |
| top_p: 1 | |
| stream: false | |
| embedder: | |
| provider: azure_openai | |
| config: | |
| model: text-embedding-ada-002 | |
| deployment_name: you_embedding_model_deployment_name | |
| ``` | |
| </CodeGroup> | |
| You can find the list of models and deployment name on the [Azure OpenAI Platform](https://oai.azure.com/portal). | |
| ## GPT4ALL | |
| GPT4All supports generating high quality embeddings of arbitrary length documents of text using a CPU optimized contrastively trained Sentence Transformer. | |
| <CodeGroup> | |
| ```python main.py | |
| from embedchain import App | |
| # load embedding model configuration from config.yaml file | |
| app = App.from_config(config_path="config.yaml") | |
| ``` | |
| ```yaml config.yaml | |
| llm: | |
| provider: gpt4all | |
| config: | |
| model: 'orca-mini-3b-gguf2-q4_0.gguf' | |
| temperature: 0.5 | |
| max_tokens: 1000 | |
| top_p: 1 | |
| stream: false | |
| embedder: | |
| provider: gpt4all | |
| ``` | |
| </CodeGroup> | |
| ## Hugging Face | |
| Hugging Face supports generating embeddings of arbitrary length documents of text using Sentence Transformer library. Example of how to generate embeddings using hugging face is given below: | |
| <CodeGroup> | |
| ```python main.py | |
| from embedchain import App | |
| # load embedding model configuration from config.yaml file | |
| app = App.from_config(config_path="config.yaml") | |
| ``` | |
| ```yaml config.yaml | |
| llm: | |
| provider: huggingface | |
| config: | |
| model: 'google/flan-t5-xxl' | |
| temperature: 0.5 | |
| max_tokens: 1000 | |
| top_p: 0.5 | |
| stream: false | |
| embedder: | |
| provider: huggingface | |
| config: | |
| model: 'sentence-transformers/all-mpnet-base-v2' | |
| ``` | |
| </CodeGroup> | |
| ## Vertex AI | |
| Embedchain supports Google's VertexAI embeddings model through a simple interface. You just have to pass the `model_name` in the config yaml and it would work out of the box. | |
| <CodeGroup> | |
| ```python main.py | |
| from embedchain import App | |
| # load embedding model configuration from config.yaml file | |
| app = App.from_config(config_path="config.yaml") | |
| ``` | |
| ```yaml config.yaml | |
| llm: | |
| provider: vertexai | |
| config: | |
| model: 'chat-bison' | |
| temperature: 0.5 | |
| top_p: 0.5 | |
| embedder: | |
| provider: vertexai | |
| config: | |
| model: 'textembedding-gecko' | |
| ``` | |
| </CodeGroup> | |