| --- |
| license: gpl-3.0 |
| tags: |
| - typesense |
| - semantic search |
| - vector search |
| --- |
| |
| # Typesense Built-in Embedding Models |
|
|
| This repository holds all the built-in ML models supported by [Typesense](https://typesense.org) for semantic search currently. |
|
|
| If you have a model that you would like to add to our supported list, you can convert it to the ONNX format and create a Pull Request (PR) to include it. (See below for instructions). |
|
|
| ## Usage |
|
|
| Here's an example of how to specify the model to use for auto-embedding generation when creating a collection in Typesense: |
|
|
| ```bash |
| curl -X POST \ |
| 'http://localhost:8108/collections' \ |
| -H 'Content-Type: application/json' \ |
| -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \ |
| -d '{ |
| "name": "products", |
| "fields": [ |
| { |
| "name": "product_name", |
| "type": "string" |
| }, |
| { |
| "name": "embedding", |
| "type": "float[]", |
| "embed": { |
| "from": [ |
| "product_name" |
| ], |
| "model_config": { |
| "model_name": "ts/all-MiniLM-L12-v2" |
| } |
| } |
| } |
| ] |
| }' |
| ``` |
|
|
| Replace `all-MiniLM-L12-v2` with any model name from this repository. |
|
|
| Here's a detailed step-by-step article with more information: https://typesense.org/docs/guide/semantic-search.html |
|
|
| ## Contributing |
|
|
| If you have a model that you would like to add to our supported list, you can convert it to the ONNX format and create a Pull Request (PR) to include it. (See below for instructions). |
|
|
| ### Convert a model to ONNX format |
|
|
| #### Converting a Hugging Face Transformers Model |
| To convert any model from Hugging Face to ONNX format, you can follow the instructions in [this link](https://huggingface.co/docs/transformers/serialization#export-to-onnx) using the ```optimum-cli```. |
|
|
| #### Converting a PyTorch Model |
| If you have a PyTorch model, you can use the ```torch.onnx``` APIs to convert it to the ONNX format. More information on the conversion process can be found [here](https://pytorch.org/docs/stable/onnx.html). |
|
|
| #### Converting a Tensorflow Model |
| For Tensorflow models, you can utilize the tf2onnx tool to convert them to the ONNX format. Detailed guidance on this conversion can be found [here](https://onnxruntime.ai/docs/tutorials/tf-get-started.html#getting-started-converting-tensorflow-to-onnx). |
|
|
| #### Creating model config |
|
|
| Before submitting your ONNX model through a PR, you need to organize the necessary files under a folder with the model's name. Ensure that your model configuration adheres to the following structure: |
|
|
| - **Model File**: The ONNX model file. |
| - **Vocab File**: The vocabulary file required for the model. |
| - **Model Config File**: Named as config.json, this file should contain the following keys: |
| | Key | Description | Optional | |
| |-----|-------------|----------| |
| |model_md5| MD5 checksum of model file as string| No | |
| |vocab_md5| MD5 checksum of vocab file as string| No | |
| |model_type| Model type (currently only ```bert``` and ```xlm_roberta``` supported)| No | |
| |vocab_file_name| File name of vocab file| No | |
| |indexing_prefix| Prefix to be added before embedding documents| Yes | |
| |query_prefix| Prefix to be added before embedding queries | Yes | |
| |
| |
| Please make sure that the information in the configuration file is accurate and complete before submitting your PR. |
| |
| We appreciate your contributions to expand our collection of supported embedding models! |