mkrimmel-pplx commited on
Commit
eec040e
·
1 Parent(s): 206ab7e

Usage section for TEI (#4)

Browse files

- docs: added usage section for TEI (48b327cf88cf893c73dcebd372043472c3ff1014)
- docs: update snippet to not use normalization (9238a3b6917121615a94512ec8a84a3d51cf0107)
- fix: use /embed endpoint (6f75acd873b3c46c9e34ed45b061030b24e4e199)
- fix: remove mention of OpenAI API (df3b2f9a723081dc4606908bed889141f2cb5be3)

Files changed (1) hide show
  1. README.md +50 -0
README.md CHANGED
@@ -86,6 +86,56 @@ embeddings = model.encode(texts, quantization="binary") # Shape: (5, 2560), quan
86
 
87
  </details>
88
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
89
 
90
  ## Technical Details
91
 
 
86
 
87
  </details>
88
 
89
+ <details>
90
+ <summary>Using Text Embeddings Inference (TEI)</summary>
91
+
92
+ > [!NOTE]
93
+ > Text Embeddings Inference v1.9.0 will be released stable soon, in the meantime
94
+ > feel free to use the latest containers or rather via SHA ``.
95
+
96
+ > [!IMPORTANT]
97
+ > Currently, only int8-quantized embeddings are available via TEI. Remember to use cosine similarity with unnormalized int8 embeddings.
98
+
99
+ - CPU w/ Candle:
100
+
101
+ ```bash
102
+ docker run -p 8080:80 ghcr.io/huggingface/text-embeddings-inference:cpu-latest --model-id perplexity-ai/pplx-embed-1-4B --auto-truncate
103
+ ```
104
+
105
+ - CPU w/ ORT (ONNX Runtime):
106
+
107
+ ```bash
108
+ docker run -p 8080:80 ghcr.io/huggingface/text-embeddings-inference:cpu-latest --model-id onnx-community/pplx-embed-1-4B --auto-truncate
109
+ ```
110
+
111
+ - GPU w/ CUDA:
112
+
113
+ ```bash
114
+ docker run --gpus all --shm-size 1g -p 8080:80 ghcr.io/huggingface/text-embeddings-inference:cuda-latest --model-id perplexity-ai/pplx-embed-1-4B --auto-truncate
115
+ ```
116
+
117
+ > Alternatively, when running in CUDA you can use the architecture / compute capability specific
118
+ > container instead of the `cuda-latest`, as that includes the binaries for Turing, Ampere and
119
+ > Hopper, so using a dedicated container will be lighter e.g., `ampere-latest`.
120
+
121
+ And then you can send requests to it via cURL to `/embed`:
122
+
123
+ ```bash
124
+ curl http://0.0.0.0:8080/embed \
125
+ -H "Content-Type: application/json" \
126
+ -d '{
127
+ "inputs": [
128
+ "Scientists explore the universe driven by curiosity.",
129
+ "Children learn through curious exploration.",
130
+ "Historical discoveries began with curious questions.",
131
+ "Animals use curiosity to adapt and survive.",
132
+ "Philosophy examines the nature of curiosity.",
133
+ ],
134
+ "normalize" false
135
+ }'
136
+ ```
137
+
138
+ </details>
139
 
140
  ## Technical Details
141