infinity usage of reranking. Implements a cohere compatible api.
#10
by
michaelfeil
- opened
README.md
CHANGED
|
@@ -126,6 +126,14 @@ with torch.no_grad():
|
|
| 126 |
# tensor([1.2315, 0.5923, 0.3041])
|
| 127 |
```
|
| 128 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 129 |
|
| 130 |
## Evaluation
|
| 131 |
|
|
|
|
| 126 |
# tensor([1.2315, 0.5923, 0.3041])
|
| 127 |
```
|
| 128 |
|
| 129 |
+
Usage with infinity:
|
| 130 |
+
|
| 131 |
+
[Infinity](https://github.com/michaelfeil/infinity), a MIT Licensed Inference RestAPI Server.
|
| 132 |
+
```
|
| 133 |
+
docker run --gpus all -v $PWD/data:/app/.cache -p "7997":"7997" \
|
| 134 |
+
michaelf34/infinity:0.0.68 \
|
| 135 |
+
v2 --model-id Alibaba-NLP/gte-multilingual-reranker-base --revision "main" --dtype bfloat16 --batch-size 32 --device cuda --engine torch --port 7997
|
| 136 |
+
```
|
| 137 |
|
| 138 |
## Evaluation
|
| 139 |
|