Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,37 @@
|
|
| 1 |
---
|
| 2 |
license: mit
|
| 3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: mit
|
| 3 |
---
|
| 4 |
+
|
| 5 |
+
<img src="https://raw.githubusercontent.com/CompendiumLabs/compendiumlabs.ai/main/images/logo_text_crop.png" alt="Compendium Labs" style="width: 500px;">
|
| 6 |
+
|
| 7 |
+
# bge-base-zh-v1.5-gguf
|
| 8 |
+
Source model: https://huggingface.co/BAAI/bge-base-zh-v1.5
|
| 9 |
+
|
| 10 |
+
Quantized and unquantized embedding models in GGUF format for use with `llama.cpp`. A large benefit over `transformers` is almost guaranteed and the benefit over ONNX will vary based on the application, but this seems to provide a large speedup on CPU and a modest speedup on GPU for larger models. Due to the relatively small size of these models, quantization will not provide huge benefits, but it does generate up to a 30% speedup on CPU with minimal loss in accuracy.
|
| 11 |
+
|
| 12 |
+
<br/>
|
| 13 |
+
|
| 14 |
+
# Files Available
|
| 15 |
+
|
| 16 |
+
<div style="width: 500px; margin: 0;">
|
| 17 |
+
|
| 18 |
+
| Filename | Quantization | Size |
|
| 19 |
+
|:-------- | ------------ | ---- |
|
| 20 |
+
| [bge-base-zh-v1.5-f32.gguf](https://huggingface.co/CompendiumLabs/bge-base-zh-v1.5-gguf/blob/main/bge-base-zh-v1.5-f32.gguf) | F32 | 389 MB |
|
| 21 |
+
| [bge-base-zh-v1.5-f16.gguf](https://huggingface.co/CompendiumLabs/bge-base-zh-v1.5-gguf/blob/main/bge-base-zh-v1.5-f16.gguf) | F16 | 195 MB |
|
| 22 |
+
| [bge-base-zh-v1.5-q8_0.gguf](https://huggingface.co/CompendiumLabs/bge-base-zh-v1.5-gguf/blob/main/bge-base-zh-v1.5-q8_0.gguf) | Q8_0 | 105 MB |
|
| 23 |
+
| [bge-base-zh-v1.5-q4_k_m.gguf](https://huggingface.co/CompendiumLabs/bge-base-zh-v1.5-gguf/blob/main/bge-base-zh-v1.5-q4_k_m.gguf) | Q4_K_M | 62 MB |
|
| 24 |
+
|
| 25 |
+
</div>
|
| 26 |
+
|
| 27 |
+
<br/>
|
| 28 |
+
|
| 29 |
+
# Usage
|
| 30 |
+
|
| 31 |
+
These model files can be used with pure `llama.cpp` or with the `llama-cpp-python` Python bindings
|
| 32 |
+
```python
|
| 33 |
+
from llama_cpp import Llama
|
| 34 |
+
model = Llama(gguf_path, embedding=True)
|
| 35 |
+
embed = model.embed(texts)
|
| 36 |
+
```
|
| 37 |
+
Here `texts` can either be a string or a list of strings, and the return value is a list of embedding vectors. The inputs are grouped into batches automatically for efficient execution. There is also LangChain integration through `langchain_community.embeddings.LlamaCppEmbeddings`.
|