bge-code-v1-GGUF / README.md
goldpulpy's picture
Update README.md
d6fcdf6 verified
---
language:
- zh
- en
tags:
- feature-extraction
- llama-cpp
- gguf
pipeline_tag: sentence-similarity
license: apache-2.0
base_model:
- BAAI/bge-code-v1
---
# BGE Code v1 GGUF
BGE-Code-v1 is an LLM-based code embedding model that supports code retrieval, text retrieval, and multilingual retrieval.
Refer to the [original model card](https://huggingface.co/BAAI/bge-code-v1) for more details on the model.
## Prerequisites
* [llama.cpp](https://github.com/ggml-org/llama.cpp) installed
---
## Available Quantizations
- bge-code-v1-F32.gguf - 32-bit float (original precision, largest file, best quality)
- bge-code-v1-F16.gguf - 16-bit float (half precision, excellent quality)
- bge-code-v1-Q8_0.gguf - 8-bit quantization (recommended, great quality-size balance)
- bge-code-v1-Q6_K.gguf - 6-bit quantization (balanced)
- bge-code-v1-Q4_0.gguf - 4-bit quantization (smaller, faster)
---
## Running the Server
You can specify the **host**, **port**:
```bash
llama-server \
--hf-repo goldpulpy/bge-code-v1-GGUF \
--hf-file bge-code-v1-Q8_0.gguf \ # Model file
--host 0.0.0.0 \ # Server host (default: 127.0.0.1)
--port 8080 \ # Server port (default: 8080)
--embeddings
```
* Default host: `127.0.0.1`
* Default port: `8080`
After starting, the server is accessible at `http://127.0.0.1:8080`.
---
## Python Example (OpenAI-compatible)
```python
from openai import OpenAI
client = OpenAI(base_url="http://127.0.0.1:8080/v1", api_key="") # API key can be empty
response = client.embeddings.create(
model="bge-code-v1",
input="def add(a, b): return a + b"
)
embedding_vector = response.data[0].embedding
print("Embedding length:", len(embedding_vector))
print("First 10 values:", embedding_vector[:10])
```