goldpulpy
/

bge-code-v1-GGUF

Sentence Similarity

feature-extraction

Model card Files Files and versions

bge-code-v1-GGUF / README.md

goldpulpy's picture

Update README.md

d6fcdf6 verified 4 months ago

|

history blame contribute delete

1.8 kB

	---
	language:
	- zh
	- en
	tags:
	- feature-extraction
	- llama-cpp
	- gguf
	pipeline_tag: sentence-similarity
	license: apache-2.0
	base_model:
	- BAAI/bge-code-v1
	---

	# BGE Code v1 GGUF
	BGE-Code-v1 is an LLM-based code embedding model that supports code retrieval, text retrieval, and multilingual retrieval.
	Refer to the [original model card](https://huggingface.co/BAAI/bge-code-v1) for more details on the model.

	## Prerequisites

	* [llama.cpp](https://github.com/ggml-org/llama.cpp) installed

	---

	## Available Quantizations

	- bge-code-v1-F32.gguf - 32-bit float (original precision, largest file, best quality)
	- bge-code-v1-F16.gguf - 16-bit float (half precision, excellent quality)
	- bge-code-v1-Q8_0.gguf - 8-bit quantization (recommended, great quality-size balance)
	- bge-code-v1-Q6_K.gguf - 6-bit quantization (balanced)
	- bge-code-v1-Q4_0.gguf - 4-bit quantization (smaller, faster)
	---

	## Running the Server

	You can specify the host, port:

	```bash
	llama-server \
	--hf-repo goldpulpy/bge-code-v1-GGUF \
	--hf-file bge-code-v1-Q8_0.gguf \ # Model file
	--host 0.0.0.0 \ # Server host (default: 127.0.0.1)
	--port 8080 \ # Server port (default: 8080)
	--embeddings
	```

	* Default host: `127.0.0.1`
	* Default port: `8080`

	After starting, the server is accessible at `http://127.0.0.1:8080`.

	---
	## Python Example (OpenAI-compatible)

	```python
	from openai import OpenAI

	client = OpenAI(base_url="http://127.0.0.1:8080/v1", api_key="") # API key can be empty

	response = client.embeddings.create(
	model="bge-code-v1",
	input="def add(a, b): return a + b"
	)

	embedding_vector = response.data[0].embedding
	print("Embedding length:", len(embedding_vector))
	print("First 10 values:", embedding_vector[:10])
	```