herMaster
/

jina-code-embeddings-1.5b-GGUF

Model card Files Files and versions

jina-code-embeddings-1.5b-GGUF / README.md

herMaster's picture

Added README.md

22516f5 verified 25 days ago

|

history blame contribute delete

3.39 kB

	---
	license: cc-by-nc-4.0
	base_model: jinaai/jina-code-embeddings-1.5b
	tags:
	- embeddings
	- code
	- gguf
	- llama.cpp
	- ollama
	- vector-search
	- retrieval
	---

	# 🧠 jina-code-embeddings-1.5b — GGUF

	This repository provides GGUF-format builds of
	Jina AI’s `jina-code-embeddings-1.5b` for efficient local inference using:

	- llama.cpp
	- LM Studio
	- Ollama
	- KoboldCpp
	- any GGUF-compatible runtime

	These files allow you to run a state-of-the-art code embedding model locally on CPU or GPU without PyTorch.

	## 🔹 Model files

	\| File \| Description \|
	\|------\|------------\|
	\| `jina-code-embeddings-1.5b.gguf` \| Full precision conversion \|

	---

	## 🔗 Original model

	This is a format conversion only of the original Jina AI model:

	Upstream model:
	https://huggingface.co/jinaai/jina-code-embeddings-1.5b

	Paper:
	Efficient Code Embeddings from Code Generation Models (Kryvosheieva et al., 2025)

	All model weights, training, and research belong to Jina AI.
	This repository only provides GGUF format conversions by herMaster.

	---

	## 🧩 What this model does

	This is a code embedding model, not a chat LLM.

	It generates vector embeddings for:

	- Text → Code search
	- Code → Code similarity
	- Code → Text explanation
	- Code completion retrieval
	- Technical Q&A

	It supports 15+ programming languages and produces 1536-dimensional embeddings (which can be truncated for smaller vectors).

	---

	## ⚠️ Important: GGUF usage notes

	Unlike the original Transformers version, GGUF engines do not apply instruction prefixes or pooling automatically.

	To get correct embeddings you must:

	1. Add the correct instruction prefix
	2. Run inference
	3. Use the last token embedding as the vector

	### Example (NL → Code)

	Query:
	```markdown
	Find the most relevant code snippet given the following query:
	print hello world in python
	```

	Candidate code:

	```python
	Candidate code snippet:
	print("Hello world")
	```


	If you do not include the instruction text, embedding quality will be significantly worse.

	---

	## 🛠 llama.cpp example (https://github.com/ggml-org/llama.cpp)

	```bash
	./llama-embedding \
	-m jina-code-embeddings-1.5b.gguf \
	-p "Find the most relevant code snippet given the following query:
	print hello world in python"
	```

	This returns a 1536-dimension vector you can store in FAISS, Qdrant, Milvus, etc.

	## 📜 License
	This model is licensed under:
	> Creative Commons Attribution-NonCommercial 4.0 (CC-BY-NC-4.0)

	You may:
	- Use it for research
	- Use it for personal projects
	- Share it freely

	You may not:
	- Use it in commercial products
	- Run it in paid APIs or SaaS
	- Sell access to it

	This license is inherited from the original Jina AI release.

	## 🙏 Credits

	- Model & training: Jina AI
	- GGUF conversion: herMaster

	All model weights, architecture, and training data belong to Jina AI.
	This repository only provides format-converted GGUF files for easier local inference.

	If you use this model in academic or technical work, please cite the original Jina AI paper:
	> Efficient Code Embeddings from Code Generation Models
	> Daria Kryvosheieva, Saba Sturua, Michael Günther, Scott Martens, Han Xiao (2025)

	This ensures proper credit is given to the original authors and helps support continued research in high-quality code embeddings.