Update README.md

e62d287 verified about 2 months ago

7.13 kB

	---
	tags:
	- feature-extraction
	- sentence-similarity
	- sentence-transformers
	- transformers
	license: apache-2.0
	---

	<div align="center">
	<h1> GeeVec-Embeddings-1.0-Lite </h1>
	</div>

	GeeVec-Embeddings-1.0-Lite is a lightweight domain-adaptive text embedding model, with only 0.35B activated parameters, built on top of a Qwen3-style base model using a PseudoMoE architecture. It is optimized for retrieval tasks and supports domain routing for improved specialization:

	- `general`: the default route, suitable for general-purpose multilingual retrieval.
	- `coding`: specialized for code-related retrieval, including programming concepts, APIs, and technical documentation.
	- `reasoning`: specialized for tasks that require deeper semantic understanding, multi-step inference, and complex query matching.

	Despite its compact size, GeeVec-Embeddings-1.0-Lite delivers strong performance across a wide range of benchmarks. It achieves SOTA performance among small-size (<1B) models on the MMTEB(Multilingual, v2) retrieval task, with an nDCG@10 score of 74.66 (as of 2026/04/02). It also performs competitively on MMTEB(eng, v2), BEIR, CoIR, and BRIGHT, demonstrating strong retrieval capability despite its lightweight design.

	Meanwhile, we also provide an API service for a larger 8B-scale model, GeeVec-Embeddings-1.0. Like GeeVec-Embeddings-1.0-Lite, it is optimized for retrieval tasks and supports the same three domains: `general`, `coding`, and `reasoning`. GeeVec-Embeddings-1.0 achieves SOTA performance on the MMTEB(Multilingual, v2) retrieval task, with an nDCG@10 score of 81.18 (as of 2026/04/02). API usage documentation: https://www.geevec.com/documentation.



	## Introduction

	This repository hosts the model `geevec-embeddings-1.0-lite`.

	Technical highlights:
	- Model Type: Text Embedding
	- Total Parameters: 349M activated / 366M total
	- Context Length: 32,768
	- Embedding dimension: Up to 4096, supports user-defined output dimensions ranging from 256 to 4096 (recommended dimensions: 256, 512, 1024, 2048, 4096)
	- Domain-specific support: `general` (default), `coding`, `reasoning`
	- Pooling Method: last-token pooling

	## Usage

	### Using FlagEmbedding

	```
	git clone https://github.com/FlagOpen/FlagEmbedding.git
	cd FlagEmbedding
	pip install -e .
	```

	```python
	from FlagEmbedding import FlagAutoModel

	model_path = "geevec-ai/geevec-embeddings-1.0-lite"

	model = FlagAutoModel.from_finetuned(
	model_path,
	model_class="decoder-only-pseudo_moe",
	query_instruction_for_retrieval="Given a question, retrieve passages that answer the question.",
	query_instruction_format="Instruct: {}\nQuery: {}",
	domain_for_pseudo_moe="general", # general / coding / reasoning
	use_bf16=True,
	use_fp16=False,
	trust_remote_code=True,
	devices="cuda:0", # if you do not have a GPU, set this to "cpu"
	)

	queries = [
	"how much protein should a female eat",
	"summit define",
	]
	documents = [
	"As a general guideline, the CDC's average requirement of protein for women ages 19 to 70 is 46 grams per day.",
	"Definition of summit for English Language Learners: the highest point of a mountain; the highest level; a meeting between leaders.",
	]

	query_embeddings = model.encode_queries(queries)
	document_embeddings = model.encode_corpus(documents)

	similarity = query_embeddings @ document_embeddings.T
	print(similarity)
	```

	### Using Sentence Transformers

	```python
	from sentence_transformers import SentenceTransformer
	import torch

	model_path = "geevec-ai/geevec-embeddings-1.0-lite"

	# Load with trust_remote_code=True because the model defines custom modules.
	model = SentenceTransformer(
	model_path,
	model_kwargs={"torch_dtype": torch.bfloat16},
	trust_remote_code=True,
	)

	queries = [
	"How can I optimize a Python function that has nested loops?",
	"What is the difference between eigenvalue decomposition and SVD?",
	]

	documents = [
	"Use vectorization, caching, and algorithmic improvements to reduce complexity.",
	"Eigenvalue decomposition applies to square matrices; SVD works for any matrix.",
	]

	# Optional domain routing: general / coding / reasoning
	query_embeddings = model.encode(queries, domain="coding", normalize_embeddings=True)
	doc_embeddings = model.encode(documents, domain="coding", normalize_embeddings=True)

	similarity = query_embeddings @ doc_embeddings.T
	print(similarity)
	```

	### Using HuggingFace Transformers

	```python
	import torch
	import torch.nn.functional as F

	from torch import Tensor
	from transformers import AutoTokenizer, AutoModel


	def last_token_pool(last_hidden_states: Tensor,
	attention_mask: Tensor) -> Tensor:
	left_padding = (attention_mask[:, -1].sum() == attention_mask.shape[0])
	if left_padding:
	return last_hidden_states[:, -1]
	else:
	sequence_lengths = attention_mask.sum(dim=1) - 1
	batch_size = last_hidden_states.shape[0]
	return last_hidden_states[torch.arange(batch_size, device=last_hidden_states.device), sequence_lengths]


	def get_detailed_instruct(task_description: str, query: str) -> str:
	return f'Instruct: {task_description}\nQuery: {query}'


	task = 'Given a web search query, retrieve relevant passages that answer the query.'
	queries = [
	get_detailed_instruct(task, "How can I optimize a Python function that has nested loops?"),
	get_detailed_instruct(task, 'summit define')
	]
	# No need to add instructions for documents
	documents = [
	"Use vectorization, caching, and algorithmic improvements to reduce complexity.",
	"What is the difference between eigenvalue decomposition and SVD?",
	]
	input_texts = queries + documents

	tokenizer = AutoTokenizer.from_pretrained("geevec-ai/geevec-embeddings-1.0-lite")
	model = AutoModel.from_pretrained("/geevec-ai/geevec-embeddings-1.0-lite", trust_remote_code=True)
	model.eval()

	max_length = 4096
	# Tokenize the input texts
	batch_dict = tokenizer(input_texts, max_length=max_length, padding=True, truncation=True, return_tensors='pt', pad_to_multiple_of=8)

	with torch.no_grad():
	outputs = model(**batch_dict)
	embeddings = last_token_pool(outputs.last_hidden_state, batch_dict['attention_mask'])

	# normalize embeddings
	embeddings = F.normalize(embeddings, p=2, dim=1)
	scores = (embeddings[:2] @ embeddings[2:].T) * 100
	print(scores.tolist())
	```

	## Notes

	- This model uses custom files `modeling_qwen3_pseudo_moe.py`, `configuration_qwen3_pseudo_moe.py`, and `pseudo_moe_st_module.py`.
	- When loading from local path or hub, set `trust_remote_code=True`.
	- If you do not specify a domain, the model uses `general` by default.

	## Evaluation

	The following benchmark results summarize the performance of GeeVec-Embeddings-1.0 and GeeVec-Embeddings-1.0-Lite on the main retrieval and embedding evaluation suites.

	### MMTEB(Multilingual, v2) - `general`

	![MTEB Multilingual](imgs/MMTEB_MULTILINGUAL_v2.png)


	### MMTEB(eng, v2) - `general`

	![MTEB English](imgs/MMTEB_ENG_V2.png)


	### BEIR - `general`

	![BEIR](imgs/BEIR.png)

	### CoIR - `coding`

	![COIR](imgs/COIR.png)

	### BRIGHT - `reasoning`

	![BRIGHT](imgs/BRIGHT.png)