Feature Extraction
sentence-transformers
Safetensors
Transformers
qwen3_pseudo_moe
sentence-similarity
custom_code
Instructions to use geevec-ai/geevec-embeddings-1.0-lite with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use geevec-ai/geevec-embeddings-1.0-lite with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("geevec-ai/geevec-embeddings-1.0-lite", trust_remote_code=True) sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] - Transformers
How to use geevec-ai/geevec-embeddings-1.0-lite with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("feature-extraction", model="geevec-ai/geevec-embeddings-1.0-lite", trust_remote_code=True)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("geevec-ai/geevec-embeddings-1.0-lite", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
File size: 7,134 Bytes
64253c3 e62d287 64253c3 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 | ---
tags:
- feature-extraction
- sentence-similarity
- sentence-transformers
- transformers
license: apache-2.0
---
<div align="center">
<h1> GeeVec-Embeddings-1.0-Lite </h1>
</div>
**GeeVec-Embeddings-1.0-Lite** is a lightweight domain-adaptive text embedding model, with only **0.35B activated parameters**, built on top of a Qwen3-style base model using a PseudoMoE architecture. It is **optimized for retrieval tasks** and supports **domain routing** for improved specialization:
- `general`: the default route, suitable for general-purpose multilingual retrieval.
- `coding`: specialized for code-related retrieval, including programming concepts, APIs, and technical documentation.
- `reasoning`: specialized for tasks that require deeper semantic understanding, multi-step inference, and complex query matching.
Despite its compact size, GeeVec-Embeddings-1.0-Lite delivers strong performance across a wide range of benchmarks. It achieves SOTA performance among small-size (<1B) models on the MMTEB(Multilingual, v2) retrieval task, with an nDCG@10 score of **74.66** (as of 2026/04/02). It also performs competitively on MMTEB(eng, v2), BEIR, CoIR, and BRIGHT, demonstrating strong retrieval capability despite its lightweight design.
Meanwhile, we also provide an API service for a larger 8B-scale model, **GeeVec-Embeddings-1.0**. Like GeeVec-Embeddings-1.0-Lite, it is optimized for retrieval tasks and supports the same three domains: `general`, `coding`, and `reasoning`. GeeVec-Embeddings-1.0 achieves SOTA performance on the MMTEB(Multilingual, v2) retrieval task, with an nDCG@10 score of **81.18** (as of 2026/04/02). API usage documentation: https://www.geevec.com/documentation.
## Introduction
This repository hosts the model `geevec-embeddings-1.0-lite`.
Technical highlights:
- Model Type: Text Embedding
- Total Parameters: 349M activated / 366M total
- Context Length: 32,768
- Embedding dimension: Up to 4096, supports user-defined output dimensions ranging from 256 to 4096 (recommended dimensions: 256, 512, 1024, 2048, 4096)
- Domain-specific support: `general` (default), `coding`, `reasoning`
- Pooling Method: last-token pooling
## Usage
### Using FlagEmbedding
```
git clone https://github.com/FlagOpen/FlagEmbedding.git
cd FlagEmbedding
pip install -e .
```
```python
from FlagEmbedding import FlagAutoModel
model_path = "geevec-ai/geevec-embeddings-1.0-lite"
model = FlagAutoModel.from_finetuned(
model_path,
model_class="decoder-only-pseudo_moe",
query_instruction_for_retrieval="Given a question, retrieve passages that answer the question.",
query_instruction_format="Instruct: {}\nQuery: {}",
domain_for_pseudo_moe="general", # general / coding / reasoning
use_bf16=True,
use_fp16=False,
trust_remote_code=True,
devices="cuda:0", # if you do not have a GPU, set this to "cpu"
)
queries = [
"how much protein should a female eat",
"summit define",
]
documents = [
"As a general guideline, the CDC's average requirement of protein for women ages 19 to 70 is 46 grams per day.",
"Definition of summit for English Language Learners: the highest point of a mountain; the highest level; a meeting between leaders.",
]
query_embeddings = model.encode_queries(queries)
document_embeddings = model.encode_corpus(documents)
similarity = query_embeddings @ document_embeddings.T
print(similarity)
```
### Using Sentence Transformers
```python
from sentence_transformers import SentenceTransformer
import torch
model_path = "geevec-ai/geevec-embeddings-1.0-lite"
# Load with trust_remote_code=True because the model defines custom modules.
model = SentenceTransformer(
model_path,
model_kwargs={"torch_dtype": torch.bfloat16},
trust_remote_code=True,
)
queries = [
"How can I optimize a Python function that has nested loops?",
"What is the difference between eigenvalue decomposition and SVD?",
]
documents = [
"Use vectorization, caching, and algorithmic improvements to reduce complexity.",
"Eigenvalue decomposition applies to square matrices; SVD works for any matrix.",
]
# Optional domain routing: general / coding / reasoning
query_embeddings = model.encode(queries, domain="coding", normalize_embeddings=True)
doc_embeddings = model.encode(documents, domain="coding", normalize_embeddings=True)
similarity = query_embeddings @ doc_embeddings.T
print(similarity)
```
### Using HuggingFace Transformers
```python
import torch
import torch.nn.functional as F
from torch import Tensor
from transformers import AutoTokenizer, AutoModel
def last_token_pool(last_hidden_states: Tensor,
attention_mask: Tensor) -> Tensor:
left_padding = (attention_mask[:, -1].sum() == attention_mask.shape[0])
if left_padding:
return last_hidden_states[:, -1]
else:
sequence_lengths = attention_mask.sum(dim=1) - 1
batch_size = last_hidden_states.shape[0]
return last_hidden_states[torch.arange(batch_size, device=last_hidden_states.device), sequence_lengths]
def get_detailed_instruct(task_description: str, query: str) -> str:
return f'Instruct: {task_description}\nQuery: {query}'
task = 'Given a web search query, retrieve relevant passages that answer the query.'
queries = [
get_detailed_instruct(task, "How can I optimize a Python function that has nested loops?"),
get_detailed_instruct(task, 'summit define')
]
# No need to add instructions for documents
documents = [
"Use vectorization, caching, and algorithmic improvements to reduce complexity.",
"What is the difference between eigenvalue decomposition and SVD?",
]
input_texts = queries + documents
tokenizer = AutoTokenizer.from_pretrained("geevec-ai/geevec-embeddings-1.0-lite")
model = AutoModel.from_pretrained("/geevec-ai/geevec-embeddings-1.0-lite", trust_remote_code=True)
model.eval()
max_length = 4096
# Tokenize the input texts
batch_dict = tokenizer(input_texts, max_length=max_length, padding=True, truncation=True, return_tensors='pt', pad_to_multiple_of=8)
with torch.no_grad():
outputs = model(**batch_dict)
embeddings = last_token_pool(outputs.last_hidden_state, batch_dict['attention_mask'])
# normalize embeddings
embeddings = F.normalize(embeddings, p=2, dim=1)
scores = (embeddings[:2] @ embeddings[2:].T) * 100
print(scores.tolist())
```
## Notes
- This model uses custom files `modeling_qwen3_pseudo_moe.py`, `configuration_qwen3_pseudo_moe.py`, and `pseudo_moe_st_module.py`.
- When loading from local path or hub, set `trust_remote_code=True`.
- If you do not specify a domain, the model uses `general` by default.
## Evaluation
The following benchmark results summarize the performance of GeeVec-Embeddings-1.0 and GeeVec-Embeddings-1.0-Lite on the main retrieval and embedding evaluation suites.
### MMTEB(Multilingual, v2) - `general`

### MMTEB(eng, v2) - `general`

### BEIR - `general`

### CoIR - `coding`

### BRIGHT - `reasoning`

|