File size: 7,134 Bytes
64253c3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e62d287
64253c3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
---
tags:
- feature-extraction
- sentence-similarity
- sentence-transformers
- transformers
license: apache-2.0
---

<div align="center">
<h1> GeeVec-Embeddings-1.0-Lite </h1>
</div>

**GeeVec-Embeddings-1.0-Lite** is a lightweight domain-adaptive text embedding model, with only **0.35B activated parameters**, built on top of a Qwen3-style base model using a PseudoMoE architecture. It is **optimized for retrieval tasks** and supports **domain routing** for improved specialization:

- `general`: the default route, suitable for general-purpose multilingual retrieval.
- `coding`: specialized for code-related retrieval, including programming concepts, APIs, and technical documentation.
- `reasoning`: specialized for tasks that require deeper semantic understanding, multi-step inference, and complex query matching.

Despite its compact size, GeeVec-Embeddings-1.0-Lite delivers strong performance across a wide range of benchmarks. It achieves SOTA performance among small-size (<1B) models on the MMTEB(Multilingual, v2) retrieval task, with an nDCG@10 score of **74.66** (as of 2026/04/02). It also performs competitively on MMTEB(eng, v2), BEIR, CoIR, and BRIGHT, demonstrating strong retrieval capability despite its lightweight design.

Meanwhile, we also provide an API service for a larger 8B-scale model, **GeeVec-Embeddings-1.0**. Like GeeVec-Embeddings-1.0-Lite, it is optimized for retrieval tasks and supports the same three domains: `general`, `coding`, and `reasoning`. GeeVec-Embeddings-1.0 achieves SOTA performance on the MMTEB(Multilingual, v2) retrieval task, with an nDCG@10 score of **81.18** (as of 2026/04/02). API usage documentation: https://www.geevec.com/documentation.



## Introduction

This repository hosts the model `geevec-embeddings-1.0-lite`.

Technical highlights:
- Model Type: Text Embedding
- Total Parameters: 349M activated / 366M total
- Context Length: 32,768
- Embedding dimension: Up to 4096, supports user-defined output dimensions ranging from 256 to 4096 (recommended dimensions: 256, 512, 1024, 2048, 4096)
- Domain-specific support: `general` (default),  `coding`, `reasoning`
- Pooling Method: last-token pooling

## Usage

### Using FlagEmbedding

```
git clone https://github.com/FlagOpen/FlagEmbedding.git
cd FlagEmbedding
pip install -e .
```

```python
from FlagEmbedding import FlagAutoModel

model_path = "geevec-ai/geevec-embeddings-1.0-lite"

model = FlagAutoModel.from_finetuned(
    model_path,
    model_class="decoder-only-pseudo_moe",
    query_instruction_for_retrieval="Given a question, retrieve passages that answer the question.",
    query_instruction_format="Instruct: {}\nQuery: {}",
    domain_for_pseudo_moe="general",  # general / coding / reasoning
    use_bf16=True,
    use_fp16=False,
    trust_remote_code=True,
    devices="cuda:0",  # if you do not have a GPU, set this to "cpu"
)

queries = [
    "how much protein should a female eat",
    "summit define",
]
documents = [
    "As a general guideline, the CDC's average requirement of protein for women ages 19 to 70 is 46 grams per day.",
    "Definition of summit for English Language Learners: the highest point of a mountain; the highest level; a meeting between leaders.",
]

query_embeddings = model.encode_queries(queries)
document_embeddings = model.encode_corpus(documents)

similarity = query_embeddings @ document_embeddings.T
print(similarity)
```

### Using Sentence Transformers

```python
from sentence_transformers import SentenceTransformer
import torch

model_path = "geevec-ai/geevec-embeddings-1.0-lite"

# Load with trust_remote_code=True because the model defines custom modules.
model = SentenceTransformer(
    model_path,
    model_kwargs={"torch_dtype": torch.bfloat16},
    trust_remote_code=True,
)

queries = [
    "How can I optimize a Python function that has nested loops?",
    "What is the difference between eigenvalue decomposition and SVD?",
]

documents = [
    "Use vectorization, caching, and algorithmic improvements to reduce complexity.",
    "Eigenvalue decomposition applies to square matrices; SVD works for any matrix.",
]

# Optional domain routing: general / coding / reasoning
query_embeddings = model.encode(queries, domain="coding", normalize_embeddings=True)
doc_embeddings = model.encode(documents, domain="coding", normalize_embeddings=True)

similarity = query_embeddings @ doc_embeddings.T
print(similarity)
```

### Using HuggingFace Transformers

```python
import torch
import torch.nn.functional as F

from torch import Tensor
from transformers import AutoTokenizer, AutoModel


def last_token_pool(last_hidden_states: Tensor,
                 attention_mask: Tensor) -> Tensor:
    left_padding = (attention_mask[:, -1].sum() == attention_mask.shape[0])
    if left_padding:
        return last_hidden_states[:, -1]
    else:
        sequence_lengths = attention_mask.sum(dim=1) - 1
        batch_size = last_hidden_states.shape[0]
        return last_hidden_states[torch.arange(batch_size, device=last_hidden_states.device), sequence_lengths]


def get_detailed_instruct(task_description: str, query: str) -> str:
    return f'Instruct: {task_description}\nQuery: {query}'


task = 'Given a web search query, retrieve relevant passages that answer the query.'
queries = [
    get_detailed_instruct(task, "How can I optimize a Python function that has nested loops?"),
    get_detailed_instruct(task, 'summit define')
]
# No need to add instructions for documents
documents = [
    "Use vectorization, caching, and algorithmic improvements to reduce complexity.",
    "What is the difference between eigenvalue decomposition and SVD?",
]
input_texts = queries + documents

tokenizer = AutoTokenizer.from_pretrained("geevec-ai/geevec-embeddings-1.0-lite")
model = AutoModel.from_pretrained("/geevec-ai/geevec-embeddings-1.0-lite", trust_remote_code=True)
model.eval()

max_length = 4096
# Tokenize the input texts
batch_dict = tokenizer(input_texts, max_length=max_length, padding=True, truncation=True, return_tensors='pt', pad_to_multiple_of=8)

with torch.no_grad():
    outputs = model(**batch_dict)
    embeddings = last_token_pool(outputs.last_hidden_state, batch_dict['attention_mask'])
    
# normalize embeddings
embeddings = F.normalize(embeddings, p=2, dim=1)
scores = (embeddings[:2] @ embeddings[2:].T) * 100
print(scores.tolist())
```

## Notes

- This model uses custom files `modeling_qwen3_pseudo_moe.py`, `configuration_qwen3_pseudo_moe.py`, and `pseudo_moe_st_module.py`.
- When loading from local path or hub, set `trust_remote_code=True`.
- If you do not specify a domain, the model uses `general` by default.

## Evaluation

The following benchmark results summarize the performance of GeeVec-Embeddings-1.0 and GeeVec-Embeddings-1.0-Lite on the main retrieval and embedding evaluation suites.

### MMTEB(Multilingual, v2) - `general`

![MTEB Multilingual](imgs/MMTEB_MULTILINGUAL_v2.png)


### MMTEB(eng, v2) - `general`

![MTEB English](imgs/MMTEB_ENG_V2.png)


### BEIR - `general`

![BEIR](imgs/BEIR.png)

### CoIR - `coding`

![COIR](imgs/COIR.png)

### BRIGHT - `reasoning`

![BRIGHT](imgs/BRIGHT.png)