Feature Extraction
sentence-transformers
Safetensors
Transformers
qwen3_pseudo_moe
sentence-similarity
custom_code
Instructions to use geevec-ai/geevec-embeddings-1.0-lite with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use geevec-ai/geevec-embeddings-1.0-lite with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("geevec-ai/geevec-embeddings-1.0-lite", trust_remote_code=True) sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] - Transformers
How to use geevec-ai/geevec-embeddings-1.0-lite with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("feature-extraction", model="geevec-ai/geevec-embeddings-1.0-lite", trust_remote_code=True)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("geevec-ai/geevec-embeddings-1.0-lite", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
| tags: | |
| - feature-extraction | |
| - sentence-similarity | |
| - sentence-transformers | |
| - transformers | |
| license: apache-2.0 | |
| <div align="center"> | |
| <h1> GeeVec-Embeddings-1.0-Lite </h1> | |
| </div> | |
| **GeeVec-Embeddings-1.0-Lite** is a lightweight domain-adaptive text embedding model, with only **0.35B activated parameters**, built on top of a Qwen3-style base model using a PseudoMoE architecture. It is **optimized for retrieval tasks** and supports **domain routing** for improved specialization: | |
| - `general`: the default route, suitable for general-purpose multilingual retrieval. | |
| - `coding`: specialized for code-related retrieval, including programming concepts, APIs, and technical documentation. | |
| - `reasoning`: specialized for tasks that require deeper semantic understanding, multi-step inference, and complex query matching. | |
| Despite its compact size, GeeVec-Embeddings-1.0-Lite delivers strong performance across a wide range of benchmarks. It achieves SOTA performance among small-size (<1B) models on the MMTEB(Multilingual, v2) retrieval task, with an nDCG@10 score of **74.66** (as of 2026/04/02). It also performs competitively on MMTEB(eng, v2), BEIR, CoIR, and BRIGHT, demonstrating strong retrieval capability despite its lightweight design. | |
| Meanwhile, we also provide an API service for a larger 8B-scale model, **GeeVec-Embeddings-1.0**. Like GeeVec-Embeddings-1.0-Lite, it is optimized for retrieval tasks and supports the same three domains: `general`, `coding`, and `reasoning`. GeeVec-Embeddings-1.0 achieves SOTA performance on the MMTEB(Multilingual, v2) retrieval task, with an nDCG@10 score of **81.18** (as of 2026/04/02). API usage documentation: https://www.geevec.com/documentation. | |
| ## Introduction | |
| This repository hosts the model `geevec-embeddings-1.0-lite`. | |
| Technical highlights: | |
| - Model Type: Text Embedding | |
| - Total Parameters: 349M activated / 366M total | |
| - Context Length: 32,768 | |
| - Embedding dimension: Up to 4096, supports user-defined output dimensions ranging from 256 to 4096 (recommended dimensions: 256, 512, 1024, 2048, 4096) | |
| - Domain-specific support: `general` (default), `coding`, `reasoning` | |
| - Pooling Method: last-token pooling | |
| ## Usage | |
| ### Using FlagEmbedding | |
| ``` | |
| git clone https://github.com/FlagOpen/FlagEmbedding.git | |
| cd FlagEmbedding | |
| pip install -e . | |
| ``` | |
| ```python | |
| from FlagEmbedding import FlagAutoModel | |
| model_path = "geevec-ai/geevec-embeddings-1.0-lite" | |
| model = FlagAutoModel.from_finetuned( | |
| model_path, | |
| model_class="decoder-only-pseudo_moe", | |
| query_instruction_for_retrieval="Given a question, retrieve passages that answer the question.", | |
| query_instruction_format="Instruct: {}\nQuery: {}", | |
| domain_for_pseudo_moe="general", # general / coding / reasoning | |
| use_bf16=True, | |
| use_fp16=False, | |
| trust_remote_code=True, | |
| devices="cuda:0", # if you do not have a GPU, set this to "cpu" | |
| ) | |
| queries = [ | |
| "how much protein should a female eat", | |
| "summit define", | |
| ] | |
| documents = [ | |
| "As a general guideline, the CDC's average requirement of protein for women ages 19 to 70 is 46 grams per day.", | |
| "Definition of summit for English Language Learners: the highest point of a mountain; the highest level; a meeting between leaders.", | |
| ] | |
| query_embeddings = model.encode_queries(queries) | |
| document_embeddings = model.encode_corpus(documents) | |
| similarity = query_embeddings @ document_embeddings.T | |
| print(similarity) | |
| ``` | |
| ### Using Sentence Transformers | |
| ```python | |
| from sentence_transformers import SentenceTransformer | |
| import torch | |
| model_path = "geevec-ai/geevec-embeddings-1.0-lite" | |
| # Load with trust_remote_code=True because the model defines custom modules. | |
| model = SentenceTransformer( | |
| model_path, | |
| model_kwargs={"torch_dtype": torch.bfloat16}, | |
| trust_remote_code=True, | |
| ) | |
| queries = [ | |
| "How can I optimize a Python function that has nested loops?", | |
| "What is the difference between eigenvalue decomposition and SVD?", | |
| ] | |
| documents = [ | |
| "Use vectorization, caching, and algorithmic improvements to reduce complexity.", | |
| "Eigenvalue decomposition applies to square matrices; SVD works for any matrix.", | |
| ] | |
| # Optional domain routing: general / coding / reasoning | |
| query_embeddings = model.encode(queries, domain="coding", normalize_embeddings=True) | |
| doc_embeddings = model.encode(documents, domain="coding", normalize_embeddings=True) | |
| similarity = query_embeddings @ doc_embeddings.T | |
| print(similarity) | |
| ``` | |
| ### Using HuggingFace Transformers | |
| ```python | |
| import torch | |
| import torch.nn.functional as F | |
| from torch import Tensor | |
| from transformers import AutoTokenizer, AutoModel | |
| def last_token_pool(last_hidden_states: Tensor, | |
| attention_mask: Tensor) -> Tensor: | |
| left_padding = (attention_mask[:, -1].sum() == attention_mask.shape[0]) | |
| if left_padding: | |
| return last_hidden_states[:, -1] | |
| else: | |
| sequence_lengths = attention_mask.sum(dim=1) - 1 | |
| batch_size = last_hidden_states.shape[0] | |
| return last_hidden_states[torch.arange(batch_size, device=last_hidden_states.device), sequence_lengths] | |
| def get_detailed_instruct(task_description: str, query: str) -> str: | |
| return f'Instruct: {task_description}\nQuery: {query}' | |
| task = 'Given a web search query, retrieve relevant passages that answer the query.' | |
| queries = [ | |
| get_detailed_instruct(task, "How can I optimize a Python function that has nested loops?"), | |
| get_detailed_instruct(task, 'summit define') | |
| ] | |
| # No need to add instructions for documents | |
| documents = [ | |
| "Use vectorization, caching, and algorithmic improvements to reduce complexity.", | |
| "What is the difference between eigenvalue decomposition and SVD?", | |
| ] | |
| input_texts = queries + documents | |
| tokenizer = AutoTokenizer.from_pretrained("geevec-ai/geevec-embeddings-1.0-lite") | |
| model = AutoModel.from_pretrained("/geevec-ai/geevec-embeddings-1.0-lite", trust_remote_code=True) | |
| model.eval() | |
| max_length = 4096 | |
| # Tokenize the input texts | |
| batch_dict = tokenizer(input_texts, max_length=max_length, padding=True, truncation=True, return_tensors='pt', pad_to_multiple_of=8) | |
| with torch.no_grad(): | |
| outputs = model(**batch_dict) | |
| embeddings = last_token_pool(outputs.last_hidden_state, batch_dict['attention_mask']) | |
| # normalize embeddings | |
| embeddings = F.normalize(embeddings, p=2, dim=1) | |
| scores = (embeddings[:2] @ embeddings[2:].T) * 100 | |
| print(scores.tolist()) | |
| ``` | |
| ## Notes | |
| - This model uses custom files `modeling_qwen3_pseudo_moe.py`, `configuration_qwen3_pseudo_moe.py`, and `pseudo_moe_st_module.py`. | |
| - When loading from local path or hub, set `trust_remote_code=True`. | |
| - If you do not specify a domain, the model uses `general` by default. | |
| ## Evaluation | |
| The following benchmark results summarize the performance of GeeVec-Embeddings-1.0 and GeeVec-Embeddings-1.0-Lite on the main retrieval and embedding evaluation suites. | |
| ### MMTEB(Multilingual, v2) - `general` | |
|  | |
| ### MMTEB(eng, v2) - `general` | |
|  | |
| ### BEIR - `general` | |
|  | |
| ### CoIR - `coding` | |
|  | |
| ### BRIGHT - `reasoning` | |
|  | |