Model Overview

Model Type: Text Embedding
Number of Parameters: 4B
Context Length: 32k
Adapted from Qwen/Qwen3-4B
Pooling: Last token
For more details, including benchmark evaluation, hardware requirements, and inference performance, training data, please refer to our Github.

Usage

Sentence Transformers Usage

# Requires transformers>=4.51.0
# Requires sentence-transformers>=2.7.0

from sentence_transformers import SentenceTransformer

# Load the model
model = SentenceTransformer("ICT-TIME-and-Querit/BOOM_4B_v1")

# We recommend enabling flash_attention_2 for better acceleration and memory saving,
# together with setting `padding_side` to "left":
# model = SentenceTransformer(
#     "Qwen/Qwen3-Embedding-4B",
#     model_kwargs={"attn_implementation": "flash_attention_2", "device_map": "auto"},
#     tokenizer_kwargs={"padding_side": "left"},
# )

# The queries and documents to embed
queries = [
    "What is the capital of China?",
    "Explain gravity",
]
documents = [
    "The capital of China is Beijing.",
    "Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun.",
]

# Encode the queries and documents. Note that queries benefit from using a prompt
# Here we use the prompt called "query" stored under `model.prompts`, but you can
# also pass your own prompt via the `prompt` argument
query_embeddings = model.encode(queries, prompt_name="query")
document_embeddings = model.encode(documents)

# Compute the (cosine) similarity between the query and document embeddings
similarity = model.similarity(query_embeddings, document_embeddings)
print(similarity)
# tensor([[0.4402, 0.0335],
        [0.0943, 0.3470]])

Transformers Usage

# Requires transformers>=4.51.0
import torch
import torch.nn.functional as F

from torch import Tensor
from transformers import AutoTokenizer, AutoModel


def last_token_pool(last_hidden_states: Tensor,
                 attention_mask: Tensor) -> Tensor:
    left_padding = (attention_mask[:, -1].sum() == attention_mask.shape[0])
    if left_padding:
        return last_hidden_states[:, -1]
    else:
        sequence_lengths = attention_mask.sum(dim=1) - 1
        batch_size = last_hidden_states.shape[0]
        return last_hidden_states[torch.arange(batch_size, device=last_hidden_states.device), sequence_lengths]


def get_detailed_instruct(task_description: str, query: str) -> str:
    return f'Instruct: {task_description}\nQuery:{query}'

# Each query must come with a one-sentence instruction that describes the task
task = 'Given a web search query, retrieve relevant passages that answer the query'

queries = [
    get_detailed_instruct(task, 'What is the capital of China?'),
    get_detailed_instruct(task, 'Explain gravity')
]
# No need to add instruction for retrieval documents
documents = [
    "The capital of China is Beijing.",
    "Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun."
]
input_texts = queries + documents

tokenizer = AutoTokenizer.from_pretrained("ICT-TIME-and-Querit/BOOM_4B_v1", padding_side='left')
model = AutoModel.from_pretrained("ICT-TIME-and-Querit/BOOM_4B_v1")

# We recommend enabling flash_attention_2 for better acceleration and memory saving.
# model = AutoModel.from_pretrained("ICT-TIME-and-Querit/BOOM_4B_v1", attn_implementation="flash_attention_2", torch_dtype=torch.float16).cuda()

max_length = 8192

# Tokenize the input texts
batch_dict = tokenizer(
    input_texts,
    padding=True,
    truncation=True,
    max_length=max_length,
    return_tensors="pt",
)
batch_dict.to(model.device)
outputs = model(**batch_dict)
embeddings = last_token_pool(outputs.last_hidden_state, batch_dict['attention_mask'])

# normalize embeddings
embeddings = F.normalize(embeddings, p=2, dim=1)
scores = (embeddings[:2] @ embeddings[2:].T)
print(scores.tolist())
# [[0.4401702880859375, 0.03349032998085022], [0.09427911043167114, 0.34699785709381104]]

Evaluation

MTEB (Multilingual)

Model Size Mean (Task) Mean (Type) Bitxt Mining Class. Clust. Inst. Retri. Multi. Class. Pair. Class. Rerank Retri. STS
NV-Embed-v2 7B 56.29 49.58 57.84 57.29 40.80 1.04 18.63 78.94 63.82 56.72 71.10
GritLM-7B 7B 60.92 53.74 70.53 61.83 49.75 3.45 22.77 79.94 63.78 58.31 73.33
BGE-M3 0.6B 59.56 52.18 79.11 60.35 40.88 -3.11 20.1 80.76 62.79 54.60 74.12
multilingual-e5-large-instruct 0.6B 63.22 55.08 80.13 64.94 50.75 -0.40 22.91 80.86 62.61 57.12 76.81
gte-Qwen2-1.5B-instruct 1.5B 59.45 52.69 62.51 58.32 52.05 0.74 24.02 81.58 62.58 60.78 71.61
gte-Qwen2-7b-Instruct 7B 62.51 55.93 73.92 61.55 52.77 4.94 25.48 85.13 65.55 60.08 73.98
text-embedding-3-large - 58.93 51.41 62.17 60.27 46.89 -2.68 22.03 79.17 63.89 59.27 71.68
Cohere-embed-multilingual-v3.0 - 61.12 53.23 70.50 62.95 46.89 -1.89 22.74 79.88 64.07 59.16 74.80
gemini-embedding-exp-03-07 - 68.37 59.59 79.28 71.82 54.59 5.18 29.16 83.63 65.58 67.71 79.40
Qwen3-Embedding-4B 4B 69.45 60.86 79.36 72.33 57.15 11.56 26.77 85.05 65.08 69.60 80.86
BOOM_4B_v1 (2.8M training data) 4B 63.52 54.81 69.25 66.94 52.80 -0.55 25.41 80.92 61.88 62.21 74.40

Citation

If you find our work helpful, feel free to give us a cite.

@article{zhang2026bagging,
  title={Bagging-Based Model Merging for Robust General Text Embeddings},
  author={Zhang, Hengran and Bi, Keping and Guo, Jiafeng and Zhang, Jiaming and Yang, Wenbo and Shi, Daiting and Cheng, Xueqi},
  journal={arXiv preprint arXiv:2602.05787},
  year={2026}
}
Downloads last month
89
Safetensors
Model size
4B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ICT-TIME-and-Querit/BOOM_4B_v1

Quantizations
1 model

Paper for ICT-TIME-and-Querit/BOOM_4B_v1