F2LLM-v2-1.7B-Preview

F2LLM-v2-1.7B-Preview is a multilingual embedding model trained from Qwen3-1.7B on a corpus of 27 million samples, spanning over 100 natural and programming languages. It is a "preview" version trained without instructions and intended to serve as a foundation for downstream embedding tasks and further fine-tuning.

F2LLM-v2 is fully open. We release base models in 5 sizes, instruct models in 8 sizes, the training data, the training code, and intermediate checkpoints. The three smallest instruct models are pruned and trained from the 0.6B base model.

Model	Base	Instruct
80M		🤗F2LLM-v2-80M
160M		🤗F2LLM-v2-160M
330M		🤗F2LLM-v2-330M
0.6B	🤗F2LLM-v2-0.6B-Preview	🤗F2LLM-v2-0.6B
1.7B	🤗F2LLM-v2-1.7B-Preview	🤗F2LLM-v2-1.7B
4B	🤗F2LLM-v2-4B-Preview	🤗F2LLM-v2-4B
8B	🤗F2LLM-v2-8B-Preview	🤗F2LLM-v2-8B
14B	🤗F2LLM-v2-14B-Preview	🤗F2LLM-v2-14B

Usage

With Sentence Transformers

To encode text with the Sentence Transformers library:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("codefuse-ai/F2LLM-v2-1.7B-Preview", device="cuda:0", model_kwargs={"torch_dtype": "bfloat16"})

# Some sample query and documents
query = "What is F2LLM used for?"
documents = [
    'We present F2LLM, a family of fully open embedding LLMs that achieve a strong balance between model size, training data, and embedding performance.',
    'F2LLM is a model for computing text embeddings that can be used for various NLP tasks such as information retrieval, semantic search, and text classification.',
    'F2LLM 是 CodeFuse 开源的系列嵌入模型。',
    'F2LLM — это модель вычисления встраивания текста, которую можно использовать для различных задач НЛП, таких как поиск информации, семантический поиск и классификация текста.'
]

# Encode the query and documents
query_embedding = model.encode(query)
document_embeddings = model.encode(documents)
print(query_embedding.shape, document_embeddings.shape)
# (2048,) (4, 2048)

# Compute cosine similarity between the query and documents
similarity = model.similarity(query_embedding, document_embeddings)
print(similarity)
# tensor([[0.6016, 0.7691, 0.6831, 0.8017]])

With Transformers

Or directly with the Transformers library:

from transformers import AutoModel, AutoTokenizer
import torch
import torch.nn.functional as F


model_path = "codefuse-ai/F2LLM-v2-1.7B-Preview"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModel.from_pretrained(model_path, torch_dtype=torch.bfloat16, device_map={'': 0})

query = "What is F2LLM used for?"

documents = [
    'We present F2LLM, a family of fully open embedding LLMs that achieve a strong balance between model size, training data, and embedding performance.',
    'F2LLM is a model for computing text embeddings that can be used for various NLP tasks such as information retrieval, semantic search, and text classification.',
    'F2LLM 是 CodeFuse 开源的系列嵌入模型。',
    'F2LLM — это модель вычисления встраивания текста, которую можно использовать для различных задач НЛП, таких как поиск информации, семантический поиск и классификация текста.'
]

def encode(sentences):
    batch_size = len(sentences)
    # the tokenizer will automatically add eos token
    tokenized_inputs = tokenizer(sentences, padding=True, return_tensors='pt').to(model.device)
    last_hidden_state = model(**tokenized_inputs).last_hidden_state
    eos_positions = tokenized_inputs.attention_mask.sum(dim=1) - 1
    embeddings = last_hidden_state[torch.arange(batch_size, device=model.device), eos_positions]
    embeddings = F.normalize(embeddings, p=2, dim=1)
    return embeddings

# Encode the query and documents
query_embedding = encode([query])
document_embeddings = encode(documents)
print(query_embedding.shape, document_embeddings.shape)
# torch.Size([1, 2048]) torch.Size([4, 2048])

# Compute cosine similarity between the query and documents
similarity = query_embedding @ document_embeddings.T
print(similarity)
# tensor([[0.6016, 0.7695, 0.6836, 0.8008]], device='cuda:0',
#        dtype=torch.bfloat16, grad_fn=<MmBackward0>)

Intermediate Checkpoints

To facilitate future research, we release intermediate checkpoints in the intermediate_checkpoints branch.

Citation

@misc{f2llm-v2,
      title={F2LLM-v2: Inclusive, Performant, and Efficient Embeddings for a Multilingual World}, 
      author={Ziyin Zhang and Zihan Liao and Hang Yu and Peng Di and Rui Wang},
      year={2026},
      eprint={2603.19223},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2603.19223}, 
}