This Model2Vec model was created by using Tokenlearn, with nomic-embed-text-v2-moe as a base.

The output dimension is 384.

The evaluation in the model card was executed using this distilled model, not the original.

This model was trained in streaming mode over large precomputed feature shards with incremental PCA (384d), vocabulary quantization capped at 32k effective tokens, and fine-tuning optimizations for large-scale data.

This is a better model than cnmoro/static-nomic-384-pten

Usage

Load this model using model2vec library:

from model2vec import StaticModel

model = StaticModel.from_pretrained("cnmoro/static-nomic-384-pten-v2")

# Compute text embeddings
embeddings = model.encode(["Example sentence"])

Or using sentence-transformers library:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer('cnmoro/static-nomic-384-pten-v2')

# Compute text embeddings
embeddings = model.encode(["Example sentence"])
Downloads last month
98
Safetensors
Model size
12.8M params
Tensor type
I64
·
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for cnmoro/static-nomic-384-pten-v2

Dataset used to train cnmoro/static-nomic-384-pten-v2

Evaluation results