Overview
Nomnom is a long-context local embedding model option used in Chonky. It is derived from Nomic's
nomic-embed-text-v1.5, a text embedding model designed for strong retrieval quality
with support for long context windows, task-style instruction, RAG, and document indexing scenarios
where instruction-prefixed embedding inputs are desirable.
The upstream Nomic model family is built around embedding tasks that benefit from
explicit input prefixes such as search_query: and task-specific input conventions.
That makes it a particularly good candidate for retrieval pipelines where the query and
stored corpus should be encoded with deliberate role distinctions.
A chonky chonk
Use Nomnom when you want:
- strong local semantic retrieval
- long-context document embedding support
- a model designed around explicit task prefixes
- a local embedder well suited for RAG-style search workflows
Nomnom is a strong fit for:
- semantic indexing of long document chunks
- retrieval-augmented generation workflows
- embedding queries separately from stored passage content
- local experimentation with prefix-aware embedding strategies
Base Model Lineage
Nomnom is derived from:
nomic-ai/nomic-embed-text-v1.5
Key characteristics of the model include:
- long-context support in the original transformer implementation
- strong retrieval-oriented design
- support for task instruction prefixes such as
search_query:and other prefixed task modes - support for reduced embedding sizes in the upstream family through Matryoshka-style representation behavior
Important Prefix Behavior
The upstream Nomic model family expects task instruction prefixes at the beginning of text strings for best results. In practice, that means inputs may need prefixes such as:
search_query:for user queries- task-specific prefixes for other workflows depending on how you use the model
If Chonky later adds model-aware prefix handling, Nomnom is one of the clearest models that can benefit from it.
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("chonks/nomnom-embed-text-v1.5", trust_remote_code=True)
sentences = ['search_document: TSNE is a dimensionality reduction algorithm created by Laurens van Der Maaten']
embeddings = model.encode(sentences)
print(embeddings)
GGUF-Specific Context Note
The GGUF release is suitable for local llama.cpp-style inference, but the long-context behavior available in the original transformer implementation may require additional context extension settings in llama.cpp-based runtimes to fully match upstream context length expectations.
Local File Layout Expected by Chonky
Chonky expects Nomnom at:
chonks/nomi/nomnom-embed-text-v1.5.Q4_K_M.gguf
Features
- local GGUF-based embedding model
- no cloud API dependency
- retrieval-oriented embedding behavior
- especially attractive for query/document search pipelines
- supports local embedding generation for chunked corpora
- aligns well with Chonky's semantic search and vector storage workflows
Recommended Chonky Usage
Nomnom is recommended when:
- you want a local retrieval model with strong search-oriented lineage
- you plan to distinguish query strings from indexed corpus text
- you want a local embedding path aligned to modern RAG conventions
- you value long-context model lineage for document-heavy tasks
Usage
Important: the text prompt must include a task instruction prefix, instructing the model which task is being performed.
For example, if you are implementing a RAG application, you embed your documents as search_document: <text here> and embed your user queries as search_query: <text here>.
Task instruction prefixes
search_document
Purpose: embed texts as documents from a dataset
This prefix is used for embedding texts as documents, for example as documents for a RAG index.
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("nomic-ai/nomic-embed-text-v1.5", trust_remote_code=True)
sentences = ['search_document: TSNE is a dimensionality reduction algorithm created by Laurens van Der Maaten']
embeddings = model.encode(sentences)
print(embeddings)
search_query
Purpose: embed texts as questions to answer
This prefix is used for embedding texts as questions that documents from a dataset could resolve, for example as queries to be answered by a RAG application.
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("nomic-ai/nomic-embed-text-v1.5", trust_remote_code=True)
sentences = ['search_query: Who is Laurens van Der Maaten?']
embeddings = model.encode(sentences)
print(embeddings)
clustering
Purpose: embed texts to group them into clusters
This prefix is used for embedding texts in order to group them into clusters, discover common topics, or remove semantic duplicates.
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("nomic-ai/nomic-embed-text-v1.5", trust_remote_code=True)
sentences = ['clustering: the quick brown fox']
embeddings = model.encode(sentences)
print(embeddings)
classification
Purpose: embed texts to classify them
This prefix is used for embedding texts into vectors that will be used as features for a classification model
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("nomic-ai/nomic-embed-text-v1.5", trust_remote_code=True)
sentences = ['classification: the quick brown fox']
embeddings = model.encode(sentences)
print(embeddings)
Sentence Transformers
import torch.nn.functional as F
from sentence_transformers import SentenceTransformer
matryoshka_dim = 512
model = SentenceTransformer("nomic-ai/nomic-embed-text-v1.5", trust_remote_code=True)
sentences = ['search_query: What is TSNE?', 'search_query: Who is Laurens van der Maaten?']
embeddings = model.encode(sentences, convert_to_tensor=True)
embeddings = F.layer_norm(embeddings, normalized_shape=(embeddings.shape[1],))
embeddings = embeddings[:, :matryoshka_dim]
embeddings = F.normalize(embeddings, p=2, dim=1)
print(embeddings)
Transformers
import torch
import torch.nn.functional as F
from transformers import AutoTokenizer, AutoModel
def mean_pooling(model_output, attention_mask):
token_embeddings = model_output[0]
input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)
sentences = ['search_query: What is TSNE?', 'search_query: Who is Laurens van der Maaten?']
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
model = AutoModel.from_pretrained('nomic-ai/nomic-embed-text-v1.5', trust_remote_code=True, safe_serialization=True)
model.eval()
encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
+ matryoshka_dim = 512
with torch.no_grad():
model_output = model(**encoded_input)
embeddings = mean_pooling(model_output, encoded_input['attention_mask'])
+ embeddings = F.layer_norm(embeddings, normalized_shape=(embeddings.shape[1],))
+ embeddings = embeddings[:, :matryoshka_dim]
embeddings = F.normalize(embeddings, p=2, dim=1)
print(embeddings)
The model natively supports scaling of the sequence length past 2048 tokens. To do so,
- tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
+ tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased', model_max_length=8192)
- model = AutoModel.from_pretrained('nomic-ai/nomic-embed-text-v1.5', trust_remote_code=True)
+ model = AutoModel.from_pretrained('nomic-ai/nomic-embed-text-v1.5', trust_remote_code=True, rotary_scaling_factor=2)
Transformers.js
import { pipeline, layer_norm } from '@huggingface/transformers';
// Create a feature extraction pipeline
const extractor = await pipeline('feature-extraction', 'nomic-ai/nomic-embed-text-v1.5');
// Define sentences
const texts = ['search_query: What is TSNE?', 'search_query: Who is Laurens van der Maaten?'];
// Compute sentence embeddings
let embeddings = await extractor(texts, { pooling: 'mean' });
console.log(embeddings); // Tensor of shape [2, 768]
const matryoshka_dim = 512;
embeddings = layer_norm(embeddings, [embeddings.dims[1]])
.slice(null, [0, matryoshka_dim])
.normalize(2, -1);
console.log(embeddings.tolist());
Nomic API
The easiest way to use Nomic Embed is through the Nomic Embedding API.
Generating embeddings with the nomic Python client is as easy as
from nomic import embed
output = embed.text(
texts=['Nomic Embedding API', '#keepAIOpen'],
model='nomic-embed-text-v1.5',
task_type='search_document',
dimensionality=256,
)
print(output)
- Downloads last month
- 29
4-bit
Model tree for leeroy-jankins/nomi
Base model
nomic-ai/nomic-embed-text-v1.5