Overview

Nomnom is a long-context local embedding model option used in Chonky. It is derived from Nomic's nomic-embed-text-v1.5, a text embedding model designed for strong retrieval quality with support for long context windows, task-style instruction, RAG, and document indexing scenarios where instruction-prefixed embedding inputs are desirable.

The upstream Nomic model family is built around embedding tasks that benefit from explicit input prefixes such as search_query: and task-specific input conventions. That makes it a particularly good candidate for retrieval pipelines where the query and stored corpus should be encoded with deliberate role distinctions.

A chonky chonk

Use Nomnom when you want:

  • strong local semantic retrieval
  • long-context document embedding support
  • a model designed around explicit task prefixes
  • a local embedder well suited for RAG-style search workflows

Nomnom is a strong fit for:

  • semantic indexing of long document chunks
  • retrieval-augmented generation workflows
  • embedding queries separately from stored passage content
  • local experimentation with prefix-aware embedding strategies

Base Model Lineage

Nomnom is derived from:

  • nomic-ai/nomic-embed-text-v1.5

Key characteristics of the model include:

  • long-context support in the original transformer implementation
  • strong retrieval-oriented design
  • support for task instruction prefixes such as search_query: and other prefixed task modes
  • support for reduced embedding sizes in the upstream family through Matryoshka-style representation behavior

Important Prefix Behavior

The upstream Nomic model family expects task instruction prefixes at the beginning of text strings for best results. In practice, that means inputs may need prefixes such as:

  • search_query: for user queries
  • task-specific prefixes for other workflows depending on how you use the model

If Chonky later adds model-aware prefix handling, Nomnom is one of the clearest models that can benefit from it.

  from sentence_transformers import SentenceTransformer
  
  model = SentenceTransformer("chonks/nomnom-embed-text-v1.5", trust_remote_code=True)
  sentences = ['search_document: TSNE is a dimensionality reduction algorithm created by Laurens van Der Maaten']
  embeddings = model.encode(sentences)
  print(embeddings)

GGUF-Specific Context Note

The GGUF release is suitable for local llama.cpp-style inference, but the long-context behavior available in the original transformer implementation may require additional context extension settings in llama.cpp-based runtimes to fully match upstream context length expectations.

Local File Layout Expected by Chonky

Chonky expects Nomnom at:

chonks/nomi/nomnom-embed-text-v1.5.Q4_K_M.gguf

Features

  • local GGUF-based embedding model
  • no cloud API dependency
  • retrieval-oriented embedding behavior
  • especially attractive for query/document search pipelines
  • supports local embedding generation for chunked corpora
  • aligns well with Chonky's semantic search and vector storage workflows

Recommended Chonky Usage

Nomnom is recommended when:

  • you want a local retrieval model with strong search-oriented lineage
  • you plan to distinguish query strings from indexed corpus text
  • you want a local embedding path aligned to modern RAG conventions
  • you value long-context model lineage for document-heavy tasks

Usage

Important: the text prompt must include a task instruction prefix, instructing the model which task is being performed.

For example, if you are implementing a RAG application, you embed your documents as search_document: <text here> and embed your user queries as search_query: <text here>.

Task instruction prefixes

search_document

Purpose: embed texts as documents from a dataset

This prefix is used for embedding texts as documents, for example as documents for a RAG index.

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("nomic-ai/nomic-embed-text-v1.5", trust_remote_code=True)
sentences = ['search_document: TSNE is a dimensionality reduction algorithm created by Laurens van Der Maaten']
embeddings = model.encode(sentences)
print(embeddings)

search_query

Purpose: embed texts as questions to answer

This prefix is used for embedding texts as questions that documents from a dataset could resolve, for example as queries to be answered by a RAG application.

from sentence_transformers import SentenceTransformer
model = SentenceTransformer("nomic-ai/nomic-embed-text-v1.5", trust_remote_code=True)
sentences = ['search_query: Who is Laurens van Der Maaten?']
embeddings = model.encode(sentences)
print(embeddings)

clustering

Purpose: embed texts to group them into clusters

This prefix is used for embedding texts in order to group them into clusters, discover common topics, or remove semantic duplicates.

from sentence_transformers import SentenceTransformer
model = SentenceTransformer("nomic-ai/nomic-embed-text-v1.5", trust_remote_code=True)
sentences = ['clustering: the quick brown fox']
embeddings = model.encode(sentences)
print(embeddings)

classification

Purpose: embed texts to classify them

This prefix is used for embedding texts into vectors that will be used as features for a classification model

from sentence_transformers import SentenceTransformer
model = SentenceTransformer("nomic-ai/nomic-embed-text-v1.5", trust_remote_code=True)
sentences = ['classification: the quick brown fox']
embeddings = model.encode(sentences)
print(embeddings)

Sentence Transformers

import torch.nn.functional as F
from sentence_transformers import SentenceTransformer
matryoshka_dim = 512
model = SentenceTransformer("nomic-ai/nomic-embed-text-v1.5", trust_remote_code=True)
sentences = ['search_query: What is TSNE?', 'search_query: Who is Laurens van der Maaten?']
embeddings = model.encode(sentences, convert_to_tensor=True)
embeddings = F.layer_norm(embeddings, normalized_shape=(embeddings.shape[1],))
embeddings = embeddings[:, :matryoshka_dim]
embeddings = F.normalize(embeddings, p=2, dim=1)
print(embeddings)

Transformers

import torch
import torch.nn.functional as F
from transformers import AutoTokenizer, AutoModel
def mean_pooling(model_output, attention_mask):
    token_embeddings = model_output[0]
    input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
    return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)
sentences = ['search_query: What is TSNE?', 'search_query: Who is Laurens van der Maaten?']
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
model = AutoModel.from_pretrained('nomic-ai/nomic-embed-text-v1.5', trust_remote_code=True, safe_serialization=True)
model.eval()
encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
+ matryoshka_dim = 512
with torch.no_grad():
    model_output = model(**encoded_input)
embeddings = mean_pooling(model_output, encoded_input['attention_mask'])
+ embeddings = F.layer_norm(embeddings, normalized_shape=(embeddings.shape[1],))
+ embeddings = embeddings[:, :matryoshka_dim]
embeddings = F.normalize(embeddings, p=2, dim=1)
print(embeddings)

The model natively supports scaling of the sequence length past 2048 tokens. To do so,

- tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
+ tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased', model_max_length=8192)
- model = AutoModel.from_pretrained('nomic-ai/nomic-embed-text-v1.5', trust_remote_code=True)
+ model = AutoModel.from_pretrained('nomic-ai/nomic-embed-text-v1.5', trust_remote_code=True, rotary_scaling_factor=2)

Transformers.js

import { pipeline, layer_norm } from '@huggingface/transformers';
// Create a feature extraction pipeline
const extractor = await pipeline('feature-extraction', 'nomic-ai/nomic-embed-text-v1.5');
// Define sentences
const texts = ['search_query: What is TSNE?', 'search_query: Who is Laurens van der Maaten?'];
// Compute sentence embeddings
let embeddings = await extractor(texts, { pooling: 'mean' });
console.log(embeddings); // Tensor of shape [2, 768]
const matryoshka_dim = 512;
embeddings = layer_norm(embeddings, [embeddings.dims[1]])
    .slice(null, [0, matryoshka_dim])
    .normalize(2, -1);
console.log(embeddings.tolist());

Nomic API

The easiest way to use Nomic Embed is through the Nomic Embedding API.

Generating embeddings with the nomic Python client is as easy as

from nomic import embed
output = embed.text(
    texts=['Nomic Embedding API', '#keepAIOpen'],
    model='nomic-embed-text-v1.5',
    task_type='search_document',
    dimensionality=256,
)
print(output)
Downloads last month
29
GGUF
Model size
0.1B params
Architecture
nomic-bert
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for leeroy-jankins/nomi

Quantized
(1)
this model