Instructions to use leeroy-jankins/nomi with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use leeroy-jankins/nomi with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="leeroy-jankins/nomi", filename="nomnom-embed-text-v1.5.Q4_K_M.gguf", )
output = llm( "Once upon a time,", max_tokens=512, echo=True ) print(output)
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use leeroy-jankins/nomi with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf leeroy-jankins/nomi:Q4_K_M # Run inference directly in the terminal: llama-cli -hf leeroy-jankins/nomi:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf leeroy-jankins/nomi:Q4_K_M # Run inference directly in the terminal: llama-cli -hf leeroy-jankins/nomi:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf leeroy-jankins/nomi:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf leeroy-jankins/nomi:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf leeroy-jankins/nomi:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf leeroy-jankins/nomi:Q4_K_M
Use Docker
docker model run hf.co/leeroy-jankins/nomi:Q4_K_M
- LM Studio
- Jan
- Ollama
How to use leeroy-jankins/nomi with Ollama:
ollama run hf.co/leeroy-jankins/nomi:Q4_K_M
- Unsloth Studio new
How to use leeroy-jankins/nomi with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for leeroy-jankins/nomi to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for leeroy-jankins/nomi to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for leeroy-jankins/nomi to start chatting
- Docker Model Runner
How to use leeroy-jankins/nomi with Docker Model Runner:
docker model run hf.co/leeroy-jankins/nomi:Q4_K_M
- Lemonade
How to use leeroy-jankins/nomi with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull leeroy-jankins/nomi:Q4_K_M
Run and chat with the model
lemonade run user.nomi-Q4_K_M
List all available models
lemonade list
Overview
Nomi is a long-context local embedding model option used in Chonky. It is derived from Nomic's
nomic-embed-text-v1.5, a text embedding model designed for strong retrieval quality
with support for long context windows, task-style instruction, RAG, and document indexing scenarios
where instruction-prefixed embedding inputs are desirable.
The upstream Nomic model family is built around embedding tasks that benefit from
explicit input prefixes such as search_query: and task-specific input conventions.
That makes it a particularly good candidate for retrieval pipelines where the query and
stored corpus should be encoded with deliberate role distinctions.
A chonky chonk
Use Nomnom when you want:
- strong local semantic retrieval
- long-context document embedding support
- a model designed around explicit task prefixes
- a local embedder well suited for RAG-style search workflows
Nomnom is a strong fit for:
- semantic indexing of long document chunks
- retrieval-augmented generation workflows
- embedding queries separately from stored passage content
- local experimentation with prefix-aware embedding strategies
Base Model Lineage
Nomnom is derived from:
nomic-ai/nomic-embed-text-v1.5
Key characteristics of the model include:
- long-context support in the original transformer implementation
- strong retrieval-oriented design
- support for task instruction prefixes such as
search_query:and other prefixed task modes - support for reduced embedding sizes in the upstream family through Matryoshka-style representation behavior
Important Prefix Behavior
The upstream Nomic model family expects task instruction prefixes at the beginning of text strings for best results. In practice, that means inputs may need prefixes such as:
search_query:for user queries- task-specific prefixes for other workflows depending on how you use the model
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("chonks/nomnom-embed-text-v1.5", trust_remote_code=True)
sentences = ['search_document: TSNE is a dimensionality reduction algorithm created by Laurens van Der Maaten']
embeddings = model.encode(sentences)
print(embeddings)
GGUF-Specific Context Note
The GGUF release is suitable for local llama.cpp-style inference, but the long-context behavior available in the original transformer implementation may require additional context extension settings in llama.cpp-based runtimes to fully match upstream context length expectations.
Local File Layout Expected by Chonky
Chonky expects Nomnom at:
chonks/nomi/nomnom-embed-text-v1.5.Q4_K_M.gguf
Features
- local GGUF-based embedding model
- no cloud API dependency
- retrieval-oriented embedding behavior
- especially attractive for query/document search pipelines
- supports local embedding generation for chunked corpora
- aligns well with Chonky's semantic search and vector storage workflows
Recommended Chonky Usage
Nomnom is recommended when:
- you want a local retrieval model with strong search-oriented lineage
- you plan to distinguish query strings from indexed corpus text
- you want a local embedding path aligned to modern RAG conventions
- you value long-context model lineage for document-heavy tasks
Usage
Important: the text prompt must include a task instruction prefix, instructing the model which task is being performed.
For example, if you are implementing a RAG application, you embed your documents as search_document: <text here> and embed your user queries as search_query: <text here>.
Task instruction prefixes
search_document
Purpose: embed texts as documents from a dataset
This prefix is used for embedding texts as documents, for example as documents for a RAG index.
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("nomi/nomnom-embed-text-v1.5.Q4_K_M.gguf", trust_remote_code=True)
sentences = ['search_document: TSNE is a dimensionality reduction algorithm created by Laurens van Der Maaten']
embeddings = model.encode(sentences)
print(embeddings)
search_query
Purpose: embed texts as questions to answer
This prefix is used for embedding texts as questions that documents from a dataset could resolve, for example as queries to be answered by a RAG application.
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("nomi/nomnom-embed-text-v1.5.Q4_K_M.gguf", trust_remote_code=True)
sentences = ['search_query: Who is Laurens van Der Maaten?']
embeddings = model.encode(sentences)
print(embeddings)
clustering
Purpose: embed texts to group them into clusters
This prefix is used for embedding texts in order to group them into clusters, discover common topics, or remove semantic duplicates.
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("nomi/nomnom-embed-text-v1.5.Q4_K_M.gguf", trust_remote_code=True)
sentences = ['clustering: the quick brown fox']
embeddings = model.encode(sentences)
print(embeddings)
classification
Purpose: embed texts to classify them
This prefix is used for embedding texts into vectors that will be used as features for a classification model
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("nomi/nomnom-embed-text-v1.5.Q4_K_M.gguf", trust_remote_code=True)
sentences = ['classification: the quick brown fox']
embeddings = model.encode(sentences)
print(embeddings)
Sentence Transformers
import torch.nn.functional as F
from sentence_transformers import SentenceTransformer
matryoshka_dim = 512
model = SentenceTransformer("nomi/nomnom-embed-text-v1.5.Q4_K_M.gguf", trust_remote_code=True)
sentences = ['search_query: What is TSNE?', 'search_query: Who is Laurens van der Maaten?']
embeddings = model.encode(sentences, convert_to_tensor=True)
embeddings = F.layer_norm(embeddings, normalized_shape=(embeddings.shape[1],))
embeddings = embeddings[:, :matryoshka_dim]
embeddings = F.normalize(embeddings, p=2, dim=1)
print(embeddings)
Transformers
import torch
import torch.nn.functional as F
from transformers import AutoTokenizer, AutoModel
def mean_pooling(model_output, attention_mask):
token_embeddings = model_output[0]
input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)
sentences = ['search_query: What is TSNE?', 'search_query: Who is Laurens van der Maaten?']
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
model = AutoModel.from_pretrained('nomi/nomnom-embed-text-v1.5.Q4_K_M.gguf', trust_remote_code=True, safe_serialization=True)
model.eval()
encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
+ matryoshka_dim = 512
with torch.no_grad():
model_output = model(**encoded_input)
embeddings = mean_pooling(model_output, encoded_input['attention_mask'])
+ embeddings = F.layer_norm(embeddings, normalized_shape=(embeddings.shape[1],))
+ embeddings = embeddings[:, :matryoshka_dim]
embeddings = F.normalize(embeddings, p=2, dim=1)
print(embeddings)
The model natively supports scaling of the sequence length past 2048 tokens. To do so,
- tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
+ tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased', model_max_length=8192)
- model = AutoModel.from_pretrained('nomic-ai/nomic-embed-text-v1.5', trust_remote_code=True)
+ model = AutoModel.from_pretrained('nomic-ai/nomic-embed-text-v1.5', trust_remote_code=True, rotary_scaling_factor=2)
Transformers.js
import { pipeline, layer_norm } from '@huggingface/transformers';
// Create a feature extraction pipeline
const extractor = await pipeline('feature-extraction', 'nomi/nomnom-embed-text-v1.5.Q4_K_M.gguf');
// Define sentences
const texts = ['search_query: What is TSNE?', 'search_query: Who is Laurens van der Maaten?'];
// Compute sentence embeddings
let embeddings = await extractor(texts, { pooling: 'mean' });
console.log(embeddings); // Tensor of shape [2, 768]
const matryoshka_dim = 512;
embeddings = layer_norm(embeddings, [embeddings.dims[1]])
.slice(null, [0, matryoshka_dim])
.normalize(2, -1);
console.log(embeddings.tolist());
Nomic API
The easiest way to use Nomic Embed is through the Nomic Embedding API.
Generating embeddings with the nomic Python client is as easy as
from nomi import embed
output = embed.text(
texts=['Nomi Noms Noms Embedding API', '#keepAIOpen'],
model='nomnom-embed-text-v1.5.Q4_K_M.gguf',
task_type='search_document',
dimensionality=256,
)
print(output)
- Downloads last month
- 4
4-bit
Model tree for leeroy-jankins/nomi
Base model
nomic-ai/nomic-embed-text-v1.5