Instructions to use leeroy-jankins/bobo with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use leeroy-jankins/bobo with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("leeroy-jankins/bobo") sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] - llama-cpp-python
How to use leeroy-jankins/bobo with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="leeroy-jankins/bobo", filename="gguf/bobo-embed-large-v1-f16.gguf", )
output = llm( "Once upon a time,", max_tokens=512, echo=True ) print(output)
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use leeroy-jankins/bobo with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf leeroy-jankins/bobo:F16 # Run inference directly in the terminal: llama-cli -hf leeroy-jankins/bobo:F16
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf leeroy-jankins/bobo:F16 # Run inference directly in the terminal: llama-cli -hf leeroy-jankins/bobo:F16
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf leeroy-jankins/bobo:F16 # Run inference directly in the terminal: ./llama-cli -hf leeroy-jankins/bobo:F16
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf leeroy-jankins/bobo:F16 # Run inference directly in the terminal: ./build/bin/llama-cli -hf leeroy-jankins/bobo:F16
Use Docker
docker model run hf.co/leeroy-jankins/bobo:F16
- LM Studio
- Jan
- Ollama
How to use leeroy-jankins/bobo with Ollama:
ollama run hf.co/leeroy-jankins/bobo:F16
- Unsloth Studio new
How to use leeroy-jankins/bobo with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for leeroy-jankins/bobo to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for leeroy-jankins/bobo to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for leeroy-jankins/bobo to start chatting
- Docker Model Runner
How to use leeroy-jankins/bobo with Docker Model Runner:
docker model run hf.co/leeroy-jankins/bobo:F16
- Lemonade
How to use leeroy-jankins/bobo with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull leeroy-jankins/bobo:F16
Run and chat with the model
lemonade run user.bobo-F16
List all available models
lemonade list
- Bobo — Embedding Model (derived from mixedbread-ai/mxbai-embed-large-v1)
Bobo is a text-embedding, LLM derived from mixedbread-ai/mxbai-embed-large-v1, packaged for drop-in use in semantic search, RAG (Retrieval-Augmented Generation), clustering, deduplication, and zero-shot classification. It produces dense vector representations suitable for ANN indexes (FAISS, ScaNN, Milvus, pgvector) and common retrieval stacks.
Key Features
- Derived from the proven mixedbread-ai/mxbai-embed-large-v1 embedding family.
- Strong performance on retrieval-style tasks (semantic search, RAG, clustering).
- Sentence-Transformers–compatible API for fast adoption.
- Mean pooling with optional L2 normalization for cosine similarity workflows.
- Production-friendly guidance on chunking, batching, and indexing.
Technical Specifications
| Property | Value / Guidance |
|---|---|
| Base model | mixedbread-ai/mxbai-embed-large-v1 (encoder-style transformer) |
| Architecture | Transformer encoder (Sentence-Transformers compatible) |
| Embedding dimension | Query programmatically at runtime; common builds use 1024 |
| Tokenization | Provided by upstream model tokenizer |
| Max input length | Depends on upstream config; chunk long docs (e.g., 256–512 tokens) |
| Pooling | Mean pooling (recommended), then optional L2 normalization |
| Output | Dense float vectors (often normalized if using cosine similarity) |
| Intended backends | FAISS, Milvus, pgvector, Qdrant, Chroma, Weaviate |
Tip: Always detect the dimension in code (see Quickstart) and configure your index accordingly.
🎯 Quickstart
- Here, we provide several ways to produce sentence embeddings. Please note that you have to provide the prompt
Represent this sentence for searching relevant passages:for query if you want to use it for retrieval. Besides that you don't need any prompt.
⚙️ Vectorized Datasets
Vectorization is the process of converting textual data into numerical vectors and is a process that is usually applied once the text is cleaned. It can help improve the execution speed and reduce the training time of your code. BudgetPy provides the following vector stores on the OpenAI platform to support environmental data analysis with machine-learning
- Appropriations - Enacted appropriations from 1996-2024 available for fine-tuning learning models
- Regulations - Collection of federal regulations on the use of appropriated funds
- SF-133 - The Report on Budget Execution and Budgetary Resources
- Balances - U.S. federal agency Account Balances (File A) submitted as part of the DATA Act 2014.
- Outlays - The actual disbursements of funds by the U.S. federal government from 1962 to 2025
- SF-133 The Report on Budget Execution and Budgetary Resources
- Balances - U.S. federal agency Account Balances (File A) submitted as part of the DATA Act 2014.
- Circular A11 - Guidance from OMB on the preparation, submission, and execution of the federal budget
- Fastbook - Treasury guidance on federal ledger accounts
- Title 31 CFR - Money & Finance
- Redbook - The Principles of Appropriations Law (Volumes I & II).
- US Standard General Ledger - Account Definitions
- Treasury Appropriation Fund Symbols (TAFSs) Dataset - Collection of TAFSs used by federal agencies
🏗️ Sentence Transformers
python -m pip install -U sentence-transformers
from sentence_transformers import SentenceTransformer
from sentence_transformers.util import cos_sim
from sentence_transformers.quantization import quantize_embeddings
# 1. Specify preffered dimensions
dimensions = 512
# 2. load model
model = SentenceTransformer("leeroy-jankins/bobo-embed-large-v1", truncate_dim=dimensions)
# The prompt used for query retrieval tasks:
# query_prompt = 'Represent this sentence for searching relevant passages: '
query = "A man is eating a piece of bread"
docs = [
"A man is eating food.",
"A man is eating pasta.",
"The girl is carrying a baby.",
"A man is riding a horse.",
]
# 2. Encode
query_embedding = model.encode(query, prompt_name="query")
# Equivalent Alternatives:
# query_embedding = model.encode(query_prompt + query)
# query_embedding = model.encode(query, prompt=query_prompt)
docs_embeddings = model.encode(docs)
# Optional: Quantize the embeddings
binary_query_embedding = quantize_embeddings(query_embedding, precision="ubinary")
binary_docs_embeddings = quantize_embeddings(docs_embeddings, precision="ubinary")
similarities = cos_sim(query_embedding, docs_embeddings)
print('similarities:', similarities)
🧠 Transformers
from typing import Dict
import torch
import numpy as np
from transformers import AutoModel, AutoTokenizer
from sentence_transformers.util import cos_sim
# For retrieval you need to pass this prompt. Please find our more in our blog post.
def transform_query(query: str) -> str:
""" For retrieval, add the prompt for query (not for documents).
"""
return f'Represent this sentence for searching relevant passages: {query}'
# The model works really well with cls pooling (default) but also with mean pooling.
def pooling(outputs: torch.Tensor, inputs: Dict, strategy: str = 'cls') -> np.ndarray:
if strategy == 'cls':
outputs = outputs[:, 0]
elif strategy == 'mean':
outputs = torch.sum(
outputs * inputs["attention_mask"][:, :, None], dim=1) / torch.sum(inputs["attention_mask"], dim=1, keepdim=True)
else:
raise NotImplementedError
return outputs.detach().cpu().numpy()
# 1. load model
model_id = 'mixedbread-ai/bobo-embed-large-v1'
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModel.from_pretrained(model_id).cuda()
docs = [
transform_query('A man is eating a piece of bread'),
"A man is eating food.",
"A man is eating pasta.",
"The girl is carrying a baby.",
"A man is riding a horse.",
]
# 2. encode
inputs = tokenizer(docs, padding=True, return_tensors='pt')
for k, v in inputs.items():
inputs[k] = v.cuda()
outputs = model(**inputs).last_hidden_state
embeddings = pooling(outputs, inputs, 'cls')
similarities = cos_sim(embeddings[0], embeddings[1:])
print('similarities:', similarities)
🏁 Evaluation
As of March 2024, our model archives SOTA performance for Bert-large sized models on the MTEB. It ourperforms commercial models like OpenAIs text-embedding-3-large and matches the performance of model 20x it's size like the echo-mistral-7b. Our model was trained with no overlap of the MTEB data, which indicates that our model generalizes well across several domains, tasks and text length. We know there are some limitations with this model, which will be fixed in v2.
| Model | Avg (56 datasets) | Classification (12 datasets) | Clustering (11 datasets) | PairClassification (3 datasets) | Reranking (4 datasets) | Retrieval (15 datasets) | STS (10 datasets) | Summarization (1 dataset) |
|---|---|---|---|---|---|---|---|---|
| bobo-embed-large-v1 | 64.68 | 75.64 | 46.71 | 87.2 | 60.11 | 54.39 | 85.00 | 32.71 |
| bge-large-en-v1.5 | 64.23 | 75.97 | 46.08 | 87.12 | 60.03 | 54.29 | 83.11 | 31.61 |
| bobo-embed-2d-large-v1 | 63.25 | 74.14 | 46.07 | 85.89 | 58.94 | 51.42 | 84.9 | 31.55 |
| nomic-embed-text-v1 | 62.39 | 74.12 | 43.91 | 85.15 | 55.69 | 52.81 | 82.06 | 30.08 |
| jina-embeddings-v2-base-en | 60.38 | 73.45 | 41.73 | 85.38 | 56.98 | 47.87 | 80.7 | 31.6 |
| Proprietary Models | ||||||||
| OpenAI text-embedding-3-large | 64.58 | 75.45 | 49.01 | 85.72 | 59.16 | 55.44 | 81.73 | 29.92 |
| Cohere embed-english-v3.0 | 64.47 | 76.49 | 47.43 | 85.84 | 58.01 | 55.00 | 82.62 | 30.18 |
| OpenAI text-embedding-ada-002 | 60.99 | 70.93 | 45.90 | 84.89 | 56.32 | 49.25 | 80.97 | 30.80 |
💻 Matryoshka and Binary Quantization
Embeddings in their commonly used form (float arrays) have a high memory footprint when used at scale. Two approaches to solve this problem are Matryoshka Representation Learning (MRL) and (Binary) Quantization. While MRL reduces the number of dimensions of an embedding, binary quantization transforms the value of each dimension from a float32 into a lower precision (int8 or even binary). The model supports both approaches!
You can also take it one step further, and combine both MRL and quantization. This combination of binary quantization and MRL allows you to reduce the memory usage of your embeddings significantly. This leads to much lower costs when using a vector database in particular.
📝License
- Bobo is published under the MIT General Public License v3
- Downloads last month
- 7
16-bit