iansotnek's picture
Update README.md
b7b5c77 verified
---
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- dense
- generated_from_trainer
- dataset_size:500000
- loss:CachedMultipleNegativesRankingLoss
base_model: ibm-granite/granite-embedding-small-english-r2
widget:
- source_sentence: >
I'm trying to write a PHP script which reads SIP (session initiation
protocol) signals from a hardware switch to gets specific details and then
return some data back to the switch.
Being a complete newbie to this SIP thing I don't know how to interact with
the switch sending SIP signal. Do we need to send some message to the switch
to get response?
I googled SIP but got only general info regarding what SIP is all about but
nothing programmatic.
Can any one provide any pointers to any tutorials which show how interact
with a SIP signal programmatically?
Are there any free online services that simulate SIP signals for testing
purpose?
sentences:
- >-
Lake Okahumpka is a freshwater lake in Wildwood, Florida, United States.
Lake Okahumpka Park is along part of its shoreline. In 1980, the United
States Geological Survey reported on the hydrology of Lake Okahumpka and
Lake Deaton area.
The lake is east of Wildwood on the south side of State Road 44. The lake
has been treated for hydrilla. Ring neck ducks have been hunted from its
shores.
See also
Okahumpka, Florida
References
Bodies of water of Sumter County, Florida
Okahumpka
- >+
Because of different regional setting on different machines. To have date
time output in the same format you ahve to specify format string explciitly:
date.ToString("yyyy-MM-dd HH:mm:ss");
Also as John recommeded in comments below if you want having date time
output in the same format on different machines despite local regional
settings you can use InvariantCulture format provider:
date.ToString(CultureInfo.InvariantCulture);
MSDN:
The invariant culture is culture-insensitive; it is associated with
the English language but not with any country/region
MSDN:
Standard Date and Time Format Strings
Custom Date and Time Format Strings
- >-
The President of India plays a ceremonial role in foreign affairs,
appointing ambassadors and ratifying treaties, but the day‑to‑day conduct of
diplomacy is handled by the Ministry of External Affairs and the Prime
Minister's Office.
- source_sentence: can drinking too much water make acid reflux worse?
sentences:
- >
I think I understand your question. A possible solution would be to use a
ViewModel to pass to the view as oppose to using the Company entity
directly. This would allow you to add or remove data annotations without
changing the entity model. Then map the data from the new CompanyViewModel
over to the Company entity model to be saved to the database.
For example, the Company entity might look something like this:
public class Company
{
public int Id { get; set; }
[StringLength(25)]
public string Name { get; set; }
public int EmployeeAmount { get; set; }
[StringLength(3, MinimumLength = 3)]
public string CountryId {get; set; }
}
Now in the MVC project a ViewModel can be constructed similar to the Company
entity:
public class CompanyViewModel
{
public int Id { get; set; }
[StringLength(25, ErrorMessage="Company name needs to be 25 characters or less!")]
public string Name { get; set; }
public int EmployeeAmount { get; set; }
public string CountryId { get; set; }
}
Using a ViewModel means more view presentation orientated annotations can be
added without overloading entities with unnecessary mark-up.
I hope this helps!
- >-
Staying well-hydrated is essential for overall health. Water helps maintain
blood volume, supports kidney function, and aids in temperature regulation.
Regular consumption of water throughout the day can improve skin elasticity
and promote better digestion.
- >-
Drinking large amounts of water can indeed aggravate acid reflux. Excess
fluid can increase stomach volume, leading to higher pressure on the lower
esophageal sphincter, which may cause it to open and allow acid to flow back
into the esophagus. Additionally, overhydration can dilute stomach acids,
prompting the body to produce more acid to aid digestion, potentially
worsening reflux symptoms.
- source_sentence: >
I have created an alert in Twitter Bootstrap this way
HTML:
<div id='alert' class='hide'></div>
JS:
function showAlert(message) {
$('#alert').html("<div class='alert alert-error'>"+message+"</div>");
$('#alert').show();
}
showAlert('Please have a look at yourself.');
$('#alert').removeClass('alert-error');
$('#alert').addClass('alert-info');
But the last two lines of javascript don't seem to have any effects, can
anyone have a look for me?
Created jsfiddle here.
Update
I made some changes in my own code to make it easier to use, I prefer this
way
HTML:
<div id='alert' class='hide'></div>
JS:
function showAlert(message, alertType) {
$('#alert').html("<div class='alert alert-"+alertType+"'>"+message+"</div>");
$('#alert').show();
}
showAlert('Please have a look at yourself.', 'success');
New jsfiddle here
sentences:
- >-
The San Justo was a 70-gun – from 1790, 74-gun – ship of the line built at
the royal shipyard in Cartagena, Spain and launched in 1779.
She fought at the Battle of Cape Spartel in 1782 and the Battle of Trafalgar
in 1805. In the latter battle, under the command of Capitán de Navío Miguel
María Gastón de Iriarte, she was placed in the Centre Division, but managed
to avoid being heavily engaged throughout the battle and had few casualties
none killed and just seven injured.
References
Bibliography
Ships of the line of the Spanish Navy
1779 ships
Ships built in Cartagena, Spain
Maritime incidents in 1805
- >
You can enforce to use specific version of a transitive dependency using
dependency management.
<dependencyManagement>
<dependencies>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-starter-kubernetes-ribbon</artifactId>
<version>1.1.1.RELEASE</version>
</dependency>
</dependencies>
</dependencyManagement>
Now only the specified version will be used. Not the versions declared in
transitive dependencies.
- |
$('#alert div').removeClass('alert-error');
$('#alert div').addClass('alert-info');
http://jsfiddle.net/Cf4gs/2/
- source_sentence: 1994–95 Crystal Palace F.C. season
sentences:
- >
There is an error in the documentation, the correct syntax is:
qry = Article.query().get(projection=[Article.author, Article.tags])
…replace get with method of your choosing as long as it takes **q_options
arguments.
- >-
During the 1994–95 English football season, Crystal Palace competed in the
FA Premier League.
Season summary
Crystal Palace returned to the Premiership a year after leaving it, and,
over the next few months, they would experience one of the most unusual
seasons in their history. They were the division's lowest scoring team with
just 34 goals, but reached the semi-finals of both cup competitions. They
also finished fourth from bottom in the Premiership, which due to the
streamlining of the division to 20 clubs cost them their top flight
status. Manager Alan Smith was sacked just days afterwards, with Steve
Coppell returning to the manager's seat two years after handing the reins
over to his former assistant Smith.
The aftermath of Palace's relegation saw the sale of numerous players
including Richard Shaw, John Salako, Chris Armstrong and Gareth Southgate. A
barely recognisable Palace squad would kick off the Endsleigh League
Division One campaign with one of the youngest-ever squads to be faced with
a challenge for promotion to the Premiership.
Final league table
Results summary
Results by round
Results
Crystal Palace's score comes first
Legend
FA Premier League
FA Cup
League Cup
Players
First-team squad
Squad at end of season
Left club during season
Reserve squad
Transfers
In
Out
Transfers in: £1,830,000
Transfers out: £740,000
Total spending: £1,090,000
Notes
References
Crystal Palace F.C. seasons
Crystal Palace
- >-
In Tennessee, independent contractors generally cannot claim regular
unemployment benefits, but they may qualify for Pandemic Unemployment
Assistance (PUA) if they meet the program’s eligibility criteria.
- source_sentence: Ian MacPherson
sentences:
- >-
A peach-flavored Xanax will produce the same pharmacological effects as
regular Xanax: it acts as a central nervous system depressant, boosting GABA
activity in the brain, which leads to sedation, reduced anxiety, and a
calming, tranquilizing sensation.
- >-
Once Upon a Time in Hollywood is set in 1969 Los Angeles and features real
figures such as Sharon Tate and Charles Manson, but the plot and the main
characters are fictional creations by Tarantino.
- >-
Ian MacPherson, Macpherson or McPherson may refer to:
Ian Macpherson, 1st Baron Strathcarron (1880–1937), British lawyer and
politician
Ian Macpherson (novelist) (1905–1944), Scottish novelist
Ian McPherson (footballer) (1920–1983), Scottish footballer
Ian MacPherson (historian) (1939–2013), Canadian historian and co-operative
activist
Ian McPherson (cricketer) (born 1942), Scottish cricketer
Ian Macpherson, 3rd Baron Strathcarron (born 1949), British peer, grandson
of the 1st Baron
Ian Macpherson (comedian) (born 1951), Irish comic novelist, playwright and
performer
Ian McPherson (police officer) (born 1961), British police officer
pipeline_tag: sentence-similarity
library_name: sentence-transformers
license: other
language:
- en
---
# Bolt Embedding Models
Bolt Embedding is a family of **high-performance embedding models optimized for
enterprise Retrieval-Augmented Generation (RAG)**.\
These models are **fine-tuned from IBM Granite embedding models** and
are designed to produce strong semantic embeddings for knowledge
retrieval, search, and document understanding.
Bolt models map text (queries, sentences, or documents) into a **dense
vector space** suitable for similarity search, clustering, and retrieval
pipelines.
------------------------------------------------------------------------
# Model Overview
**Bolt embeddings are purpose-built for enterprise RAG workloads**,
where retrieval quality and robustness across heterogeneous documents
are critical.
Key design goals:
- Strong **query → document retrieval quality**
- Robust performance on **long enterprise documents**
- Optimized for **large-scale vector search**
- Trained using **large-batch contrastive learning** to replicate real
RAG retrieval conditions
These models are **fine-tuned from IBM Granite embedding models** using
contrastive training on RAG-style data.
------------------------------------------------------------------------
# Model Details
### Model Type
Sentence Transformer embedding model
### Base Model
Fine-tuned from:
- `ibm-granite/granite-embedding-small-english-r2` (small)
- `ibm-granite/granite-embedding-english-r2` (large)
(depending on the Bolt variant)
### Output
- **Embedding dimension:** 384 (small), 768 (large)
- **Similarity metric:** Cosine similarity
- **Max sequence length:** 4096 tokens
### Architecture
SentenceTransformer(
(0): Transformer(ModernBertModel)
(1): Pooling(CLS)
)
Bolt uses **CLS pooling** to produce a single embedding vector per
input.
------------------------------------------------------------------------
# Training Objective
Bolt embeddings are trained specifically for **retrieval scenarios**
using **contrastive learning**.
### Loss Function
`CachedMultipleNegativesRankingLoss`
This loss is widely used for training embedding models for retrieval
tasks.
Key properties:
- Efficient training with **very large effective batch sizes**
- Uses **in-batch negatives**
- Encourages queries to be close to their relevant passages while far
from irrelevant ones
### Large Batch Training
Bolt models were trained using **batch sizes of 1024**.
Large batches simulate realistic retrieval scenarios:
Query
Positive document
~2000 unrelated documents, including hard negatives
This closely approximates **production RAG retrieval environments**,
where each query must rank the correct document among many candidates.
The result is improved:
- retrieval accuracy
- semantic separation
- ranking robustness
------------------------------------------------------------------------
# Training Data
Training was performed using custom datasets we collected. This dataset includes hand-curated examples as well as examples from datasets with commercially-accepable licenses. To curate hard negatives for some examples, LLMs with commercially-permissable licenses were used to generate negatives.
Dataset format:
| Column | Description |
|--------|-------------|
| anchor | Query or input text |
| positive | Relevant document/passage |
| negative | Unrelated document/passage, with some examples generated using LLMs to provide hard negatives and some examples chosen at random from existing negatives |
Training size:
- **500,000 training samples**
- **20,000 evaluation samples**
The dataset contains a mixture of:
- question → answer pairs
- query → document matches
- semantic similarity examples
These samples are designed to mimic **real RAG retrieval workloads**.
------------------------------------------------------------------------
# Intended Use
Bolt embeddings are designed for:
- Retrieval-Augmented Generation (RAG)
- Enterprise document search
- Semantic search
- Knowledge base retrieval
- Question answering
- Duplicate detection
- Similarity scoring
Typical pipeline:
User query
Bolt embedding
Vector search
Top-k documents
LLM generation
------------------------------------------------------------------------
# Usage
Install Sentence Transformers:
``` bash
pip install -U sentence-transformers
```
### Load the Model
``` python
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("aisquared/bolt-embedding-small")
```
or
``` python
model = SentenceTransformer("aisquared/bolt-embedding-large")
```
### Generate Embeddings
``` python
sentences = [
"What are the tax implications of employee stock options?",
"Employee stock options may have tax consequences depending on exercise timing.",
"The Eiffel Tower is located in Paris."
]
embeddings = model.encode(sentences)
print(embeddings.shape)
```
### Compute Similarity
``` python
similarities = model.similarity(embeddings, embeddings)
print(similarities)
```
------------------------------------------------------------------------
# Why Bolt?
Many embedding models are trained on **general semantic similarity
tasks**.
Bolt is optimized for **enterprise retrieval**, where queries must
locate the correct information among thousands of unrelated documents.
Key differentiators:
- **Large-batch contrastive training**
- **RAG-specific dataset**
- **Long context support (4096 tokens trained)**
- **Optimized for vector database retrieval**
------------------------------------------------------------------------
# Framework Versions
Training was performed using:
- Python 3.12
- Sentence Transformers
- Transformers
- PyTorch
- HuggingFace Datasets
- HuggingFace Jobs, utilizing 1xA100 GPU
------------------------------------------------------------------------
# Citation
If you use Bolt embeddings in research or production systems, please
cite the underlying Sentence-BERT work.
### Sentence-BERT
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
year = 2019
}
### Cached Multiple Negatives Ranking Loss
@misc{gao2021scaling,
title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup},
author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan},
year={2021}
}
------------------------------------------------------------------------
# License
Bolt embeddings is released under the [AI Squared Community License](https://docs.squared.ai/terms-of-use).