| --- |
| tags: |
| - sentence-transformers |
| - sentence-similarity |
| - feature-extraction |
| - dense |
| - generated_from_trainer |
| - dataset_size:500000 |
| - loss:CachedMultipleNegativesRankingLoss |
| base_model: ibm-granite/granite-embedding-small-english-r2 |
| widget: |
| - source_sentence: > |
| I'm trying to write a PHP script which reads SIP (session initiation |
| protocol) signals from a hardware switch to gets specific details and then |
| return some data back to the switch. |
| |
| Being a complete newbie to this SIP thing I don't know how to interact with |
| the switch sending SIP signal. Do we need to send some message to the switch |
| to get response? |
|
|
| I googled SIP but got only general info regarding what SIP is all about but |
| nothing programmatic. |
|
|
| Can any one provide any pointers to any tutorials which show how interact |
| with a SIP signal programmatically? |
|
|
| Are there any free online services that simulate SIP signals for testing |
| purpose? |
| sentences: |
| - >- |
| Lake Okahumpka is a freshwater lake in Wildwood, Florida, United States. |
| Lake Okahumpka Park is along part of its shoreline. In 1980, the United |
| States Geological Survey reported on the hydrology of Lake Okahumpka and |
| Lake Deaton area. |
| |
|
|
| The lake is east of Wildwood on the south side of State Road 44. The lake |
| has been treated for hydrilla. Ring neck ducks have been hunted from its |
| shores. |
|
|
|
|
| See also |
|
|
| Okahumpka, Florida |
|
|
|
|
| References |
|
|
|
|
| Bodies of water of Sumter County, Florida |
|
|
| Okahumpka |
| - >+ |
| Because of different regional setting on different machines. To have date |
| time output in the same format you ahve to specify format string explciitly: |
| |
| date.ToString("yyyy-MM-dd HH:mm:ss"); |
|
|
|
|
| Also as John recommeded in comments below if you want having date time |
| output in the same format on different machines despite local regional |
| settings you can use InvariantCulture format provider: |
|
|
| date.ToString(CultureInfo.InvariantCulture); |
|
|
|
|
| MSDN: |
|
|
|
|
| The invariant culture is culture-insensitive; it is associated with |
| the English language but not with any country/region |
|
|
| MSDN: |
|
|
|
|
| Standard Date and Time Format Strings |
|
|
| Custom Date and Time Format Strings |
|
|
| - >- |
| The President of India plays a ceremonial role in foreign affairs, |
| appointing ambassadors and ratifying treaties, but the day‑to‑day conduct of |
| diplomacy is handled by the Ministry of External Affairs and the Prime |
| Minister's Office. |
| - source_sentence: can drinking too much water make acid reflux worse? |
| sentences: |
| - > |
| I think I understand your question. A possible solution would be to use a |
| ViewModel to pass to the view as oppose to using the Company entity |
| directly. This would allow you to add or remove data annotations without |
| changing the entity model. Then map the data from the new CompanyViewModel |
| over to the Company entity model to be saved to the database. |
| |
| For example, the Company entity might look something like this: |
|
|
| public class Company |
|
|
| { |
| public int Id { get; set; } |
| [StringLength(25)] |
| public string Name { get; set; } |
| public int EmployeeAmount { get; set; } |
| [StringLength(3, MinimumLength = 3)] |
| public string CountryId {get; set; } |
| } |
|
|
|
|
| Now in the MVC project a ViewModel can be constructed similar to the Company |
| entity: |
|
|
| public class CompanyViewModel |
|
|
| { |
| public int Id { get; set; } |
| [StringLength(25, ErrorMessage="Company name needs to be 25 characters or less!")] |
| public string Name { get; set; } |
| public int EmployeeAmount { get; set; } |
| public string CountryId { get; set; } |
| } |
|
|
|
|
| Using a ViewModel means more view presentation orientated annotations can be |
| added without overloading entities with unnecessary mark-up. |
|
|
| I hope this helps! |
| - >- |
| Staying well-hydrated is essential for overall health. Water helps maintain |
| blood volume, supports kidney function, and aids in temperature regulation. |
| Regular consumption of water throughout the day can improve skin elasticity |
| and promote better digestion. |
| - >- |
| Drinking large amounts of water can indeed aggravate acid reflux. Excess |
| fluid can increase stomach volume, leading to higher pressure on the lower |
| esophageal sphincter, which may cause it to open and allow acid to flow back |
| into the esophagus. Additionally, overhydration can dilute stomach acids, |
| prompting the body to produce more acid to aid digestion, potentially |
| worsening reflux symptoms. |
| - source_sentence: > |
| I have created an alert in Twitter Bootstrap this way |
| |
| HTML: |
|
|
| <div id='alert' class='hide'></div> |
|
|
|
|
| JS: |
|
|
| function showAlert(message) { |
| $('#alert').html("<div class='alert alert-error'>"+message+"</div>"); |
| $('#alert').show(); |
| } |
|
|
| showAlert('Please have a look at yourself.'); |
|
|
| $('#alert').removeClass('alert-error'); |
|
|
| $('#alert').addClass('alert-info'); |
|
|
|
|
| But the last two lines of javascript don't seem to have any effects, can |
| anyone have a look for me? |
|
|
| Created jsfiddle here. |
|
|
| Update |
|
|
| I made some changes in my own code to make it easier to use, I prefer this |
| way |
|
|
| HTML: |
|
|
| <div id='alert' class='hide'></div> |
|
|
|
|
| JS: |
|
|
| function showAlert(message, alertType) { |
| $('#alert').html("<div class='alert alert-"+alertType+"'>"+message+"</div>"); |
| $('#alert').show(); |
| } |
|
|
|
|
| showAlert('Please have a look at yourself.', 'success'); |
|
|
|
|
| New jsfiddle here |
| sentences: |
| - >- |
| The San Justo was a 70-gun – from 1790, 74-gun – ship of the line built at |
| the royal shipyard in Cartagena, Spain and launched in 1779. |
| |
|
|
| She fought at the Battle of Cape Spartel in 1782 and the Battle of Trafalgar |
| in 1805. In the latter battle, under the command of Capitán de Navío Miguel |
| María Gastón de Iriarte, she was placed in the Centre Division, but managed |
| to avoid being heavily engaged throughout the battle and had few casualties |
| – none killed and just seven injured. |
|
|
|
|
| References |
|
|
|
|
| Bibliography |
|
|
|
|
| Ships of the line of the Spanish Navy |
|
|
| 1779 ships |
|
|
| Ships built in Cartagena, Spain |
|
|
| Maritime incidents in 1805 |
| - > |
| You can enforce to use specific version of a transitive dependency using |
| dependency management. |
| |
| <dependencyManagement> |
| <dependencies> |
| <dependency> |
| <groupId>org.springframework.cloud</groupId> |
| <artifactId>spring-cloud-starter-kubernetes-ribbon</artifactId> |
| <version>1.1.1.RELEASE</version> |
| </dependency> |
| </dependencies> |
| </dependencyManagement> |
|
|
|
|
| Now only the specified version will be used. Not the versions declared in |
| transitive dependencies. |
| - | |
| $('#alert div').removeClass('alert-error'); |
| $('#alert div').addClass('alert-info'); |
| |
| http://jsfiddle.net/Cf4gs/2/ |
| - source_sentence: 1994–95 Crystal Palace F.C. season |
| sentences: |
| - > |
| There is an error in the documentation, the correct syntax is: |
| |
| qry = Article.query().get(projection=[Article.author, Article.tags]) |
|
|
|
|
| …replace get with method of your choosing as long as it takes **q_options |
| arguments. |
| - >- |
| During the 1994–95 English football season, Crystal Palace competed in the |
| FA Premier League. |
| |
|
|
| Season summary |
|
|
| Crystal Palace returned to the Premiership a year after leaving it, and, |
| over the next few months, they would experience one of the most unusual |
| seasons in their history. They were the division's lowest scoring team with |
| just 34 goals, but reached the semi-finals of both cup competitions. They |
| also finished fourth from bottom in the Premiership, which – due to the |
| streamlining of the division to 20 clubs – cost them their top flight |
| status. Manager Alan Smith was sacked just days afterwards, with Steve |
| Coppell returning to the manager's seat two years after handing the reins |
| over to his former assistant Smith. |
|
|
|
|
| The aftermath of Palace's relegation saw the sale of numerous players |
| including Richard Shaw, John Salako, Chris Armstrong and Gareth Southgate. A |
| barely recognisable Palace squad would kick off the Endsleigh League |
| Division One campaign with one of the youngest-ever squads to be faced with |
| a challenge for promotion to the Premiership. |
|
|
|
|
| Final league table |
|
|
|
|
| Results summary |
|
|
|
|
| Results by round |
|
|
|
|
| Results |
|
|
| Crystal Palace's score comes first |
|
|
|
|
| Legend |
|
|
|
|
| FA Premier League |
|
|
|
|
| FA Cup |
|
|
|
|
| League Cup |
|
|
|
|
| Players |
|
|
|
|
| First-team squad |
|
|
| Squad at end of season |
|
|
|
|
| Left club during season |
|
|
|
|
| Reserve squad |
|
|
|
|
| Transfers |
|
|
|
|
| In |
|
|
|
|
| Out |
|
|
|
|
| Transfers in: £1,830,000 |
|
|
| Transfers out: £740,000 |
|
|
| Total spending: £1,090,000 |
|
|
|
|
| Notes |
|
|
|
|
| References |
|
|
|
|
| Crystal Palace F.C. seasons |
|
|
| Crystal Palace |
| - >- |
| In Tennessee, independent contractors generally cannot claim regular |
| unemployment benefits, but they may qualify for Pandemic Unemployment |
| Assistance (PUA) if they meet the program’s eligibility criteria. |
| - source_sentence: Ian MacPherson |
| sentences: |
| - >- |
| A peach-flavored Xanax will produce the same pharmacological effects as |
| regular Xanax: it acts as a central nervous system depressant, boosting GABA |
| activity in the brain, which leads to sedation, reduced anxiety, and a |
| calming, tranquilizing sensation. |
| - >- |
| Once Upon a Time in Hollywood is set in 1969 Los Angeles and features real |
| figures such as Sharon Tate and Charles Manson, but the plot and the main |
| characters are fictional creations by Tarantino. |
| - >- |
| Ian MacPherson, Macpherson or McPherson may refer to: |
| |
|
|
| Ian Macpherson, 1st Baron Strathcarron (1880–1937), British lawyer and |
| politician |
|
|
| Ian Macpherson (novelist) (1905–1944), Scottish novelist |
|
|
| Ian McPherson (footballer) (1920–1983), Scottish footballer |
|
|
| Ian MacPherson (historian) (1939–2013), Canadian historian and co-operative |
| activist |
|
|
| Ian McPherson (cricketer) (born 1942), Scottish cricketer |
|
|
| Ian Macpherson, 3rd Baron Strathcarron (born 1949), British peer, grandson |
| of the 1st Baron |
|
|
| Ian Macpherson (comedian) (born 1951), Irish comic novelist, playwright and |
| performer |
|
|
| Ian McPherson (police officer) (born 1961), British police officer |
| pipeline_tag: sentence-similarity |
| library_name: sentence-transformers |
| license: other |
| language: |
| - en |
| --- |
| |
| # Bolt Embedding Models |
|
|
| Bolt Embedding is a family of **high-performance embedding models optimized for |
| enterprise Retrieval-Augmented Generation (RAG)**.\ |
| These models are **fine-tuned from IBM Granite embedding models** and |
| are designed to produce strong semantic embeddings for knowledge |
| retrieval, search, and document understanding. |
|
|
| Bolt models map text (queries, sentences, or documents) into a **dense |
| vector space** suitable for similarity search, clustering, and retrieval |
| pipelines. |
|
|
| ------------------------------------------------------------------------ |
|
|
| # Model Overview |
|
|
| **Bolt embeddings are purpose-built for enterprise RAG workloads**, |
| where retrieval quality and robustness across heterogeneous documents |
| are critical. |
|
|
| Key design goals: |
|
|
| - Strong **query → document retrieval quality** |
| - Robust performance on **long enterprise documents** |
| - Optimized for **large-scale vector search** |
| - Trained using **large-batch contrastive learning** to replicate real |
| RAG retrieval conditions |
| |
| These models are **fine-tuned from IBM Granite embedding models** using |
| contrastive training on RAG-style data. |
|
|
| ------------------------------------------------------------------------ |
|
|
| # Model Details |
|
|
| ### Model Type |
|
|
| Sentence Transformer embedding model |
|
|
| ### Base Model |
|
|
| Fine-tuned from: |
|
|
| - `ibm-granite/granite-embedding-small-english-r2` (small) |
| - `ibm-granite/granite-embedding-english-r2` (large) |
|
|
| (depending on the Bolt variant) |
|
|
| ### Output |
|
|
| - **Embedding dimension:** 384 (small), 768 (large) |
| - **Similarity metric:** Cosine similarity |
| - **Max sequence length:** 4096 tokens |
|
|
| ### Architecture |
|
|
| SentenceTransformer( |
| (0): Transformer(ModernBertModel) |
| (1): Pooling(CLS) |
| ) |
| |
| Bolt uses **CLS pooling** to produce a single embedding vector per |
| input. |
|
|
| ------------------------------------------------------------------------ |
|
|
| # Training Objective |
|
|
| Bolt embeddings are trained specifically for **retrieval scenarios** |
| using **contrastive learning**. |
|
|
| ### Loss Function |
|
|
| `CachedMultipleNegativesRankingLoss` |
|
|
| This loss is widely used for training embedding models for retrieval |
| tasks. |
|
|
| Key properties: |
|
|
| - Efficient training with **very large effective batch sizes** |
| - Uses **in-batch negatives** |
| - Encourages queries to be close to their relevant passages while far |
| from irrelevant ones |
| |
| ### Large Batch Training |
|
|
| Bolt models were trained using **batch sizes of 1024**. |
|
|
| Large batches simulate realistic retrieval scenarios: |
|
|
| Query |
| Positive document |
| ~2000 unrelated documents, including hard negatives |
| |
| This closely approximates **production RAG retrieval environments**, |
| where each query must rank the correct document among many candidates. |
|
|
| The result is improved: |
|
|
| - retrieval accuracy |
| - semantic separation |
| - ranking robustness |
|
|
| ------------------------------------------------------------------------ |
|
|
| # Training Data |
|
|
| Training was performed using custom datasets we collected. This dataset includes hand-curated examples as well as examples from datasets with commercially-accepable licenses. To curate hard negatives for some examples, LLMs with commercially-permissable licenses were used to generate negatives. |
|
|
| Dataset format: |
|
|
| | Column | Description | |
| |--------|-------------| |
| | anchor | Query or input text | |
| | positive | Relevant document/passage | |
| | negative | Unrelated document/passage, with some examples generated using LLMs to provide hard negatives and some examples chosen at random from existing negatives | |
|
|
| Training size: |
|
|
| - **500,000 training samples** |
| - **20,000 evaluation samples** |
|
|
| The dataset contains a mixture of: |
|
|
| - question → answer pairs |
| - query → document matches |
| - semantic similarity examples |
|
|
| These samples are designed to mimic **real RAG retrieval workloads**. |
|
|
| ------------------------------------------------------------------------ |
|
|
| # Intended Use |
|
|
| Bolt embeddings are designed for: |
|
|
| - Retrieval-Augmented Generation (RAG) |
| - Enterprise document search |
| - Semantic search |
| - Knowledge base retrieval |
| - Question answering |
| - Duplicate detection |
| - Similarity scoring |
|
|
| Typical pipeline: |
|
|
| User query |
| ↓ |
| Bolt embedding |
| ↓ |
| Vector search |
| ↓ |
| Top-k documents |
| ↓ |
| LLM generation |
| |
| ------------------------------------------------------------------------ |
|
|
| # Usage |
|
|
| Install Sentence Transformers: |
|
|
| ``` bash |
| pip install -U sentence-transformers |
| ``` |
|
|
| ### Load the Model |
|
|
| ``` python |
| from sentence_transformers import SentenceTransformer |
| |
| model = SentenceTransformer("aisquared/bolt-embedding-small") |
| ``` |
|
|
| or |
|
|
| ``` python |
| model = SentenceTransformer("aisquared/bolt-embedding-large") |
| ``` |
|
|
| ### Generate Embeddings |
|
|
| ``` python |
| sentences = [ |
| "What are the tax implications of employee stock options?", |
| "Employee stock options may have tax consequences depending on exercise timing.", |
| "The Eiffel Tower is located in Paris." |
| ] |
| |
| embeddings = model.encode(sentences) |
| |
| print(embeddings.shape) |
| ``` |
|
|
| ### Compute Similarity |
|
|
| ``` python |
| similarities = model.similarity(embeddings, embeddings) |
| |
| print(similarities) |
| ``` |
|
|
| ------------------------------------------------------------------------ |
|
|
| # Why Bolt? |
|
|
| Many embedding models are trained on **general semantic similarity |
| tasks**. |
|
|
| Bolt is optimized for **enterprise retrieval**, where queries must |
| locate the correct information among thousands of unrelated documents. |
|
|
| Key differentiators: |
|
|
| - **Large-batch contrastive training** |
| - **RAG-specific dataset** |
| - **Long context support (4096 tokens trained)** |
| - **Optimized for vector database retrieval** |
|
|
| ------------------------------------------------------------------------ |
|
|
| # Framework Versions |
|
|
| Training was performed using: |
|
|
| - Python 3.12 |
| - Sentence Transformers |
| - Transformers |
| - PyTorch |
| - HuggingFace Datasets |
| - HuggingFace Jobs, utilizing 1xA100 GPU |
|
|
| ------------------------------------------------------------------------ |
|
|
| # Citation |
|
|
| If you use Bolt embeddings in research or production systems, please |
| cite the underlying Sentence-BERT work. |
|
|
| ### Sentence-BERT |
|
|
| @inproceedings{reimers-2019-sentence-bert, |
| title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks", |
| author = "Reimers, Nils and Gurevych, Iryna", |
| year = 2019 |
| } |
| |
| ### Cached Multiple Negatives Ranking Loss |
|
|
| @misc{gao2021scaling, |
| title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup}, |
| author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan}, |
| year={2021} |
| } |
| |
| ------------------------------------------------------------------------ |
|
|
| # License |
|
|
| Bolt embeddings is released under the [AI Squared Community License](https://docs.squared.ai/terms-of-use). |
|
|