bolt-instruct-7b / README.md
iansotnek's picture
Update README.md
7ff8d5e verified
---
tags:
- text-generation
- causal-lm
- instruction-tuning
- chat
- rag
- code-generation
- summarization
- extraction
- synthetic-data
- generated_from_trainer
license: other
pipeline_tag: text-generation
library_name: transformers
language:
- en
base_model:
- allenai/OLMo-2-0425-1B-Instruct
- allenai/OLMo-3-7B-Instruct
- allenai/OLMo-3.1-32B-Instruct
---
# Bolt Instruct Models
Bolt Instruct is a family of **instruction-tuned language models designed for high-quality generation, reasoning, and enterprise workflows**.
These models are **fine-tuned from Allen Institute for AI OLMo instruct models** and optimized for:
- General conversational AI
- Structured and controllable generation
- Retrieval-Augmented Generation (RAG)
- Enterprise document understanding
- Code generation and transformation
---
# Model Overview
Bolt Instruct models provide **strong instruction-following capabilities** across diverse tasks with robust long-context support.
Key design goals:
- Strong instruction adherence
- High-quality structured outputs (JSON, extraction)
- RAG-grounded responses
- Long-context support (65k tokens for 7B and 32B)
- Balanced chat, reasoning, and coding performance
---
# Model Variants
| Model | Base Model | Positioning |
|------|------------|------------|
| bolt-instruct-1b | allenai/OLMo-2-0425-1B-Instruct | Lightweight / low-latency |
| bolt-instruct-7b | allenai/OLMo-3-7B-Instruct | Balanced |
| bolt-instruct-32b | allenai/OLMo-3.1-32B-Instruct | Highest quality |
---
# Model Details
- **Type:** Causal LM (instruction-tuned)
- **Max context:** 65,536 tokens (7B and 32B), 4,096 tokens (1B)
- **Training context:** 32k (7B), 16k (32B), 4k (1B)
### Capabilities
- Chat / multi-turn dialogue
- Instruction following
- Structured output (JSON)
- Summarization & transformation
- Extraction
- RAG generation
- Code generation
---
# Training
- **Method:** Supervised Fine-Tuning (SFT)
- **Dataset size:** ~125k conversations
- **Eval set:** ~10k examples
- **Data mix:** public + synthetic + internal tasks
### Training Approach
- 1B → full fine-tune
- 7B / 32B → QLoRA (4-bit)
### Hardware
- 1× A100 80GB GPU
---
# Intended Use
- Chat assistants
- Enterprise copilots
- RAG pipelines
- Document processing
- Structured extraction
- Code assistance
---
# Usage
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "aisquared/bolt-instruct-7b"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
```
---
# Evaluation
To evaluate these models, we ran a subset of tasks using the [Eleuther AI Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness). Below are the metrics for each model.
## Language Model Evaluation Harness
### Evaluation results for aisquared/bolt-instruct-1b:
| Tasks |Version| Filter |n-shot| Metric | |Value | |Stderr|
|----------------------------------------------------------|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|arc_challenge | 1|none | 0|acc |↑ |0.3490|± |0.0139|
| | |none | 0|acc_norm |↑ |0.3823|± |0.0142|
|arc_easy | 1|none | 0|acc |↑ |0.6098|± |0.0100|
| | |none | 0|acc_norm |↑ |0.5560|± |0.0102|
|bbh | 3|get-answer | |exact_match|↑ |0.3081|± |0.0052|
| - bbh_cot_fewshot_boolean_expressions | 4|get-answer | 3|exact_match|↑ |0.5840|± |0.0312|
| - bbh_cot_fewshot_causal_judgement | 4|get-answer | 3|exact_match|↑ |0.5508|± |0.0365|
| - bbh_cot_fewshot_date_understanding | 4|get-answer | 3|exact_match|↑ |0.2600|± |0.0278|
| - bbh_cot_fewshot_disambiguation_qa | 4|get-answer | 3|exact_match|↑ |0.3640|± |0.0305|
| - bbh_cot_fewshot_dyck_languages | 4|get-answer | 3|exact_match|↑ |0.0040|± |0.0040|
| - bbh_cot_fewshot_formal_fallacies | 4|get-answer | 3|exact_match|↑ |0.5040|± |0.0317|
| - bbh_cot_fewshot_geometric_shapes | 4|get-answer | 3|exact_match|↑ |0.0920|± |0.0183|
| - bbh_cot_fewshot_hyperbaton | 4|get-answer | 3|exact_match|↑ |0.5240|± |0.0316|
| - bbh_cot_fewshot_logical_deduction_five_objects | 4|get-answer | 3|exact_match|↑ |0.1720|± |0.0239|
| - bbh_cot_fewshot_logical_deduction_seven_objects | 4|get-answer | 3|exact_match|↑ |0.1080|± |0.0197|
| - bbh_cot_fewshot_logical_deduction_three_objects | 4|get-answer | 3|exact_match|↑ |0.3520|± |0.0303|
| - bbh_cot_fewshot_movie_recommendation | 4|get-answer | 3|exact_match|↑ |0.5040|± |0.0317|
| - bbh_cot_fewshot_multistep_arithmetic_two | 4|get-answer | 3|exact_match|↑ |0.0600|± |0.0151|
| - bbh_cot_fewshot_navigate | 4|get-answer | 3|exact_match|↑ |0.5560|± |0.0315|
| - bbh_cot_fewshot_object_counting | 4|get-answer | 3|exact_match|↑ |0.4360|± |0.0314|
| - bbh_cot_fewshot_penguins_in_a_table | 4|get-answer | 3|exact_match|↑ |0.2123|± |0.0340|
| - bbh_cot_fewshot_reasoning_about_colored_objects | 4|get-answer | 3|exact_match|↑ |0.2440|± |0.0272|
| - bbh_cot_fewshot_ruin_names | 4|get-answer | 3|exact_match|↑ |0.2440|± |0.0272|
| - bbh_cot_fewshot_salient_translation_error_detection | 4|get-answer | 3|exact_match|↑ |0.1920|± |0.0250|
| - bbh_cot_fewshot_snarks | 4|get-answer | 3|exact_match|↑ |0.3989|± |0.0368|
| - bbh_cot_fewshot_sports_understanding | 4|get-answer | 3|exact_match|↑ |0.6560|± |0.0301|
| - bbh_cot_fewshot_temporal_sequences | 4|get-answer | 3|exact_match|↑ |0.2760|± |0.0283|
| - bbh_cot_fewshot_tracking_shuffled_objects_five_objects | 4|get-answer | 3|exact_match|↑ |0.1920|± |0.0250|
| - bbh_cot_fewshot_tracking_shuffled_objects_seven_objects| 4|get-answer | 3|exact_match|↑ |0.0360|± |0.0118|
| - bbh_cot_fewshot_tracking_shuffled_objects_three_objects| 4|get-answer | 3|exact_match|↑ |0.2840|± |0.0286|
| - bbh_cot_fewshot_web_of_lies | 4|get-answer | 3|exact_match|↑ |0.5240|± |0.0316|
| - bbh_cot_fewshot_word_sorting | 4|get-answer | 3|exact_match|↑ |0.0360|± |0.0118|
|gsm8k | 3|flexible-extract| 5|exact_match|↑ |0.5072|± |0.0138|
| | |strict-match | 5|exact_match|↑ |0.4943|± |0.0138|
|hellaswag | 1|none | 0|acc |↑ |0.4729|± |0.0050|
| | |none | 0|acc_norm |↑ |0.6181|± |0.0048|
|mmlu_pro | 2|custom-extract | |exact_match|↑ |0.1435|± |0.0032|
| - biology | 3|custom-extract | 5|exact_match|↑ |0.2050|± |0.0151|
| - business | 3|custom-extract | 5|exact_match|↑ |0.1369|± |0.0122|
| - chemistry | 3|custom-extract | 5|exact_match|↑ |0.0848|± |0.0083|
| - computer_science | 3|custom-extract | 5|exact_match|↑ |0.1415|± |0.0172|
| - economics | 3|custom-extract | 5|exact_match|↑ |0.1943|± |0.0136|
| - engineering | 3|custom-extract | 5|exact_match|↑ |0.0929|± |0.0093|
| - health | 3|custom-extract | 5|exact_match|↑ |0.1528|± |0.0126|
| - history | 3|custom-extract | 5|exact_match|↑ |0.1549|± |0.0186|
| - law | 3|custom-extract | 5|exact_match|↑ |0.1081|± |0.0094|
| - math | 3|custom-extract | 5|exact_match|↑ |0.1414|± |0.0095|
| - other | 3|custom-extract | 5|exact_match|↑ |0.1916|± |0.0130|
| - philosophy | 3|custom-extract | 5|exact_match|↑ |0.1383|± |0.0155|
| - physics | 3|custom-extract | 5|exact_match|↑ |0.1186|± |0.0090|
| - psychology | 3|custom-extract | 5|exact_match|↑ |0.2130|± |0.0145|
|truthfulqa_mc2 | 3|none | 0|acc |↑ |0.4734|± |0.0153|
|winogrande | 1|none | 0|acc |↑ |0.6156|± |0.0137|
### Evaluation results for aisquared/bolt-instruct-7b:
| Tasks |Version| Filter |n-shot| Metric | |Value | |Stderr|
|----------------------------------------------------------|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|arc_challenge | 1|none | 0|acc |↑ |0.4778|± |0.0146|
| | |none | 0|acc_norm |↑ |0.4957|± |0.0146|
|arc_easy | 1|none | 0|acc |↑ |0.7534|± |0.0088|
| | |none | 0|acc_norm |↑ |0.7311|± |0.0091|
|bbh | 3|get-answer | |exact_match|↑ |0.3038|± |0.0047|
| - bbh_cot_fewshot_boolean_expressions | 4|get-answer | 3|exact_match|↑ |0.0000|± |0.0000|
| - bbh_cot_fewshot_causal_judgement | 4|get-answer | 3|exact_match|↑ |0.5668|± |0.0363|
| - bbh_cot_fewshot_date_understanding | 4|get-answer | 3|exact_match|↑ |0.4480|± |0.0315|
| - bbh_cot_fewshot_disambiguation_qa | 4|get-answer | 3|exact_match|↑ |0.0000|± |0.0000|
| - bbh_cot_fewshot_dyck_languages | 4|get-answer | 3|exact_match|↑ |0.0000|± |0.0000|
| - bbh_cot_fewshot_formal_fallacies | 4|get-answer | 3|exact_match|↑ |0.2240|± |0.0264|
| - bbh_cot_fewshot_geometric_shapes | 4|get-answer | 3|exact_match|↑ |0.2960|± |0.0289|
| - bbh_cot_fewshot_hyperbaton | 4|get-answer | 3|exact_match|↑ |0.5200|± |0.0317|
| - bbh_cot_fewshot_logical_deduction_five_objects | 4|get-answer | 3|exact_match|↑ |0.0200|± |0.0089|
| - bbh_cot_fewshot_logical_deduction_seven_objects | 4|get-answer | 3|exact_match|↑ |0.0000|± |0.0000|
| - bbh_cot_fewshot_logical_deduction_three_objects | 4|get-answer | 3|exact_match|↑ |0.6720|± |0.0298|
| - bbh_cot_fewshot_movie_recommendation | 4|get-answer | 3|exact_match|↑ |0.1200|± |0.0206|
| - bbh_cot_fewshot_multistep_arithmetic_two | 4|get-answer | 3|exact_match|↑ |0.0000|± |0.0000|
| - bbh_cot_fewshot_navigate | 4|get-answer | 3|exact_match|↑ |0.5560|± |0.0315|
| - bbh_cot_fewshot_object_counting | 4|get-answer | 3|exact_match|↑ |0.1520|± |0.0228|
| - bbh_cot_fewshot_penguins_in_a_table | 4|get-answer | 3|exact_match|↑ |0.4110|± |0.0409|
| - bbh_cot_fewshot_reasoning_about_colored_objects | 4|get-answer | 3|exact_match|↑ |0.1880|± |0.0248|
| - bbh_cot_fewshot_ruin_names | 4|get-answer | 3|exact_match|↑ |0.4800|± |0.0317|
| - bbh_cot_fewshot_salient_translation_error_detection | 4|get-answer | 3|exact_match|↑ |0.4760|± |0.0316|
| - bbh_cot_fewshot_snarks | 4|get-answer | 3|exact_match|↑ |0.2921|± |0.0342|
| - bbh_cot_fewshot_sports_understanding | 4|get-answer | 3|exact_match|↑ |0.6760|± |0.0297|
| - bbh_cot_fewshot_temporal_sequences | 4|get-answer | 3|exact_match|↑ |0.5880|± |0.0312|
| - bbh_cot_fewshot_tracking_shuffled_objects_five_objects | 4|get-answer | 3|exact_match|↑ |0.0000|± |0.0000|
| - bbh_cot_fewshot_tracking_shuffled_objects_seven_objects| 4|get-answer | 3|exact_match|↑ |0.0000|± |0.0000|
| - bbh_cot_fewshot_tracking_shuffled_objects_three_objects| 4|get-answer | 3|exact_match|↑ |0.8280|± |0.0239|
| - bbh_cot_fewshot_web_of_lies | 4|get-answer | 3|exact_match|↑ |0.6560|± |0.0301|
| - bbh_cot_fewshot_word_sorting | 4|get-answer | 3|exact_match|↑ |0.1400|± |0.0220|
|gsm8k | 3|flexible-extract| 5|exact_match|↑ |0.7998|± |0.0110|
| | |strict-match | 5|exact_match|↑ |0.7392|± |0.0121|
|hellaswag | 1|none | 0|acc |↑ |0.4882|± |0.0050|
| | |none | 0|acc_norm |↑ |0.6165|± |0.0049|
|mmlu_pro | 2|custom-extract | |exact_match|↑ |0.4978|± |0.0044|
| - biology | 3|custom-extract | 5|exact_match|↑ |0.6848|± |0.0174|
| - business | 3|custom-extract | 5|exact_match|↑ |0.5729|± |0.0176|
| - chemistry | 3|custom-extract | 5|exact_match|↑ |0.5380|± |0.0148|
| - computer_science | 3|custom-extract | 5|exact_match|↑ |0.5878|± |0.0243|
| - economics | 3|custom-extract | 5|exact_match|↑ |0.5592|± |0.0171|
| - engineering | 3|custom-extract | 5|exact_match|↑ |0.2405|± |0.0137|
| - health | 3|custom-extract | 5|exact_match|↑ |0.4670|± |0.0175|
| - history | 3|custom-extract | 5|exact_match|↑ |0.3727|± |0.0248|
| - law | 3|custom-extract | 5|exact_match|↑ |0.2525|± |0.0131|
| - math | 3|custom-extract | 5|exact_match|↑ |0.7158|± |0.0123|
| - other | 3|custom-extract | 5|exact_match|↑ |0.4351|± |0.0163|
| - philosophy | 3|custom-extract | 5|exact_match|↑ |0.4128|± |0.0221|
| - physics | 3|custom-extract | 5|exact_match|↑ |0.5142|± |0.0139|
| - psychology | 3|custom-extract | 5|exact_match|↑ |0.5602|± |0.0176|
|truthfulqa_mc2 | 3|none | 0|acc |↑ |0.5666|± |0.0162|
|winogrande | 1|none | 0|acc |↑ |0.6385|± |0.0135|
### Evaluation results for aisquared/bolt-instruct-32b:
| Tasks |Version| Filter |n-shot| Metric | |Value | |Stderr|
|----------------------------------------------------------|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|arc_challenge | 1|none | 0|acc |↑ |0.5776|± |0.0144|
| | |none | 0|acc_norm |↑ |0.6007|± |0.0143|
|arc_easy | 1|none | 0|acc |↑ |0.8333|± |0.0076|
| | |none | 0|acc_norm |↑ |0.8228|± |0.0078|
|bbh | 3|get-answer | |exact_match|↑ |0.3087|± |0.0048|
| - bbh_cot_fewshot_boolean_expressions | 4|get-answer | 3|exact_match|↑ |0.5760|± |0.0313|
| - bbh_cot_fewshot_causal_judgement | 4|get-answer | 3|exact_match|↑ |0.5882|± |0.0361|
| - bbh_cot_fewshot_date_understanding | 4|get-answer | 3|exact_match|↑ |0.6640|± |0.0299|
| - bbh_cot_fewshot_disambiguation_qa | 4|get-answer | 3|exact_match|↑ |0.1920|± |0.0250|
| - bbh_cot_fewshot_dyck_languages | 4|get-answer | 3|exact_match|↑ |0.0000|± |0.0000|
| - bbh_cot_fewshot_formal_fallacies | 4|get-answer | 3|exact_match|↑ |0.0480|± |0.0135|
| - bbh_cot_fewshot_geometric_shapes | 4|get-answer | 3|exact_match|↑ |0.2760|± |0.0283|
| - bbh_cot_fewshot_hyperbaton | 4|get-answer | 3|exact_match|↑ |0.3200|± |0.0296|
| - bbh_cot_fewshot_logical_deduction_five_objects | 4|get-answer | 3|exact_match|↑ |0.0000|± |0.0000|
| - bbh_cot_fewshot_logical_deduction_seven_objects | 4|get-answer | 3|exact_match|↑ |0.0000|± |0.0000|
| - bbh_cot_fewshot_logical_deduction_three_objects | 4|get-answer | 3|exact_match|↑ |0.5400|± |0.0316|
| - bbh_cot_fewshot_movie_recommendation | 4|get-answer | 3|exact_match|↑ |0.6000|± |0.0310|
| - bbh_cot_fewshot_multistep_arithmetic_two | 4|get-answer | 3|exact_match|↑ |0.0000|± |0.0000|
| - bbh_cot_fewshot_navigate | 4|get-answer | 3|exact_match|↑ |0.0160|± |0.0080|
| - bbh_cot_fewshot_object_counting | 4|get-answer | 3|exact_match|↑ |0.5120|± |0.0317|
| - bbh_cot_fewshot_penguins_in_a_table | 4|get-answer | 3|exact_match|↑ |0.2945|± |0.0379|
| - bbh_cot_fewshot_reasoning_about_colored_objects | 4|get-answer | 3|exact_match|↑ |0.2280|± |0.0266|
| - bbh_cot_fewshot_ruin_names | 4|get-answer | 3|exact_match|↑ |0.5120|± |0.0317|
| - bbh_cot_fewshot_salient_translation_error_detection | 4|get-answer | 3|exact_match|↑ |0.5440|± |0.0316|
| - bbh_cot_fewshot_snarks | 4|get-answer | 3|exact_match|↑ |0.7079|± |0.0342|
| - bbh_cot_fewshot_sports_understanding | 4|get-answer | 3|exact_match|↑ |0.4880|± |0.0317|
| - bbh_cot_fewshot_temporal_sequences | 4|get-answer | 3|exact_match|↑ |0.3120|± |0.0294|
| - bbh_cot_fewshot_tracking_shuffled_objects_five_objects | 4|get-answer | 3|exact_match|↑ |0.0000|± |0.0000|
| - bbh_cot_fewshot_tracking_shuffled_objects_seven_objects| 4|get-answer | 3|exact_match|↑ |0.0000|± |0.0000|
| - bbh_cot_fewshot_tracking_shuffled_objects_three_objects| 4|get-answer | 3|exact_match|↑ |0.6280|± |0.0306|
| - bbh_cot_fewshot_web_of_lies | 4|get-answer | 3|exact_match|↑ |0.4400|± |0.0315|
| - bbh_cot_fewshot_word_sorting | 4|get-answer | 3|exact_match|↑ |0.0280|± |0.0105|
|gsm8k | 3|flexible-extract| 5|exact_match|↑ |0.8795|± |0.0090|
| | |strict-match | 5|exact_match|↑ |0.7801|± |0.0114|
|hellaswag | 1|none | 0|acc |↑ |0.5407|± |0.0050|
| | |none | 0|acc_norm |↑ |0.6763|± |0.0047|
|mmlu_pro | 2|custom-extract | |exact_match|↑ |0.6340|± |0.0042|
| - biology | 3|custom-extract | 5|exact_match|↑ |0.8117|± |0.0146|
| - business | 3|custom-extract | 5|exact_match|↑ |0.6907|± |0.0165|
| - chemistry | 3|custom-extract | 5|exact_match|↑ |0.6431|± |0.0142|
| - computer_science | 3|custom-extract | 5|exact_match|↑ |0.6951|± |0.0228|
| - economics | 3|custom-extract | 5|exact_match|↑ |0.7405|± |0.0151|
| - engineering | 3|custom-extract | 5|exact_match|↑ |0.3447|± |0.0153|
| - health | 3|custom-extract | 5|exact_match|↑ |0.6540|± |0.0166|
| - history | 3|custom-extract | 5|exact_match|↑ |0.5512|± |0.0255|
| - law | 3|custom-extract | 5|exact_match|↑ |0.3860|± |0.0147|
| - math | 3|custom-extract | 5|exact_match|↑ |0.7979|± |0.0109|
| - other | 3|custom-extract | 5|exact_match|↑ |0.6028|± |0.0161|
| - philosophy | 3|custom-extract | 5|exact_match|↑ |0.5912|± |0.0220|
| - physics | 3|custom-extract | 5|exact_match|↑ |0.6551|± |0.0132|
| - psychology | 3|custom-extract | 5|exact_match|↑ |0.7243|± |0.0158|
|truthfulqa_mc2 | 3|none | 0|acc |↑ |0.6906|± |0.0153|
|winogrande | 1|none | 0|acc |↑ |0.6630|± |0.0133|
---
# Limitations
- May hallucinate without grounding
- Performance varies by model size
- Not suitable for high-risk domains without oversight
---
# License
Bolt Instruct is released under the [AI Squared Community License](https://docs.squared.ai/terms-of-use).