bolt-instruct-7b / README.md
iansotnek's picture
Update README.md
7ff8d5e verified
metadata
tags:
  - text-generation
  - causal-lm
  - instruction-tuning
  - chat
  - rag
  - code-generation
  - summarization
  - extraction
  - synthetic-data
  - generated_from_trainer
license: other
pipeline_tag: text-generation
library_name: transformers
language:
  - en
base_model:
  - allenai/OLMo-2-0425-1B-Instruct
  - allenai/OLMo-3-7B-Instruct
  - allenai/OLMo-3.1-32B-Instruct

Bolt Instruct Models

Bolt Instruct is a family of instruction-tuned language models designed for high-quality generation, reasoning, and enterprise workflows.

These models are fine-tuned from Allen Institute for AI OLMo instruct models and optimized for:

  • General conversational AI
  • Structured and controllable generation
  • Retrieval-Augmented Generation (RAG)
  • Enterprise document understanding
  • Code generation and transformation

Model Overview

Bolt Instruct models provide strong instruction-following capabilities across diverse tasks with robust long-context support.

Key design goals:

  • Strong instruction adherence
  • High-quality structured outputs (JSON, extraction)
  • RAG-grounded responses
  • Long-context support (65k tokens for 7B and 32B)
  • Balanced chat, reasoning, and coding performance

Model Variants

Model Base Model Positioning
bolt-instruct-1b allenai/OLMo-2-0425-1B-Instruct Lightweight / low-latency
bolt-instruct-7b allenai/OLMo-3-7B-Instruct Balanced
bolt-instruct-32b allenai/OLMo-3.1-32B-Instruct Highest quality

Model Details

  • Type: Causal LM (instruction-tuned)
  • Max context: 65,536 tokens (7B and 32B), 4,096 tokens (1B)
  • Training context: 32k (7B), 16k (32B), 4k (1B)

Capabilities

  • Chat / multi-turn dialogue
  • Instruction following
  • Structured output (JSON)
  • Summarization & transformation
  • Extraction
  • RAG generation
  • Code generation

Training

  • Method: Supervised Fine-Tuning (SFT)
  • Dataset size: ~125k conversations
  • Eval set: ~10k examples
  • Data mix: public + synthetic + internal tasks

Training Approach

  • 1B → full fine-tune
  • 7B / 32B → QLoRA (4-bit)

Hardware

  • 1× A100 80GB GPU

Intended Use

  • Chat assistants
  • Enterprise copilots
  • RAG pipelines
  • Document processing
  • Structured extraction
  • Code assistance

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "aisquared/bolt-instruct-7b"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

Evaluation

To evaluate these models, we ran a subset of tasks using the Eleuther AI Language Model Evaluation Harness. Below are the metrics for each model.

Language Model Evaluation Harness

Evaluation results for aisquared/bolt-instruct-1b:

Tasks Version Filter n-shot Metric Value Stderr
arc_challenge 1 none 0 acc 0.3490 ± 0.0139
none 0 acc_norm 0.3823 ± 0.0142
arc_easy 1 none 0 acc 0.6098 ± 0.0100
none 0 acc_norm 0.5560 ± 0.0102
bbh 3 get-answer exact_match 0.3081 ± 0.0052
- bbh_cot_fewshot_boolean_expressions 4 get-answer 3 exact_match 0.5840 ± 0.0312
- bbh_cot_fewshot_causal_judgement 4 get-answer 3 exact_match 0.5508 ± 0.0365
- bbh_cot_fewshot_date_understanding 4 get-answer 3 exact_match 0.2600 ± 0.0278
- bbh_cot_fewshot_disambiguation_qa 4 get-answer 3 exact_match 0.3640 ± 0.0305
- bbh_cot_fewshot_dyck_languages 4 get-answer 3 exact_match 0.0040 ± 0.0040
- bbh_cot_fewshot_formal_fallacies 4 get-answer 3 exact_match 0.5040 ± 0.0317
- bbh_cot_fewshot_geometric_shapes 4 get-answer 3 exact_match 0.0920 ± 0.0183
- bbh_cot_fewshot_hyperbaton 4 get-answer 3 exact_match 0.5240 ± 0.0316
- bbh_cot_fewshot_logical_deduction_five_objects 4 get-answer 3 exact_match 0.1720 ± 0.0239
- bbh_cot_fewshot_logical_deduction_seven_objects 4 get-answer 3 exact_match 0.1080 ± 0.0197
- bbh_cot_fewshot_logical_deduction_three_objects 4 get-answer 3 exact_match 0.3520 ± 0.0303
- bbh_cot_fewshot_movie_recommendation 4 get-answer 3 exact_match 0.5040 ± 0.0317
- bbh_cot_fewshot_multistep_arithmetic_two 4 get-answer 3 exact_match 0.0600 ± 0.0151
- bbh_cot_fewshot_navigate 4 get-answer 3 exact_match 0.5560 ± 0.0315
- bbh_cot_fewshot_object_counting 4 get-answer 3 exact_match 0.4360 ± 0.0314
- bbh_cot_fewshot_penguins_in_a_table 4 get-answer 3 exact_match 0.2123 ± 0.0340
- bbh_cot_fewshot_reasoning_about_colored_objects 4 get-answer 3 exact_match 0.2440 ± 0.0272
- bbh_cot_fewshot_ruin_names 4 get-answer 3 exact_match 0.2440 ± 0.0272
- bbh_cot_fewshot_salient_translation_error_detection 4 get-answer 3 exact_match 0.1920 ± 0.0250
- bbh_cot_fewshot_snarks 4 get-answer 3 exact_match 0.3989 ± 0.0368
- bbh_cot_fewshot_sports_understanding 4 get-answer 3 exact_match 0.6560 ± 0.0301
- bbh_cot_fewshot_temporal_sequences 4 get-answer 3 exact_match 0.2760 ± 0.0283
- bbh_cot_fewshot_tracking_shuffled_objects_five_objects 4 get-answer 3 exact_match 0.1920 ± 0.0250
- bbh_cot_fewshot_tracking_shuffled_objects_seven_objects 4 get-answer 3 exact_match 0.0360 ± 0.0118
- bbh_cot_fewshot_tracking_shuffled_objects_three_objects 4 get-answer 3 exact_match 0.2840 ± 0.0286
- bbh_cot_fewshot_web_of_lies 4 get-answer 3 exact_match 0.5240 ± 0.0316
- bbh_cot_fewshot_word_sorting 4 get-answer 3 exact_match 0.0360 ± 0.0118
gsm8k 3 flexible-extract 5 exact_match 0.5072 ± 0.0138
strict-match 5 exact_match 0.4943 ± 0.0138
hellaswag 1 none 0 acc 0.4729 ± 0.0050
none 0 acc_norm 0.6181 ± 0.0048
mmlu_pro 2 custom-extract exact_match 0.1435 ± 0.0032
- biology 3 custom-extract 5 exact_match 0.2050 ± 0.0151
- business 3 custom-extract 5 exact_match 0.1369 ± 0.0122
- chemistry 3 custom-extract 5 exact_match 0.0848 ± 0.0083
- computer_science 3 custom-extract 5 exact_match 0.1415 ± 0.0172
- economics 3 custom-extract 5 exact_match 0.1943 ± 0.0136
- engineering 3 custom-extract 5 exact_match 0.0929 ± 0.0093
- health 3 custom-extract 5 exact_match 0.1528 ± 0.0126
- history 3 custom-extract 5 exact_match 0.1549 ± 0.0186
- law 3 custom-extract 5 exact_match 0.1081 ± 0.0094
- math 3 custom-extract 5 exact_match 0.1414 ± 0.0095
- other 3 custom-extract 5 exact_match 0.1916 ± 0.0130
- philosophy 3 custom-extract 5 exact_match 0.1383 ± 0.0155
- physics 3 custom-extract 5 exact_match 0.1186 ± 0.0090
- psychology 3 custom-extract 5 exact_match 0.2130 ± 0.0145
truthfulqa_mc2 3 none 0 acc 0.4734 ± 0.0153
winogrande 1 none 0 acc 0.6156 ± 0.0137

Evaluation results for aisquared/bolt-instruct-7b:

Tasks Version Filter n-shot Metric Value Stderr
arc_challenge 1 none 0 acc 0.4778 ± 0.0146
none 0 acc_norm 0.4957 ± 0.0146
arc_easy 1 none 0 acc 0.7534 ± 0.0088
none 0 acc_norm 0.7311 ± 0.0091
bbh 3 get-answer exact_match 0.3038 ± 0.0047
- bbh_cot_fewshot_boolean_expressions 4 get-answer 3 exact_match 0.0000 ± 0.0000
- bbh_cot_fewshot_causal_judgement 4 get-answer 3 exact_match 0.5668 ± 0.0363
- bbh_cot_fewshot_date_understanding 4 get-answer 3 exact_match 0.4480 ± 0.0315
- bbh_cot_fewshot_disambiguation_qa 4 get-answer 3 exact_match 0.0000 ± 0.0000
- bbh_cot_fewshot_dyck_languages 4 get-answer 3 exact_match 0.0000 ± 0.0000
- bbh_cot_fewshot_formal_fallacies 4 get-answer 3 exact_match 0.2240 ± 0.0264
- bbh_cot_fewshot_geometric_shapes 4 get-answer 3 exact_match 0.2960 ± 0.0289
- bbh_cot_fewshot_hyperbaton 4 get-answer 3 exact_match 0.5200 ± 0.0317
- bbh_cot_fewshot_logical_deduction_five_objects 4 get-answer 3 exact_match 0.0200 ± 0.0089
- bbh_cot_fewshot_logical_deduction_seven_objects 4 get-answer 3 exact_match 0.0000 ± 0.0000
- bbh_cot_fewshot_logical_deduction_three_objects 4 get-answer 3 exact_match 0.6720 ± 0.0298
- bbh_cot_fewshot_movie_recommendation 4 get-answer 3 exact_match 0.1200 ± 0.0206
- bbh_cot_fewshot_multistep_arithmetic_two 4 get-answer 3 exact_match 0.0000 ± 0.0000
- bbh_cot_fewshot_navigate 4 get-answer 3 exact_match 0.5560 ± 0.0315
- bbh_cot_fewshot_object_counting 4 get-answer 3 exact_match 0.1520 ± 0.0228
- bbh_cot_fewshot_penguins_in_a_table 4 get-answer 3 exact_match 0.4110 ± 0.0409
- bbh_cot_fewshot_reasoning_about_colored_objects 4 get-answer 3 exact_match 0.1880 ± 0.0248
- bbh_cot_fewshot_ruin_names 4 get-answer 3 exact_match 0.4800 ± 0.0317
- bbh_cot_fewshot_salient_translation_error_detection 4 get-answer 3 exact_match 0.4760 ± 0.0316
- bbh_cot_fewshot_snarks 4 get-answer 3 exact_match 0.2921 ± 0.0342
- bbh_cot_fewshot_sports_understanding 4 get-answer 3 exact_match 0.6760 ± 0.0297
- bbh_cot_fewshot_temporal_sequences 4 get-answer 3 exact_match 0.5880 ± 0.0312
- bbh_cot_fewshot_tracking_shuffled_objects_five_objects 4 get-answer 3 exact_match 0.0000 ± 0.0000
- bbh_cot_fewshot_tracking_shuffled_objects_seven_objects 4 get-answer 3 exact_match 0.0000 ± 0.0000
- bbh_cot_fewshot_tracking_shuffled_objects_three_objects 4 get-answer 3 exact_match 0.8280 ± 0.0239
- bbh_cot_fewshot_web_of_lies 4 get-answer 3 exact_match 0.6560 ± 0.0301
- bbh_cot_fewshot_word_sorting 4 get-answer 3 exact_match 0.1400 ± 0.0220
gsm8k 3 flexible-extract 5 exact_match 0.7998 ± 0.0110
strict-match 5 exact_match 0.7392 ± 0.0121
hellaswag 1 none 0 acc 0.4882 ± 0.0050
none 0 acc_norm 0.6165 ± 0.0049
mmlu_pro 2 custom-extract exact_match 0.4978 ± 0.0044
- biology 3 custom-extract 5 exact_match 0.6848 ± 0.0174
- business 3 custom-extract 5 exact_match 0.5729 ± 0.0176
- chemistry 3 custom-extract 5 exact_match 0.5380 ± 0.0148
- computer_science 3 custom-extract 5 exact_match 0.5878 ± 0.0243
- economics 3 custom-extract 5 exact_match 0.5592 ± 0.0171
- engineering 3 custom-extract 5 exact_match 0.2405 ± 0.0137
- health 3 custom-extract 5 exact_match 0.4670 ± 0.0175
- history 3 custom-extract 5 exact_match 0.3727 ± 0.0248
- law 3 custom-extract 5 exact_match 0.2525 ± 0.0131
- math 3 custom-extract 5 exact_match 0.7158 ± 0.0123
- other 3 custom-extract 5 exact_match 0.4351 ± 0.0163
- philosophy 3 custom-extract 5 exact_match 0.4128 ± 0.0221
- physics 3 custom-extract 5 exact_match 0.5142 ± 0.0139
- psychology 3 custom-extract 5 exact_match 0.5602 ± 0.0176
truthfulqa_mc2 3 none 0 acc 0.5666 ± 0.0162
winogrande 1 none 0 acc 0.6385 ± 0.0135

Evaluation results for aisquared/bolt-instruct-32b:

Tasks Version Filter n-shot Metric Value Stderr
arc_challenge 1 none 0 acc 0.5776 ± 0.0144
none 0 acc_norm 0.6007 ± 0.0143
arc_easy 1 none 0 acc 0.8333 ± 0.0076
none 0 acc_norm 0.8228 ± 0.0078
bbh 3 get-answer exact_match 0.3087 ± 0.0048
- bbh_cot_fewshot_boolean_expressions 4 get-answer 3 exact_match 0.5760 ± 0.0313
- bbh_cot_fewshot_causal_judgement 4 get-answer 3 exact_match 0.5882 ± 0.0361
- bbh_cot_fewshot_date_understanding 4 get-answer 3 exact_match 0.6640 ± 0.0299
- bbh_cot_fewshot_disambiguation_qa 4 get-answer 3 exact_match 0.1920 ± 0.0250
- bbh_cot_fewshot_dyck_languages 4 get-answer 3 exact_match 0.0000 ± 0.0000
- bbh_cot_fewshot_formal_fallacies 4 get-answer 3 exact_match 0.0480 ± 0.0135
- bbh_cot_fewshot_geometric_shapes 4 get-answer 3 exact_match 0.2760 ± 0.0283
- bbh_cot_fewshot_hyperbaton 4 get-answer 3 exact_match 0.3200 ± 0.0296
- bbh_cot_fewshot_logical_deduction_five_objects 4 get-answer 3 exact_match 0.0000 ± 0.0000
- bbh_cot_fewshot_logical_deduction_seven_objects 4 get-answer 3 exact_match 0.0000 ± 0.0000
- bbh_cot_fewshot_logical_deduction_three_objects 4 get-answer 3 exact_match 0.5400 ± 0.0316
- bbh_cot_fewshot_movie_recommendation 4 get-answer 3 exact_match 0.6000 ± 0.0310
- bbh_cot_fewshot_multistep_arithmetic_two 4 get-answer 3 exact_match 0.0000 ± 0.0000
- bbh_cot_fewshot_navigate 4 get-answer 3 exact_match 0.0160 ± 0.0080
- bbh_cot_fewshot_object_counting 4 get-answer 3 exact_match 0.5120 ± 0.0317
- bbh_cot_fewshot_penguins_in_a_table 4 get-answer 3 exact_match 0.2945 ± 0.0379
- bbh_cot_fewshot_reasoning_about_colored_objects 4 get-answer 3 exact_match 0.2280 ± 0.0266
- bbh_cot_fewshot_ruin_names 4 get-answer 3 exact_match 0.5120 ± 0.0317
- bbh_cot_fewshot_salient_translation_error_detection 4 get-answer 3 exact_match 0.5440 ± 0.0316
- bbh_cot_fewshot_snarks 4 get-answer 3 exact_match 0.7079 ± 0.0342
- bbh_cot_fewshot_sports_understanding 4 get-answer 3 exact_match 0.4880 ± 0.0317
- bbh_cot_fewshot_temporal_sequences 4 get-answer 3 exact_match 0.3120 ± 0.0294
- bbh_cot_fewshot_tracking_shuffled_objects_five_objects 4 get-answer 3 exact_match 0.0000 ± 0.0000
- bbh_cot_fewshot_tracking_shuffled_objects_seven_objects 4 get-answer 3 exact_match 0.0000 ± 0.0000
- bbh_cot_fewshot_tracking_shuffled_objects_three_objects 4 get-answer 3 exact_match 0.6280 ± 0.0306
- bbh_cot_fewshot_web_of_lies 4 get-answer 3 exact_match 0.4400 ± 0.0315
- bbh_cot_fewshot_word_sorting 4 get-answer 3 exact_match 0.0280 ± 0.0105
gsm8k 3 flexible-extract 5 exact_match 0.8795 ± 0.0090
strict-match 5 exact_match 0.7801 ± 0.0114
hellaswag 1 none 0 acc 0.5407 ± 0.0050
none 0 acc_norm 0.6763 ± 0.0047
mmlu_pro 2 custom-extract exact_match 0.6340 ± 0.0042
- biology 3 custom-extract 5 exact_match 0.8117 ± 0.0146
- business 3 custom-extract 5 exact_match 0.6907 ± 0.0165
- chemistry 3 custom-extract 5 exact_match 0.6431 ± 0.0142
- computer_science 3 custom-extract 5 exact_match 0.6951 ± 0.0228
- economics 3 custom-extract 5 exact_match 0.7405 ± 0.0151
- engineering 3 custom-extract 5 exact_match 0.3447 ± 0.0153
- health 3 custom-extract 5 exact_match 0.6540 ± 0.0166
- history 3 custom-extract 5 exact_match 0.5512 ± 0.0255
- law 3 custom-extract 5 exact_match 0.3860 ± 0.0147
- math 3 custom-extract 5 exact_match 0.7979 ± 0.0109
- other 3 custom-extract 5 exact_match 0.6028 ± 0.0161
- philosophy 3 custom-extract 5 exact_match 0.5912 ± 0.0220
- physics 3 custom-extract 5 exact_match 0.6551 ± 0.0132
- psychology 3 custom-extract 5 exact_match 0.7243 ± 0.0158
truthfulqa_mc2 3 none 0 acc 0.6906 ± 0.0153
winogrande 1 none 0 acc 0.6630 ± 0.0133

Limitations

  • May hallucinate without grounding
  • Performance varies by model size
  • Not suitable for high-risk domains without oversight

License

Bolt Instruct is released under the AI Squared Community License.