Instructions to use aisquared/bolt-instruct-7b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use aisquared/bolt-instruct-7b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="aisquared/bolt-instruct-7b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("aisquared/bolt-instruct-7b")
model = AutoModelForCausalLM.from_pretrained("aisquared/bolt-instruct-7b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use aisquared/bolt-instruct-7b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "aisquared/bolt-instruct-7b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "aisquared/bolt-instruct-7b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/aisquared/bolt-instruct-7b

SGLang

How to use aisquared/bolt-instruct-7b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "aisquared/bolt-instruct-7b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "aisquared/bolt-instruct-7b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "aisquared/bolt-instruct-7b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "aisquared/bolt-instruct-7b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use aisquared/bolt-instruct-7b with Docker Model Runner:
```
docker model run hf.co/aisquared/bolt-instruct-7b
```

bolt-instruct-7b / README.md

iansotnek

Update README.md

7ff8d5e verified 19 days ago

preview code

raw

history blame contribute delete

24.7 kB

	---
	tags:
	- text-generation
	- causal-lm
	- instruction-tuning
	- chat
	- rag
	- code-generation
	- summarization
	- extraction
	- synthetic-data
	- generated_from_trainer
	license: other
	pipeline_tag: text-generation
	library_name: transformers
	language:
	- en
	base_model:
	- allenai/OLMo-2-0425-1B-Instruct
	- allenai/OLMo-3-7B-Instruct
	- allenai/OLMo-3.1-32B-Instruct
	---

	# Bolt Instruct Models

	Bolt Instruct is a family of instruction-tuned language models designed for high-quality generation, reasoning, and enterprise workflows.

	These models are fine-tuned from Allen Institute for AI OLMo instruct models and optimized for:

	- General conversational AI
	- Structured and controllable generation
	- Retrieval-Augmented Generation (RAG)
	- Enterprise document understanding
	- Code generation and transformation

	---

	# Model Overview

	Bolt Instruct models provide strong instruction-following capabilities across diverse tasks with robust long-context support.

	Key design goals:

	- Strong instruction adherence
	- High-quality structured outputs (JSON, extraction)
	- RAG-grounded responses
	- Long-context support (65k tokens for 7B and 32B)
	- Balanced chat, reasoning, and coding performance

	---

	# Model Variants

	\| Model \| Base Model \| Positioning \|
	\|------\|------------\|------------\|
	\| bolt-instruct-1b \| allenai/OLMo-2-0425-1B-Instruct \| Lightweight / low-latency \|
	\| bolt-instruct-7b \| allenai/OLMo-3-7B-Instruct \| Balanced \|
	\| bolt-instruct-32b \| allenai/OLMo-3.1-32B-Instruct \| Highest quality \|

	---

	# Model Details

	- Type: Causal LM (instruction-tuned)
	- Max context: 65,536 tokens (7B and 32B), 4,096 tokens (1B)
	- Training context: 32k (7B), 16k (32B), 4k (1B)

	### Capabilities

	- Chat / multi-turn dialogue
	- Instruction following
	- Structured output (JSON)
	- Summarization & transformation
	- Extraction
	- RAG generation
	- Code generation

	---

	# Training

	- Method: Supervised Fine-Tuning (SFT)
	- Dataset size: ~125k conversations
	- Eval set: ~10k examples
	- Data mix: public + synthetic + internal tasks

	### Training Approach

	- 1B → full fine-tune
	- 7B / 32B → QLoRA (4-bit)

	### Hardware

	- 1× A100 80GB GPU

	---

	# Intended Use

	- Chat assistants
	- Enterprise copilots
	- RAG pipelines
	- Document processing
	- Structured extraction
	- Code assistance

	---

	# Usage

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_name = "aisquared/bolt-instruct-7b"

	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForCausalLM.from_pretrained(model_name)
	```

	---

	# Evaluation

	To evaluate these models, we ran a subset of tasks using the [Eleuther AI Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness). Below are the metrics for each model.

	## Language Model Evaluation Harness

	### Evaluation results for aisquared/bolt-instruct-1b:

	\| Tasks \|Version\| Filter \|n-shot\| Metric \| \|Value \| \|Stderr\|
	\|----------------------------------------------------------\|------:\|----------------\|-----:\|-----------\|---\|-----:\|---\|-----:\|
	\|arc_challenge \| 1\|none \| 0\|acc \|↑ \|0.3490\|± \|0.0139\|
	\| \| \|none \| 0\|acc_norm \|↑ \|0.3823\|± \|0.0142\|
	\|arc_easy \| 1\|none \| 0\|acc \|↑ \|0.6098\|± \|0.0100\|
	\| \| \|none \| 0\|acc_norm \|↑ \|0.5560\|± \|0.0102\|
	\|bbh \| 3\|get-answer \| \|exact_match\|↑ \|0.3081\|± \|0.0052\|
	\| - bbh_cot_fewshot_boolean_expressions \| 4\|get-answer \| 3\|exact_match\|↑ \|0.5840\|± \|0.0312\|
	\| - bbh_cot_fewshot_causal_judgement \| 4\|get-answer \| 3\|exact_match\|↑ \|0.5508\|± \|0.0365\|
	\| - bbh_cot_fewshot_date_understanding \| 4\|get-answer \| 3\|exact_match\|↑ \|0.2600\|± \|0.0278\|
	\| - bbh_cot_fewshot_disambiguation_qa \| 4\|get-answer \| 3\|exact_match\|↑ \|0.3640\|± \|0.0305\|
	\| - bbh_cot_fewshot_dyck_languages \| 4\|get-answer \| 3\|exact_match\|↑ \|0.0040\|± \|0.0040\|
	\| - bbh_cot_fewshot_formal_fallacies \| 4\|get-answer \| 3\|exact_match\|↑ \|0.5040\|± \|0.0317\|
	\| - bbh_cot_fewshot_geometric_shapes \| 4\|get-answer \| 3\|exact_match\|↑ \|0.0920\|± \|0.0183\|
	\| - bbh_cot_fewshot_hyperbaton \| 4\|get-answer \| 3\|exact_match\|↑ \|0.5240\|± \|0.0316\|
	\| - bbh_cot_fewshot_logical_deduction_five_objects \| 4\|get-answer \| 3\|exact_match\|↑ \|0.1720\|± \|0.0239\|
	\| - bbh_cot_fewshot_logical_deduction_seven_objects \| 4\|get-answer \| 3\|exact_match\|↑ \|0.1080\|± \|0.0197\|
	\| - bbh_cot_fewshot_logical_deduction_three_objects \| 4\|get-answer \| 3\|exact_match\|↑ \|0.3520\|± \|0.0303\|
	\| - bbh_cot_fewshot_movie_recommendation \| 4\|get-answer \| 3\|exact_match\|↑ \|0.5040\|± \|0.0317\|
	\| - bbh_cot_fewshot_multistep_arithmetic_two \| 4\|get-answer \| 3\|exact_match\|↑ \|0.0600\|± \|0.0151\|
	\| - bbh_cot_fewshot_navigate \| 4\|get-answer \| 3\|exact_match\|↑ \|0.5560\|± \|0.0315\|
	\| - bbh_cot_fewshot_object_counting \| 4\|get-answer \| 3\|exact_match\|↑ \|0.4360\|± \|0.0314\|
	\| - bbh_cot_fewshot_penguins_in_a_table \| 4\|get-answer \| 3\|exact_match\|↑ \|0.2123\|± \|0.0340\|
	\| - bbh_cot_fewshot_reasoning_about_colored_objects \| 4\|get-answer \| 3\|exact_match\|↑ \|0.2440\|± \|0.0272\|
	\| - bbh_cot_fewshot_ruin_names \| 4\|get-answer \| 3\|exact_match\|↑ \|0.2440\|± \|0.0272\|
	\| - bbh_cot_fewshot_salient_translation_error_detection \| 4\|get-answer \| 3\|exact_match\|↑ \|0.1920\|± \|0.0250\|
	\| - bbh_cot_fewshot_snarks \| 4\|get-answer \| 3\|exact_match\|↑ \|0.3989\|± \|0.0368\|
	\| - bbh_cot_fewshot_sports_understanding \| 4\|get-answer \| 3\|exact_match\|↑ \|0.6560\|± \|0.0301\|
	\| - bbh_cot_fewshot_temporal_sequences \| 4\|get-answer \| 3\|exact_match\|↑ \|0.2760\|± \|0.0283\|
	\| - bbh_cot_fewshot_tracking_shuffled_objects_five_objects \| 4\|get-answer \| 3\|exact_match\|↑ \|0.1920\|± \|0.0250\|
	\| - bbh_cot_fewshot_tracking_shuffled_objects_seven_objects\| 4\|get-answer \| 3\|exact_match\|↑ \|0.0360\|± \|0.0118\|
	\| - bbh_cot_fewshot_tracking_shuffled_objects_three_objects\| 4\|get-answer \| 3\|exact_match\|↑ \|0.2840\|± \|0.0286\|
	\| - bbh_cot_fewshot_web_of_lies \| 4\|get-answer \| 3\|exact_match\|↑ \|0.5240\|± \|0.0316\|
	\| - bbh_cot_fewshot_word_sorting \| 4\|get-answer \| 3\|exact_match\|↑ \|0.0360\|± \|0.0118\|
	\|gsm8k \| 3\|flexible-extract\| 5\|exact_match\|↑ \|0.5072\|± \|0.0138\|
	\| \| \|strict-match \| 5\|exact_match\|↑ \|0.4943\|± \|0.0138\|
	\|hellaswag \| 1\|none \| 0\|acc \|↑ \|0.4729\|± \|0.0050\|
	\| \| \|none \| 0\|acc_norm \|↑ \|0.6181\|± \|0.0048\|
	\|mmlu_pro \| 2\|custom-extract \| \|exact_match\|↑ \|0.1435\|± \|0.0032\|
	\| - biology \| 3\|custom-extract \| 5\|exact_match\|↑ \|0.2050\|± \|0.0151\|
	\| - business \| 3\|custom-extract \| 5\|exact_match\|↑ \|0.1369\|± \|0.0122\|
	\| - chemistry \| 3\|custom-extract \| 5\|exact_match\|↑ \|0.0848\|± \|0.0083\|
	\| - computer_science \| 3\|custom-extract \| 5\|exact_match\|↑ \|0.1415\|± \|0.0172\|
	\| - economics \| 3\|custom-extract \| 5\|exact_match\|↑ \|0.1943\|± \|0.0136\|
	\| - engineering \| 3\|custom-extract \| 5\|exact_match\|↑ \|0.0929\|± \|0.0093\|
	\| - health \| 3\|custom-extract \| 5\|exact_match\|↑ \|0.1528\|± \|0.0126\|
	\| - history \| 3\|custom-extract \| 5\|exact_match\|↑ \|0.1549\|± \|0.0186\|
	\| - law \| 3\|custom-extract \| 5\|exact_match\|↑ \|0.1081\|± \|0.0094\|
	\| - math \| 3\|custom-extract \| 5\|exact_match\|↑ \|0.1414\|± \|0.0095\|
	\| - other \| 3\|custom-extract \| 5\|exact_match\|↑ \|0.1916\|± \|0.0130\|
	\| - philosophy \| 3\|custom-extract \| 5\|exact_match\|↑ \|0.1383\|± \|0.0155\|
	\| - physics \| 3\|custom-extract \| 5\|exact_match\|↑ \|0.1186\|± \|0.0090\|
	\| - psychology \| 3\|custom-extract \| 5\|exact_match\|↑ \|0.2130\|± \|0.0145\|
	\|truthfulqa_mc2 \| 3\|none \| 0\|acc \|↑ \|0.4734\|± \|0.0153\|
	\|winogrande \| 1\|none \| 0\|acc \|↑ \|0.6156\|± \|0.0137\|


	### Evaluation results for aisquared/bolt-instruct-7b:

	\| Tasks \|Version\| Filter \|n-shot\| Metric \| \|Value \| \|Stderr\|
	\|----------------------------------------------------------\|------:\|----------------\|-----:\|-----------\|---\|-----:\|---\|-----:\|
	\|arc_challenge \| 1\|none \| 0\|acc \|↑ \|0.4778\|± \|0.0146\|
	\| \| \|none \| 0\|acc_norm \|↑ \|0.4957\|± \|0.0146\|
	\|arc_easy \| 1\|none \| 0\|acc \|↑ \|0.7534\|± \|0.0088\|
	\| \| \|none \| 0\|acc_norm \|↑ \|0.7311\|± \|0.0091\|
	\|bbh \| 3\|get-answer \| \|exact_match\|↑ \|0.3038\|± \|0.0047\|
	\| - bbh_cot_fewshot_boolean_expressions \| 4\|get-answer \| 3\|exact_match\|↑ \|0.0000\|± \|0.0000\|
	\| - bbh_cot_fewshot_causal_judgement \| 4\|get-answer \| 3\|exact_match\|↑ \|0.5668\|± \|0.0363\|
	\| - bbh_cot_fewshot_date_understanding \| 4\|get-answer \| 3\|exact_match\|↑ \|0.4480\|± \|0.0315\|
	\| - bbh_cot_fewshot_disambiguation_qa \| 4\|get-answer \| 3\|exact_match\|↑ \|0.0000\|± \|0.0000\|
	\| - bbh_cot_fewshot_dyck_languages \| 4\|get-answer \| 3\|exact_match\|↑ \|0.0000\|± \|0.0000\|
	\| - bbh_cot_fewshot_formal_fallacies \| 4\|get-answer \| 3\|exact_match\|↑ \|0.2240\|± \|0.0264\|
	\| - bbh_cot_fewshot_geometric_shapes \| 4\|get-answer \| 3\|exact_match\|↑ \|0.2960\|± \|0.0289\|
	\| - bbh_cot_fewshot_hyperbaton \| 4\|get-answer \| 3\|exact_match\|↑ \|0.5200\|± \|0.0317\|
	\| - bbh_cot_fewshot_logical_deduction_five_objects \| 4\|get-answer \| 3\|exact_match\|↑ \|0.0200\|± \|0.0089\|
	\| - bbh_cot_fewshot_logical_deduction_seven_objects \| 4\|get-answer \| 3\|exact_match\|↑ \|0.0000\|± \|0.0000\|
	\| - bbh_cot_fewshot_logical_deduction_three_objects \| 4\|get-answer \| 3\|exact_match\|↑ \|0.6720\|± \|0.0298\|
	\| - bbh_cot_fewshot_movie_recommendation \| 4\|get-answer \| 3\|exact_match\|↑ \|0.1200\|± \|0.0206\|
	\| - bbh_cot_fewshot_multistep_arithmetic_two \| 4\|get-answer \| 3\|exact_match\|↑ \|0.0000\|± \|0.0000\|
	\| - bbh_cot_fewshot_navigate \| 4\|get-answer \| 3\|exact_match\|↑ \|0.5560\|± \|0.0315\|
	\| - bbh_cot_fewshot_object_counting \| 4\|get-answer \| 3\|exact_match\|↑ \|0.1520\|± \|0.0228\|
	\| - bbh_cot_fewshot_penguins_in_a_table \| 4\|get-answer \| 3\|exact_match\|↑ \|0.4110\|± \|0.0409\|
	\| - bbh_cot_fewshot_reasoning_about_colored_objects \| 4\|get-answer \| 3\|exact_match\|↑ \|0.1880\|± \|0.0248\|
	\| - bbh_cot_fewshot_ruin_names \| 4\|get-answer \| 3\|exact_match\|↑ \|0.4800\|± \|0.0317\|
	\| - bbh_cot_fewshot_salient_translation_error_detection \| 4\|get-answer \| 3\|exact_match\|↑ \|0.4760\|± \|0.0316\|
	\| - bbh_cot_fewshot_snarks \| 4\|get-answer \| 3\|exact_match\|↑ \|0.2921\|± \|0.0342\|
	\| - bbh_cot_fewshot_sports_understanding \| 4\|get-answer \| 3\|exact_match\|↑ \|0.6760\|± \|0.0297\|
	\| - bbh_cot_fewshot_temporal_sequences \| 4\|get-answer \| 3\|exact_match\|↑ \|0.5880\|± \|0.0312\|
	\| - bbh_cot_fewshot_tracking_shuffled_objects_five_objects \| 4\|get-answer \| 3\|exact_match\|↑ \|0.0000\|± \|0.0000\|
	\| - bbh_cot_fewshot_tracking_shuffled_objects_seven_objects\| 4\|get-answer \| 3\|exact_match\|↑ \|0.0000\|± \|0.0000\|
	\| - bbh_cot_fewshot_tracking_shuffled_objects_three_objects\| 4\|get-answer \| 3\|exact_match\|↑ \|0.8280\|± \|0.0239\|
	\| - bbh_cot_fewshot_web_of_lies \| 4\|get-answer \| 3\|exact_match\|↑ \|0.6560\|± \|0.0301\|
	\| - bbh_cot_fewshot_word_sorting \| 4\|get-answer \| 3\|exact_match\|↑ \|0.1400\|± \|0.0220\|
	\|gsm8k \| 3\|flexible-extract\| 5\|exact_match\|↑ \|0.7998\|± \|0.0110\|
	\| \| \|strict-match \| 5\|exact_match\|↑ \|0.7392\|± \|0.0121\|
	\|hellaswag \| 1\|none \| 0\|acc \|↑ \|0.4882\|± \|0.0050\|
	\| \| \|none \| 0\|acc_norm \|↑ \|0.6165\|± \|0.0049\|
	\|mmlu_pro \| 2\|custom-extract \| \|exact_match\|↑ \|0.4978\|± \|0.0044\|
	\| - biology \| 3\|custom-extract \| 5\|exact_match\|↑ \|0.6848\|± \|0.0174\|
	\| - business \| 3\|custom-extract \| 5\|exact_match\|↑ \|0.5729\|± \|0.0176\|
	\| - chemistry \| 3\|custom-extract \| 5\|exact_match\|↑ \|0.5380\|± \|0.0148\|
	\| - computer_science \| 3\|custom-extract \| 5\|exact_match\|↑ \|0.5878\|± \|0.0243\|
	\| - economics \| 3\|custom-extract \| 5\|exact_match\|↑ \|0.5592\|± \|0.0171\|
	\| - engineering \| 3\|custom-extract \| 5\|exact_match\|↑ \|0.2405\|± \|0.0137\|
	\| - health \| 3\|custom-extract \| 5\|exact_match\|↑ \|0.4670\|± \|0.0175\|
	\| - history \| 3\|custom-extract \| 5\|exact_match\|↑ \|0.3727\|± \|0.0248\|
	\| - law \| 3\|custom-extract \| 5\|exact_match\|↑ \|0.2525\|± \|0.0131\|
	\| - math \| 3\|custom-extract \| 5\|exact_match\|↑ \|0.7158\|± \|0.0123\|
	\| - other \| 3\|custom-extract \| 5\|exact_match\|↑ \|0.4351\|± \|0.0163\|
	\| - philosophy \| 3\|custom-extract \| 5\|exact_match\|↑ \|0.4128\|± \|0.0221\|
	\| - physics \| 3\|custom-extract \| 5\|exact_match\|↑ \|0.5142\|± \|0.0139\|
	\| - psychology \| 3\|custom-extract \| 5\|exact_match\|↑ \|0.5602\|± \|0.0176\|
	\|truthfulqa_mc2 \| 3\|none \| 0\|acc \|↑ \|0.5666\|± \|0.0162\|
	\|winogrande \| 1\|none \| 0\|acc \|↑ \|0.6385\|± \|0.0135\|


	### Evaluation results for aisquared/bolt-instruct-32b:

	\| Tasks \|Version\| Filter \|n-shot\| Metric \| \|Value \| \|Stderr\|
	\|----------------------------------------------------------\|------:\|----------------\|-----:\|-----------\|---\|-----:\|---\|-----:\|
	\|arc_challenge \| 1\|none \| 0\|acc \|↑ \|0.5776\|± \|0.0144\|
	\| \| \|none \| 0\|acc_norm \|↑ \|0.6007\|± \|0.0143\|
	\|arc_easy \| 1\|none \| 0\|acc \|↑ \|0.8333\|± \|0.0076\|
	\| \| \|none \| 0\|acc_norm \|↑ \|0.8228\|± \|0.0078\|
	\|bbh \| 3\|get-answer \| \|exact_match\|↑ \|0.3087\|± \|0.0048\|
	\| - bbh_cot_fewshot_boolean_expressions \| 4\|get-answer \| 3\|exact_match\|↑ \|0.5760\|± \|0.0313\|
	\| - bbh_cot_fewshot_causal_judgement \| 4\|get-answer \| 3\|exact_match\|↑ \|0.5882\|± \|0.0361\|
	\| - bbh_cot_fewshot_date_understanding \| 4\|get-answer \| 3\|exact_match\|↑ \|0.6640\|± \|0.0299\|
	\| - bbh_cot_fewshot_disambiguation_qa \| 4\|get-answer \| 3\|exact_match\|↑ \|0.1920\|± \|0.0250\|
	\| - bbh_cot_fewshot_dyck_languages \| 4\|get-answer \| 3\|exact_match\|↑ \|0.0000\|± \|0.0000\|
	\| - bbh_cot_fewshot_formal_fallacies \| 4\|get-answer \| 3\|exact_match\|↑ \|0.0480\|± \|0.0135\|
	\| - bbh_cot_fewshot_geometric_shapes \| 4\|get-answer \| 3\|exact_match\|↑ \|0.2760\|± \|0.0283\|
	\| - bbh_cot_fewshot_hyperbaton \| 4\|get-answer \| 3\|exact_match\|↑ \|0.3200\|± \|0.0296\|
	\| - bbh_cot_fewshot_logical_deduction_five_objects \| 4\|get-answer \| 3\|exact_match\|↑ \|0.0000\|± \|0.0000\|
	\| - bbh_cot_fewshot_logical_deduction_seven_objects \| 4\|get-answer \| 3\|exact_match\|↑ \|0.0000\|± \|0.0000\|
	\| - bbh_cot_fewshot_logical_deduction_three_objects \| 4\|get-answer \| 3\|exact_match\|↑ \|0.5400\|± \|0.0316\|
	\| - bbh_cot_fewshot_movie_recommendation \| 4\|get-answer \| 3\|exact_match\|↑ \|0.6000\|± \|0.0310\|
	\| - bbh_cot_fewshot_multistep_arithmetic_two \| 4\|get-answer \| 3\|exact_match\|↑ \|0.0000\|± \|0.0000\|
	\| - bbh_cot_fewshot_navigate \| 4\|get-answer \| 3\|exact_match\|↑ \|0.0160\|± \|0.0080\|
	\| - bbh_cot_fewshot_object_counting \| 4\|get-answer \| 3\|exact_match\|↑ \|0.5120\|± \|0.0317\|
	\| - bbh_cot_fewshot_penguins_in_a_table \| 4\|get-answer \| 3\|exact_match\|↑ \|0.2945\|± \|0.0379\|
	\| - bbh_cot_fewshot_reasoning_about_colored_objects \| 4\|get-answer \| 3\|exact_match\|↑ \|0.2280\|± \|0.0266\|
	\| - bbh_cot_fewshot_ruin_names \| 4\|get-answer \| 3\|exact_match\|↑ \|0.5120\|± \|0.0317\|
	\| - bbh_cot_fewshot_salient_translation_error_detection \| 4\|get-answer \| 3\|exact_match\|↑ \|0.5440\|± \|0.0316\|
	\| - bbh_cot_fewshot_snarks \| 4\|get-answer \| 3\|exact_match\|↑ \|0.7079\|± \|0.0342\|
	\| - bbh_cot_fewshot_sports_understanding \| 4\|get-answer \| 3\|exact_match\|↑ \|0.4880\|± \|0.0317\|
	\| - bbh_cot_fewshot_temporal_sequences \| 4\|get-answer \| 3\|exact_match\|↑ \|0.3120\|± \|0.0294\|
	\| - bbh_cot_fewshot_tracking_shuffled_objects_five_objects \| 4\|get-answer \| 3\|exact_match\|↑ \|0.0000\|± \|0.0000\|
	\| - bbh_cot_fewshot_tracking_shuffled_objects_seven_objects\| 4\|get-answer \| 3\|exact_match\|↑ \|0.0000\|± \|0.0000\|
	\| - bbh_cot_fewshot_tracking_shuffled_objects_three_objects\| 4\|get-answer \| 3\|exact_match\|↑ \|0.6280\|± \|0.0306\|
	\| - bbh_cot_fewshot_web_of_lies \| 4\|get-answer \| 3\|exact_match\|↑ \|0.4400\|± \|0.0315\|
	\| - bbh_cot_fewshot_word_sorting \| 4\|get-answer \| 3\|exact_match\|↑ \|0.0280\|± \|0.0105\|
	\|gsm8k \| 3\|flexible-extract\| 5\|exact_match\|↑ \|0.8795\|± \|0.0090\|
	\| \| \|strict-match \| 5\|exact_match\|↑ \|0.7801\|± \|0.0114\|
	\|hellaswag \| 1\|none \| 0\|acc \|↑ \|0.5407\|± \|0.0050\|
	\| \| \|none \| 0\|acc_norm \|↑ \|0.6763\|± \|0.0047\|
	\|mmlu_pro \| 2\|custom-extract \| \|exact_match\|↑ \|0.6340\|± \|0.0042\|
	\| - biology \| 3\|custom-extract \| 5\|exact_match\|↑ \|0.8117\|± \|0.0146\|
	\| - business \| 3\|custom-extract \| 5\|exact_match\|↑ \|0.6907\|± \|0.0165\|
	\| - chemistry \| 3\|custom-extract \| 5\|exact_match\|↑ \|0.6431\|± \|0.0142\|
	\| - computer_science \| 3\|custom-extract \| 5\|exact_match\|↑ \|0.6951\|± \|0.0228\|
	\| - economics \| 3\|custom-extract \| 5\|exact_match\|↑ \|0.7405\|± \|0.0151\|
	\| - engineering \| 3\|custom-extract \| 5\|exact_match\|↑ \|0.3447\|± \|0.0153\|
	\| - health \| 3\|custom-extract \| 5\|exact_match\|↑ \|0.6540\|± \|0.0166\|
	\| - history \| 3\|custom-extract \| 5\|exact_match\|↑ \|0.5512\|± \|0.0255\|
	\| - law \| 3\|custom-extract \| 5\|exact_match\|↑ \|0.3860\|± \|0.0147\|
	\| - math \| 3\|custom-extract \| 5\|exact_match\|↑ \|0.7979\|± \|0.0109\|
	\| - other \| 3\|custom-extract \| 5\|exact_match\|↑ \|0.6028\|± \|0.0161\|
	\| - philosophy \| 3\|custom-extract \| 5\|exact_match\|↑ \|0.5912\|± \|0.0220\|
	\| - physics \| 3\|custom-extract \| 5\|exact_match\|↑ \|0.6551\|± \|0.0132\|
	\| - psychology \| 3\|custom-extract \| 5\|exact_match\|↑ \|0.7243\|± \|0.0158\|
	\|truthfulqa_mc2 \| 3\|none \| 0\|acc \|↑ \|0.6906\|± \|0.0153\|
	\|winogrande \| 1\|none \| 0\|acc \|↑ \|0.6630\|± \|0.0133\|

	---

	# Limitations

	- May hallucinate without grounding
	- Performance varies by model size
	- Not suitable for high-risk domains without oversight

	---

	# License

	Bolt Instruct is released under the [AI Squared Community License](https://docs.squared.ai/terms-of-use).