Text Generation
Transformers
Safetensors
English
olmo3
causal-lm
instruction-tuning
chat
rag
code-generation
summarization
extraction
synthetic-data
Generated from Trainer
conversational
4-bit precision
bitsandbytes
Instructions to use aisquared/bolt-instruct-7b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use aisquared/bolt-instruct-7b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="aisquared/bolt-instruct-7b") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("aisquared/bolt-instruct-7b") model = AutoModelForCausalLM.from_pretrained("aisquared/bolt-instruct-7b") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use aisquared/bolt-instruct-7b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "aisquared/bolt-instruct-7b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "aisquared/bolt-instruct-7b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/aisquared/bolt-instruct-7b
- SGLang
How to use aisquared/bolt-instruct-7b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "aisquared/bolt-instruct-7b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "aisquared/bolt-instruct-7b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "aisquared/bolt-instruct-7b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "aisquared/bolt-instruct-7b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use aisquared/bolt-instruct-7b with Docker Model Runner:
docker model run hf.co/aisquared/bolt-instruct-7b
| tags: | |
| - text-generation | |
| - causal-lm | |
| - instruction-tuning | |
| - chat | |
| - rag | |
| - code-generation | |
| - summarization | |
| - extraction | |
| - synthetic-data | |
| - generated_from_trainer | |
| license: other | |
| pipeline_tag: text-generation | |
| library_name: transformers | |
| language: | |
| - en | |
| base_model: | |
| - allenai/OLMo-2-0425-1B-Instruct | |
| - allenai/OLMo-3-7B-Instruct | |
| - allenai/OLMo-3.1-32B-Instruct | |
| # Bolt Instruct Models | |
| Bolt Instruct is a family of **instruction-tuned language models designed for high-quality generation, reasoning, and enterprise workflows**. | |
| These models are **fine-tuned from Allen Institute for AI OLMo instruct models** and optimized for: | |
| - General conversational AI | |
| - Structured and controllable generation | |
| - Retrieval-Augmented Generation (RAG) | |
| - Enterprise document understanding | |
| - Code generation and transformation | |
| --- | |
| # Model Overview | |
| Bolt Instruct models provide **strong instruction-following capabilities** across diverse tasks with robust long-context support. | |
| Key design goals: | |
| - Strong instruction adherence | |
| - High-quality structured outputs (JSON, extraction) | |
| - RAG-grounded responses | |
| - Long-context support (65k tokens for 7B and 32B) | |
| - Balanced chat, reasoning, and coding performance | |
| --- | |
| # Model Variants | |
| | Model | Base Model | Positioning | | |
| |------|------------|------------| | |
| | bolt-instruct-1b | allenai/OLMo-2-0425-1B-Instruct | Lightweight / low-latency | | |
| | bolt-instruct-7b | allenai/OLMo-3-7B-Instruct | Balanced | | |
| | bolt-instruct-32b | allenai/OLMo-3.1-32B-Instruct | Highest quality | | |
| --- | |
| # Model Details | |
| - **Type:** Causal LM (instruction-tuned) | |
| - **Max context:** 65,536 tokens (7B and 32B), 4,096 tokens (1B) | |
| - **Training context:** 32k (7B), 16k (32B), 4k (1B) | |
| ### Capabilities | |
| - Chat / multi-turn dialogue | |
| - Instruction following | |
| - Structured output (JSON) | |
| - Summarization & transformation | |
| - Extraction | |
| - RAG generation | |
| - Code generation | |
| --- | |
| # Training | |
| - **Method:** Supervised Fine-Tuning (SFT) | |
| - **Dataset size:** ~125k conversations | |
| - **Eval set:** ~10k examples | |
| - **Data mix:** public + synthetic + internal tasks | |
| ### Training Approach | |
| - 1B → full fine-tune | |
| - 7B / 32B → QLoRA (4-bit) | |
| ### Hardware | |
| - 1× A100 80GB GPU | |
| --- | |
| # Intended Use | |
| - Chat assistants | |
| - Enterprise copilots | |
| - RAG pipelines | |
| - Document processing | |
| - Structured extraction | |
| - Code assistance | |
| --- | |
| # Usage | |
| ```python | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| model_name = "aisquared/bolt-instruct-7b" | |
| tokenizer = AutoTokenizer.from_pretrained(model_name) | |
| model = AutoModelForCausalLM.from_pretrained(model_name) | |
| ``` | |
| --- | |
| # Evaluation | |
| To evaluate these models, we ran a subset of tasks using the [Eleuther AI Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness). Below are the metrics for each model. | |
| ## Language Model Evaluation Harness | |
| ### Evaluation results for aisquared/bolt-instruct-1b: | |
| | Tasks |Version| Filter |n-shot| Metric | |Value | |Stderr| | |
| |----------------------------------------------------------|------:|----------------|-----:|-----------|---|-----:|---|-----:| | |
| |arc_challenge | 1|none | 0|acc |↑ |0.3490|± |0.0139| | |
| | | |none | 0|acc_norm |↑ |0.3823|± |0.0142| | |
| |arc_easy | 1|none | 0|acc |↑ |0.6098|± |0.0100| | |
| | | |none | 0|acc_norm |↑ |0.5560|± |0.0102| | |
| |bbh | 3|get-answer | |exact_match|↑ |0.3081|± |0.0052| | |
| | - bbh_cot_fewshot_boolean_expressions | 4|get-answer | 3|exact_match|↑ |0.5840|± |0.0312| | |
| | - bbh_cot_fewshot_causal_judgement | 4|get-answer | 3|exact_match|↑ |0.5508|± |0.0365| | |
| | - bbh_cot_fewshot_date_understanding | 4|get-answer | 3|exact_match|↑ |0.2600|± |0.0278| | |
| | - bbh_cot_fewshot_disambiguation_qa | 4|get-answer | 3|exact_match|↑ |0.3640|± |0.0305| | |
| | - bbh_cot_fewshot_dyck_languages | 4|get-answer | 3|exact_match|↑ |0.0040|± |0.0040| | |
| | - bbh_cot_fewshot_formal_fallacies | 4|get-answer | 3|exact_match|↑ |0.5040|± |0.0317| | |
| | - bbh_cot_fewshot_geometric_shapes | 4|get-answer | 3|exact_match|↑ |0.0920|± |0.0183| | |
| | - bbh_cot_fewshot_hyperbaton | 4|get-answer | 3|exact_match|↑ |0.5240|± |0.0316| | |
| | - bbh_cot_fewshot_logical_deduction_five_objects | 4|get-answer | 3|exact_match|↑ |0.1720|± |0.0239| | |
| | - bbh_cot_fewshot_logical_deduction_seven_objects | 4|get-answer | 3|exact_match|↑ |0.1080|± |0.0197| | |
| | - bbh_cot_fewshot_logical_deduction_three_objects | 4|get-answer | 3|exact_match|↑ |0.3520|± |0.0303| | |
| | - bbh_cot_fewshot_movie_recommendation | 4|get-answer | 3|exact_match|↑ |0.5040|± |0.0317| | |
| | - bbh_cot_fewshot_multistep_arithmetic_two | 4|get-answer | 3|exact_match|↑ |0.0600|± |0.0151| | |
| | - bbh_cot_fewshot_navigate | 4|get-answer | 3|exact_match|↑ |0.5560|± |0.0315| | |
| | - bbh_cot_fewshot_object_counting | 4|get-answer | 3|exact_match|↑ |0.4360|± |0.0314| | |
| | - bbh_cot_fewshot_penguins_in_a_table | 4|get-answer | 3|exact_match|↑ |0.2123|± |0.0340| | |
| | - bbh_cot_fewshot_reasoning_about_colored_objects | 4|get-answer | 3|exact_match|↑ |0.2440|± |0.0272| | |
| | - bbh_cot_fewshot_ruin_names | 4|get-answer | 3|exact_match|↑ |0.2440|± |0.0272| | |
| | - bbh_cot_fewshot_salient_translation_error_detection | 4|get-answer | 3|exact_match|↑ |0.1920|± |0.0250| | |
| | - bbh_cot_fewshot_snarks | 4|get-answer | 3|exact_match|↑ |0.3989|± |0.0368| | |
| | - bbh_cot_fewshot_sports_understanding | 4|get-answer | 3|exact_match|↑ |0.6560|± |0.0301| | |
| | - bbh_cot_fewshot_temporal_sequences | 4|get-answer | 3|exact_match|↑ |0.2760|± |0.0283| | |
| | - bbh_cot_fewshot_tracking_shuffled_objects_five_objects | 4|get-answer | 3|exact_match|↑ |0.1920|± |0.0250| | |
| | - bbh_cot_fewshot_tracking_shuffled_objects_seven_objects| 4|get-answer | 3|exact_match|↑ |0.0360|± |0.0118| | |
| | - bbh_cot_fewshot_tracking_shuffled_objects_three_objects| 4|get-answer | 3|exact_match|↑ |0.2840|± |0.0286| | |
| | - bbh_cot_fewshot_web_of_lies | 4|get-answer | 3|exact_match|↑ |0.5240|± |0.0316| | |
| | - bbh_cot_fewshot_word_sorting | 4|get-answer | 3|exact_match|↑ |0.0360|± |0.0118| | |
| |gsm8k | 3|flexible-extract| 5|exact_match|↑ |0.5072|± |0.0138| | |
| | | |strict-match | 5|exact_match|↑ |0.4943|± |0.0138| | |
| |hellaswag | 1|none | 0|acc |↑ |0.4729|± |0.0050| | |
| | | |none | 0|acc_norm |↑ |0.6181|± |0.0048| | |
| |mmlu_pro | 2|custom-extract | |exact_match|↑ |0.1435|± |0.0032| | |
| | - biology | 3|custom-extract | 5|exact_match|↑ |0.2050|± |0.0151| | |
| | - business | 3|custom-extract | 5|exact_match|↑ |0.1369|± |0.0122| | |
| | - chemistry | 3|custom-extract | 5|exact_match|↑ |0.0848|± |0.0083| | |
| | - computer_science | 3|custom-extract | 5|exact_match|↑ |0.1415|± |0.0172| | |
| | - economics | 3|custom-extract | 5|exact_match|↑ |0.1943|± |0.0136| | |
| | - engineering | 3|custom-extract | 5|exact_match|↑ |0.0929|± |0.0093| | |
| | - health | 3|custom-extract | 5|exact_match|↑ |0.1528|± |0.0126| | |
| | - history | 3|custom-extract | 5|exact_match|↑ |0.1549|± |0.0186| | |
| | - law | 3|custom-extract | 5|exact_match|↑ |0.1081|± |0.0094| | |
| | - math | 3|custom-extract | 5|exact_match|↑ |0.1414|± |0.0095| | |
| | - other | 3|custom-extract | 5|exact_match|↑ |0.1916|± |0.0130| | |
| | - philosophy | 3|custom-extract | 5|exact_match|↑ |0.1383|± |0.0155| | |
| | - physics | 3|custom-extract | 5|exact_match|↑ |0.1186|± |0.0090| | |
| | - psychology | 3|custom-extract | 5|exact_match|↑ |0.2130|± |0.0145| | |
| |truthfulqa_mc2 | 3|none | 0|acc |↑ |0.4734|± |0.0153| | |
| |winogrande | 1|none | 0|acc |↑ |0.6156|± |0.0137| | |
| ### Evaluation results for aisquared/bolt-instruct-7b: | |
| | Tasks |Version| Filter |n-shot| Metric | |Value | |Stderr| | |
| |----------------------------------------------------------|------:|----------------|-----:|-----------|---|-----:|---|-----:| | |
| |arc_challenge | 1|none | 0|acc |↑ |0.4778|± |0.0146| | |
| | | |none | 0|acc_norm |↑ |0.4957|± |0.0146| | |
| |arc_easy | 1|none | 0|acc |↑ |0.7534|± |0.0088| | |
| | | |none | 0|acc_norm |↑ |0.7311|± |0.0091| | |
| |bbh | 3|get-answer | |exact_match|↑ |0.3038|± |0.0047| | |
| | - bbh_cot_fewshot_boolean_expressions | 4|get-answer | 3|exact_match|↑ |0.0000|± |0.0000| | |
| | - bbh_cot_fewshot_causal_judgement | 4|get-answer | 3|exact_match|↑ |0.5668|± |0.0363| | |
| | - bbh_cot_fewshot_date_understanding | 4|get-answer | 3|exact_match|↑ |0.4480|± |0.0315| | |
| | - bbh_cot_fewshot_disambiguation_qa | 4|get-answer | 3|exact_match|↑ |0.0000|± |0.0000| | |
| | - bbh_cot_fewshot_dyck_languages | 4|get-answer | 3|exact_match|↑ |0.0000|± |0.0000| | |
| | - bbh_cot_fewshot_formal_fallacies | 4|get-answer | 3|exact_match|↑ |0.2240|± |0.0264| | |
| | - bbh_cot_fewshot_geometric_shapes | 4|get-answer | 3|exact_match|↑ |0.2960|± |0.0289| | |
| | - bbh_cot_fewshot_hyperbaton | 4|get-answer | 3|exact_match|↑ |0.5200|± |0.0317| | |
| | - bbh_cot_fewshot_logical_deduction_five_objects | 4|get-answer | 3|exact_match|↑ |0.0200|± |0.0089| | |
| | - bbh_cot_fewshot_logical_deduction_seven_objects | 4|get-answer | 3|exact_match|↑ |0.0000|± |0.0000| | |
| | - bbh_cot_fewshot_logical_deduction_three_objects | 4|get-answer | 3|exact_match|↑ |0.6720|± |0.0298| | |
| | - bbh_cot_fewshot_movie_recommendation | 4|get-answer | 3|exact_match|↑ |0.1200|± |0.0206| | |
| | - bbh_cot_fewshot_multistep_arithmetic_two | 4|get-answer | 3|exact_match|↑ |0.0000|± |0.0000| | |
| | - bbh_cot_fewshot_navigate | 4|get-answer | 3|exact_match|↑ |0.5560|± |0.0315| | |
| | - bbh_cot_fewshot_object_counting | 4|get-answer | 3|exact_match|↑ |0.1520|± |0.0228| | |
| | - bbh_cot_fewshot_penguins_in_a_table | 4|get-answer | 3|exact_match|↑ |0.4110|± |0.0409| | |
| | - bbh_cot_fewshot_reasoning_about_colored_objects | 4|get-answer | 3|exact_match|↑ |0.1880|± |0.0248| | |
| | - bbh_cot_fewshot_ruin_names | 4|get-answer | 3|exact_match|↑ |0.4800|± |0.0317| | |
| | - bbh_cot_fewshot_salient_translation_error_detection | 4|get-answer | 3|exact_match|↑ |0.4760|± |0.0316| | |
| | - bbh_cot_fewshot_snarks | 4|get-answer | 3|exact_match|↑ |0.2921|± |0.0342| | |
| | - bbh_cot_fewshot_sports_understanding | 4|get-answer | 3|exact_match|↑ |0.6760|± |0.0297| | |
| | - bbh_cot_fewshot_temporal_sequences | 4|get-answer | 3|exact_match|↑ |0.5880|± |0.0312| | |
| | - bbh_cot_fewshot_tracking_shuffled_objects_five_objects | 4|get-answer | 3|exact_match|↑ |0.0000|± |0.0000| | |
| | - bbh_cot_fewshot_tracking_shuffled_objects_seven_objects| 4|get-answer | 3|exact_match|↑ |0.0000|± |0.0000| | |
| | - bbh_cot_fewshot_tracking_shuffled_objects_three_objects| 4|get-answer | 3|exact_match|↑ |0.8280|± |0.0239| | |
| | - bbh_cot_fewshot_web_of_lies | 4|get-answer | 3|exact_match|↑ |0.6560|± |0.0301| | |
| | - bbh_cot_fewshot_word_sorting | 4|get-answer | 3|exact_match|↑ |0.1400|± |0.0220| | |
| |gsm8k | 3|flexible-extract| 5|exact_match|↑ |0.7998|± |0.0110| | |
| | | |strict-match | 5|exact_match|↑ |0.7392|± |0.0121| | |
| |hellaswag | 1|none | 0|acc |↑ |0.4882|± |0.0050| | |
| | | |none | 0|acc_norm |↑ |0.6165|± |0.0049| | |
| |mmlu_pro | 2|custom-extract | |exact_match|↑ |0.4978|± |0.0044| | |
| | - biology | 3|custom-extract | 5|exact_match|↑ |0.6848|± |0.0174| | |
| | - business | 3|custom-extract | 5|exact_match|↑ |0.5729|± |0.0176| | |
| | - chemistry | 3|custom-extract | 5|exact_match|↑ |0.5380|± |0.0148| | |
| | - computer_science | 3|custom-extract | 5|exact_match|↑ |0.5878|± |0.0243| | |
| | - economics | 3|custom-extract | 5|exact_match|↑ |0.5592|± |0.0171| | |
| | - engineering | 3|custom-extract | 5|exact_match|↑ |0.2405|± |0.0137| | |
| | - health | 3|custom-extract | 5|exact_match|↑ |0.4670|± |0.0175| | |
| | - history | 3|custom-extract | 5|exact_match|↑ |0.3727|± |0.0248| | |
| | - law | 3|custom-extract | 5|exact_match|↑ |0.2525|± |0.0131| | |
| | - math | 3|custom-extract | 5|exact_match|↑ |0.7158|± |0.0123| | |
| | - other | 3|custom-extract | 5|exact_match|↑ |0.4351|± |0.0163| | |
| | - philosophy | 3|custom-extract | 5|exact_match|↑ |0.4128|± |0.0221| | |
| | - physics | 3|custom-extract | 5|exact_match|↑ |0.5142|± |0.0139| | |
| | - psychology | 3|custom-extract | 5|exact_match|↑ |0.5602|± |0.0176| | |
| |truthfulqa_mc2 | 3|none | 0|acc |↑ |0.5666|± |0.0162| | |
| |winogrande | 1|none | 0|acc |↑ |0.6385|± |0.0135| | |
| ### Evaluation results for aisquared/bolt-instruct-32b: | |
| | Tasks |Version| Filter |n-shot| Metric | |Value | |Stderr| | |
| |----------------------------------------------------------|------:|----------------|-----:|-----------|---|-----:|---|-----:| | |
| |arc_challenge | 1|none | 0|acc |↑ |0.5776|± |0.0144| | |
| | | |none | 0|acc_norm |↑ |0.6007|± |0.0143| | |
| |arc_easy | 1|none | 0|acc |↑ |0.8333|± |0.0076| | |
| | | |none | 0|acc_norm |↑ |0.8228|± |0.0078| | |
| |bbh | 3|get-answer | |exact_match|↑ |0.3087|± |0.0048| | |
| | - bbh_cot_fewshot_boolean_expressions | 4|get-answer | 3|exact_match|↑ |0.5760|± |0.0313| | |
| | - bbh_cot_fewshot_causal_judgement | 4|get-answer | 3|exact_match|↑ |0.5882|± |0.0361| | |
| | - bbh_cot_fewshot_date_understanding | 4|get-answer | 3|exact_match|↑ |0.6640|± |0.0299| | |
| | - bbh_cot_fewshot_disambiguation_qa | 4|get-answer | 3|exact_match|↑ |0.1920|± |0.0250| | |
| | - bbh_cot_fewshot_dyck_languages | 4|get-answer | 3|exact_match|↑ |0.0000|± |0.0000| | |
| | - bbh_cot_fewshot_formal_fallacies | 4|get-answer | 3|exact_match|↑ |0.0480|± |0.0135| | |
| | - bbh_cot_fewshot_geometric_shapes | 4|get-answer | 3|exact_match|↑ |0.2760|± |0.0283| | |
| | - bbh_cot_fewshot_hyperbaton | 4|get-answer | 3|exact_match|↑ |0.3200|± |0.0296| | |
| | - bbh_cot_fewshot_logical_deduction_five_objects | 4|get-answer | 3|exact_match|↑ |0.0000|± |0.0000| | |
| | - bbh_cot_fewshot_logical_deduction_seven_objects | 4|get-answer | 3|exact_match|↑ |0.0000|± |0.0000| | |
| | - bbh_cot_fewshot_logical_deduction_three_objects | 4|get-answer | 3|exact_match|↑ |0.5400|± |0.0316| | |
| | - bbh_cot_fewshot_movie_recommendation | 4|get-answer | 3|exact_match|↑ |0.6000|± |0.0310| | |
| | - bbh_cot_fewshot_multistep_arithmetic_two | 4|get-answer | 3|exact_match|↑ |0.0000|± |0.0000| | |
| | - bbh_cot_fewshot_navigate | 4|get-answer | 3|exact_match|↑ |0.0160|± |0.0080| | |
| | - bbh_cot_fewshot_object_counting | 4|get-answer | 3|exact_match|↑ |0.5120|± |0.0317| | |
| | - bbh_cot_fewshot_penguins_in_a_table | 4|get-answer | 3|exact_match|↑ |0.2945|± |0.0379| | |
| | - bbh_cot_fewshot_reasoning_about_colored_objects | 4|get-answer | 3|exact_match|↑ |0.2280|± |0.0266| | |
| | - bbh_cot_fewshot_ruin_names | 4|get-answer | 3|exact_match|↑ |0.5120|± |0.0317| | |
| | - bbh_cot_fewshot_salient_translation_error_detection | 4|get-answer | 3|exact_match|↑ |0.5440|± |0.0316| | |
| | - bbh_cot_fewshot_snarks | 4|get-answer | 3|exact_match|↑ |0.7079|± |0.0342| | |
| | - bbh_cot_fewshot_sports_understanding | 4|get-answer | 3|exact_match|↑ |0.4880|± |0.0317| | |
| | - bbh_cot_fewshot_temporal_sequences | 4|get-answer | 3|exact_match|↑ |0.3120|± |0.0294| | |
| | - bbh_cot_fewshot_tracking_shuffled_objects_five_objects | 4|get-answer | 3|exact_match|↑ |0.0000|± |0.0000| | |
| | - bbh_cot_fewshot_tracking_shuffled_objects_seven_objects| 4|get-answer | 3|exact_match|↑ |0.0000|± |0.0000| | |
| | - bbh_cot_fewshot_tracking_shuffled_objects_three_objects| 4|get-answer | 3|exact_match|↑ |0.6280|± |0.0306| | |
| | - bbh_cot_fewshot_web_of_lies | 4|get-answer | 3|exact_match|↑ |0.4400|± |0.0315| | |
| | - bbh_cot_fewshot_word_sorting | 4|get-answer | 3|exact_match|↑ |0.0280|± |0.0105| | |
| |gsm8k | 3|flexible-extract| 5|exact_match|↑ |0.8795|± |0.0090| | |
| | | |strict-match | 5|exact_match|↑ |0.7801|± |0.0114| | |
| |hellaswag | 1|none | 0|acc |↑ |0.5407|± |0.0050| | |
| | | |none | 0|acc_norm |↑ |0.6763|± |0.0047| | |
| |mmlu_pro | 2|custom-extract | |exact_match|↑ |0.6340|± |0.0042| | |
| | - biology | 3|custom-extract | 5|exact_match|↑ |0.8117|± |0.0146| | |
| | - business | 3|custom-extract | 5|exact_match|↑ |0.6907|± |0.0165| | |
| | - chemistry | 3|custom-extract | 5|exact_match|↑ |0.6431|± |0.0142| | |
| | - computer_science | 3|custom-extract | 5|exact_match|↑ |0.6951|± |0.0228| | |
| | - economics | 3|custom-extract | 5|exact_match|↑ |0.7405|± |0.0151| | |
| | - engineering | 3|custom-extract | 5|exact_match|↑ |0.3447|± |0.0153| | |
| | - health | 3|custom-extract | 5|exact_match|↑ |0.6540|± |0.0166| | |
| | - history | 3|custom-extract | 5|exact_match|↑ |0.5512|± |0.0255| | |
| | - law | 3|custom-extract | 5|exact_match|↑ |0.3860|± |0.0147| | |
| | - math | 3|custom-extract | 5|exact_match|↑ |0.7979|± |0.0109| | |
| | - other | 3|custom-extract | 5|exact_match|↑ |0.6028|± |0.0161| | |
| | - philosophy | 3|custom-extract | 5|exact_match|↑ |0.5912|± |0.0220| | |
| | - physics | 3|custom-extract | 5|exact_match|↑ |0.6551|± |0.0132| | |
| | - psychology | 3|custom-extract | 5|exact_match|↑ |0.7243|± |0.0158| | |
| |truthfulqa_mc2 | 3|none | 0|acc |↑ |0.6906|± |0.0153| | |
| |winogrande | 1|none | 0|acc |↑ |0.6630|± |0.0133| | |
| --- | |
| # Limitations | |
| - May hallucinate without grounding | |
| - Performance varies by model size | |
| - Not suitable for high-risk domains without oversight | |
| --- | |
| # License | |
| Bolt Instruct is released under the [AI Squared Community License](https://docs.squared.ai/terms-of-use). | |