Instructions to use clemsail/devstral-v3-dapo with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use clemsail/devstral-v3-dapo with PEFT:

from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("unsloth/Devstral-Small-2507-unsloth-bnb-4bit")
model = PeftModel.from_pretrained(base_model, "clemsail/devstral-v3-dapo")

Transformers

How to use clemsail/devstral-v3-dapo with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="clemsail/devstral-v3-dapo")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("clemsail/devstral-v3-dapo", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use clemsail/devstral-v3-dapo with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "clemsail/devstral-v3-dapo"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "clemsail/devstral-v3-dapo",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/clemsail/devstral-v3-dapo

SGLang

How to use clemsail/devstral-v3-dapo with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "clemsail/devstral-v3-dapo" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "clemsail/devstral-v3-dapo",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "clemsail/devstral-v3-dapo" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "clemsail/devstral-v3-dapo",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Unsloth Studio new

How to use clemsail/devstral-v3-dapo with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for clemsail/devstral-v3-dapo to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for clemsail/devstral-v3-dapo to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for clemsail/devstral-v3-dapo to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="clemsail/devstral-v3-dapo",
    max_seq_length=2048,
)

Docker Model Runner
How to use clemsail/devstral-v3-dapo with Docker Model Runner:
```
docker model run hf.co/clemsail/devstral-v3-dapo
```

clemsail commited on 20 days ago

Commit

75c22a0

verified ·

1 Parent(s): c781217

chore: upload lm-eval-harness results

Browse files

Files changed (1) hide show

evals/results_2026-04-15T18-44-41.265898.json +610 -0

evals/results_2026-04-15T18-44-41.265898.json ADDED Viewed

	@@ -0,0 +1,610 @@

+{
+  "results": {
+    "leaderboard_math_hard": {
+      "exact_match,none": 0.3406344410876133,
+      "exact_match_stderr,none": 0.012022643333214926,
+      "alias": "leaderboard_math_hard"
+    },
+    "leaderboard_math_algebra_hard": {
+      "alias": " - leaderboard_math_algebra_hard",
+      "exact_match,none": 0.5700325732899023,
+      "exact_match_stderr,none": 0.02830133364131638,
+      "exact_match_original,none": 0.0,
+      "exact_match_original_stderr,none": 0.0
+    },
+    "leaderboard_math_counting_and_prob_hard": {
+      "alias": " - leaderboard_math_counting_and_prob_hard",
+      "exact_match,none": 0.25203252032520324,
+      "exact_match_stderr,none": 0.03930879526823995,
+      "exact_match_original,none": 0.0,
+      "exact_match_original_stderr,none": 0.0
+    },
+    "leaderboard_math_geometry_hard": {
+      "alias": " - leaderboard_math_geometry_hard",
+      "exact_match,none": 0.18181818181818182,
+      "exact_match_stderr,none": 0.03369829435719357,
+      "exact_match_original,none": 0.0,
+      "exact_match_original_stderr,none": 0.0
+    },
+    "leaderboard_math_intermediate_algebra_hard": {
+      "alias": " - leaderboard_math_intermediate_algebra_hard",
+      "exact_match,none": 0.1392857142857143,
+      "exact_match_stderr,none": 0.02072911170255923,
+      "exact_match_original,none": 0.0,
+      "exact_match_original_stderr,none": 0.0
+    },
+    "leaderboard_math_num_theory_hard": {
+      "alias": " - leaderboard_math_num_theory_hard",
+      "exact_match,none": 0.4155844155844156,
+      "exact_match_stderr,none": 0.03984233708298028,
+      "exact_match_original,none": 0.0,
+      "exact_match_original_stderr,none": 0.0
+    },
+    "leaderboard_math_prealgebra_hard": {
+      "alias": " - leaderboard_math_prealgebra_hard",
+      "exact_match,none": 0.5233160621761658,
+      "exact_match_stderr,none": 0.03604513672442202,
+      "exact_match_original,none": 0.0,
+      "exact_match_original_stderr,none": 0.0
+    },
+    "leaderboard_math_precalculus_hard": {
+      "alias": " - leaderboard_math_precalculus_hard",
+      "exact_match,none": 0.1259259259259259,
+      "exact_match_stderr,none": 0.02866020527595505,
+      "exact_match_original,none": 0.0,
+      "exact_match_original_stderr,none": 0.0
+    }
+  },
+  "groups": {
+    "leaderboard_math_hard": {
+      "exact_match,none": 0.3406344410876133,
+      "exact_match_stderr,none": 0.012022643333214926,
+      "alias": "leaderboard_math_hard"
+    }
+  },
+  "group_subtasks": {
+    "leaderboard_math_hard": [
+      "leaderboard_math_algebra_hard",
+      "leaderboard_math_counting_and_prob_hard",
+      "leaderboard_math_geometry_hard",
+      "leaderboard_math_intermediate_algebra_hard",
+      "leaderboard_math_num_theory_hard",
+      "leaderboard_math_prealgebra_hard",
+      "leaderboard_math_precalculus_hard"
+    ]
+  },
+  "configs": {
+    "leaderboard_math_algebra_hard": {
+      "task": "leaderboard_math_algebra_hard",
+      "dataset_path": "DigitalLearningGmbH/MATH-lighteval",
+      "dataset_name": "algebra",
+      "training_split": "train",
+      "test_split": "test",
+      "process_docs": "def process_docs(dataset: datasets.Dataset) -> datasets.Dataset:\n    def _process_doc(doc: dict) -> dict:\n        out_doc = {\n            \"problem\": doc[\"problem\"],\n            \"solution\": doc[\"solution\"],\n            \"answer\": remove_boxed(last_boxed_only_string(doc[\"solution\"])),\n        }\n        if getattr(doc, \"few_shot\", None) is not None:\n            out_doc[\"few_shot\"] = True\n        return out_doc\n\n    return dataset.filter(lambda x: x[\"level\"] == \"Level 5\").map(_process_doc)\n",
+      "doc_to_text": "def doc_to_text(doc: dict) -> str:\n    return \"Problem:\" + \"\\n\" + doc[\"problem\"] + \"\\n\\n\" + \"Solution:\"\n",
+      "doc_to_target": "{{answer if few_shot is undefined else solution}}",
+      "unsafe_code": false,
+      "process_results": "def process_results(doc: dict, results: List[str]) -> Dict[str, int]:\n    candidates = results[0]\n    parsed_candidate = parse(candidates)\n    parsed_answer = parse(doc[\"solution\"], extraction_config=[LatexExtractionConfig()])\n    if verify(parsed_answer, parsed_candidate):\n        retval = 1\n    else:\n        retval = 0\n\n    try:\n        original = process_result_v1(doc, candidates)\n    except:  # noqa: E722\n        original = 0\n\n    output = {\n        \"exact_match\": retval,\n        \"exact_match_original\": original,\n    }\n    return output\n",
+      "description": "",
+      "target_delimiter": " ",
+      "fewshot_delimiter": "\n\n",
+      "fewshot_config": {
+        "sampler": "first_n",
+        "split": null,
+        "process_docs": "<function process_docs at 0x77380f7fefc0>",
+        "fewshot_indices": null,
+        "samples": "<function list_fewshot_samples at 0x77380eab8f40>",
+        "doc_to_text": "<function doc_to_text at 0x77380f79e660>",
+        "doc_to_choice": null,
+        "doc_to_target": "{{answer if few_shot is undefined else solution}}",
+        "gen_prefix": null,
+        "fewshot_delimiter": "\n\n",
+        "target_delimiter": " "
+      },
+      "num_fewshot": 0,
+      "metric_list": [
+        {
+          "metric": "exact_match",
+          "aggregation": "mean",
+          "higher_is_better": true
+        },
+        {
+          "metric": "exact_match_original",
+          "aggregation": "mean",
+          "higher_is_better": true
+        }
+      ],
+      "output_type": "generate_until",
+      "generation_kwargs": {
+        "until": [
+          "Problem:"
+        ],
+        "do_sample": false,
+        "temperature": 0.0,
+        "max_gen_toks": 1024
+      },
+      "repeats": 1,
+      "should_decontaminate": false,
+      "metadata": {
+        "version": 3.0,
+        "base_url": "http://localhost:8000/v1/chat/completions",
+        "model": "devstral",
+        "num_concurrent": 1
+      }
+    },
+    "leaderboard_math_counting_and_prob_hard": {
+      "task": "leaderboard_math_counting_and_prob_hard",
+      "dataset_path": "DigitalLearningGmbH/MATH-lighteval",
+      "dataset_name": "counting_and_probability",
+      "training_split": "train",
+      "test_split": "test",
+      "process_docs": "def process_docs(dataset: datasets.Dataset) -> datasets.Dataset:\n    def _process_doc(doc: dict) -> dict:\n        out_doc = {\n            \"problem\": doc[\"problem\"],\n            \"solution\": doc[\"solution\"],\n            \"answer\": remove_boxed(last_boxed_only_string(doc[\"solution\"])),\n        }\n        if getattr(doc, \"few_shot\", None) is not None:\n            out_doc[\"few_shot\"] = True\n        return out_doc\n\n    return dataset.filter(lambda x: x[\"level\"] == \"Level 5\").map(_process_doc)\n",
+      "doc_to_text": "def doc_to_text(doc: dict) -> str:\n    return \"Problem:\" + \"\\n\" + doc[\"problem\"] + \"\\n\\n\" + \"Solution:\"\n",
+      "doc_to_target": "{{answer if few_shot is undefined else solution}}",
+      "unsafe_code": false,
+      "process_results": "def process_results(doc: dict, results: List[str]) -> Dict[str, int]:\n    candidates = results[0]\n    parsed_candidate = parse(candidates)\n    parsed_answer = parse(doc[\"solution\"], extraction_config=[LatexExtractionConfig()])\n    if verify(parsed_answer, parsed_candidate):\n        retval = 1\n    else:\n        retval = 0\n\n    try:\n        original = process_result_v1(doc, candidates)\n    except:  # noqa: E722\n        original = 0\n\n    output = {\n        \"exact_match\": retval,\n        \"exact_match_original\": original,\n    }\n    return output\n",
+      "description": "",
+      "target_delimiter": " ",
+      "fewshot_delimiter": "\n\n",
+      "fewshot_config": {
+        "sampler": "first_n",
+        "split": null,
+        "process_docs": "<function process_docs at 0x77380f7fef20>",
+        "fewshot_indices": null,
+        "samples": "<function list_fewshot_samples at 0x77380f7fd1c0>",
+        "doc_to_text": "<function doc_to_text at 0x77380f7ff380>",
+        "doc_to_choice": null,
+        "doc_to_target": "{{answer if few_shot is undefined else solution}}",
+        "gen_prefix": null,
+        "fewshot_delimiter": "\n\n",
+        "target_delimiter": " "
+      },
+      "num_fewshot": 0,
+      "metric_list": [
+        {
+          "metric": "exact_match",
+          "aggregation": "mean",
+          "higher_is_better": true
+        },
+        {
+          "metric": "exact_match_original",
+          "aggregation": "mean",
+          "higher_is_better": true
+        }
+      ],
+      "output_type": "generate_until",
+      "generation_kwargs": {
+        "until": [
+          "Problem:"
+        ],
+        "do_sample": false,
+        "temperature": 0.0,
+        "max_gen_toks": 1024
+      },
+      "repeats": 1,
+      "should_decontaminate": false,
+      "metadata": {
+        "version": 3.0,
+        "base_url": "http://localhost:8000/v1/chat/completions",
+        "model": "devstral",
+        "num_concurrent": 1
+      }
+    },
+    "leaderboard_math_geometry_hard": {
+      "task": "leaderboard_math_geometry_hard",
+      "dataset_path": "DigitalLearningGmbH/MATH-lighteval",
+      "dataset_name": "geometry",
+      "training_split": "train",
+      "test_split": "test",
+      "process_docs": "def process_docs(dataset: datasets.Dataset) -> datasets.Dataset:\n    def _process_doc(doc: dict) -> dict:\n        out_doc = {\n            \"problem\": doc[\"problem\"],\n            \"solution\": doc[\"solution\"],\n            \"answer\": remove_boxed(last_boxed_only_string(doc[\"solution\"])),\n        }\n        if getattr(doc, \"few_shot\", None) is not None:\n            out_doc[\"few_shot\"] = True\n        return out_doc\n\n    return dataset.filter(lambda x: x[\"level\"] == \"Level 5\").map(_process_doc)\n",
+      "doc_to_text": "def doc_to_text(doc: dict) -> str:\n    return \"Problem:\" + \"\\n\" + doc[\"problem\"] + \"\\n\\n\" + \"Solution:\"\n",
+      "doc_to_target": "{{answer if few_shot is undefined else solution}}",
+      "unsafe_code": false,
+      "process_results": "def process_results(doc: dict, results: List[str]) -> Dict[str, int]:\n    candidates = results[0]\n    parsed_candidate = parse(candidates)\n    parsed_answer = parse(doc[\"solution\"], extraction_config=[LatexExtractionConfig()])\n    if verify(parsed_answer, parsed_candidate):\n        retval = 1\n    else:\n        retval = 0\n\n    try:\n        original = process_result_v1(doc, candidates)\n    except:  # noqa: E722\n        original = 0\n\n    output = {\n        \"exact_match\": retval,\n        \"exact_match_original\": original,\n    }\n    return output\n",
+      "description": "",
+      "target_delimiter": " ",
+      "fewshot_delimiter": "\n\n",
+      "fewshot_config": {
+        "sampler": "first_n",
+        "split": null,
+        "process_docs": "<function process_docs at 0x77380f7dfba0>",
+        "fewshot_indices": null,
+        "samples": "<function list_fewshot_samples at 0x77380f7dcea0>",
+        "doc_to_text": "<function doc_to_text at 0x77380f7df240>",
+        "doc_to_choice": null,
+        "doc_to_target": "{{answer if few_shot is undefined else solution}}",
+        "gen_prefix": null,
+        "fewshot_delimiter": "\n\n",
+        "target_delimiter": " "
+      },
+      "num_fewshot": 0,
+      "metric_list": [
+        {
+          "metric": "exact_match",
+          "aggregation": "mean",
+          "higher_is_better": true
+        },
+        {
+          "metric": "exact_match_original",
+          "aggregation": "mean",
+          "higher_is_better": true
+        }
+      ],
+      "output_type": "generate_until",
+      "generation_kwargs": {
+        "until": [
+          "Problem:"
+        ],
+        "do_sample": false,
+        "temperature": 0.0,
+        "max_gen_toks": 1024
+      },
+      "repeats": 1,
+      "should_decontaminate": false,
+      "metadata": {
+        "version": 3.0,
+        "base_url": "http://localhost:8000/v1/chat/completions",
+        "model": "devstral",
+        "num_concurrent": 1
+      }
+    },
+    "leaderboard_math_intermediate_algebra_hard": {
+      "task": "leaderboard_math_intermediate_algebra_hard",
+      "dataset_path": "DigitalLearningGmbH/MATH-lighteval",
+      "dataset_name": "intermediate_algebra",
+      "training_split": "train",
+      "test_split": "test",
+      "process_docs": "def process_docs(dataset: datasets.Dataset) -> datasets.Dataset:\n    def _process_doc(doc: dict) -> dict:\n        out_doc = {\n            \"problem\": doc[\"problem\"],\n            \"solution\": doc[\"solution\"],\n            \"answer\": remove_boxed(last_boxed_only_string(doc[\"solution\"])),\n        }\n        if getattr(doc, \"few_shot\", None) is not None:\n            out_doc[\"few_shot\"] = True\n        return out_doc\n\n    return dataset.filter(lambda x: x[\"level\"] == \"Level 5\").map(_process_doc)\n",
+      "doc_to_text": "def doc_to_text(doc: dict) -> str:\n    return \"Problem:\" + \"\\n\" + doc[\"problem\"] + \"\\n\\n\" + \"Solution:\"\n",
+      "doc_to_target": "{{answer if few_shot is undefined else solution}}",
+      "unsafe_code": false,
+      "process_results": "def process_results(doc: dict, results: List[str]) -> Dict[str, int]:\n    candidates = results[0]\n    parsed_candidate = parse(candidates)\n    parsed_answer = parse(doc[\"solution\"], extraction_config=[LatexExtractionConfig()])\n    if verify(parsed_answer, parsed_candidate):\n        retval = 1\n    else:\n        retval = 0\n\n    try:\n        original = process_result_v1(doc, candidates)\n    except:  # noqa: E722\n        original = 0\n\n    output = {\n        \"exact_match\": retval,\n        \"exact_match_original\": original,\n    }\n    return output\n",
+      "description": "",
+      "target_delimiter": " ",
+      "fewshot_delimiter": "\n\n",
+      "fewshot_config": {
+        "sampler": "first_n",
+        "split": null,
+        "process_docs": "<function process_docs at 0x7738109345e0>",
+        "fewshot_indices": null,
+        "samples": "<function list_fewshot_samples at 0x77380f7dcb80>",
+        "doc_to_text": "<function doc_to_text at 0x77380f75fd80>",
+        "doc_to_choice": null,
+        "doc_to_target": "{{answer if few_shot is undefined else solution}}",
+        "gen_prefix": null,
+        "fewshot_delimiter": "\n\n",
+        "target_delimiter": " "
+      },
+      "num_fewshot": 0,
+      "metric_list": [
+        {
+          "metric": "exact_match",
+          "aggregation": "mean",
+          "higher_is_better": true
+        },
+        {
+          "metric": "exact_match_original",
+          "aggregation": "mean",
+          "higher_is_better": true
+        }
+      ],
+      "output_type": "generate_until",
+      "generation_kwargs": {
+        "until": [
+          "Problem:"
+        ],
+        "do_sample": false,
+        "temperature": 0.0,
+        "max_gen_toks": 1024
+      },
+      "repeats": 1,
+      "should_decontaminate": false,
+      "metadata": {
+        "version": 3.0,
+        "base_url": "http://localhost:8000/v1/chat/completions",
+        "model": "devstral",
+        "num_concurrent": 1
+      }
+    },
+    "leaderboard_math_num_theory_hard": {
+      "task": "leaderboard_math_num_theory_hard",
+      "dataset_path": "DigitalLearningGmbH/MATH-lighteval",
+      "dataset_name": "number_theory",
+      "training_split": "train",
+      "test_split": "test",
+      "process_docs": "def process_docs(dataset: datasets.Dataset) -> datasets.Dataset:\n    def _process_doc(doc: dict) -> dict:\n        out_doc = {\n            \"problem\": doc[\"problem\"],\n            \"solution\": doc[\"solution\"],\n            \"answer\": remove_boxed(last_boxed_only_string(doc[\"solution\"])),\n        }\n        if getattr(doc, \"few_shot\", None) is not None:\n            out_doc[\"few_shot\"] = True\n        return out_doc\n\n    return dataset.filter(lambda x: x[\"level\"] == \"Level 5\").map(_process_doc)\n",
+      "doc_to_text": "def doc_to_text(doc: dict) -> str:\n    return \"Problem:\" + \"\\n\" + doc[\"problem\"] + \"\\n\\n\" + \"Solution:\"\n",
+      "doc_to_target": "{{answer if few_shot is undefined else solution}}",
+      "unsafe_code": false,
+      "process_results": "def process_results(doc: dict, results: List[str]) -> Dict[str, int]:\n    candidates = results[0]\n    parsed_candidate = parse(candidates)\n    parsed_answer = parse(doc[\"solution\"], extraction_config=[LatexExtractionConfig()])\n    if verify(parsed_answer, parsed_candidate):\n        retval = 1\n    else:\n        retval = 0\n\n    try:\n        original = process_result_v1(doc, candidates)\n    except:  # noqa: E722\n        original = 0\n\n    output = {\n        \"exact_match\": retval,\n        \"exact_match_original\": original,\n    }\n    return output\n",
+      "description": "",
+      "target_delimiter": " ",
+      "fewshot_delimiter": "\n\n",
+      "fewshot_config": {
+        "sampler": "first_n",
+        "split": null,
+        "process_docs": "<function process_docs at 0x77380f79e520>",
+        "fewshot_indices": null,
+        "samples": "<function list_fewshot_samples at 0x77380f79c0e0>",
+        "doc_to_text": "<function doc_to_text at 0x77380f79de40>",
+        "doc_to_choice": null,
+        "doc_to_target": "{{answer if few_shot is undefined else solution}}",
+        "gen_prefix": null,
+        "fewshot_delimiter": "\n\n",
+        "target_delimiter": " "
+      },
+      "num_fewshot": 0,
+      "metric_list": [
+        {
+          "metric": "exact_match",
+          "aggregation": "mean",
+          "higher_is_better": true
+        },
+        {
+          "metric": "exact_match_original",
+          "aggregation": "mean",
+          "higher_is_better": true
+        }
+      ],
+      "output_type": "generate_until",
+      "generation_kwargs": {
+        "until": [
+          "Problem:"
+        ],
+        "do_sample": false,
+        "temperature": 0.0,
+        "max_gen_toks": 1024
+      },
+      "repeats": 1,
+      "should_decontaminate": false,
+      "metadata": {
+        "version": 3.0,
+        "base_url": "http://localhost:8000/v1/chat/completions",
+        "model": "devstral",
+        "num_concurrent": 1
+      }
+    },
+    "leaderboard_math_prealgebra_hard": {
+      "task": "leaderboard_math_prealgebra_hard",
+      "dataset_path": "DigitalLearningGmbH/MATH-lighteval",
+      "dataset_name": "prealgebra",
+      "training_split": "train",
+      "test_split": "test",
+      "process_docs": "def process_docs(dataset: datasets.Dataset) -> datasets.Dataset:\n    def _process_doc(doc: dict) -> dict:\n        out_doc = {\n            \"problem\": doc[\"problem\"],\n            \"solution\": doc[\"solution\"],\n            \"answer\": remove_boxed(last_boxed_only_string(doc[\"solution\"])),\n        }\n        if getattr(doc, \"few_shot\", None) is not None:\n            out_doc[\"few_shot\"] = True\n        return out_doc\n\n    return dataset.filter(lambda x: x[\"level\"] == \"Level 5\").map(_process_doc)\n",
+      "doc_to_text": "def doc_to_text(doc: dict) -> str:\n    return \"Problem:\" + \"\\n\" + doc[\"problem\"] + \"\\n\\n\" + \"Solution:\"\n",
+      "doc_to_target": "{{answer if few_shot is undefined else solution}}",
+      "unsafe_code": false,
+      "process_results": "def process_results(doc: dict, results: List[str]) -> Dict[str, int]:\n    candidates = results[0]\n    parsed_candidate = parse(candidates)\n    parsed_answer = parse(doc[\"solution\"], extraction_config=[LatexExtractionConfig()])\n    if verify(parsed_answer, parsed_candidate):\n        retval = 1\n    else:\n        retval = 0\n\n    try:\n        original = process_result_v1(doc, candidates)\n    except:  # noqa: E722\n        original = 0\n\n    output = {\n        \"exact_match\": retval,\n        \"exact_match_original\": original,\n    }\n    return output\n",
+      "description": "",
+      "target_delimiter": " ",
+      "fewshot_delimiter": "\n\n",
+      "fewshot_config": {
+        "sampler": "first_n",
+        "split": null,
+        "process_docs": "<function process_docs at 0x77381091b4c0>",
+        "fewshot_indices": null,
+        "samples": "<function list_fewshot_samples at 0x77380f75e480>",
+        "doc_to_text": "<function doc_to_text at 0x77380f75db20>",
+        "doc_to_choice": null,
+        "doc_to_target": "{{answer if few_shot is undefined else solution}}",
+        "gen_prefix": null,
+        "fewshot_delimiter": "\n\n",
+        "target_delimiter": " "
+      },
+      "num_fewshot": 0,
+      "metric_list": [
+        {
+          "metric": "exact_match",
+          "aggregation": "mean",
+          "higher_is_better": true
+        },
+        {
+          "metric": "exact_match_original",
+          "aggregation": "mean",
+          "higher_is_better": true
+        }
+      ],
+      "output_type": "generate_until",
+      "generation_kwargs": {
+        "until": [
+          "Problem:"
+        ],
+        "do_sample": false,
+        "temperature": 0.0,
+        "max_gen_toks": 1024
+      },
+      "repeats": 1,
+      "should_decontaminate": false,
+      "metadata": {
+        "version": 3.0,
+        "base_url": "http://localhost:8000/v1/chat/completions",
+        "model": "devstral",
+        "num_concurrent": 1
+      }
+    },
+    "leaderboard_math_precalculus_hard": {
+      "task": "leaderboard_math_precalculus_hard",
+      "dataset_path": "DigitalLearningGmbH/MATH-lighteval",
+      "dataset_name": "precalculus",
+      "training_split": "train",
+      "test_split": "test",
+      "process_docs": "def process_docs(dataset: datasets.Dataset) -> datasets.Dataset:\n    def _process_doc(doc: dict) -> dict:\n        out_doc = {\n            \"problem\": doc[\"problem\"],\n            \"solution\": doc[\"solution\"],\n            \"answer\": remove_boxed(last_boxed_only_string(doc[\"solution\"])),\n        }\n        if getattr(doc, \"few_shot\", None) is not None:\n            out_doc[\"few_shot\"] = True\n        return out_doc\n\n    return dataset.filter(lambda x: x[\"level\"] == \"Level 5\").map(_process_doc)\n",
+      "doc_to_text": "def doc_to_text(doc: dict) -> str:\n    return \"Problem:\" + \"\\n\" + doc[\"problem\"] + \"\\n\\n\" + \"Solution:\"\n",
+      "doc_to_target": "{{answer if few_shot is undefined else solution}}",
+      "unsafe_code": false,
+      "process_results": "def process_results(doc: dict, results: List[str]) -> Dict[str, int]:\n    candidates = results[0]\n    parsed_candidate = parse(candidates)\n    parsed_answer = parse(doc[\"solution\"], extraction_config=[LatexExtractionConfig()])\n    if verify(parsed_answer, parsed_candidate):\n        retval = 1\n    else:\n        retval = 0\n\n    try:\n        original = process_result_v1(doc, candidates)\n    except:  # noqa: E722\n        original = 0\n\n    output = {\n        \"exact_match\": retval,\n        \"exact_match_original\": original,\n    }\n    return output\n",
+      "description": "",
+      "target_delimiter": " ",
+      "fewshot_delimiter": "\n\n",
+      "fewshot_config": {
+        "sampler": "first_n",
+        "split": null,
+        "process_docs": "<function process_docs at 0x773810eca480>",
+        "fewshot_indices": null,
+        "samples": "<function list_fewshot_samples at 0x7738109180e0>",
+        "doc_to_text": "<function doc_to_text at 0x773810906b60>",
+        "doc_to_choice": null,
+        "doc_to_target": "{{answer if few_shot is undefined else solution}}",
+        "gen_prefix": null,
+        "fewshot_delimiter": "\n\n",
+        "target_delimiter": " "
+      },
+      "num_fewshot": 0,
+      "metric_list": [
+        {
+          "metric": "exact_match",
+          "aggregation": "mean",
+          "higher_is_better": true
+        },
+        {
+          "metric": "exact_match_original",
+          "aggregation": "mean",
+          "higher_is_better": true
+        }
+      ],
+      "output_type": "generate_until",
+      "generation_kwargs": {
+        "until": [
+          "Problem:"
+        ],
+        "do_sample": false,
+        "temperature": 0.0,
+        "max_gen_toks": 1024
+      },
+      "repeats": 1,
+      "should_decontaminate": false,
+      "metadata": {
+        "version": 3.0,
+        "base_url": "http://localhost:8000/v1/chat/completions",
+        "model": "devstral",
+        "num_concurrent": 1
+      }
+    }
+  },
+  "versions": {
+    "leaderboard_math_algebra_hard": 3.0,
+    "leaderboard_math_counting_and_prob_hard": 3.0,
+    "leaderboard_math_geometry_hard": 3.0,
+    "leaderboard_math_hard": null,
+    "leaderboard_math_intermediate_algebra_hard": 3.0,
+    "leaderboard_math_num_theory_hard": 3.0,
+    "leaderboard_math_prealgebra_hard": 3.0,
+    "leaderboard_math_precalculus_hard": 3.0
+  },
+  "n-shot": {
+    "leaderboard_math_algebra_hard": 0,
+    "leaderboard_math_counting_and_prob_hard": 0,
+    "leaderboard_math_geometry_hard": 0,
+    "leaderboard_math_intermediate_algebra_hard": 0,
+    "leaderboard_math_num_theory_hard": 0,
+    "leaderboard_math_prealgebra_hard": 0,
+    "leaderboard_math_precalculus_hard": 0
+  },
+  "higher_is_better": {
+    "leaderboard_math_algebra_hard": {
+      "exact_match": true,
+      "exact_match_original": true
+    },
+    "leaderboard_math_counting_and_prob_hard": {
+      "exact_match": true,
+      "exact_match_original": true
+    },
+    "leaderboard_math_geometry_hard": {
+      "exact_match": true,
+      "exact_match_original": true
+    },
+    "leaderboard_math_hard": {
+      "exact_match": true,
+      "exact_match_original": true
+    },
+    "leaderboard_math_intermediate_algebra_hard": {
+      "exact_match": true,
+      "exact_match_original": true
+    },
+    "leaderboard_math_num_theory_hard": {
+      "exact_match": true,
+      "exact_match_original": true
+    },
+    "leaderboard_math_prealgebra_hard": {
+      "exact_match": true,
+      "exact_match_original": true
+    },
+    "leaderboard_math_precalculus_hard": {
+      "exact_match": true,
+      "exact_match_original": true
+    }
+  },
+  "n-samples": {
+    "leaderboard_math_algebra_hard": {
+      "original": 307,
+      "effective": 307
+    },
+    "leaderboard_math_counting_and_prob_hard": {
+      "original": 123,
+      "effective": 123
+    },
+    "leaderboard_math_geometry_hard": {
+      "original": 132,
+      "effective": 132
+    },
+    "leaderboard_math_intermediate_algebra_hard": {
+      "original": 280,
+      "effective": 280
+    },
+    "leaderboard_math_num_theory_hard": {
+      "original": 154,
+      "effective": 154
+    },
+    "leaderboard_math_prealgebra_hard": {
+      "original": 193,
+      "effective": 193
+    },
+    "leaderboard_math_precalculus_hard": {
+      "original": 135,
+      "effective": 135
+    }
+  },
+  "config": {
+    "model": "local-chat-completions",
+    "model_args": {
+      "base_url": "http://localhost:8000/v1/chat/completions",
+      "model": "devstral",
+      "num_concurrent": 1
+    },
+    "batch_size": "1",
+    "batch_sizes": [],
+    "device": "cuda:0",
+    "use_cache": null,
+    "limit": 500.0,
+    "bootstrap_iters": 100000,
+    "gen_kwargs": {},
+    "random_seed": 0,
+    "numpy_seed": 1234,
+    "torch_seed": 1234,
+    "fewshot_seed": 1234
+  },
+  "git_hash": null,
+  "date": 1776259937.910414,
+  "pretty_env_info": "N/A (torch not installed)",
+  "transformers_version": "N/A",
+  "lm_eval_version": "0.4.11",
+  "upper_git_hash": null,
+  "task_hashes": {
+    "leaderboard_math_algebra_hard": "f502f4e54c73dc4380b660c8ac770bbffa9da9c5d9be8d207504818e02ea72d2",
+    "leaderboard_math_counting_and_prob_hard": "ba9cca223890ed35e5a58cb92291243d8e958074d9b28e058e06e5d6edb471f4",
+    "leaderboard_math_geometry_hard": "20f9c4fbd59977383a679ab91ed2a4b52784bfc63bb01336b14022f148f0d3c7",
+    "leaderboard_math_intermediate_algebra_hard": "cc53697424ad5d3d04f874c5db5a0681320796c06db79c3ac036b6894b6d8a95",
+    "leaderboard_math_num_theory_hard": "c3e042ba212294c4672c0972cbd2f93afc4f59a9e3fbc38b076c7c76257c33dc",
+    "leaderboard_math_prealgebra_hard": "de7a926c6ca2898d3df2dda36b485b3287c5c3af36f3f1b2b300bb11da7d3bee",
+    "leaderboard_math_precalculus_hard": "1501f3af918c1fc4c099cdf133f7e77b7b492691362f3f8a0a6323ac5757339f"
+  },
+  "model_source": "local-chat-completions",
+  "model_name": "devstral",
+  "model_name_sanitized": "devstral",
+  "system_instruction": null,
+  "system_instruction_sha": null,
+  "fewshot_as_multiturn": true,
+  "chat_template": "",
+  "chat_template_sha": null,
+  "total_evaluation_time_seconds": "11544.726048459765"
+}