| # Benchmarking models | |
| To use `bench-TriLMs.sh`, you need to | |
| - Place it in a `llama.cpp` checkout | |
| - Have `cmake`, `gcc`, and other dependencies of `llama.cpp` | |
| - If you want to benchmark on GPUs, the script checks if `nvidia-smi` is present, and you'll also need the necessary compile-time dependencies | |
| The script will automatically download the models and quantize different variants. | |
| It will then produce 2 result files, one called `results-$(date +%s).json` and the other called `results-$(date +%s)-cpuinfo.txt`. Both will use the exact same date. | |
| The intention is to eventually read the produced `.json` in a Python script with | |
| ```python3 | |
| from __future__ import annotations | |
| from typing import Any | |
| import json | |
| with open("result-1234567890.json") as f: | |
| data: list[list[dict[str, Any]]] = json.loads("[" + f.read() + "]") | |
| # Then use that data | |
| ... | |
| ``` | |