| ### Environment Setup |
|
|
| Download this directory to a local machine and set up [`uv`](https://docs.astral.sh/uv/). |
|
|
| 1. **Install `uv`** (if you haven't already): |
| ```bash |
| curl -LsSf [https://astral.sh/uv/install.sh](https://astral.sh/uv/install.sh) | sh |
| ``` |
|
|
| 2. **Sync the environment:** |
| ```bash |
| uv sync |
| ``` |
| *(This automatically creates a virtual environment at `.venv` and strictly installs the dependencies locked in `uv.lock`.)* |
|
|
| 3. **Activate the environment:** |
| ```bash |
| source .venv/bin/activate |
| ``` |
|
|
| ### Evaluation Script |
|
|
| Run: |
|
|
| ```bash |
| accelerate launch eval.py \ |
| --model cloverlm \ |
| --model_args "pretrained=daslab-testing/CloverLM,dtype=bfloat16,quartet_2_impl=quartet2,attn_backend=pytorch" \ |
| --tasks "arc_easy_mi,arc_challenge_mi,hellaswag,piqa" \ |
| --num_fewshot 0 \ |
| --include_path ./ \ |
| --trust_remote_code \ |
| --confirm_run_unsafe_code \ |
| --batch_size auto |
| ``` |
|
|
| ### Expected Evaluation Results |
|
|
| ``` |
| | Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr| |
| |----------------|------:|------|-----:|---------------|---|-----:|---|-----:| |
| |arc_challenge_mi| 1|none | 0|acc |↑ |0.4625|± |0.0146| |
| | | |none | 0|acc_mutual_info|↑ |0.5094|± |0.0146| |
| | | |none | 0|acc_norm |↑ |0.4923|± |0.0146| |
| |arc_easy_mi | 1|none | 0|acc |↑ |0.7997|± |0.0082| |
| | | |none | 0|acc_mutual_info|↑ |0.7239|± |0.0092| |
| | | |none | 0|acc_norm |↑ |0.7731|± |0.0086| |
| |hellaswag | 1|none | 0|acc |↑ |0.5392|± |0.0050| |
| | | |none | 0|acc_norm |↑ |0.7167|± |0.0045| |
| |piqa | 1|none | 0|acc |↑ |0.7922|± |0.0095| |
| | | |none | 0|acc_norm |↑ |0.8058|± |0.0092| |
| ``` |
|
|
| ### Alternative Backends |
|
|
| Replace `quartet_2_impl=quartet2` with `quartet_2_impl=pseudoquant` on non-Blackwell GPUs. |
| You can try `attn_backend=pytorch/flash2/flash3/flash4` if you have the corresponding libs installed. |
|
|