Instructions to use Zarinaaa/spectral-collapse-bf16 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use Zarinaaa/spectral-collapse-bf16 with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("google/gemma-2-9b") model = PeftModel.from_pretrained(base_model, "Zarinaaa/spectral-collapse-bf16") - Transformers
How to use Zarinaaa/spectral-collapse-bf16 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Zarinaaa/spectral-collapse-bf16")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Zarinaaa/spectral-collapse-bf16", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Zarinaaa/spectral-collapse-bf16 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Zarinaaa/spectral-collapse-bf16" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Zarinaaa/spectral-collapse-bf16", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Zarinaaa/spectral-collapse-bf16
- SGLang
How to use Zarinaaa/spectral-collapse-bf16 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Zarinaaa/spectral-collapse-bf16" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Zarinaaa/spectral-collapse-bf16", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Zarinaaa/spectral-collapse-bf16" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Zarinaaa/spectral-collapse-bf16", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use Zarinaaa/spectral-collapse-bf16 with Docker Model Runner:
docker model run hf.co/Zarinaaa/spectral-collapse-bf16
| ======================================================================== | |
| Post-Training Evaluation β Turkic LoRA | |
| ======================================================================== | |
| [INFO] Loading base model: google/gemma-2-9b | |
| [INFO] Loading adapter from: ./output_ky_bf16_r16_lr2e4_3ep/final_adapter | |
| [transformers] `torch_dtype` is deprecated! Use `dtype` instead! | |
| Loading weights: 0%| | 0/464 [00:00<?, ?it/s] Loading weights: 0%|β | 1/464 [00:00<03:52, 1.99it/s]/venv/main/lib/python3.12/site-packages/bitsandbytes/backends/cuda/ops.py:213: FutureWarning: _check_is_size will be removed in a future PyTorch release along with guard_size_oblivious. Use _check(i >= 0) instead. | |
| torch._check_is_size(blocksize) | |
| Loading weights: 1%|β | 4/464 [00:00<00:58, 7.91it/s] Loading weights: 3%|βββ | 14/464 [00:00<00:15, 28.40it/s] Loading weights: 5%|βββββ | 22/464 [00:00<00:10, 40.56it/s] Loading weights: 6%|ββββββ | 28/464 [00:00<00:10, 41.99it/s] Loading weights: 8%|ββββββββ | 38/464 [00:01<00:08, 50.22it/s] Loading weights: 10%|βββββββββββ | 48/464 [00:01<00:06, 60.63it/s] Loading weights: 13%|βββββββββββββ | 59/464 [00:01<00:05, 68.55it/s] Loading weights: 15%|βββββββββββββββ | 70/464 [00:01<00:05, 74.20it/s] Loading weights: 17%|βββββββββββββββββ | 80/464 [00:01<00:05, 72.65it/s] Loading weights: 20%|ββββββββββββββββββββ | 91/464 [00:01<00:04, 77.35it/s] Loading weights: 21%|βββββββββββββββββββββ | 99/464 [00:01<00:04, 75.34it/s] Loading weights: 23%|βββββββββββββββββββββββ | 107/464 [00:01<00:04, 72.84it/s] Loading weights: 25%|βββββββββββββββββββββββββ | 115/464 [00:02<00:05, 67.17it/s] Loading weights: 27%|βββββββββββββββββββββββββββ | 125/464 [00:02<00:04, 72.08it/s] Loading weights: 29%|βββββββββββββββββββββββββββββ | 136/464 [00:02<00:04, 76.84it/s] Loading weights: 31%|βββββββββββββββββββββββββββββββ | 146/464 [00:02<00:04, 73.17it/s] Loading weights: 34%|ββββββββββββββββββββββββββββββββββ | 157/464 [00:02<00:03, 77.61it/s] Loading weights: 36%|βββββββββββββββββββββββββββββββββββ | 165/464 [00:02<00:03, 78.05it/s] Loading weights: 37%|βββββββββββββββββββββββββββββββββββββ | 173/464 [00:02<00:03, 73.29it/s] Loading weights: 39%|βββββββββββββββββββββββββββββββββββββββ | 181/464 [00:03<00:04, 65.58it/s] Loading weights: 41%|βββββββββββββββββββββββββββββββββββββββββ | 191/464 [00:03<00:03, 71.97it/s] Loading weights: 43%|βββββββββββββββββββββββββββββββββββββββββββ | 201/464 [00:03<00:03, 78.37it/s] Loading weights: 45%|βββββββββββββββββββββββββββββββββββββββββββββ | 210/464 [00:03<00:03, 79.61it/s] Loading weights: 47%|βββββββββββββββββββββββββββββββββββββββββββββββ | 219/464 [00:03<00:03, 73.91it/s] Loading weights: 49%|ββββββββββββββββββββββββββββββββββββββββββββββββ | 227/464 [00:03<00:03, 66.76it/s] Loading weights: 51%|ββββββββββββββββββββββββββββββββββββββββββββββββββ | 236/464 [00:03<00:03, 70.23it/s] Loading weights: 53%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 246/464 [00:03<00:02, 76.71it/s] Loading weights: 55%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 257/464 [00:04<00:02, 74.72it/s] Loading weights: 58%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 268/464 [00:04<00:02, 80.12it/s] Loading weights: 60%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 279/464 [00:04<00:02, 82.81it/s] Loading weights: 62%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 290/464 [00:04<00:01, 87.59it/s] Loading weights: 65%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 300/464 [00:04<00:02, 79.91it/s] Loading weights: 67%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 311/464 [00:04<00:01, 82.72it/s] Loading weights: 69%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 320/464 [00:04<00:01, 82.70it/s] Loading weights: 71%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 329/464 [00:04<00:01, 77.08it/s] Loading weights: 73%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 337/464 [00:05<00:01, 67.87it/s] Loading weights: 75%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 346/464 [00:05<00:01, 71.66it/s] Loading weights: 77%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 357/464 [00:05<00:01, 75.78it/s] Loading weights: 79%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 367/464 [00:05<00:01, 73.92it/s] Loading weights: 81%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 377/464 [00:05<00:01, 79.92it/s] Loading weights: 83%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 386/464 [00:05<00:00, 80.21it/s] Loading weights: 85%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 395/464 [00:05<00:00, 78.71it/s] Loading weights: 87%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 403/464 [00:05<00:00, 74.65it/s] Loading weights: 89%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 412/464 [00:06<00:00, 70.21it/s] Loading weights: 91%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 422/464 [00:06<00:00, 77.12it/s] Loading weights: 93%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 432/464 [00:06<00:00, 82.76it/s] Loading weights: 95%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 441/464 [00:06<00:00, 83.47it/s] Loading weights: 97%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 450/464 [00:06<00:00, 76.74it/s] Loading weights: 99%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 458/464 [00:06<00:00, 64.60it/s] Loading weights: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 464/464 [00:06<00:00, 69.47it/s] | |
| [INFO] Parameters: 5,133,925,888 total, 0 trainable (0.00%) | |
| ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| PERPLEXITY EVALUATION | |
| ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| Generating train split: 0 examples [00:00, ? examples/s] Generating train split: 1626 examples [00:00, 12650.22 examples/s] Generating train split: 3837 examples [00:00, 17454.85 examples/s] Generating train split: 6513 examples [00:00, 19332.65 examples/s] Generating train split: 9104 examples [00:00, 19837.70 examples/s] Generating train split: 12367 examples [00:00, 23008.83 examples/s] Generating train split: 20270 examples [00:00, 32282.48 examples/s] Generating train split: 28798 examples [00:00, 37387.21 examples/s] Generating train split: 33588 examples [00:01, 39010.99 examples/s] Generating train split: 44566 examples [00:01, 47870.34 examples/s] Generating train split: 57845 examples [00:01, 57449.85 examples/s] Generating train split: 61879 examples [00:01, 40853.87 examples/s] | |
| Generating train split: 0 examples [00:00, ? examples/s] Generating train split: 1150 examples [00:00, 9981.38 examples/s] Generating train split: 2295 examples [00:00, 9220.59 examples/s] Generating train split: 4576 examples [00:00, 11206.99 examples/s] Generating train split: 5707 examples [00:00, 11070.60 examples/s] Generating train split: 6848 examples [00:00, 10666.74 examples/s] Generating train split: 7985 examples [00:00, 10127.30 examples/s] Generating train split: 10269 examples [00:00, 10519.59 examples/s] Generating train split: 11415 examples [00:01, 10046.33 examples/s] Generating train split: 13709 examples [00:01, 10471.00 examples/s] Generating train split: 15803 examples [00:01, 12140.60 examples/s] Generating train split: 18451 examples [00:01, 14576.53 examples/s] Generating train split: 21242 examples [00:01, 16573.13 examples/s] Generating train split: 21242 examples [00:01, 12582.24 examples/s] | |
| Tokenizing (num_proc=30): 0%| | 0/100481 [00:00<?, ? examples/s] Tokenizing (num_proc=30): 1%|β | 1000/100481 [00:08<14:14, 116.48 examples/s] Tokenizing (num_proc=30): 2%|ββ | 2000/100481 [00:09<07:06, 230.65 examples/s] Tokenizing (num_proc=30): 3%|βββ | 3000/100481 [00:11<04:50, 335.84 examples/s] Tokenizing (num_proc=30): 3%|βββ | 3350/100481 [00:11<04:26, 364.98 examples/s] Tokenizing (num_proc=30): 4%|ββββ | 4350/100481 [00:14<04:09, 385.85 examples/s] Tokenizing (num_proc=30): 5%|ββββ | 5350/100481 [00:15<03:26, 459.91 examples/s] Tokenizing (num_proc=30): 6%|βββββ | 6350/100481 [00:17<03:00, 521.49 examples/s] Tokenizing (num_proc=30): 7%|ββββββ | 6700/100481 [00:17<02:55, 534.18 examples/s] Tokenizing (num_proc=30): 8%|ββββββ | 7700/100481 [00:19<03:02, 509.40 examples/s] Tokenizing (num_proc=30): 9%|βββββββ | 8700/100481 [00:21<02:42, 563.23 examples/s] Tokenizing (num_proc=30): 10%|ββββββββ | 9700/100481 [00:22<02:32, 593.74 examples/s] Tokenizing (num_proc=30): 10%|ββββββββ | 10050/100481 [00:23<02:30, 600.86 examples/s] Tokenizing (num_proc=30): 11%|βββββββββ | 11050/100481 [00:25<02:43, 546.76 examples/s] Tokenizing (num_proc=30): 12%|βββββββββ | 12050/100481 [00:26<02:31, 583.37 examples/s] Tokenizing (num_proc=30): 13%|ββββββββββ | 13050/100481 [00:28<02:37, 554.74 examples/s] Tokenizing (num_proc=30): 13%|βββββββββββ | 13400/100481 [00:29<02:36, 555.64 examples/s] Tokenizing (num_proc=30): 14%|βββββββββββ | 14400/100481 [00:30<02:12, 652.07 examples/s] Tokenizing (num_proc=30): 15%|ββββββββββββ | 15400/100481 [00:32<02:05, 676.18 examples/s] Tokenizing (num_proc=30): 16%|βββββββββββββ | 16400/100481 [00:33<02:04, 677.25 examples/s] Tokenizing (num_proc=30): 17%|βββββββββββββ | 16750/100481 [00:33<02:01, 689.11 examples/s] Tokenizing (num_proc=30): 18%|ββββββββββββββ | 17750/100481 [00:36<02:37, 525.11 examples/s] Tokenizing (num_proc=30): 19%|βββββββββββββββ | 18750/100481 [00:38<02:24, 567.00 examples/s] Tokenizing (num_proc=30): 20%|βββββββββββββββ | 19750/100481 [00:39<02:15, 595.05 examples/s] Tokenizing (num_proc=30): 20%|ββββββββββββββββ | 20100/100481 [00:40<02:13, 600.79 examples/s] Tokenizing (num_proc=30): 21%|ββββββββββββββββ | 21100/100481 [00:41<02:12, 600.73 examples/s] Tokenizing (num_proc=30): 22%|βββββββββββββββββ | 22100/100481 [00:43<02:02, 641.71 examples/s] Tokenizing (num_proc=30): 23%|ββββββββββββββββββ | 23100/100481 [00:44<01:55, 671.69 examples/s] Tokenizing (num_proc=30): 23%|ββββββββββββββββββ | 23450/100481 [00:45<01:55, 667.12 examples/s] Tokenizing (num_proc=30): 24%|βββββββββββββββββββ | 24450/100481 [00:47<02:21, 539.00 examples/s] Tokenizing (num_proc=30): 25%|ββββββββββββββββββββ | 25450/100481 [00:49<02:08, 582.32 examples/s] Tokenizing (num_proc=30): 26%|ββββββββββββββββββββ | 26450/100481 [00:50<02:00, 614.55 examples/s] Tokenizing (num_proc=30): 27%|βββββββββββββββββββββ | 26800/100481 [00:51<01:58, 619.45 examples/s] Tokenizing (num_proc=30): 28%|βββββββββββββββββββββ | 27800/100481 [00:52<02:03, 587.41 examples/s] Tokenizing (num_proc=30): 29%|ββββββββββββββββββββββ | 28800/100481 [00:54<01:57, 608.63 examples/s] Tokenizing (num_proc=30): 30%|βββββββββββββββββββββββ | 29800/100481 [00:55<01:51, 636.37 examples/s] Tokenizing (num_proc=30): 30%|βββββββββββββββββββββββ | 30150/100481 [00:56<01:50, 634.21 examples/s] Tokenizing (num_proc=30): 31%|ββββββββββββββββββββββββ | 31150/100481 [00:58<01:55, 601.15 examples/s] Tokenizing (num_proc=30): 32%|βββββββββββββββββββββββββ | 32150/100481 [00:59<01:50, 618.03 examples/s] Tokenizing (num_proc=30): 33%|βββββββββββββββββββββββββ | 33150/100481 [01:01<01:45, 640.17 examples/s] Tokenizing (num_proc=30): 33%|ββββββββββββββββββββββββββ | 33500/100481 [01:01<01:45, 636.51 examples/s] Tokenizing (num_proc=30): 34%|ββββββββββββββββββββββββββ | 34500/100481 [01:03<01:55, 572.34 examples/s] Tokenizing (num_proc=30): 35%|βββββββββββββββββββββββββββ | 35500/100481 [01:05<01:48, 599.52 examples/s] Tokenizing (num_proc=30): 36%|ββββββββββββββββββββββββββββ | 36500/100481 [01:06<01:41, 628.96 examples/s] Tokenizing (num_proc=30): 37%|ββββββββββββββββββββββββββββ | 36850/100481 [01:07<01:41, 629.46 examples/s] Tokenizing (num_proc=30): 38%|βββββββββββββββββββββββββββββ | 37850/100481 [01:09<01:42, 608.71 examples/s] Tokenizing (num_proc=30): 39%|ββββββββββββββββββββββββββββββ | 38850/100481 [01:10<01:37, 630.41 examples/s] Tokenizing (num_proc=30): 40%|βββββββββββββββββββββββββββββββ | 39850/100481 [01:11<01:33, 649.93 examples/s] Tokenizing (num_proc=30): 40%|βββββββββββββββββββββββββββββββ | 40199/100481 [01:12<01:34, 635.41 examples/s] Tokenizing (num_proc=30): 41%|ββββββββββββββββββββββββββββββββ | 41199/100481 [01:14<01:31, 650.62 examples/s] Tokenizing (num_proc=30): 42%|ββββββββββββββββββββββββββββββββ | 42199/100481 [01:15<01:26, 672.92 examples/s] Tokenizing (num_proc=30): 43%|βββββββββββββββββββββββββββββββββ | 43199/100481 [01:16<01:23, 686.47 examples/s] Tokenizing (num_proc=30): 43%|βββββββββββββββββββββββββββββββββ | 43548/100481 [01:17<01:24, 676.44 examples/s] Tokenizing (num_proc=30): 44%|ββββββββββββββββββββββββββββββββββ | 44548/100481 [01:20<01:46, 526.19 examples/s] Tokenizing (num_proc=30): 45%|βββββββββββββββββββββββββββββββββββ | 45548/100481 [01:21<01:34, 578.26 examples/s] Tokenizing (num_proc=30): 46%|ββββββββββββββββββββββββββββββββββββ | 46548/100481 [01:22<01:28, 607.46 examples/s] Tokenizing (num_proc=30): 47%|ββββββββββββββββββββββββββββββββββββ | 46897/100481 [01:23<01:27, 611.31 examples/s] Tokenizing (num_proc=30): 48%|βββββββββββββββββββββββββββββββββββββ | 47897/100481 [01:25<01:33, 559.64 examples/s] Tokenizing (num_proc=30): 49%|βββββββββββββββββββββββββββββββββββββ | 48897/100481 [01:27<01:26, 596.87 examples/s] Tokenizing (num_proc=30): 50%|ββββββββββββββββββββββββββββββββββββββ | 49897/100481 [01:28<01:21, 620.47 examples/s] Tokenizing (num_proc=30): 50%|ββββββββββββββββββββββββββββββββββββββ | 50246/100481 [01:29<01:22, 610.15 examples/s] Tokenizing (num_proc=30): 51%|βββββββββββββββββββββββββββββββββββββββ | 51246/100481 [01:31<01:27, 560.92 examples/s] Tokenizing (num_proc=30): 52%|ββββββββββββββββββββββββββββββββββββββββ | 52246/100481 [01:32<01:21, 593.40 examples/s] Tokenizing (num_proc=30): 53%|βββββββββββββββββββββββββββββββββββββββββ | 53246/100481 [01:34<01:15, 628.86 examples/s] Tokenizing (num_proc=30): 53%|βββββββββββββββββββββββββββββββββββββββββ | 53595/100481 [01:34<01:15, 617.70 examples/s] Tokenizing (num_proc=30): 54%|ββββββββββββββββββββββββββββββββββββββββββ | 54595/100481 [01:36<01:19, 574.15 examples/s] Tokenizing (num_proc=30): 55%|ββββββββββββββββββββββββββββββββββββββββββ | 55595/100481 [01:38<01:14, 601.26 examples/s] Tokenizing (num_proc=30): 56%|βββββββββββββββββββββββββββββββββββββββββββ | 56595/100481 [01:39<01:10, 626.25 examples/s] Tokenizing (num_proc=30): 57%|βββββββββββββββββββββββββββββββββββββββββββ | 56944/100481 [01:40<01:09, 628.50 examples/s] Tokenizing (num_proc=30): 58%|ββββββββββββββββββββββββββββββββββββββββββββ | 57944/100481 [01:42<01:17, 545.54 examples/s] Tokenizing (num_proc=30): 59%|βββββββββββββββββββββββββββββββββββββββββββββ | 58944/100481 [01:43<01:11, 580.83 examples/s] Tokenizing (num_proc=30): 60%|ββββββββββββββββββββββββββββββββββββββββββββββ | 59944/100481 [01:45<01:06, 611.76 examples/s] Tokenizing (num_proc=30): 60%|ββββββββββββββββββββββββββββββββββββββββββββββ | 60293/100481 [01:45<01:06, 607.46 examples/s] Tokenizing (num_proc=30): 61%|βββββββββββββββββββββββββββββββββββββββββββββββ | 61293/100481 [01:48<01:12, 537.71 examples/s] Tokenizing (num_proc=30): 62%|βββββββββββββββββββββββββββββββββββββββββββββββ | 62293/100481 [01:49<01:05, 581.09 examples/s] Tokenizing (num_proc=30): 63%|ββββββββββββββββββββββββββββββββββββββββββββββββ | 63293/100481 [01:51<01:00, 617.45 examples/s] Tokenizing (num_proc=30): 63%|βββββββββββββββββββββββββββββββββββββββββββββββββ | 63642/100481 [01:51<00:59, 619.16 examples/s] Tokenizing (num_proc=30): 64%|βββββββββββββββββββββββββββββββββββββββββββββββββ | 64642/100481 [01:53<01:05, 546.46 examples/s] Tokenizing (num_proc=30): 65%|ββββββββββββββββββββββββββββββββββββββββββββββββββ | 65642/100481 [01:55<00:59, 584.87 examples/s] Tokenizing (num_proc=30): 66%|βββββββββββββββββββββββββββββββββββββββββββββββββββ | 66642/100481 [01:56<00:55, 615.10 examples/s] Tokenizing (num_proc=30): 67%|βββββββββββββββββββββββββββββββββββββββββββββββββββ | 66991/100481 [01:57<00:54, 618.50 examples/s] Tokenizing (num_proc=30): 68%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 67991/100481 [01:59<00:56, 571.14 examples/s] Tokenizing (num_proc=30): 69%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ | 68991/100481 [02:00<00:52, 600.98 examples/s] Tokenizing (num_proc=30): 70%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ | 69991/100481 [02:02<00:48, 624.76 examples/s] Tokenizing (num_proc=30): 70%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 70340/100481 [02:02<00:48, 623.43 examples/s] Tokenizing (num_proc=30): 71%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 71340/100481 [02:04<00:48, 594.87 examples/s] Tokenizing (num_proc=30): 72%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 72340/100481 [02:06<00:44, 626.13 examples/s] Tokenizing (num_proc=30): 73%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 73340/100481 [02:07<00:42, 637.74 examples/s] Tokenizing (num_proc=30): 73%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 73689/100481 [02:08<00:41, 641.67 examples/s] Tokenizing (num_proc=30): 74%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 74689/100481 [02:10<00:44, 584.33 examples/s] Tokenizing (num_proc=30): 75%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 75689/100481 [02:11<00:40, 619.31 examples/s] Tokenizing (num_proc=30): 76%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 76689/100481 [02:12<00:36, 647.50 examples/s] Tokenizing (num_proc=30): 77%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 77038/100481 [02:13<00:35, 655.76 examples/s] Tokenizing (num_proc=30): 78%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 78038/100481 [02:15<00:39, 574.99 examples/s] Tokenizing (num_proc=30): 79%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 79038/100481 [02:16<00:35, 610.17 examples/s] Tokenizing (num_proc=30): 80%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 80038/100481 [02:18<00:32, 634.16 examples/s] Tokenizing (num_proc=30): 80%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 80387/100481 [02:18<00:32, 627.17 examples/s] Tokenizing (num_proc=30): 81%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 81387/100481 [02:20<00:32, 585.98 examples/s] Tokenizing (num_proc=30): 82%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 82387/100481 [02:22<00:29, 615.09 examples/s] Tokenizing (num_proc=30): 83%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 83387/100481 [02:23<00:26, 642.68 examples/s] Tokenizing (num_proc=30): 83%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 83736/100481 [02:24<00:26, 641.17 examples/s] Tokenizing (num_proc=30): 84%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 84736/100481 [02:26<00:26, 590.63 examples/s] Tokenizing (num_proc=30): 85%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 85736/100481 [02:27<00:23, 618.23 examples/s] Tokenizing (num_proc=30): 86%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 86736/100481 [02:29<00:21, 636.20 examples/s] Tokenizing (num_proc=30): 87%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 87085/100481 [02:29<00:20, 638.34 examples/s] Tokenizing (num_proc=30): 88%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 88085/100481 [02:31<00:20, 593.25 examples/s] Tokenizing (num_proc=30): 89%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 89085/100481 [02:33<00:18, 617.03 examples/s] Tokenizing (num_proc=30): 90%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 90085/100481 [02:34<00:16, 640.25 examples/s] Tokenizing (num_proc=30): 90%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 90434/100481 [02:35<00:15, 633.80 examples/s] Tokenizing (num_proc=30): 91%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 91434/100481 [02:36<00:14, 637.55 examples/s] Tokenizing (num_proc=30): 92%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 92434/100481 [02:38<00:11, 673.26 examples/s] Tokenizing (num_proc=30): 93%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 93434/100481 [02:39<00:10, 693.92 examples/s] Tokenizing (num_proc=30): 93%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 93783/100481 [02:39<00:09, 692.18 examples/s] Tokenizing (num_proc=30): 94%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 94783/100481 [02:42<00:10, 527.05 examples/s] Tokenizing (num_proc=30): 95%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 95783/100481 [02:44<00:08, 568.73 examples/s] Tokenizing (num_proc=30): 96%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 96783/100481 [02:45<00:06, 603.90 examples/s] Tokenizing (num_proc=30): 97%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 97132/100481 [02:46<00:05, 608.98 examples/s] Tokenizing (num_proc=30): 98%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 98132/100481 [02:47<00:03, 595.22 examples/s] Tokenizing (num_proc=30): 99%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 99132/100481 [02:49<00:02, 622.76 examples/s] Tokenizing (num_proc=30): 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 100132/100481 [02:50<00:00, 636.52 examples/s] Tokenizing (num_proc=30): 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 100481/100481 [02:51<00:00, 638.03 examples/s] Tokenizing (num_proc=30): 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 100481/100481 [02:52<00:00, 582.35 examples/s] | |
| Val set size: 10,049 | |
| Filter: 0%| | 0/10049 [00:00<?, ? examples/s] Filter: 10%|ββββββββββ | 1000/10049 [00:00<00:02, 3733.69 examples/s] Filter: 20%|βββββββββββββββββββ | 2000/10049 [00:00<00:02, 3973.66 examples/s] Filter: 30%|βββββββββββββββββββββββββββββ | 3000/10049 [00:00<00:01, 4100.03 examples/s] Filter: 40%|ββββββββββββββββββββββββββββββββββββββ | 4000/10049 [00:00<00:01, 4157.50 examples/s] Filter: 50%|ββββββββββββββββββββββββββββββββββββββββββββββββ | 5000/10049 [00:01<00:01, 4194.84 examples/s] Filter: 60%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 6000/10049 [00:01<00:00, 4175.02 examples/s] Filter: 70%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 7000/10049 [00:01<00:00, 4203.24 examples/s] Filter: 80%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 8000/10049 [00:01<00:00, 4235.79 examples/s] Filter: 90%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 9000/10049 [00:02<00:00, 4243.84 examples/s] Filter: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 10000/10049 [00:02<00:00, 4261.66 examples/s] Filter: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 10049/10049 [00:02<00:00, 4182.27 examples/s] | |
| [ky] loss=1.6071 ppl=4.99 (500 samples, 128,000 tokens) | |
| Filter: 0%| | 0/10049 [00:00<?, ? examples/s] Filter: 10%|ββββββββββ | 1000/10049 [00:00<00:02, 4025.80 examples/s] Filter: 20%|βββββββββββββββββββ | 2000/10049 [00:00<00:01, 4145.04 examples/s] Filter: 30%|βββββββββββββββββββββββββββββ | 3000/10049 [00:00<00:01, 4203.59 examples/s] Filter: 40%|ββββββββββββββββββββββββββββββββββββββ | 4000/10049 [00:00<00:01, 4221.11 examples/s] Filter: 50%|ββββββββββββββββββββββββββββββββββββββββββββββββ | 5000/10049 [00:01<00:01, 4236.54 examples/s] Filter: 60%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 6000/10049 [00:01<00:00, 4175.64 examples/s] Filter: 70%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 7000/10049 [00:01<00:00, 4195.01 examples/s] Filter: 80%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 8000/10049 [00:01<00:00, 4202.56 examples/s] Filter: 90%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 9000/10049 [00:02<00:00, 4228.32 examples/s] Filter: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 10000/10049 [00:02<00:00, 4235.06 examples/s] Filter: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 10049/10049 [00:02<00:00, 4202.59 examples/s] | |
| [kz] loss=3.7835 ppl=43.97 (500 samples, 113,336 tokens) | |
| Filter: 0%| | 0/10049 [00:00<?, ? examples/s] Filter: 10%|ββββββββββ | 1000/10049 [00:00<00:02, 3937.82 examples/s] Filter: 20%|βββββββββββββββββββ | 2000/10049 [00:00<00:01, 4119.20 examples/s] Filter: 30%|βββββββββββββββββββββββββββββ | 3000/10049 [00:00<00:01, 4182.84 examples/s] Filter: 40%|ββββββββββββββββββββββββββββββββββββββ | 4000/10049 [00:00<00:01, 4202.84 examples/s] Filter: 50%|ββββββββββββββββββββββββββββββββββββββββββββββββ | 5000/10049 [00:01<00:01, 4222.60 examples/s] Filter: 60%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 6000/10049 [00:01<00:00, 4210.93 examples/s] Filter: 70%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 7000/10049 [00:01<00:00, 4223.28 examples/s] Filter: 80%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 8000/10049 [00:01<00:00, 4234.02 examples/s] Filter: 90%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 9000/10049 [00:02<00:00, 4235.90 examples/s] Filter: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 10000/10049 [00:02<00:00, 4242.52 examples/s] Filter: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 10049/10049 [00:02<00:00, 4206.19 examples/s] | |
| [uz] loss=4.0779 ppl=59.02 (500 samples, 127,308 tokens) | |
| ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| NER EVALUATION (WikiANN, few-shot prompting) | |
| ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| [ky] Loading WikiANN (ky)... | |
| Generating validation split: 0%| | 0/100 [00:00<?, ? examples/s] Generating validation split: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 100/100 [00:00<00:00, 5555.00 examples/s] | |
| Generating test split: 0%| | 0/100 [00:00<?, ? examples/s] Generating test split: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 100/100 [00:00<00:00, 20996.72 examples/s] | |
| Generating train split: 0%| | 0/100 [00:00<?, ? examples/s] Generating train split: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 100/100 [00:00<00:00, 21993.10 examples/s] | |
| [ky] Evaluating on 100 test examples... | |
| /venv/main/lib/python3.12/site-packages/bitsandbytes/backends/cuda/ops.py:468: FutureWarning: _check_is_size will be removed in a future PyTorch release along with guard_size_oblivious. Use _check(i >= 0) instead. | |
| torch._check_is_size(blocksize) | |
| [ky] P=0.109 R=0.333 F1=0.164 (parse_fail=0/100) | |
| [kz] Loading WikiANN (kk)... | |
| Generating validation split: 0%| | 0/1000 [00:00<?, ? examples/s] Generating validation split: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1000/1000 [00:00<00:00, 128442.93 examples/s] | |
| Generating test split: 0%| | 0/1000 [00:00<?, ? examples/s] Generating test split: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1000/1000 [00:00<00:00, 156346.37 examples/s] | |
| Generating train split: 0%| | 0/1000 [00:00<?, ? examples/s] Generating train split: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1000/1000 [00:00<00:00, 148987.78 examples/s] | |
| [kz] Evaluating on 100 test examples... | |
| [kz] P=0.132 R=0.375 F1=0.195 (parse_fail=0/100) | |
| [uz] Loading WikiANN (uz)... | |
| Generating validation split: 0%| | 0/1000 [00:00<?, ? examples/s] Generating validation split: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1000/1000 [00:00<00:00, 146648.86 examples/s] | |
| Generating test split: 0%| | 0/1000 [00:00<?, ? examples/s] Generating test split: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1000/1000 [00:00<00:00, 175626.16 examples/s] | |
| Generating train split: 0%| | 0/1000 [00:00<?, ? examples/s] Generating train split: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1000/1000 [00:00<00:00, 172832.70 examples/s] | |
| [uz] Evaluating on 100 test examples... | |
| [uz] P=0.220 R=0.563 F1=0.316 (parse_fail=0/100) | |
| ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| NER EVALUATION (log-likelihood span typing) | |
| ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| [ky] Loading WikiANN (ky)... | |
| [ky] Classifying spans from 100 examples... | |
| [ky] type_acc=0.586 macro_F1=0.485 (n_spans=111) | |
| [kz] Loading WikiANN (kk)... | |
| [kz] Classifying spans from 100 examples... | |
| [kz] type_acc=0.705 macro_F1=0.543 (n_spans=112) | |
| [uz] Loading WikiANN (uz)... | |
| [uz] Classifying spans from 100 examples... | |
| [uz] type_acc=0.641 macro_F1=0.506 (n_spans=103) | |
| ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| TUMLU QA EVALUATION (log-likelihood) | |
| ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| [kz] Loading TUMLU-mini (kazakh)... | |
| Generating dev split: 0%| | 0/40 [00:00<?, ? examples/s] Generating dev split: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 40/40 [00:00<00:00, 6489.97 examples/s] | |
| Generating test split: 0%| | 0/799 [00:00<?, ? examples/s] Generating test split: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 799/799 [00:00<00:00, 141265.81 examples/s] | |
| [kz] Evaluating 794 questions (5-shot)... | |