| nohup: ignoring input |
| The following values were not passed to `accelerate launch` and had defaults used instead: |
| More than one GPU was found, enabling multi-GPU training. |
| If this was unintended please pass in `--num_processes=1`. |
| `--num_machines` was set to a value of `1` |
| `--mixed_precision` was set to a value of `'no'` |
| `--dynamo_backend` was set to a value of `'no'` |
| To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`. |
| 07:42:46 INFO Loaded 34002 PRMTrainRecord rows from /workspace/fms4navigation/datasets/Faithfulness-Critic-Dataset/train_dataset.jsonl |
| 07:42:46 INFO Loaded 512 PRMTrainRecord rows from /workspace/fms4navigation/datasets/Faithfulness-Critic-Dataset/val_dataset_512.jsonl |
| 07:42:46 INFO Label balance (train, n=34002): |
| 07:42:46 INFO overall CONSISTENT=14006 INCONSISTENT=19996 (41.2% pos) |
| 07:42:46 INFO image_to_mj CONSISTENT=27995 INCONSISTENT= 6007 (82.3% pos) |
| 07:42:46 INFO mj_to_action CONSISTENT=24708 INCONSISTENT= 9294 (72.7% pos) |
| 07:42:46 INFO action_to_waypoints CONSISTENT=22682 INCONSISTENT=11320 (66.7% pos) |
| 07:42:46 INFO mj_to_waypoints CONSISTENT=21033 INCONSISTENT=12969 (61.9% pos) |
| 07:42:46 INFO Label balance (val, n=512): |
| 07:42:46 INFO overall CONSISTENT= 209 INCONSISTENT= 303 (40.8% pos) |
| 07:42:46 INFO image_to_mj CONSISTENT= 432 INCONSISTENT= 80 (84.4% pos) |
| 07:42:46 INFO mj_to_action CONSISTENT= 364 INCONSISTENT= 148 (71.1% pos) |
| 07:42:46 INFO action_to_waypoints CONSISTENT= 324 INCONSISTENT= 188 (63.3% pos) |
| 07:42:46 INFO mj_to_waypoints CONSISTENT= 321 INCONSISTENT= 191 (62.7% pos) |
| 07:42:46 INFO Loading processor: Qwen/Qwen3-VL-4B-Instruct |
|
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]07:42:51 INFO Loading base model: Qwen/Qwen3-VL-4B-Instruct (dtype=bfloat16, attn=sdpa) |
|
Loading checkpoint shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 2/2 [00:00<00:00, 27.34it/s] |
|
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]
Loading checkpoint shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 2/2 [00:00<00:00, 26.79it/s] |
|
Loading checkpoint shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 2/2 [00:00<00:00, 26.74it/s] |
|
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]
Loading checkpoint shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 2/2 [00:00<00:00, 24.14it/s] |
|
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]
Loading checkpoint shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 2/2 [00:00<00:00, 27.16it/s] |
|
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]
Loading checkpoint shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 2/2 [00:00<00:00, 27.70it/s] |
|
Loading checkpoint shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 2/2 [00:00<00:00, 27.45it/s] |
|
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]
Loading checkpoint shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 2/2 [00:00<00:00, 27.34it/s] |
| 07:42:53 INFO LoRA enabled (r=256, alpha=256, dropout=0.05, targets=['q_proj', 'k_proj', 'v_proj', 'o_proj']). |
| trainable params: 188,743,680 || all params: 4,626,559,488 || trainable%: 4.0796 |
| 07:42:53 INFO CameraStore dataset_dir: /workspace/fms4navigation/datasets/Faithfulness-Critic-Dataset |
| 07:42:53 INFO CameraStore: dataset_dir=/workspace/fms4navigation/datasets/Faithfulness-Critic-Dataset |
| The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'eos_token_id': 151645, 'bos_token_id': None, 'pad_token_id': 151643}. |
| The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'eos_token_id': 151645, 'bos_token_id': None, 'pad_token_id': 151643}. |
| The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'eos_token_id': 151645, 'bos_token_id': None, 'pad_token_id': 151643}. |
| 07:42:56 INFO Train config β /workspace/fms4navigation/results/PRM-v2-r256/train_config.json |
| The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'eos_token_id': 151645, 'bos_token_id': None, 'pad_token_id': 151643}. |
| The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'eos_token_id': 151645, 'bos_token_id': None, 'pad_token_id': 151643}. |
| The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'eos_token_id': 151645, 'bos_token_id': None, 'pad_token_id': 151643}. |
| The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'eos_token_id': 151645, 'bos_token_id': None, 'pad_token_id': 151643}. |
| The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'eos_token_id': 151645, 'bos_token_id': None, 'pad_token_id': 151643}. |
| wandb: [wandb.login()] Loaded credentials for https://api.wandb.ai from /root/.netrc. |
| The following generation flags are not valid and may be ignored: ['temperature', 'top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details. |
| The following generation flags are not valid and may be ignored: ['temperature', 'top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details. |
| The following generation flags are not valid and may be ignored: ['temperature', 'top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details. |
| The following generation flags are not valid and may be ignored: ['temperature', 'top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details. |
| The following generation flags are not valid and may be ignored: ['temperature', 'top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details. |
| The following generation flags are not valid and may be ignored: ['temperature', 'top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details. |
| The following generation flags are not valid and may be ignored: ['temperature', 'top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details. |
| wandb: Currently logged in as: mjf-su (mjf-su-stanford-university) to https://api.wandb.ai. Use `wandb login --relogin` to force relogin |
| wandb: Tracking run with wandb version 0.27.0 |
| wandb: Run data is saved locally in /workspace/fms4navigation/wandb/run-20260515_074304-ciao5cei |
| wandb: Run `wandb offline` to turn off syncing. |
| wandb: Syncing run ruby-bird-20 |
| wandb: βοΈ View project at https://wandb.ai/mjf-su-stanford-university/huggingface |
| wandb: π View run at https://wandb.ai/mjf-su-stanford-university/huggingface/runs/ciao5cei |
| The following generation flags are not valid and may be ignored: ['temperature', 'top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details. |
| 07:47:54 INFO eval generation: 50/64 (0.17 rec/s, 288.4s elapsed, ~81s left) |
| 07:49:16 INFO eval generation: 64/64 (0.17 rec/s, 369.9s elapsed, ~0s left) |
| 07:49:29 INFO Eval @ start β n=512 |
| 07:49:29 INFO slot acc acc|C acc|I P(INC) R(INC) F1(INC) |
| 07:49:29 INFO overall 0.605 0.081 0.967 0.604 0.967 0.744 |
| 07:49:29 INFO E1 0.293 0.164 0.988 0.180 0.988 0.304 |
| 07:49:29 INFO E2 0.760 0.679 0.959 0.548 0.959 0.698 |
| 07:49:29 INFO E3 0.400 0.071 0.968 0.377 0.968 0.542 |
| 07:49:29 INFO E4 0.387 0.053 0.948 0.373 0.948 0.536 |
| 07:49:29 INFO exact_match=0.064 edge_macro_acc=0.460 malformed=0.000 |
|
0%| | 0/1063 [00:00<?, ?it/s]
0%| | 1/1063 [00:04<1:22:01, 4.63s/it]
0%|β | 2/1063 [00:07<1:03:53, 3.61s/it]
0%|β | 3/1063 [00:09<54:08, 3.06s/it]
0%|β | 4/1063 [00:12<49:51, 2.82s/it]
0%|β | 5/1063 [00:15<50:27, 2.86s/it]
1%|β | 6/1063 [00:17<48:34, 2.76s/it]
1%|β | 7/1063 [00:20<45:54, 2.61s/it]
1%|β | 8/1063 [00:22<44:07, 2.51s/it]
1%|β | 9/1063 [00:24<44:07, 2.51s/it]
1%|β | 10/1063 [00:27<42:45, 2.44s/it]
1%|ββ | 11/1063 [00:29<42:01, 2.40s/it]
1%|ββ | 12/1063 [00:31<41:18, 2.36s/it]
1%|ββ | 13/1063 [00:34<40:39, 2.32s/it]
1%|ββ | 14/1063 [00:36<40:24, 2.31s/it]
1%|ββ | 15/1063 [00:38<39:59, 2.29s/it]
2%|ββ | 16/1063 [00:40<39:57, 2.29s/it]
2%|ββ | 17/1063 [00:43<39:45, 2.28s/it]
2%|ββ | 18/1063 [00:45<39:30, 2.27s/it]
2%|ββ | 19/1063 [00:47<38:53, 2.24s/it]
2%|βββ | 20/1063 [00:49<39:00, 2.24s/it]
{'loss': 0.0936, 'grad_norm': 0.13588111102581024, 'learning_rate': 5.9375e-05, 'epoch': 0.02} |
|
2%|βββ | 20/1063 [00:49<39:00, 2.24s/it]
2%|βββ | 21/1063 [00:52<39:16, 2.26s/it]
2%|βββ | 22/1063 [00:54<38:56, 2.24s/it]
2%|βββ | 23/1063 [00:56<39:13, 2.26s/it]
2%|βββ | 24/1063 [00:59<40:44, 2.35s/it]
2%|βββ | 25/1063 [01:01<40:02, 2.31s/it]
2%|βββ | 26/1063 [01:03<39:42, 2.30s/it]
3%|βββ | 27/1063 [01:05<39:20, 2.28s/it]
3%|βββ | 28/1063 [01:08<38:50, 2.25s/it]
3%|ββββ | 29/1063 [01:10<38:37, 2.24s/it]
3%|ββββ | 30/1063 [01:12<38:29, 2.24s/it]
3%|ββββ | 31/1063 [01:15<40:42, 2.37s/it]
3%|ββββ | 32/1063 [01:17<40:04, 2.33s/it]
3%|ββββ | 33/1063 [01:19<39:25, 2.30s/it]
3%|ββββ | 34/1063 [01:21<39:28, 2.30s/it]
3%|ββββ | 35/1063 [01:24<39:06, 2.28s/it]
3%|ββββ | 36/1063 [01:26<38:59, 2.28s/it]
3%|ββββ | 37/1063 [01:28<38:38, 2.26s/it]
4%|ββββ | 38/1063 [01:31<38:52, 2.28s/it]
4%|βββββ | 39/1063 [01:33<38:56, 2.28s/it]
4%|βββββ | 40/1063 [01:35<38:49, 2.28s/it]
{'loss': 0.0348, 'grad_norm': 0.15987196564674377, 'learning_rate': 9.932104752667314e-05, 'epoch': 0.04} |
|
4%|βββββ | 40/1063 [01:35<38:49, 2.28s/it]
4%|βββββ | 41/1063 [01:38<40:10, 2.36s/it]
4%|βββββ | 42/1063 [01:40<39:24, 2.32s/it]
4%|βββββ | 43/1063 [01:43<41:04, 2.42s/it]
4%|βββββ | 44/1063 [01:45<40:19, 2.37s/it]
4%|βββββ | 45/1063 [01:47<39:15, 2.31s/it]
4%|βββββ | 46/1063 [01:50<40:42, 2.40s/it]
4%|βββββ | 47/1063 [01:52<39:32, 2.33s/it]
5%|ββββββ | 48/1063 [01:54<39:05, 2.31s/it]
5%|ββββββ | 49/1063 [01:56<38:37, 2.29s/it]
5%|ββββββ | 50/1063 [01:58<38:09, 2.26s/it]
5%|ββββββ | 51/1063 [02:01<38:04, 2.26s/it]
5%|ββββββ | 52/1063 [02:03<38:06, 2.26s/it]
5%|ββββββ | 53/1063 [02:05<38:10, 2.27s/it]
5%|ββββββ | 54/1063 [02:07<37:47, 2.25s/it]
5%|ββββββ | 55/1063 [02:10<37:50, 2.25s/it]
5%|ββββββ | 56/1063 [02:12<37:55, 2.26s/it]
5%|βββββββ | 57/1063 [02:14<37:45, 2.25s/it]
5%|βββββββ | 58/1063 [02:16<37:39, 2.25s/it]
6%|βββββββ | 59/1063 [02:19<37:25, 2.24s/it]
6%|βββββββ | 60/1063 [02:21<37:14, 2.23s/it]
{'loss': 0.0273, 'grad_norm': 0.14962074160575867, 'learning_rate': 9.73811833171678e-05, 'epoch': 0.06} |
|
6%|βββββββ | 60/1063 [02:21<37:14, 2.23s/it]
6%|βββββββ | 61/1063 [02:23<37:21, 2.24s/it]
6%|βββββββ | 62/1063 [02:25<37:33, 2.25s/it]
6%|βββββββ | 63/1063 [02:28<37:12, 2.23s/it]
6%|βββββββ | 64/1063 [02:30<37:18, 2.24s/it]
6%|βββββββ | 65/1063 [02:32<37:19, 2.24s/it]
6%|ββββββββ | 66/1063 [02:34<37:16, 2.24s/it]
6%|ββββββββ | 67/1063 [02:37<37:07, 2.24s/it]
6%|ββββββββ | 68/1063 [02:39<37:20, 2.25s/it]
6%|ββββββββ | 69/1063 [02:41<37:22, 2.26s/it]
7%|ββββββββ | 70/1063 [02:43<37:35, 2.27s/it]
7%|ββββββββ | 71/1063 [02:46<37:40, 2.28s/it]
7%|ββββββββ | 72/1063 [02:48<37:19, 2.26s/it]
7%|ββββββββ | 73/1063 [02:50<36:56, 2.24s/it]
7%|ββββββββ | 74/1063 [02:52<37:08, 2.25s/it]
7%|ββββββββ | 75/1063 [02:55<37:10, 2.26s/it]
7%|βββββββββ | 76/1063 [02:57<36:55, 2.24s/it]
7%|βββββββββ | 77/1063 [02:59<37:08, 2.26s/it]
7%|βββββββββ | 78/1063 [03:01<36:46, 2.24s/it]
7%|βββββββββ | 79/1063 [03:04<36:49, 2.25s/it]
8%|βββββββββ | 80/1063 [03:06<36:36, 2.23s/it]
{'loss': 0.0248, 'grad_norm': 0.1912640929222107, 'learning_rate': 9.544131910766246e-05, 'epoch': 0.08} |
|
8%|βββββββββ | 80/1063 [03:06<36:36, 2.23s/it]
8%|βββββββββ | 81/1063 [03:08<36:42, 2.24s/it]
8%|βββββββββ | 82/1063 [03:10<36:43, 2.25s/it]
8%|βββββββββ | 83/1063 [03:13<36:49, 2.25s/it]
8%|βββββββββ | 84/1063 [03:15<36:48, 2.26s/it]
8%|ββββββββββ | 85/1063 [03:17<36:47, 2.26s/it]
8%|ββββββββββ | 86/1063 [03:19<36:24, 2.24s/it]
8%|ββββββββββ | 87/1063 [03:22<36:36, 2.25s/it]
8%|ββββββββββ | 88/1063 [03:24<36:25, 2.24s/it]
8%|ββββββββββ | 89/1063 [03:26<36:35, 2.25s/it]
8%|ββββββββββ | 90/1063 [03:28<36:31, 2.25s/it]
9%|ββββββββββ | 91/1063 [03:31<36:24, 2.25s/it]
9%|ββββββββββ | 92/1063 [03:33<36:25, 2.25s/it]
9%|ββββββββββ | 93/1063 [03:35<36:31, 2.26s/it]
9%|βββββββββββ | 94/1063 [03:37<36:27, 2.26s/it]
9%|βββββββββββ | 95/1063 [03:40<36:24, 2.26s/it]
9%|βββββββββββ | 96/1063 [03:42<36:09, 2.24s/it]
9%|βββββββββββ | 97/1063 [03:44<35:55, 2.23s/it]
9%|βββββββββββ | 98/1063 [03:46<35:52, 2.23s/it]
9%|βββββββββββ | 99/1063 [03:49<35:43, 2.22s/it]07:58:07 INFO eval generation: 50/64 (0.17 rec/s, 286.9s elapsed, ~80s left) |
| 07:59:28 INFO eval generation: 64/64 (0.17 rec/s, 367.4s elapsed, ~0s left) |
| 07:59:43 INFO Eval @ step_100 β n=512 |
| 07:59:43 INFO slot acc acc|C acc|I P(INC) R(INC) F1(INC) |
| 07:59:43 INFO overall 0.807 0.833 0.789 0.872 0.789 0.828 |
| 07:59:43 INFO E1 0.816 0.882 0.463 0.420 0.463 0.440 |
| 07:59:43 INFO E2 0.932 0.975 0.824 0.931 0.824 0.875 |
| 07:59:43 INFO E3 0.773 0.818 0.697 0.689 0.697 0.693 |
| 07:59:43 INFO E4 0.805 0.863 0.707 0.754 0.707 0.730 |
| 07:59:43 INFO exact_match=0.576 edge_macro_acc=0.832 malformed=0.000 |
|
9%|βββββββββββ | 100/1063 [10:14<31:19:35, 117.11s/it]
{'loss': 0.0248, 'grad_norm': 0.10156096518039703, 'learning_rate': 9.350145489815713e-05, 'epoch': 0.09} |
|
9%|βββββββββββ | 100/1063 [10:14<31:19:35, 117.11s/it]
10%|βββββββββββ | 101/1063 [10:16<22:05:10, 82.65s/it]
10%|βββββββββββ | 102/1063 [10:18<15:37:32, 58.54s/it]
10%|βββββββββββ | 103/1063 [10:20<11:06:20, 41.65s/it]
10%|βββββββββββ | 104/1063 [10:23<7:56:29, 29.81s/it]
10%|βββββββββββ | 105/1063 [10:25<5:43:54, 21.54s/it]
10%|ββββββββββββ | 106/1063 [10:27<4:11:00, 15.74s/it]
10%|ββββββββββββ | 107/1063 [10:29<3:06:14, 11.69s/it]
10%|ββββββββββββ | 108/1063 [10:32<2:20:52, 8.85s/it]
10%|ββββββββββββ | 109/1063 [10:34<1:49:02, 6.86s/it]
10%|ββββββββββββ | 110/1063 [10:36<1:26:56, 5.47s/it]
10%|ββββββββββββ | 111/1063 [10:38<1:11:10, 4.49s/it]
11%|ββββββββββββ | 112/1063 [10:40<1:00:31, 3.82s/it]
11%|ββββββββββββ | 113/1063 [10:43<53:07, 3.36s/it]
11%|βββββββββββββ | 114/1063 [10:45<47:47, 3.02s/it]
11%|βββββββββββββ | 115/1063 [10:47<44:11, 2.80s/it]
11%|βββββββββββββ | 116/1063 [10:50<41:53, 2.65s/it]
11%|βββββββββββββ | 117/1063 [10:52<40:13, 2.55s/it]
11%|βββββββββββββ | 118/1063 [10:54<40:29, 2.57s/it]
11%|βββββββββββββ | 119/1063 [10:57<39:02, 2.48s/it]
11%|βββββββββββββ | 120/1063 [10:59<38:04, 2.42s/it]
{'loss': 0.0241, 'grad_norm': 0.15984299778938293, 'learning_rate': 9.15615906886518e-05, 'epoch': 0.11} |
|
11%|βββββββββββββ | 120/1063 [10:59<38:04, 2.42s/it]
11%|βββββββββββββ | 121/1063 [11:01<37:12, 2.37s/it]
11%|βββββββββββββ | 122/1063 [11:04<36:35, 2.33s/it]
12%|ββββββββββββββ | 123/1063 [11:06<36:12, 2.31s/it]
12%|ββββββββββββββ | 124/1063 [11:08<36:01, 2.30s/it]
12%|ββββββββββββββ | 125/1063 [11:10<35:53, 2.30s/it]
12%|ββββββββββββββ | 126/1063 [11:13<35:40, 2.28s/it]
12%|ββββββββββββββ | 127/1063 [11:15<35:30, 2.28s/it]
12%|ββββββββββββββ | 128/1063 [11:17<35:30, 2.28s/it]
12%|ββββββββββββββ | 129/1063 [11:19<35:12, 2.26s/it]
12%|ββββββββββββββ | 130/1063 [11:22<35:27, 2.28s/it]
12%|ββββββββββββββ | 131/1063 [11:24<35:19, 2.27s/it]
12%|βββββββββββββββ | 132/1063 [11:26<35:19, 2.28s/it]
13%|βββββββββββββββ | 133/1063 [11:29<35:24, 2.28s/it]
13%|βββββββββββββββ | 134/1063 [11:31<35:19, 2.28s/it]
13%|βββββββββββββββ | 135/1063 [11:33<35:11, 2.28s/it]
13%|βββββββββββββββ | 136/1063 [11:35<35:09, 2.28s/it]
13%|βββββββββββββββ | 137/1063 [11:38<35:00, 2.27s/it]
13%|βββββββββββββββ | 138/1063 [11:40<34:54, 2.26s/it]
13%|βββββββββββββββ | 139/1063 [11:42<34:39, 2.25s/it]
13%|βββββββββββββββ | 140/1063 [11:44<34:43, 2.26s/it]
{'loss': 0.0222, 'grad_norm': 0.14268864691257477, 'learning_rate': 8.962172647914647e-05, 'epoch': 0.13} |
|
13%|βββββββββββββββ | 140/1063 [11:44<34:43, 2.26s/it]
13%|βββββββββββββββ | 141/1063 [11:47<34:44, 2.26s/it]
13%|ββββββββββββββββ | 142/1063 [11:49<34:49, 2.27s/it]
13%|ββββββββββββββββ | 143/1063 [11:51<35:00, 2.28s/it]
14%|ββββββββββββββββ | 144/1063 [11:54<34:54, 2.28s/it]
14%|ββββββββββββββββ | 145/1063 [11:56<34:44, 2.27s/it]
14%|ββββββββββββββββ | 146/1063 [11:58<34:58, 2.29s/it]
14%|ββββββββββββββββ | 147/1063 [12:00<35:04, 2.30s/it]
14%|ββββββββββββββββ | 148/1063 [12:03<34:37, 2.27s/it]
14%|ββββββββββββββββ | 149/1063 [12:05<34:36, 2.27s/it]
14%|ββββββββββββββββ | 150/1063 [12:07<34:42, 2.28s/it]
14%|βββββββββββββββββ | 151/1063 [12:09<34:33, 2.27s/it]
14%|βββββββββββββββββ | 152/1063 [12:12<34:33, 2.28s/it]
14%|βββββββββββββββββ | 153/1063 [12:14<34:27, 2.27s/it]
14%|βββββββββββββββββ | 154/1063 [12:16<34:25, 2.27s/it]
15%|βββββββββββββββββ | 155/1063 [12:19<34:24, 2.27s/it]
15%|βββββββββββββββββ | 156/1063 [12:21<34:16, 2.27s/it]
15%|βββββββββββββββββ | 157/1063 [12:23<34:22, 2.28s/it]
15%|βββββββββββββββββ | 158/1063 [12:25<34:06, 2.26s/it]
15%|βββββββββββββββββ | 159/1063 [12:28<34:23, 2.28s/it]
15%|ββββββββββββββββββ | 160/1063 [12:30<34:31, 2.29s/it]
{'loss': 0.022, 'grad_norm': 0.09480800479650497, 'learning_rate': 8.768186226964112e-05, 'epoch': 0.15} |
|
15%|ββββββββββββββββββ | 160/1063 [12:30<34:31, 2.29s/it]
15%|ββββββββββββββββββ | 161/1063 [12:32<34:40, 2.31s/it]
15%|ββββββββββββββββββ | 162/1063 [12:35<34:32, 2.30s/it]
15%|ββββββββββββββββββ | 163/1063 [12:37<34:14, 2.28s/it]
15%|ββββββββββββββββββ | 164/1063 [12:39<33:49, 2.26s/it]
16%|ββββββββββββββββββ | 165/1063 [12:41<34:06, 2.28s/it]
16%|ββββββββββββββββββ | 166/1063 [12:44<33:48, 2.26s/it]
16%|ββββββββββββββββββ | 167/1063 [12:46<33:45, 2.26s/it]
16%|ββββββββββββββββββ | 168/1063 [12:48<33:40, 2.26s/it]
16%|ββββββββββββββββββ | 169/1063 [12:50<33:29, 2.25s/it]
16%|βββββββββββββββββββ | 170/1063 [12:53<33:21, 2.24s/it]
16%|βββββββββββββββββββ | 171/1063 [12:55<33:26, 2.25s/it]
16%|βββββββββββββββββββ | 172/1063 [12:57<33:22, 2.25s/it]
16%|βββββββββββββββββββ | 173/1063 [12:59<33:20, 2.25s/it]
16%|βββββββββββββββββββ | 174/1063 [13:02<33:25, 2.26s/it]
16%|βββββββββββββββββββ | 175/1063 [13:04<33:05, 2.24s/it]
17%|βββββββββββββββββββ | 176/1063 [13:06<33:12, 2.25s/it]
17%|βββββββββββββββββββ | 177/1063 [13:08<33:04, 2.24s/it]
17%|βββββββββββββββββββ | 178/1063 [13:11<33:14, 2.25s/it]
17%|ββββββββββββββββββββ | 179/1063 [13:13<33:14, 2.26s/it]
17%|ββββββββββββββββββββ | 180/1063 [13:15<33:04, 2.25s/it]
{'loss': 0.0205, 'grad_norm': 0.09908359497785568, 'learning_rate': 8.57419980601358e-05, 'epoch': 0.17} |
|
17%|ββββββββββββββββββββ | 180/1063 [13:15<33:04, 2.25s/it]
17%|ββββββββββββββββββββ | 181/1063 [13:17<33:09, 2.26s/it]
17%|ββββββββββββββββββββ | 182/1063 [13:20<33:05, 2.25s/it]
17%|ββββββββββββββββββββ | 183/1063 [13:22<32:54, 2.24s/it]
17%|ββββββββββββββββββββ | 184/1063 [13:24<33:03, 2.26s/it]
17%|ββββββββββββββββββββ | 185/1063 [13:26<33:09, 2.27s/it]
17%|ββββββββββββββββββββ | 186/1063 [13:29<33:03, 2.26s/it]
18%|ββββββββββββββββββββ | 187/1063 [13:31<33:11, 2.27s/it]
18%|βββββββββββββββββββββ | 188/1063 [13:33<33:10, 2.28s/it]
18%|βββββββββββββββββββββ | 189/1063 [13:35<32:56, 2.26s/it]
18%|βββββββββββββββββββββ | 190/1063 [13:38<32:52, 2.26s/it]
18%|βββββββββββββββββββββ | 191/1063 [13:40<32:46, 2.26s/it]
18%|βββββββββββββββββββββ | 192/1063 [13:42<32:46, 2.26s/it]
18%|βββββββββββββββββββββ | 193/1063 [13:44<32:39, 2.25s/it]
18%|βββββββββββββββββββββ | 194/1063 [13:47<32:22, 2.24s/it]
18%|βββββββββββββββββββββ | 195/1063 [13:49<32:20, 2.24s/it]
18%|βββββββββββββββββββββ | 196/1063 [13:51<32:23, 2.24s/it]
19%|ββββββββββββββββββββββ | 197/1063 [13:53<32:33, 2.26s/it]
19%|ββββββββββββββββββββββ | 198/1063 [13:56<32:40, 2.27s/it]
19%|ββββββββββββββββββββββ | 199/1063 [13:58<32:44, 2.27s/it]08:08:09 INFO eval generation: 50/64 (0.18 rec/s, 279.7s elapsed, ~78s left) |
| 08:09:27 INFO eval generation: 64/64 (0.18 rec/s, 357.4s elapsed, ~0s left) |
| 08:09:48 INFO Eval @ step_200 β n=512 |
| 08:09:48 INFO slot acc acc|C acc|I P(INC) R(INC) F1(INC) |
| 08:09:48 INFO overall 0.824 0.756 0.871 0.838 0.871 0.854 |
| 08:09:48 INFO E1 0.844 0.924 0.412 0.500 0.412 0.452 |
| 08:09:48 INFO E2 0.943 0.945 0.939 0.874 0.939 0.906 |
| 08:09:48 INFO E3 0.852 0.870 0.819 0.786 0.819 0.802 |
| 08:09:48 INFO E4 0.822 0.841 0.791 0.748 0.791 0.768 |
| 08:09:48 INFO exact_match=0.637 edge_macro_acc=0.865 malformed=0.000 |
|
19%|βββββββββββββββββββββ | 200/1063 [20:18<27:44:27, 115.72s/it]
{'loss': 0.0211, 'grad_norm': 0.1302443891763687, 'learning_rate': 8.380213385063046e-05, 'epoch': 0.19} |
|
19%|βββββββββββββββββββββ | 200/1063 [20:18<27:44:27, 115.72s/it]
19%|βββββββββββββββββββββ | 201/1063 [20:21<19:33:43, 81.70s/it]
19%|βββββββββββββββββββββ | 202/1063 [20:23<13:50:23, 57.87s/it]
19%|ββββββββββββββββββββββ | 203/1063 [20:25<9:50:12, 41.18s/it]
19%|ββββββββββββββββββββββ | 204/1063 [20:27<7:02:12, 29.49s/it]
19%|ββββββββββββββββββββββ | 205/1063 [20:30<5:04:34, 21.30s/it]
19%|ββββββββββββββββββββββ | 206/1063 [20:32<3:42:22, 15.57s/it]
19%|ββββββββββββββββββββββ | 207/1063 [20:34<2:45:15, 11.58s/it]
20%|ββββββββββββββββββββββ | 208/1063 [20:36<2:05:16, 8.79s/it]
20%|ββββββββββββββββββββββ | 209/1063 [20:39<1:37:17, 6.84s/it]
20%|βββββββββββββββββββββββ | 210/1063 [20:41<1:17:35, 5.46s/it]
20%|βββββββββββββββββββββββ | 211/1063 [20:43<1:03:31, 4.47s/it]
20%|βββββββββββββββββββββββ | 212/1063 [20:45<53:53, 3.80s/it]
20%|βββββββββββββββββββββββ | 213/1063 [20:48<47:13, 3.33s/it]
20%|βββββββββββββββββββββββ | 214/1063 [20:50<42:21, 2.99s/it]
20%|βββββββββββββββββββββββ | 215/1063 [20:52<39:20, 2.78s/it]
20%|ββββββββββββββββββββββββ | 216/1063 [20:54<37:12, 2.64s/it]
20%|ββββββββββββββββββββββββ | 217/1063 [20:57<35:37, 2.53s/it]
21%|ββββββββββββββββββββββββ | 218/1063 [20:59<34:42, 2.46s/it]
21%|ββββββββββββββββββββββββ | 219/1063 [21:01<33:46, 2.40s/it]
21%|ββββββββββββββββββββββββ | 220/1063 [21:03<33:20, 2.37s/it]
{'loss': 0.0187, 'grad_norm': 0.14304408431053162, 'learning_rate': 8.186226964112513e-05, 'epoch': 0.21} |
|
21%|ββββββββββββββββββββββββ | 220/1063 [21:03<33:20, 2.37s/it]
21%|ββββββββββββββββββββββββ | 221/1063 [21:06<32:54, 2.35s/it]
21%|ββββββββββββββββββββββββ | 222/1063 [21:08<32:38, 2.33s/it]
21%|ββββββββββββββββββββββββ | 223/1063 [21:10<32:27, 2.32s/it]
21%|ββββββββββββββββββββββββ | 224/1063 [21:13<32:17, 2.31s/it]
21%|βββββββββββββββββββββββββ | 225/1063 [21:15<32:09, 2.30s/it]
21%|βββββββββββββββββββββββββ | 226/1063 [21:17<31:57, 2.29s/it]
21%|βββββββββββββββββββββββββ | 227/1063 [21:20<32:10, 2.31s/it]
21%|βββββββββββββββββββββββββ | 228/1063 [21:22<31:44, 2.28s/it]
22%|βββββββββββββββββββββββββ | 229/1063 [21:24<31:40, 2.28s/it]
22%|βββββββββββββββββββββββββ | 230/1063 [21:26<31:38, 2.28s/it]
22%|βββββββββββββββββββββββββ | 231/1063 [21:29<31:27, 2.27s/it]
22%|βββββββββββββββββββββββββ | 232/1063 [21:31<31:26, 2.27s/it]
22%|βββββββββββββββββββββββββ | 233/1063 [21:33<31:29, 2.28s/it]
22%|βββββββββββββββββββββββββ | 234/1063 [21:35<31:19, 2.27s/it]
22%|ββββββββββββββββββββββββββ | 235/1063 [21:38<31:30, 2.28s/it]
22%|ββββββββββββββββββββββββββ | 236/1063 [21:40<31:35, 2.29s/it]
22%|ββββββββββββββββββββββββββ | 237/1063 [21:42<31:38, 2.30s/it]
22%|ββββββββββββββββββββββββββ | 238/1063 [21:45<31:37, 2.30s/it]
22%|ββββββββββββββββββββββββββ | 239/1063 [21:47<31:24, 2.29s/it]
23%|ββββββββββββββββββββββββββ | 240/1063 [21:49<31:11, 2.27s/it]
{'loss': 0.0208, 'grad_norm': 0.20394913852214813, 'learning_rate': 7.99224054316198e-05, 'epoch': 0.23} |
|
23%|ββββββββββββββββββββββββββ | 240/1063 [21:49<31:11, 2.27s/it]
23%|ββββββββββββββββββββββββββ | 241/1063 [21:51<31:08, 2.27s/it]
23%|ββββββββββββββββββββββββββ | 242/1063 [21:54<31:10, 2.28s/it]
23%|ββββββββββββββββββββββββββ | 243/1063 [21:56<31:07, 2.28s/it]
23%|βββββββββββββββββββββββββββ | 244/1063 [21:58<31:06, 2.28s/it]
23%|βββββββββββββββββββββββββββ | 245/1063 [22:00<30:50, 2.26s/it]
23%|βββββββββββββββββββββββββββ | 246/1063 [22:03<30:57, 2.27s/it]
23%|βββββββββββββββββββββββββββ | 247/1063 [22:05<30:56, 2.27s/it]
23%|βββββββββββββββββββββββββββ | 248/1063 [22:07<30:38, 2.26s/it]
23%|βββββββββββββββββββββββββββ | 249/1063 [22:09<30:30, 2.25s/it]
24%|βββββββββββββββββββββββββββ | 250/1063 [22:12<30:47, 2.27s/it]
24%|βββββββββββββββββββββββββββ | 251/1063 [22:14<30:51, 2.28s/it]
24%|βββββββββββββββββββββββββββ | 252/1063 [22:16<30:31, 2.26s/it]
24%|ββββββββββββββββββββββββββββ | 253/1063 [22:19<30:16, 2.24s/it]
24%|ββββββββββββββββββββββββββββ | 254/1063 [22:21<30:11, 2.24s/it]
24%|ββββββββββββββββββββββββββββ | 255/1063 [22:23<30:27, 2.26s/it]
24%|ββββββββββββββββββββββββββββ | 256/1063 [22:25<30:31, 2.27s/it]
24%|ββββββββββββββββββββββββββββ | 257/1063 [22:28<30:23, 2.26s/it]
24%|ββββββββββββββββββββββββββββ | 258/1063 [22:30<30:20, 2.26s/it]
24%|ββββββββββββββββββββββββββββ | 259/1063 [22:32<30:21, 2.27s/it]
24%|ββββββββββββββββββββββββββββ | 260/1063 [22:34<30:18, 2.26s/it]
{'loss': 0.019, 'grad_norm': 0.11161988973617554, 'learning_rate': 7.798254122211446e-05, 'epoch': 0.24} |
|
24%|ββββββββββββββββββββββββββββ | 260/1063 [22:34<30:18, 2.26s/it]
25%|ββββββββββββββββββββββββββββ | 261/1063 [22:37<30:13, 2.26s/it]
25%|ββββββββββββββββββββββββββββ | 262/1063 [22:39<30:09, 2.26s/it]
25%|βββββββββββββββββββββββββββββ | 263/1063 [22:41<29:59, 2.25s/it]
25%|βββββββββββββββββββββββββββββ | 264/1063 [22:43<30:06, 2.26s/it]
25%|βββββββββββββββββββββββββββββ | 265/1063 [22:46<30:01, 2.26s/it]
25%|βββββββββββββββββββββββββββββ | 266/1063 [22:48<29:42, 2.24s/it]
25%|βββββββββββββββββββββββββββββ | 267/1063 [22:50<29:53, 2.25s/it]
25%|βββββββββββββββββββββββββββββ | 268/1063 [22:52<30:04, 2.27s/it]
25%|βββββββββββββββββββββββββββββ | 269/1063 [22:55<29:52, 2.26s/it]
25%|βββββββββββββββββββββββββββββ | 270/1063 [22:57<29:58, 2.27s/it]
25%|βββββββββββββββββββββββββββββ | 271/1063 [22:59<29:45, 2.25s/it]
26%|ββββββββββββββββββββββββββββββ | 272/1063 [23:01<29:42, 2.25s/it]
26%|ββββββββββββββββββββββββββββββ | 273/1063 [23:04<29:30, 2.24s/it]
26%|ββββββββββββββββββββββββββββββ | 274/1063 [23:06<29:28, 2.24s/it]
26%|ββββββββββββββββββββββββββββββ | 275/1063 [23:08<30:52, 2.35s/it]
26%|ββββββββββββββββββββββββββββββ | 276/1063 [23:11<30:24, 2.32s/it]
26%|ββββββββββββββββββββββββββββββ | 277/1063 [23:13<30:07, 2.30s/it]
26%|ββββββββββββββββββββββββββββββ | 278/1063 [23:15<30:00, 2.29s/it]
26%|ββββββββββββββββββββββββββββββ | 279/1063 [23:18<29:46, 2.28s/it]
26%|ββββββββββββββββββββββββββββββ | 280/1063 [23:20<29:40, 2.27s/it]
{'loss': 0.0204, 'grad_norm': 0.11904545873403549, 'learning_rate': 7.604267701260912e-05, 'epoch': 0.26} |
|
26%|ββββββββββββββββββββββββββββββ | 280/1063 [23:20<29:40, 2.27s/it]
26%|βββββββββββββββββββββββββββββββ | 281/1063 [23:22<29:31, 2.26s/it]
27%|βββββββββββββββββββββββββββββββ | 282/1063 [23:24<29:30, 2.27s/it]
27%|βββββββββββββββββββββββββββββββ | 283/1063 [23:27<29:24, 2.26s/it]
27%|βββββββββββββββββββββββββββββββ | 284/1063 [23:29<29:10, 2.25s/it]
27%|βββββββββββββββββββββββββββββββ | 285/1063 [23:31<29:16, 2.26s/it]
27%|βββββββββββββββββββββββββββββββ | 286/1063 [23:33<29:23, 2.27s/it]
27%|βββββββββββββββββββββββββββββββ | 287/1063 [23:36<29:11, 2.26s/it]
27%|βββββββββββββββββββββββββββββββ | 288/1063 [23:38<29:09, 2.26s/it]
27%|βββββββββββββββββββββββββββββββ | 289/1063 [23:40<29:14, 2.27s/it]
27%|βββββββββββββββββββββββββββββββ | 290/1063 [23:42<29:19, 2.28s/it]
27%|ββββββββββββββββββββββββββββββββ | 291/1063 [23:45<29:05, 2.26s/it]
27%|ββββββββββββββββββββββββββββββββ | 292/1063 [23:47<29:09, 2.27s/it]
28%|ββββββββββββββββββββββββββββββββ | 293/1063 [23:49<29:12, 2.28s/it]
28%|ββββββββββββββββββββββββββββββββ | 294/1063 [23:52<29:12, 2.28s/it]
28%|ββββββββββββββββββββββββββββββββ | 295/1063 [23:54<29:13, 2.28s/it]
28%|ββββββββββββββββββββββββββββββββ | 296/1063 [23:56<28:59, 2.27s/it]
28%|ββββββββββββββββββββββββββββββββ | 297/1063 [23:58<28:44, 2.25s/it]
28%|ββββββββββββββββββββββββββββββββ | 298/1063 [24:00<28:34, 2.24s/it]
28%|ββββββββββββββββββββββββββββββββ | 299/1063 [24:03<28:30, 2.24s/it]08:18:12 INFO eval generation: 50/64 (0.18 rec/s, 277.8s elapsed, ~78s left) |
| 08:19:31 INFO eval generation: 64/64 (0.18 rec/s, 356.5s elapsed, ~0s left) |
| 08:19:50 INFO Eval @ step_300 β n=512 |
| 08:19:50 INFO slot acc acc|C acc|I P(INC) R(INC) F1(INC) |
| 08:19:50 INFO overall 0.861 0.885 0.845 0.914 0.845 0.878 |
| 08:19:50 INFO E1 0.867 0.965 0.338 0.643 0.338 0.443 |
| 08:19:50 INFO E2 0.957 0.967 0.932 0.920 0.932 0.926 |
| 08:19:50 INFO E3 0.893 0.938 0.814 0.884 0.814 0.848 |
| 08:19:50 INFO E4 0.859 0.894 0.801 0.818 0.801 0.810 |
| 08:19:50 INFO exact_match=0.697 edge_macro_acc=0.894 malformed=0.000 |
|
28%|βββββββββββββββββββββββββββββββ | 300/1063 [30:21<24:23:06, 115.05s/it]
{'loss': 0.0212, 'grad_norm': 0.1075235903263092, 'learning_rate': 7.410281280310378e-05, 'epoch': 0.28} |
|
28%|βββββββββββββββββββββββββββββββ | 300/1063 [30:21<24:23:06, 115.05s/it]
28%|ββββββββββββββββββββββββββββββββ | 301/1063 [30:25<17:19:46, 81.87s/it]
28%|ββββββββββββββββββββββββββββββββ | 302/1063 [30:28<12:15:31, 57.99s/it]
29%|ββββββββββββββββββββββββββββββββ | 303/1063 [30:30<8:42:51, 41.28s/it]
29%|ββββββββββββββββββββββββββββββββ | 304/1063 [30:32<6:14:02, 29.57s/it]
29%|βββββββββββββββββββββββββββββββββ | 305/1063 [30:34<4:29:59, 21.37s/it]
29%|βββββββββββββββββββββββββββββββββ | 306/1063 [30:37<3:17:13, 15.63s/it]
29%|βββββββββββββββββββββββββββββββββ | 307/1063 [30:39<2:26:19, 11.61s/it]
29%|βββββββββββββββββββββββββββββββββ | 308/1063 [30:41<1:50:42, 8.80s/it]
29%|βββββββββββββββββββββββββββββββββ | 309/1063 [30:43<1:25:53, 6.84s/it]
29%|βββββββββββββββββββββββββββββββββ | 310/1063 [30:46<1:08:16, 5.44s/it]
29%|ββββββββββββββββββββββββββββββββββ | 311/1063 [30:48<56:15, 4.49s/it]
29%|ββββββββββββββββββββββββββββββββββ | 312/1063 [30:50<47:45, 3.82s/it]
29%|ββββββββββββββββββββββββββββββββββ | 313/1063 [30:52<41:38, 3.33s/it]
30%|ββββββββββββββββββββββββββββββββββ | 314/1063 [30:55<37:42, 3.02s/it]
30%|ββββββββββββββββββββββββββββββββββ | 315/1063 [30:57<34:41, 2.78s/it]
30%|ββββββββββββββββββββββββββββββββββ | 316/1063 [30:59<32:40, 2.62s/it]
30%|ββββββββββββββββββββββββββββββββββ | 317/1063 [31:01<31:12, 2.51s/it]
30%|ββββββββββββββββββββββββββββββββββ | 318/1063 [31:04<30:11, 2.43s/it]
30%|βββββββββββββββββββββββββββββββββββ | 319/1063 [31:06<29:23, 2.37s/it]
30%|βββββββββββββββββββββββββββββββββββ | 320/1063 [31:08<28:58, 2.34s/it]
{'loss': 0.0177, 'grad_norm': 0.08106118440628052, 'learning_rate': 7.216294859359845e-05, 'epoch': 0.3} |
|
30%|βββββββββββββββββββββββββββββββββββ | 320/1063 [31:08<28:58, 2.34s/it]
30%|βββββββββββββββββββββββββββββββββββ | 321/1063 [31:10<28:25, 2.30s/it]
30%|βββββββββββββββββββββββββββββββββββ | 322/1063 [31:13<28:15, 2.29s/it]
30%|βββββββββββββββββββββββββββββββββββ | 323/1063 [31:15<28:18, 2.30s/it]
30%|βββββββββββββββββββββββββββββββββββ | 324/1063 [31:17<28:04, 2.28s/it]
31%|βββββββββββββββββββββββββββββββββββ | 325/1063 [31:19<27:57, 2.27s/it]
31%|βββββββββββββββββββββββββββββββββββ | 326/1063 [31:22<27:34, 2.25s/it]
31%|βββββββββββββββββββββββββββββββββββ | 327/1063 [31:24<27:32, 2.25s/it]
31%|ββββββββββββββββββββββββββββββββββββ | 328/1063 [31:26<27:42, 2.26s/it]
31%|ββββββββββββββββββββββββββββββββββββ | 329/1063 [31:28<27:39, 2.26s/it]
31%|ββββββββββββββββββββββββββββββββββββ | 330/1063 [31:31<27:43, 2.27s/it]
31%|ββββββββββββββββββββββββββββββββββββ | 331/1063 [31:33<27:45, 2.28s/it]
31%|ββββββββββββββββββββββββββββββββββββ | 332/1063 [31:35<27:29, 2.26s/it]
31%|ββββββββββββββββββββββββββββββββββββ | 333/1063 [31:37<27:32, 2.26s/it]
31%|ββββββββββββββββββββββββββββββββββββ | 334/1063 [31:40<27:26, 2.26s/it]
32%|ββββββββββββββββββββββββββββββββββββ | 335/1063 [31:42<27:32, 2.27s/it]
32%|ββββββββββββββββββββββββββββββββββββ | 336/1063 [31:44<27:35, 2.28s/it]
32%|βββββββββββββββββββββββββββββββββββββ | 337/1063 [31:47<27:37, 2.28s/it]
32%|βββββββββββββββββββββββββββββββββββββ | 338/1063 [31:49<27:36, 2.29s/it]
32%|βββββββββββββββββββββββββββββββββββββ | 339/1063 [31:51<27:38, 2.29s/it]
32%|βββββββββββββββββββββββββββββββββββββ | 340/1063 [31:53<27:38, 2.29s/it]
{'loss': 0.0182, 'grad_norm': 0.07731106132268906, 'learning_rate': 7.022308438409312e-05, 'epoch': 0.32} |
|
32%|βββββββββββββββββββββββββββββββββββββ | 340/1063 [31:53<27:38, 2.29s/it]
32%|βββββββββββββββββββββββββββββββββββββ | 341/1063 [31:56<27:37, 2.30s/it]
32%|βββββββββββββββββββββββββββββββββββββ | 342/1063 [31:58<27:37, 2.30s/it]
32%|βββββββββββββββββββββββββββββββββββββ | 343/1063 [32:00<27:28, 2.29s/it]
32%|βββββββββββββββββββββββββββββββββββββ | 344/1063 [32:03<27:32, 2.30s/it]
32%|βββββββββββββββββββββββββββββββββββββ | 345/1063 [32:05<27:28, 2.30s/it]
33%|βββββββββββββββββββββββββββββββββββββ | 346/1063 [32:07<27:11, 2.28s/it]
33%|ββββββββββββββββββββββββββββββββββββββ | 347/1063 [32:09<26:55, 2.26s/it]
33%|ββββββββββββββββββββββββββββββββββββββ | 348/1063 [32:12<26:50, 2.25s/it]
33%|ββββββββββββββββββββββββββββββββββββββ | 349/1063 [32:14<26:49, 2.25s/it]
33%|ββββββββββββββββββββββββββββββββββββββ | 350/1063 [32:16<26:46, 2.25s/it]
33%|ββββββββββββββββββββββββββββββββββββββ | 351/1063 [32:18<26:33, 2.24s/it]
33%|ββββββββββββββββββββββββββββββββββββββ | 352/1063 [32:21<26:28, 2.23s/it]
33%|ββββββββββββββββββββββββββββββββββββββ | 353/1063 [32:23<26:27, 2.24s/it]
33%|ββββββββββββββββββββββββββββββββββββββ | 354/1063 [32:25<26:36, 2.25s/it]
33%|ββββββββββββββββββββββββββββββββββββββ | 355/1063 [32:27<26:35, 2.25s/it]
33%|βββββββββββββββββββββββββββββββββββββββ | 356/1063 [32:30<26:40, 2.26s/it]
34%|βββββββββββββββββββββββββββββββββββββββ | 357/1063 [32:32<26:27, 2.25s/it]
34%|βββββββββββββββββββββββββββββββββββββββ | 358/1063 [32:34<26:13, 2.23s/it]
34%|βββββββββββββββββββββββββββββββββββββββ | 359/1063 [32:36<26:04, 2.22s/it]
34%|βββββββββββββββββββββββββββββββββββββββ | 360/1063 [32:38<26:08, 2.23s/it]
{'loss': 0.0178, 'grad_norm': 0.1482355296611786, 'learning_rate': 6.828322017458779e-05, 'epoch': 0.34} |
|
34%|βββββββββββββββββββββββββββββββββββββββ | 360/1063 [32:39<26:08, 2.23s/it]
34%|βββββββββββββββββββββββββββββββββββββββ | 361/1063 [32:41<26:17, 2.25s/it]
34%|βββββββββββββββββββββββββββββββββββββββ | 362/1063 [32:43<26:04, 2.23s/it]
34%|βββββββββββββββββββββββββββββββββββββββ | 363/1063 [32:45<26:12, 2.25s/it]
34%|βββββββββββββββββββββββββββββββββββββββ | 364/1063 [32:48<26:11, 2.25s/it]
34%|ββββββββββββββββββββββββββββββββββββββββ | 365/1063 [32:50<26:13, 2.25s/it]
34%|ββββββββββββββββββββββββββββββββββββββββ | 366/1063 [32:52<26:06, 2.25s/it]
35%|ββββββββββββββββββββββββββββββββββββββββ | 367/1063 [32:54<26:04, 2.25s/it]
35%|ββββββββββββββββββββββββββββββββββββββββ | 368/1063 [32:57<26:04, 2.25s/it]
35%|ββββββββββββββββββββββββββββββββββββββββ | 369/1063 [32:59<26:05, 2.26s/it]
35%|ββββββββββββββββββββββββββββββββββββββββ | 370/1063 [33:01<25:55, 2.24s/it]
35%|ββββββββββββββββββββββββββββββββββββββββ | 371/1063 [33:03<25:41, 2.23s/it]
35%|ββββββββββββββββββββββββββββββββββββββββ | 372/1063 [33:05<25:46, 2.24s/it]
35%|ββββββββββββββββββββββββββββββββββββββββ | 373/1063 [33:08<25:43, 2.24s/it]
35%|ββββββββββββββββββββββββββββββββββββββββ | 374/1063 [33:10<25:40, 2.24s/it]
35%|βββββββββββββββββββββββββββββββββββββββββ | 375/1063 [33:12<25:44, 2.25s/it]
35%|βββββββββββββββββββββββββββββββββββββββββ | 376/1063 [33:14<25:46, 2.25s/it]
35%|βββββββββββββββββββββββββββββββββββββββββ | 377/1063 [33:17<25:43, 2.25s/it]
36%|βββββββββββββββββββββββββββββββββββββββββ | 378/1063 [33:19<25:39, 2.25s/it]
36%|βββββββββββββββββββββββββββββββββββββββββ | 379/1063 [33:21<25:32, 2.24s/it]
36%|βββββββββββββββββββββββββββββββββββββββββ | 380/1063 [33:23<25:22, 2.23s/it]
{'loss': 0.018, 'grad_norm': 0.07897721230983734, 'learning_rate': 6.634335596508244e-05, 'epoch': 0.36} |
|
36%|βββββββββββββββββββββββββββββββββββββββββ | 380/1063 [33:23<25:22, 2.23s/it]
36%|βββββββββββββββββββββββββββββββββββββββββ | 381/1063 [33:26<25:25, 2.24s/it]
36%|βββββββββββββββββββββββββββββββββββββββββ | 382/1063 [33:28<25:27, 2.24s/it]
36%|βββββββββββββββββββββββββββββββββββββββββ | 383/1063 [33:30<25:18, 2.23s/it]
36%|ββββββββββββββββββββββββββββββββββββββββββ | 384/1063 [33:32<25:20, 2.24s/it]
36%|ββββββββββββββββββββββββββββββββββββββββββ | 385/1063 [33:35<25:24, 2.25s/it]
36%|ββββββββββββββββββββββββββββββββββββββββββ | 386/1063 [33:37<25:23, 2.25s/it]
36%|ββββββββββββββββββββββββββββββββββββββββββ | 387/1063 [33:39<25:21, 2.25s/it]
37%|ββββββββββββββββββββββββββββββββββββββββββ | 388/1063 [33:41<25:23, 2.26s/it]
37%|ββββββββββββββββββββββββββββββββββββββββββ | 389/1063 [33:44<25:23, 2.26s/it]
37%|ββββββββββββββββββββββββββββββββββββββββββ | 390/1063 [33:46<25:19, 2.26s/it]
37%|ββββββββββββββββββββββββββββββββββββββββββ | 391/1063 [33:48<25:18, 2.26s/it]
37%|ββββββββββββββββββββββββββββββββββββββββββ | 392/1063 [33:50<25:21, 2.27s/it]
37%|βββββββββββββββββββββββββββββββββββββββββββ | 393/1063 [33:53<25:21, 2.27s/it]
37%|βββββββββββββββββββββββββββββββββββββββββββ | 394/1063 [33:55<25:11, 2.26s/it]
37%|βββββββββββββββββββββββββββββββββββββββββββ | 395/1063 [33:57<25:15, 2.27s/it]
37%|βββββββββββββββββββββββββββββββββββββββββββ | 396/1063 [34:00<25:14, 2.27s/it]
37%|βββββββββββββββββββββββββββββββββββββββββββ | 397/1063 [34:02<24:55, 2.25s/it]
37%|βββββββββββββββββββββββββββββββββββββββββββ | 398/1063 [34:04<24:56, 2.25s/it]
38%|βββββββββββββββββββββββββββββββββββββββββββ | 399/1063 [34:06<25:00, 2.26s/it]08:28:10 INFO eval generation: 50/64 (0.18 rec/s, 271.7s elapsed, ~76s left) |
| 08:29:25 INFO eval generation: 64/64 (0.18 rec/s, 347.0s elapsed, ~0s left) |
| 08:29:55 INFO Eval @ step_400 β n=512 |
| 08:29:55 INFO slot acc acc|C acc|I P(INC) R(INC) F1(INC) |
| 08:29:55 INFO overall 0.881 0.928 0.848 0.945 0.848 0.894 |
| 08:29:55 INFO E1 0.891 0.972 0.450 0.750 0.450 0.563 |
| 08:29:55 INFO E2 0.959 0.970 0.932 0.926 0.932 0.929 |
| 08:29:55 INFO E3 0.906 0.960 0.814 0.922 0.814 0.864 |
| 08:29:55 INFO E4 0.871 0.910 0.806 0.842 0.806 0.824 |
| 08:29:55 INFO exact_match=0.729 edge_macro_acc=0.907 malformed=0.000 |
|
38%|ββββββββββββββββββββββββββββββββββββββββββ | 400/1063 [40:26<21:15:25, 115.42s/it]
{'loss': 0.0168, 'grad_norm': 0.1120094284415245, 'learning_rate': 6.440349175557712e-05, 'epoch': 0.38} |
|
38%|ββββββββββββββββββββββββββββββββββββββββββ | 400/1063 [40:26<21:15:25, 115.42s/it]
38%|ββββββββββββββββββββββββββββββββββββββββββ | 401/1063 [40:28<14:58:53, 81.47s/it]
38%|ββββββββββββββββββββββββββββββββββββββββββ | 402/1063 [40:30<10:35:41, 57.70s/it]
38%|βββββββββββββββββββββββββββββββββββββββββββ | 403/1063 [40:33<7:31:47, 41.07s/it]
38%|βββββββββββββββββββββββββββββββββββββββββββ | 404/1063 [40:35<5:23:04, 29.41s/it]
38%|βββββββββββββββββββββββββββββββββββββββββββ | 405/1063 [40:37<3:52:59, 21.24s/it]
38%|βββββββββββββββββββββββββββββββββββββββββββ | 406/1063 [40:39<2:50:11, 15.54s/it]
38%|βββββββββββββββββββββββββββββββββββββββββββ | 407/1063 [40:41<2:06:20, 11.56s/it]
38%|βββββββββββββββββββββββββββββββββββββββββββ | 408/1063 [40:44<1:35:40, 8.76s/it]
38%|βββββββββββββββββββββββββββββββββββββββββββ | 409/1063 [40:46<1:14:13, 6.81s/it]
39%|ββββββββββββββββββββββββββββββββββββββββββββ | 410/1063 [40:48<59:07, 5.43s/it]
39%|ββββββββββββββββββββββββββββββββββββββββββββ | 411/1063 [40:50<48:40, 4.48s/it]
39%|βββββββββββββββββββββββββββββββββββββββββββββ | 412/1063 [40:53<41:19, 3.81s/it]
39%|βββββββββββββββββββββββββββββββββββββββββββββ | 413/1063 [40:55<36:09, 3.34s/it]
39%|βββββββββββββββββββββββββββββββββββββββββββββ | 414/1063 [40:57<32:25, 3.00s/it]
39%|βββββββββββββββββββββββββββββββββββββββββββββ | 415/1063 [40:59<30:07, 2.79s/it]
39%|βββββββββββββββββββββββββββββββββββββββββββββ | 416/1063 [41:02<28:26, 2.64s/it]
39%|βββββββββββββββββββββββββββββββββββββββββββββ | 417/1063 [41:04<27:10, 2.52s/it]
39%|βββββββββββββββββββββββββββββββββββββββββββββ | 418/1063 [41:06<26:22, 2.45s/it]
39%|βββββββββββββββββββββββββββββββββββββββββββββ | 419/1063 [41:08<25:50, 2.41s/it]
40%|βββββββββββββββββββββββββββββββββββββββββββββ | 420/1063 [41:11<25:20, 2.36s/it]
{'loss': 0.016, 'grad_norm': 0.16071178019046783, 'learning_rate': 6.246362754607178e-05, 'epoch': 0.4} |
|
40%|βββββββββββββββββββββββββββββββββββββββββββββ | 420/1063 [41:11<25:20, 2.36s/it]
40%|ββββββββββββββββββββββββββββββββββββββββββββββ | 421/1063 [41:13<26:18, 2.46s/it]
40%|ββββββββββββββββββββββββββββββββββββββββββββββ | 422/1063 [41:16<25:47, 2.41s/it]
40%|ββββββββββββββββββββββββββββββββββββββββββββββ | 423/1063 [41:18<25:12, 2.36s/it]
40%|ββββββββββββββββββββββββββββββββββββββββββββββ | 424/1063 [41:20<24:57, 2.34s/it]
40%|ββββββββββββββββββββββββββββββββββββββββββββββ | 425/1063 [41:23<24:50, 2.34s/it]
40%|ββββββββββββββββββββββββββββββββββββββββββββββ | 426/1063 [41:25<24:27, 2.30s/it]
40%|ββββββββββββββββββββββββββββββββββββββββββββββ | 427/1063 [41:27<24:24, 2.30s/it]
40%|ββββββββββββββββββββββββββββββββββββββββββββββ | 428/1063 [41:29<24:25, 2.31s/it]
40%|ββββββββββββββββββββββββββββββββββββββββββββββ | 429/1063 [41:32<24:03, 2.28s/it]
40%|ββββββββββββββββββββββββββββββββββββββββββββββ | 430/1063 [41:34<23:55, 2.27s/it]
41%|βββββββββββββββββββββββββββββββββββββββββββββββ | 431/1063 [41:36<23:59, 2.28s/it]
41%|βββββββββββββββββββββββββββββββββββββββββββββββ | 432/1063 [41:38<23:45, 2.26s/it]
41%|βββββββββββββββββββββββββββββββββββββββββββββββ | 433/1063 [41:41<23:40, 2.26s/it]
41%|βββββββββββββββββββββββββββββββββββββββββββββββ | 434/1063 [41:43<23:49, 2.27s/it]
41%|βββββββββββββββββββββββββββββββββββββββββββββββ | 435/1063 [41:45<23:49, 2.28s/it]
41%|βββββββββββββββββββββββββββββββββββββββββββββββ | 436/1063 [41:48<23:55, 2.29s/it]
41%|βββββββββββββββββββββββββββββββββββββββββββββββ | 437/1063 [41:50<23:46, 2.28s/it]
41%|βββββββββββββββββββββββββββββββββββββββββββββββ | 438/1063 [41:52<23:46, 2.28s/it]
41%|βββββββββββββββββββββββββββββββββββββββββββββββ | 439/1063 [41:54<23:47, 2.29s/it]
41%|ββββββββββββββββββββββββββββββββββββββββββββββββ | 440/1063 [41:57<23:49, 2.29s/it]
{'loss': 0.0159, 'grad_norm': 0.10165177285671234, 'learning_rate': 6.0523763336566445e-05, 'epoch': 0.41} |
|
41%|ββββββββββββββββββββββββββββββββββββββββββββββββ | 440/1063 [41:57<23:49, 2.29s/it]
41%|ββββββββββββββββββββββββββββββββββββββββββββββββ | 441/1063 [41:59<23:45, 2.29s/it]
42%|ββββββββββββββββββββββββββββββββββββββββββββββββ | 442/1063 [42:01<23:46, 2.30s/it]
42%|ββββββββββββββββββββββββββββββββββββββββββββββββ | 443/1063 [42:04<23:43, 2.30s/it]
42%|ββββββββββββββββββββββββββββββββββββββββββββββββ | 444/1063 [42:06<23:42, 2.30s/it]
42%|ββββββββββββββββββββββββββββββββββββββββββββββββ | 445/1063 [42:08<23:39, 2.30s/it]
42%|ββββββββββββββββββββββββββββββββββββββββββββββββ | 446/1063 [42:11<23:35, 2.29s/it]
42%|ββββββββββββββββββββββββββββββββββββββββββββββββ | 447/1063 [42:13<23:34, 2.30s/it]
42%|ββββββββββββββββββββββββββββββββββββββββββββββββ | 448/1063 [42:15<23:17, 2.27s/it]
42%|βββββββββββββββββββββββββββββββββββββββββββββββββ | 449/1063 [42:17<23:21, 2.28s/it]
42%|βββββββββββββββββββββββββββββββββββββββββββββββββ | 450/1063 [42:20<23:13, 2.27s/it]
42%|βββββββββββββββββββββββββββββββββββββββββββββββββ | 451/1063 [42:22<23:16, 2.28s/it]
43%|βββββββββββββββββββββββββββββββββββββββββββββββββ | 452/1063 [42:24<23:16, 2.29s/it]
43%|βββββββββββββββββββββββββββββββββββββββββββββββββ | 453/1063 [42:26<23:13, 2.28s/it]
43%|βββββββββββββββββββββββββββββββββββββββββββββββββ | 454/1063 [42:29<23:13, 2.29s/it]
43%|βββββββββββββββββββββββββββββββββββββββββββββββββ | 455/1063 [42:31<23:20, 2.30s/it]
43%|βββββββββββββββββββββββββββββββββββββββββββββββββ | 456/1063 [42:33<23:17, 2.30s/it]
43%|βββββββββββββββββββββββββββββββββββββββββββββββββ | 457/1063 [42:36<23:15, 2.30s/it]
43%|βββββββββββββββββββββββββββββββββββββββββββββββββ | 458/1063 [42:38<23:03, 2.29s/it]
43%|ββββββββββββββββββββββββββββββββββββββββββββββββββ | 459/1063 [42:40<23:00, 2.29s/it]
43%|ββββββββββββββββββββββββββββββββββββββββββββββββββ | 460/1063 [42:42<22:52, 2.28s/it]
{'loss': 0.0167, 'grad_norm': 0.08946932107210159, 'learning_rate': 5.8583899127061106e-05, 'epoch': 0.43} |
|
43%|ββββββββββββββββββββββββββββββββββββββββββββββββββ | 460/1063 [42:42<22:52, 2.28s/it]
43%|ββββββββββββββββββββββββββββββββββββββββββββββββββ | 461/1063 [42:45<22:49, 2.28s/it]
43%|ββββββββββββββββββββββββββββββββββββββββββββββββββ | 462/1063 [42:47<22:31, 2.25s/it]
44%|ββββββββββββββββββββββββββββββββββββββββββββββββββ | 463/1063 [42:49<22:28, 2.25s/it]
44%|ββββββββββββββββββββββββββββββββββββββββββββββββββ | 464/1063 [42:51<22:36, 2.26s/it]
44%|ββββββββββββββββββββββββββββββββββββββββββββββββββ | 465/1063 [42:54<22:41, 2.28s/it]
44%|ββββββββββββββββββββββββββββββββββββββββββββββββββ | 466/1063 [42:56<22:36, 2.27s/it]
44%|ββββββββββββββββββββββββββββββββββββββββββββββββββ | 467/1063 [42:58<22:30, 2.27s/it]
44%|βββββββββββββββββββββββββββββββββββββββββββββββββββ | 468/1063 [43:01<22:15, 2.25s/it]
44%|βββββββββββββββββββββββββββββββββββββββββββββββββββ | 469/1063 [43:03<22:05, 2.23s/it]
44%|βββββββββββββββββββββββββββββββββββββββββββββββββββ | 470/1063 [43:05<21:59, 2.22s/it]
44%|βββββββββββββββββββββββββββββββββββββββββββββββββββ | 471/1063 [43:07<22:01, 2.23s/it]
44%|βββββββββββββββββββββββββββββββββββββββββββββββββββ | 472/1063 [43:09<21:57, 2.23s/it]
44%|βββββββββββββββββββββββββββββββββββββββββββββββββββ | 473/1063 [43:12<22:05, 2.25s/it]
45%|βββββββββββββββββββββββββββββββββββββββββββββββββββ | 474/1063 [43:14<22:08, 2.26s/it]
45%|βββββββββββββββββββββββββββββββββββββββββββββββββββ | 475/1063 [43:16<22:12, 2.27s/it]
45%|βββββββββββββββββββββββββββββββββββββββββββββββββββ | 476/1063 [43:19<22:13, 2.27s/it]
45%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 477/1063 [43:21<22:11, 2.27s/it]
45%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 478/1063 [43:23<22:09, 2.27s/it]
45%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 479/1063 [43:25<22:06, 2.27s/it]
45%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 480/1063 [43:28<22:05, 2.27s/it]
{'loss': 0.016, 'grad_norm': 0.07830841839313507, 'learning_rate': 5.664403491755578e-05, 'epoch': 0.45} |
|
45%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 480/1063 [43:28<22:05, 2.27s/it]
45%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 481/1063 [43:30<22:10, 2.29s/it]
45%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 482/1063 [43:32<22:01, 2.28s/it]
45%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 483/1063 [43:34<21:47, 2.25s/it]
46%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 484/1063 [43:37<21:38, 2.24s/it]
46%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 485/1063 [43:39<21:50, 2.27s/it]
46%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 486/1063 [43:41<21:45, 2.26s/it]
46%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ | 487/1063 [43:44<21:55, 2.28s/it]
46%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ | 488/1063 [43:46<21:48, 2.28s/it]
46%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ | 489/1063 [43:48<21:51, 2.29s/it]
46%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ | 490/1063 [43:50<21:43, 2.27s/it]
46%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ | 491/1063 [43:53<21:42, 2.28s/it]
46%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ | 492/1063 [43:55<21:38, 2.27s/it]
46%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ | 493/1063 [43:57<21:38, 2.28s/it]
46%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ | 494/1063 [43:59<21:41, 2.29s/it]
47%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ | 495/1063 [44:02<21:41, 2.29s/it]
47%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 496/1063 [44:04<21:43, 2.30s/it]
47%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 497/1063 [44:06<21:30, 2.28s/it]
47%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 498/1063 [44:09<21:22, 2.27s/it]
47%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 499/1063 [44:11<21:24, 2.28s/it]08:38:17 INFO eval generation: 50/64 (0.18 rec/s, 274.6s elapsed, ~77s left) |
| 08:39:34 INFO eval generation: 64/64 (0.18 rec/s, 351.7s elapsed, ~0s left) |
| 08:40:00 INFO Eval @ step_500 β n=512 |
| 08:40:00 INFO slot acc acc|C acc|I P(INC) R(INC) F1(INC) |
| 08:40:00 INFO overall 0.871 0.833 0.898 0.886 0.898 0.892 |
| 08:40:00 INFO E1 0.887 0.944 0.575 0.657 0.575 0.613 |
| 08:40:00 INFO E2 0.938 0.929 0.959 0.845 0.959 0.899 |
| 08:40:00 INFO E3 0.891 0.892 0.888 0.827 0.888 0.856 |
| 08:40:00 INFO E4 0.877 0.885 0.864 0.817 0.864 0.840 |
| 08:40:00 INFO exact_match=0.715 edge_macro_acc=0.898 malformed=0.000 |
|
47%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 500/1063 [50:31<18:05:03, 115.64s/it]
{'loss': 0.0168, 'grad_norm': 0.23986610770225525, 'learning_rate': 5.470417070805044e-05, 'epoch': 0.47} |
|
47%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 500/1063 [50:31<18:05:03, 115.64s/it]
47%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ | 501/1063 [50:33<12:44:30, 81.62s/it]
47%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ | 502/1063 [50:35<9:00:26, 57.80s/it]
47%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ | 503/1063 [50:38<6:24:01, 41.15s/it]
47%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ | 504/1063 [50:40<4:34:39, 29.48s/it]
48%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 505/1063 [50:42<3:18:17, 21.32s/it]
48%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 506/1063 [50:45<2:24:51, 15.60s/it]
48%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 507/1063 [50:47<1:47:34, 11.61s/it]
48%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 508/1063 [50:49<1:21:29, 8.81s/it]
48%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 509/1063 [50:51<1:03:19, 6.86s/it]
48%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 510/1063 [50:54<50:27, 5.47s/it]
48%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 511/1063 [50:56<41:31, 4.51s/it]
48%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 512/1063 [50:58<35:10, 3.83s/it]
48%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 513/1063 [51:00<30:50, 3.36s/it]
48%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 514/1063 [51:03<27:34, 3.01s/it]
48%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 515/1063 [51:05<25:22, 2.78s/it]
49%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 516/1063 [51:07<23:46, 2.61s/it]
49%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 517/1063 [51:09<22:39, 2.49s/it]
49%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 518/1063 [51:12<22:03, 2.43s/it]
49%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 519/1063 [51:14<21:31, 2.37s/it]
49%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 520/1063 [51:16<21:05, 2.33s/it]
{'loss': 0.0169, 'grad_norm': 0.07184334099292755, 'learning_rate': 5.27643064985451e-05, 'epoch': 0.49} |
|
49%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 520/1063 [51:16<21:05, 2.33s/it]
49%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 521/1063 [51:18<20:47, 2.30s/it]
49%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 522/1063 [51:21<20:30, 2.28s/it]
49%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 523/1063 [51:23<20:17, 2.26s/it]
49%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 524/1063 [51:25<20:12, 2.25s/it]
49%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 525/1063 [51:27<20:19, 2.27s/it]
49%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 526/1063 [51:30<20:21, 2.27s/it]
50%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 527/1063 [51:32<20:21, 2.28s/it]
50%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 528/1063 [51:34<20:03, 2.25s/it]
50%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 529/1063 [51:36<19:55, 2.24s/it]
50%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 530/1063 [51:38<19:48, 2.23s/it]
50%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 531/1063 [51:41<19:48, 2.23s/it]
50%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 532/1063 [51:43<19:40, 2.22s/it]
50%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 533/1063 [51:45<19:41, 2.23s/it]
50%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 534/1063 [51:47<19:32, 2.22s/it]
50%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 535/1063 [51:50<19:37, 2.23s/it]
50%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 536/1063 [51:52<19:44, 2.25s/it]
51%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 537/1063 [51:54<19:49, 2.26s/it]
51%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 538/1063 [51:56<19:43, 2.25s/it]
51%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 539/1063 [51:59<19:40, 2.25s/it]
51%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 540/1063 [52:01<19:37, 2.25s/it]
{'loss': 0.0148, 'grad_norm': 0.0486169196665287, 'learning_rate': 5.0824442289039763e-05, 'epoch': 0.51} |
|
51%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 540/1063 [52:01<19:37, 2.25s/it]
51%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 541/1063 [52:03<19:30, 2.24s/it]
51%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 542/1063 [52:05<19:20, 2.23s/it]
51%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 543/1063 [52:08<19:30, 2.25s/it]
51%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 544/1063 [52:10<19:27, 2.25s/it]
51%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 545/1063 [52:12<19:30, 2.26s/it]
51%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 546/1063 [52:14<19:22, 2.25s/it]
51%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 547/1063 [52:17<19:11, 2.23s/it]
52%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 548/1063 [52:19<19:15, 2.24s/it]
52%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 549/1063 [52:21<19:09, 2.24s/it]
52%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 550/1063 [52:23<19:12, 2.25s/it]
52%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 551/1063 [52:26<19:03, 2.23s/it]
52%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 552/1063 [52:28<19:02, 2.24s/it]
52%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 553/1063 [52:30<18:57, 2.23s/it]
52%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 554/1063 [52:32<18:55, 2.23s/it]
52%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 555/1063 [52:34<18:44, 2.21s/it]
52%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 556/1063 [52:37<18:48, 2.23s/it]
52%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 557/1063 [52:39<18:47, 2.23s/it]
52%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 558/1063 [52:41<18:42, 2.22s/it]
53%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 559/1063 [52:43<18:45, 2.23s/it]
53%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 560/1063 [52:46<18:44, 2.24s/it]
{'loss': 0.0173, 'grad_norm': 0.11095824092626572, 'learning_rate': 4.888457807953444e-05, 'epoch': 0.53} |
|
53%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 560/1063 [52:46<18:44, 2.24s/it]
53%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 561/1063 [52:48<18:46, 2.24s/it]
53%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 562/1063 [52:50<18:45, 2.25s/it]
53%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 563/1063 [52:52<18:45, 2.25s/it]
53%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 564/1063 [52:55<18:41, 2.25s/it]
53%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 565/1063 [52:57<18:36, 2.24s/it]
53%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 566/1063 [52:59<18:35, 2.24s/it]
53%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 567/1063 [53:01<18:35, 2.25s/it]
53%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 568/1063 [53:04<18:33, 2.25s/it]
54%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 569/1063 [53:06<18:34, 2.26s/it]
54%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 570/1063 [53:08<18:27, 2.25s/it]
54%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 571/1063 [53:10<18:21, 2.24s/it]
54%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 572/1063 [53:13<18:20, 2.24s/it]
54%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 573/1063 [53:15<18:15, 2.24s/it]
54%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 574/1063 [53:17<18:19, 2.25s/it]
54%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 575/1063 [53:19<18:09, 2.23s/it]
54%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 576/1063 [53:22<18:11, 2.24s/it]
54%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 577/1063 [53:24<18:04, 2.23s/it]
54%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 578/1063 [53:26<17:52, 2.21s/it]
54%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 579/1063 [53:28<17:58, 2.23s/it]
55%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 580/1063 [53:30<18:00, 2.24s/it]
{'loss': 0.0149, 'grad_norm': 0.11410392075777054, 'learning_rate': 4.69447138700291e-05, 'epoch': 0.55} |
|
55%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 580/1063 [53:30<18:00, 2.24s/it]
55%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 581/1063 [53:33<17:49, 2.22s/it]
55%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 582/1063 [53:35<17:53, 2.23s/it]
55%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 583/1063 [53:37<17:46, 2.22s/it]
55%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 584/1063 [53:39<17:55, 2.24s/it]
55%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 585/1063 [53:42<17:57, 2.25s/it]
55%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 586/1063 [53:44<17:56, 2.26s/it]
55%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 587/1063 [53:46<17:53, 2.26s/it]
55%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 588/1063 [53:48<17:53, 2.26s/it]
55%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 589/1063 [53:51<17:53, 2.26s/it]
56%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 590/1063 [53:53<17:57, 2.28s/it]
56%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 591/1063 [53:55<17:50, 2.27s/it]
56%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 592/1063 [53:58<17:51, 2.27s/it]
56%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 593/1063 [54:00<17:41, 2.26s/it]
56%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 594/1063 [54:02<17:40, 2.26s/it]
56%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 595/1063 [54:04<17:42, 2.27s/it]
56%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 596/1063 [54:07<18:15, 2.35s/it]
56%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 597/1063 [54:09<18:08, 2.34s/it]
56%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 598/1063 [54:11<17:55, 2.31s/it]
56%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 599/1063 [54:14<17:48, 2.30s/it]08:48:21 INFO eval generation: 50/64 (0.18 rec/s, 276.0s elapsed, ~77s left) |
| 08:49:40 INFO eval generation: 64/64 (0.18 rec/s, 355.0s elapsed, ~0s left) |
| 08:50:08 INFO Eval @ step_600 β n=512 |
| 08:50:08 INFO slot acc acc|C acc|I P(INC) R(INC) F1(INC) |
| 08:50:08 INFO overall 0.867 0.947 0.812 0.957 0.812 0.879 |
| 08:50:08 INFO E1 0.889 0.972 0.438 0.745 0.438 0.551 |
| 08:50:08 INFO E2 0.967 0.978 0.939 0.946 0.939 0.942 |
| 08:50:08 INFO E3 0.889 0.969 0.750 0.934 0.750 0.832 |
| 08:50:08 INFO E4 0.891 0.956 0.780 0.914 0.780 0.842 |
| 08:50:08 INFO exact_match=0.729 edge_macro_acc=0.909 malformed=0.000 |
|
56%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 600/1063 [1:00:39<15:03:35, 117.10s/it]
{'loss': 0.0159, 'grad_norm': 0.13476960361003876, 'learning_rate': 4.500484966052377e-05, 'epoch': 0.56} |
|
56%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 600/1063 [1:00:39<15:03:35, 117.10s/it]
57%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 601/1063 [1:00:43<10:41:12, 83.27s/it]
57%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 602/1063 [1:00:45<7:33:02, 58.96s/it]
57%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 603/1063 [1:00:48<5:21:42, 41.96s/it]
57%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 604/1063 [1:00:50<3:49:48, 30.04s/it]
57%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 605/1063 [1:00:52<2:45:37, 21.70s/it]
57%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 606/1063 [1:00:54<2:00:53, 15.87s/it]
57%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 607/1063 [1:00:57<1:29:34, 11.79s/it]
57%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 608/1063 [1:00:59<1:07:40, 8.92s/it]
57%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 609/1063 [1:01:01<52:17, 6.91s/it]
57%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 610/1063 [1:01:03<41:29, 5.50s/it]
57%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 611/1063 [1:01:06<34:13, 4.54s/it]
58%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 612/1063 [1:01:08<29:05, 3.87s/it]
58%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 613/1063 [1:01:10<25:24, 3.39s/it]
58%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 614/1063 [1:01:12<22:45, 3.04s/it]
58%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 615/1063 [1:01:15<20:50, 2.79s/it]
58%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 616/1063 [1:01:17<19:30, 2.62s/it]
58%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 617/1063 [1:01:19<18:35, 2.50s/it]
58%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 618/1063 [1:01:21<17:57, 2.42s/it]
58%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 619/1063 [1:01:23<17:26, 2.36s/it]
58%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 620/1063 [1:01:26<17:09, 2.32s/it]
{'loss': 0.0158, 'grad_norm': 0.22448623180389404, 'learning_rate': 4.306498545101843e-05, 'epoch': 0.58} |
|
58%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 620/1063 [1:01:26<17:09, 2.32s/it]
58%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 621/1063 [1:01:28<17:02, 2.31s/it]
59%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 622/1063 [1:01:30<17:00, 2.31s/it]
59%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 623/1063 [1:01:33<16:51, 2.30s/it]
59%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 624/1063 [1:01:35<16:44, 2.29s/it]
59%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 625/1063 [1:01:37<16:42, 2.29s/it]
59%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 626/1063 [1:01:39<16:30, 2.27s/it]
59%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 627/1063 [1:01:41<16:21, 2.25s/it]
59%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 628/1063 [1:01:44<16:21, 2.26s/it]
59%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 629/1063 [1:01:46<16:15, 2.25s/it]
59%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 630/1063 [1:01:48<16:15, 2.25s/it]
59%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 631/1063 [1:01:50<16:12, 2.25s/it]
59%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 632/1063 [1:01:53<16:12, 2.26s/it]
60%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 633/1063 [1:01:55<16:16, 2.27s/it]
60%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 634/1063 [1:01:57<16:19, 2.28s/it]
60%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 635/1063 [1:02:00<16:16, 2.28s/it]
60%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 636/1063 [1:02:02<16:13, 2.28s/it]
60%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 637/1063 [1:02:04<16:11, 2.28s/it]
60%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 638/1063 [1:02:07<16:09, 2.28s/it]
60%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 639/1063 [1:02:09<16:06, 2.28s/it]
60%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 640/1063 [1:02:11<16:04, 2.28s/it]
{'loss': 0.0163, 'grad_norm': 0.3091525435447693, 'learning_rate': 4.1125121241513096e-05, 'epoch': 0.6} |
|
60%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 640/1063 [1:02:11<16:04, 2.28s/it]
60%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 641/1063 [1:02:13<15:55, 2.26s/it]
60%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 642/1063 [1:02:16<15:58, 2.28s/it]
60%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 643/1063 [1:02:18<15:57, 2.28s/it]
61%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 644/1063 [1:02:20<15:49, 2.27s/it]
61%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 645/1063 [1:02:22<15:44, 2.26s/it]
61%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 646/1063 [1:02:25<15:43, 2.26s/it]
61%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 647/1063 [1:02:27<15:37, 2.25s/it]
61%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 648/1063 [1:02:29<15:28, 2.24s/it]
61%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 649/1063 [1:02:31<15:31, 2.25s/it]
61%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 650/1063 [1:02:34<15:26, 2.24s/it]
61%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 651/1063 [1:02:36<15:31, 2.26s/it]
61%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 652/1063 [1:02:38<15:33, 2.27s/it]
61%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 653/1063 [1:02:40<15:32, 2.27s/it]
62%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 654/1063 [1:02:43<15:28, 2.27s/it]
62%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 655/1063 [1:02:45<15:24, 2.26s/it]
62%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 656/1063 [1:02:47<15:24, 2.27s/it]
62%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 657/1063 [1:02:49<15:16, 2.26s/it]
62%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 658/1063 [1:02:52<15:21, 2.27s/it]
62%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 659/1063 [1:02:54<15:15, 2.27s/it]
62%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 660/1063 [1:02:56<15:02, 2.24s/it]
{'loss': 0.0153, 'grad_norm': 0.06753229349851608, 'learning_rate': 3.9185257032007764e-05, 'epoch': 0.62} |
|
62%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 660/1063 [1:02:56<15:02, 2.24s/it]
62%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 661/1063 [1:02:58<15:01, 2.24s/it]
62%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 662/1063 [1:03:01<14:59, 2.24s/it]
62%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 663/1063 [1:03:03<14:55, 2.24s/it]
62%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 664/1063 [1:03:05<14:58, 2.25s/it]
63%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 665/1063 [1:03:07<14:59, 2.26s/it]
63%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 666/1063 [1:03:10<15:00, 2.27s/it]
63%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 667/1063 [1:03:12<15:02, 2.28s/it]
63%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 668/1063 [1:03:14<15:04, 2.29s/it]
63%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 669/1063 [1:03:17<15:03, 2.29s/it]
63%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 670/1063 [1:03:19<15:00, 2.29s/it]
63%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 671/1063 [1:03:21<14:54, 2.28s/it]
63%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 672/1063 [1:03:24<14:50, 2.28s/it]
63%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 673/1063 [1:03:26<14:51, 2.28s/it]
63%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 674/1063 [1:03:28<14:45, 2.28s/it]
63%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 675/1063 [1:03:30<14:38, 2.27s/it]
64%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 676/1063 [1:03:33<14:34, 2.26s/it]
64%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 677/1063 [1:03:35<14:32, 2.26s/it]
64%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 678/1063 [1:03:37<14:30, 2.26s/it]
64%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 679/1063 [1:03:39<14:28, 2.26s/it]
64%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 680/1063 [1:03:42<14:20, 2.25s/it]
{'loss': 0.0137, 'grad_norm': 0.16454404592514038, 'learning_rate': 3.724539282250243e-05, 'epoch': 0.64} |
|
64%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 680/1063 [1:03:42<14:20, 2.25s/it]
64%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 681/1063 [1:03:44<14:16, 2.24s/it]
64%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 682/1063 [1:03:46<14:16, 2.25s/it]
64%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 683/1063 [1:03:48<14:17, 2.26s/it]
64%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 684/1063 [1:03:51<14:15, 2.26s/it]
64%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 685/1063 [1:03:53<14:13, 2.26s/it]
65%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 686/1063 [1:03:55<14:13, 2.26s/it]
65%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 687/1063 [1:03:57<14:13, 2.27s/it]
65%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 688/1063 [1:04:00<14:03, 2.25s/it]
65%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 689/1063 [1:04:02<14:06, 2.26s/it]
65%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 690/1063 [1:04:04<14:09, 2.28s/it]
65%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 691/1063 [1:04:06<14:02, 2.26s/it]
65%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 692/1063 [1:04:09<14:01, 2.27s/it]
65%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 693/1063 [1:04:11<14:00, 2.27s/it]
65%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 694/1063 [1:04:13<14:04, 2.29s/it]
65%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 695/1063 [1:04:16<14:05, 2.30s/it]
65%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 696/1063 [1:04:18<14:04, 2.30s/it]
66%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 697/1063 [1:04:20<14:03, 2.30s/it]
66%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 698/1063 [1:04:23<13:57, 2.29s/it]
66%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 699/1063 [1:04:25<13:54, 2.29s/it]08:58:26 INFO eval generation: 50/64 (0.19 rec/s, 269.3s elapsed, ~75s left) |
| 08:59:41 INFO eval generation: 64/64 (0.19 rec/s, 344.9s elapsed, ~0s left) |
| 09:00:10 INFO Eval @ step_700 β n=512 |
| 09:00:10 INFO slot acc acc|C acc|I P(INC) R(INC) F1(INC) |
| 09:00:10 INFO overall 0.891 0.885 0.894 0.919 0.894 0.906 |
| 09:00:10 INFO E1 0.865 0.896 0.700 0.554 0.700 0.619 |
| 09:00:10 INFO E2 0.971 0.984 0.939 0.959 0.939 0.949 |
| 09:00:10 INFO E3 0.912 0.944 0.856 0.899 0.856 0.877 |
| 09:00:10 INFO E4 0.891 0.903 0.869 0.843 0.869 0.856 |
| 09:00:10 INFO exact_match=0.736 edge_macro_acc=0.910 malformed=0.000 |
|
66%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 700/1063 [1:10:41<11:31:41, 114.33s/it]
{'loss': 0.0159, 'grad_norm': 0.10377257317304611, 'learning_rate': 3.530552861299709e-05, 'epoch': 0.66} |
|
66%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 700/1063 [1:10:41<11:31:41, 114.33s/it]
66%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 701/1063 [1:10:43<8:06:50, 80.69s/it]
66%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 702/1063 [1:10:45<5:43:49, 57.15s/it]
66%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 703/1063 [1:10:47<4:04:08, 40.69s/it]
66%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 704/1063 [1:10:49<2:54:22, 29.14s/it]
66%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 705/1063 [1:10:52<2:05:41, 21.07s/it]
66%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 706/1063 [1:10:54<1:31:42, 15.41s/it]
67%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 707/1063 [1:10:56<1:08:01, 11.46s/it]
67%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 708/1063 [1:10:58<51:32, 8.71s/it]
67%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 709/1063 [1:11:01<39:54, 6.76s/it]
67%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 710/1063 [1:11:03<31:43, 5.39s/it]
67%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 711/1063 [1:11:05<26:03, 4.44s/it]
67%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 712/1063 [1:11:07<22:07, 3.78s/it]
67%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 713/1063 [1:11:10<19:21, 3.32s/it]
67%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 714/1063 [1:11:12<17:20, 2.98s/it]
67%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 715/1063 [1:11:14<16:04, 2.77s/it]
67%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 716/1063 [1:11:16<15:07, 2.61s/it]
67%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 717/1063 [1:11:18<14:21, 2.49s/it]
68%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 718/1063 [1:11:21<13:51, 2.41s/it]
68%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 719/1063 [1:11:23<13:30, 2.35s/it]
68%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 720/1063 [1:11:25<13:10, 2.31s/it]
{'loss': 0.0154, 'grad_norm': 0.08269070833921432, 'learning_rate': 3.336566440349176e-05, 'epoch': 0.68} |
|
68%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 720/1063 [1:11:25<13:10, 2.31s/it]
68%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 721/1063 [1:11:27<13:03, 2.29s/it]
68%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 722/1063 [1:11:30<13:02, 2.29s/it]
68%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 723/1063 [1:11:32<12:57, 2.29s/it]
68%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 724/1063 [1:11:34<12:53, 2.28s/it]
68%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 725/1063 [1:11:36<12:39, 2.25s/it]
68%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 726/1063 [1:11:39<12:36, 2.25s/it]
68%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 727/1063 [1:11:41<12:38, 2.26s/it]
68%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 728/1063 [1:11:43<12:37, 2.26s/it]
69%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 729/1063 [1:11:45<12:37, 2.27s/it]
69%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 730/1063 [1:11:48<12:32, 2.26s/it]
69%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 731/1063 [1:11:50<12:32, 2.27s/it]
69%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 732/1063 [1:11:52<12:29, 2.26s/it]
69%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 733/1063 [1:11:55<12:29, 2.27s/it]
69%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 734/1063 [1:11:57<12:23, 2.26s/it]
69%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 735/1063 [1:11:59<12:23, 2.27s/it]
69%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 736/1063 [1:12:01<12:19, 2.26s/it]
69%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 737/1063 [1:12:04<12:18, 2.27s/it]
69%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 738/1063 [1:12:06<12:18, 2.27s/it]
70%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 739/1063 [1:12:08<12:16, 2.27s/it]
70%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 740/1063 [1:12:10<12:10, 2.26s/it]
{'loss': 0.0164, 'grad_norm': 0.13774652779102325, 'learning_rate': 3.142580019398642e-05, 'epoch': 0.7} |
|
70%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 740/1063 [1:12:10<12:10, 2.26s/it]
70%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 741/1063 [1:12:13<12:09, 2.27s/it]
70%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 742/1063 [1:12:15<12:02, 2.25s/it]
70%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 743/1063 [1:12:17<11:56, 2.24s/it]
70%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 744/1063 [1:12:19<11:57, 2.25s/it]
70%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 745/1063 [1:12:22<11:54, 2.25s/it]
70%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 746/1063 [1:12:24<11:56, 2.26s/it]
70%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 747/1063 [1:12:26<11:56, 2.27s/it]
70%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 748/1063 [1:12:28<11:49, 2.25s/it]
70%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 749/1063 [1:12:31<11:49, 2.26s/it]
71%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 750/1063 [1:12:33<11:48, 2.26s/it]
71%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 751/1063 [1:12:35<11:51, 2.28s/it]
71%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 752/1063 [1:12:38<11:48, 2.28s/it]
71%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 753/1063 [1:12:40<11:43, 2.27s/it]
71%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 754/1063 [1:12:42<11:37, 2.26s/it]
71%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 755/1063 [1:12:44<11:35, 2.26s/it]
71%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 756/1063 [1:12:47<11:32, 2.26s/it]
71%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 757/1063 [1:12:49<11:31, 2.26s/it]
71%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 758/1063 [1:12:51<11:24, 2.24s/it]
71%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 759/1063 [1:12:53<11:25, 2.25s/it]
71%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 760/1063 [1:12:55<11:19, 2.24s/it]
{'loss': 0.0289, 'grad_norm': 0.0529085136950016, 'learning_rate': 2.948593598448109e-05, 'epoch': 0.71} |
|
71%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 760/1063 [1:12:55<11:19, 2.24s/it]
72%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 761/1063 [1:12:58<11:19, 2.25s/it]
72%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 762/1063 [1:13:00<11:18, 2.25s/it]
72%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 763/1063 [1:13:02<11:18, 2.26s/it]
72%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 764/1063 [1:13:05<11:20, 2.27s/it]
72%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 765/1063 [1:13:07<11:16, 2.27s/it]
72%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 766/1063 [1:13:09<11:14, 2.27s/it]
72%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 767/1063 [1:13:11<11:13, 2.28s/it]
72%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 768/1063 [1:13:14<11:08, 2.27s/it]
72%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 769/1063 [1:13:16<11:06, 2.27s/it]
72%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 770/1063 [1:13:18<11:05, 2.27s/it]
73%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 771/1063 [1:13:21<11:06, 2.28s/it]
73%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 772/1063 [1:13:23<11:02, 2.28s/it]
73%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 773/1063 [1:13:25<11:00, 2.28s/it]
73%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 774/1063 [1:13:27<10:59, 2.28s/it]
73%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 775/1063 [1:13:30<10:53, 2.27s/it]
73%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 776/1063 [1:13:32<10:52, 2.27s/it]
73%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 777/1063 [1:13:34<10:52, 2.28s/it]
73%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 778/1063 [1:13:37<10:52, 2.29s/it]
73%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 779/1063 [1:13:39<10:42, 2.26s/it]
73%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 780/1063 [1:13:41<10:44, 2.28s/it]
{'loss': 0.015, 'grad_norm': 0.07344073057174683, 'learning_rate': 2.754607177497575e-05, 'epoch': 0.73} |
|
73%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 780/1063 [1:13:41<10:44, 2.28s/it]
73%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 781/1063 [1:13:43<10:44, 2.29s/it]
74%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 782/1063 [1:13:46<10:45, 2.30s/it]
74%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 783/1063 [1:13:48<10:37, 2.28s/it]
74%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 784/1063 [1:13:50<10:36, 2.28s/it]
74%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 785/1063 [1:13:52<10:35, 2.29s/it]
74%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 786/1063 [1:13:55<10:32, 2.28s/it]
74%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 787/1063 [1:13:57<10:26, 2.27s/it]
74%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 788/1063 [1:13:59<10:25, 2.28s/it]
74%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 789/1063 [1:14:02<10:24, 2.28s/it]
74%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 790/1063 [1:14:04<10:24, 2.29s/it]
74%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 791/1063 [1:14:06<10:23, 2.29s/it]
75%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 792/1063 [1:14:08<10:19, 2.29s/it]
75%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 793/1063 [1:14:11<10:16, 2.28s/it]
75%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 794/1063 [1:14:13<10:13, 2.28s/it]
75%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 795/1063 [1:14:15<10:07, 2.27s/it]
75%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 796/1063 [1:14:17<10:01, 2.25s/it]
75%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 797/1063 [1:14:20<09:57, 2.25s/it]
75%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 798/1063 [1:14:22<09:57, 2.25s/it]
75%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 799/1063 [1:14:24<09:58, 2.27s/it]09:08:28 INFO eval generation: 50/64 (0.18 rec/s, 272.0s elapsed, ~76s left) |
| 09:09:44 INFO eval generation: 64/64 (0.18 rec/s, 348.4s elapsed, ~0s left) |
| 09:10:05 INFO Eval @ step_800 β n=512 |
| 09:10:05 INFO slot acc acc|C acc|I P(INC) R(INC) F1(INC) |
| 09:10:05 INFO overall 0.885 0.928 0.855 0.945 0.855 0.898 |
| 09:10:05 INFO E1 0.891 0.972 0.450 0.750 0.450 0.563 |
| 09:10:05 INFO E2 0.955 0.959 0.946 0.903 0.946 0.924 |
| 09:10:05 INFO E3 0.916 0.960 0.840 0.924 0.840 0.880 |
| 09:10:05 INFO E4 0.898 0.935 0.838 0.884 0.838 0.860 |
| 09:10:05 INFO exact_match=0.752 edge_macro_acc=0.915 malformed=0.000 |
|
75%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 800/1063 [1:20:35<8:15:06, 112.95s/it]
{'loss': 0.0146, 'grad_norm': 0.07503994554281235, 'learning_rate': 2.560620756547042e-05, 'epoch': 0.75} |
|
75%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 800/1063 [1:20:35<8:15:06, 112.95s/it]
75%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 801/1063 [1:20:38<5:48:09, 79.73s/it]
75%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 802/1063 [1:20:40<4:05:45, 56.50s/it]
76%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 803/1063 [1:20:42<2:54:21, 40.24s/it]
76%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 804/1063 [1:20:45<2:04:34, 28.86s/it]
76%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 805/1063 [1:20:47<1:29:45, 20.87s/it]
76%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 806/1063 [1:20:49<1:05:29, 15.29s/it]
76%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 807/1063 [1:20:51<48:36, 11.39s/it]
76%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 808/1063 [1:20:54<36:44, 8.65s/it]
76%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 809/1063 [1:20:56<28:32, 6.74s/it]
76%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 810/1063 [1:20:58<22:48, 5.41s/it]
76%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 811/1063 [1:21:01<18:48, 4.48s/it]
76%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 812/1063 [1:21:03<16:00, 3.83s/it]
76%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 813/1063 [1:21:05<14:02, 3.37s/it]
77%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 814/1063 [1:21:07<12:40, 3.05s/it]
77%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 815/1063 [1:21:10<11:41, 2.83s/it]
77%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 816/1063 [1:21:12<10:58, 2.66s/it]
77%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 817/1063 [1:21:14<10:28, 2.55s/it]
77%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 818/1063 [1:21:17<10:03, 2.46s/it]
77%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 819/1063 [1:21:19<09:44, 2.39s/it]
77%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 820/1063 [1:21:21<09:33, 2.36s/it]
{'loss': 0.0142, 'grad_norm': 0.05603089556097984, 'learning_rate': 2.3666343355965083e-05, 'epoch': 0.77} |
|
77%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 820/1063 [1:21:21<09:33, 2.36s/it]
77%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 821/1063 [1:21:23<09:26, 2.34s/it]
77%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 822/1063 [1:21:26<09:14, 2.30s/it]
77%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 823/1063 [1:21:28<09:08, 2.28s/it]
78%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 824/1063 [1:21:30<09:02, 2.27s/it]
78%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 825/1063 [1:21:32<08:56, 2.25s/it]
78%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 826/1063 [1:21:35<08:55, 2.26s/it]
78%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 827/1063 [1:21:37<08:49, 2.25s/it]
78%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 828/1063 [1:21:39<08:51, 2.26s/it]
78%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 829/1063 [1:21:41<08:48, 2.26s/it]
78%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 830/1063 [1:21:44<08:46, 2.26s/it]
78%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 831/1063 [1:21:46<08:43, 2.26s/it]
78%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 832/1063 [1:21:48<08:41, 2.26s/it]
78%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 833/1063 [1:21:50<08:35, 2.24s/it]
78%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 834/1063 [1:21:53<08:35, 2.25s/it]
79%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 835/1063 [1:21:55<08:33, 2.25s/it]
79%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 836/1063 [1:21:57<08:29, 2.24s/it]
79%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 837/1063 [1:21:59<08:29, 2.25s/it]
79%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 838/1063 [1:22:02<08:29, 2.27s/it]
79%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 839/1063 [1:22:04<08:27, 2.27s/it]
79%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 840/1063 [1:22:06<08:26, 2.27s/it]
{'loss': 0.0151, 'grad_norm': 0.09725795686244965, 'learning_rate': 2.172647914645975e-05, 'epoch': 0.79} |
|
79%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 840/1063 [1:22:06<08:26, 2.27s/it]
79%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 841/1063 [1:22:08<08:21, 2.26s/it]
79%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 842/1063 [1:22:11<08:16, 2.25s/it]
79%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 843/1063 [1:22:13<08:14, 2.25s/it]
79%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 844/1063 [1:22:15<08:12, 2.25s/it]
79%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 845/1063 [1:22:17<08:06, 2.23s/it]
80%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 846/1063 [1:22:20<08:05, 2.24s/it]
80%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 847/1063 [1:22:22<08:04, 2.24s/it]
80%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 848/1063 [1:22:24<08:00, 2.23s/it]
80%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 849/1063 [1:22:26<07:59, 2.24s/it]
80%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 850/1063 [1:22:29<08:02, 2.26s/it]
80%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 851/1063 [1:22:31<08:00, 2.27s/it]
80%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 852/1063 [1:22:33<07:59, 2.27s/it]
80%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 853/1063 [1:22:35<07:52, 2.25s/it]
80%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 854/1063 [1:22:38<07:52, 2.26s/it]
80%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 855/1063 [1:22:40<07:49, 2.26s/it]
81%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 856/1063 [1:22:42<07:43, 2.24s/it]
81%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 857/1063 [1:22:44<07:43, 2.25s/it]
81%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 858/1063 [1:22:47<07:43, 2.26s/it]
81%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 859/1063 [1:22:49<07:42, 2.26s/it]
81%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 860/1063 [1:22:51<07:35, 2.24s/it]
{'loss': 0.0136, 'grad_norm': 0.13120990991592407, 'learning_rate': 1.9786614936954415e-05, 'epoch': 0.81} |
|
81%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 860/1063 [1:22:51<07:35, 2.24s/it]
81%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 861/1063 [1:22:53<07:34, 2.25s/it]
81%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 862/1063 [1:22:56<07:34, 2.26s/it]
81%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 863/1063 [1:22:58<07:27, 2.24s/it]
81%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 864/1063 [1:23:00<07:25, 2.24s/it]
81%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 865/1063 [1:23:02<07:23, 2.24s/it]
81%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 866/1063 [1:23:05<07:22, 2.25s/it]
82%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 867/1063 [1:23:07<07:23, 2.26s/it]
82%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 868/1063 [1:23:09<07:20, 2.26s/it]
82%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 869/1063 [1:23:11<07:13, 2.23s/it]
82%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 870/1063 [1:23:14<07:10, 2.23s/it]
82%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 871/1063 [1:23:16<07:14, 2.26s/it]
82%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 872/1063 [1:23:18<07:11, 2.26s/it]
82%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 873/1063 [1:23:20<07:09, 2.26s/it]
82%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 874/1063 [1:23:23<07:06, 2.26s/it]
82%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 875/1063 [1:23:25<07:03, 2.25s/it]
82%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 876/1063 [1:23:27<07:03, 2.27s/it]
83%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 877/1063 [1:23:29<07:01, 2.26s/it]
83%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 878/1063 [1:23:32<06:59, 2.27s/it]
83%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 879/1063 [1:23:34<06:54, 2.25s/it]
83%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 880/1063 [1:23:36<06:53, 2.26s/it]
{'loss': 0.0121, 'grad_norm': 0.09627928584814072, 'learning_rate': 1.784675072744908e-05, 'epoch': 0.83} |
|
83%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 880/1063 [1:23:36<06:53, 2.26s/it]
83%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 881/1063 [1:23:39<06:53, 2.27s/it]
83%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 882/1063 [1:23:41<06:52, 2.28s/it]
83%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 883/1063 [1:23:43<06:52, 2.29s/it]
83%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 884/1063 [1:23:45<06:49, 2.29s/it]
83%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 885/1063 [1:23:48<06:45, 2.28s/it]
83%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 886/1063 [1:23:50<06:44, 2.28s/it]
83%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 887/1063 [1:23:52<06:41, 2.28s/it]
84%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 888/1063 [1:23:54<06:35, 2.26s/it]
84%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 889/1063 [1:23:57<06:35, 2.27s/it]
84%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 890/1063 [1:23:59<06:34, 2.28s/it]
84%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 891/1063 [1:24:01<06:30, 2.27s/it]
84%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 892/1063 [1:24:04<06:29, 2.28s/it]
84%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 893/1063 [1:24:06<06:24, 2.26s/it]
84%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 894/1063 [1:24:08<06:19, 2.25s/it]
84%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 895/1063 [1:24:10<06:17, 2.25s/it]
84%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 896/1063 [1:24:12<06:12, 2.23s/it]
84%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 897/1063 [1:24:15<06:13, 2.25s/it]
84%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 898/1063 [1:24:17<06:12, 2.26s/it]
85%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 899/1063 [1:24:19<06:10, 2.26s/it]09:18:15 INFO eval generation: 50/64 (0.19 rec/s, 264.2s elapsed, ~74s left) |
| 09:19:30 INFO eval generation: 64/64 (0.19 rec/s, 339.0s elapsed, ~0s left) |
| 09:19:56 INFO Eval @ step_900 β n=512 |
| 09:19:56 INFO slot acc acc|C acc|I P(INC) R(INC) F1(INC) |
| 09:19:56 INFO overall 0.889 0.890 0.888 0.921 0.888 0.904 |
| 09:19:56 INFO E1 0.889 0.947 0.575 0.667 0.575 0.617 |
| 09:19:56 INFO E2 0.965 0.975 0.939 0.939 0.939 0.939 |
| 09:19:56 INFO E3 0.900 0.941 0.830 0.891 0.830 0.860 |
| 09:19:56 INFO E4 0.895 0.913 0.864 0.855 0.864 0.859 |
| 09:19:56 INFO exact_match=0.748 edge_macro_acc=0.912 malformed=0.000 |
|
85%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 900/1063 [1:30:27<5:03:48, 111.83s/it]
{'loss': 0.0151, 'grad_norm': 0.07019995152950287, 'learning_rate': 1.5906886517943744e-05, 'epoch': 0.85} |
|
85%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 900/1063 [1:30:27<5:03:48, 111.83s/it]
85%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 901/1063 [1:30:31<3:34:54, 79.60s/it]
85%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 902/1063 [1:30:33<2:31:18, 56.39s/it]
85%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 903/1063 [1:30:36<1:47:02, 40.14s/it]
85%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 904/1063 [1:30:38<1:16:12, 28.76s/it]
85%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 905/1063 [1:30:40<54:43, 20.78s/it]
85%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 906/1063 [1:30:42<39:49, 15.22s/it]
85%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 907/1063 [1:30:45<29:26, 11.33s/it]
85%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 908/1063 [1:30:47<22:15, 8.61s/it]
86%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 909/1063 [1:30:49<17:12, 6.71s/it]
86%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 910/1063 [1:30:51<13:41, 5.37s/it]
86%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 911/1063 [1:30:54<11:14, 4.44s/it]
86%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 912/1063 [1:30:56<09:32, 3.79s/it]
86%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 913/1063 [1:30:58<08:17, 3.32s/it]
86%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 914/1063 [1:31:00<07:25, 2.99s/it]
86%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 915/1063 [1:31:03<06:50, 2.78s/it]
86%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 916/1063 [1:31:05<06:24, 2.62s/it]
86%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 917/1063 [1:31:07<06:05, 2.50s/it]
86%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 918/1063 [1:31:09<05:51, 2.42s/it]
86%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 919/1063 [1:31:12<05:42, 2.38s/it]
87%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 920/1063 [1:31:14<05:36, 2.35s/it]
{'loss': 0.0146, 'grad_norm': 0.09808464348316193, 'learning_rate': 1.396702230843841e-05, 'epoch': 0.87} |
|
87%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 920/1063 [1:31:14<05:36, 2.35s/it]
87%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 921/1063 [1:31:16<05:28, 2.32s/it]
87%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 922/1063 [1:31:18<05:24, 2.30s/it]
87%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 923/1063 [1:31:21<05:20, 2.29s/it]
87%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 924/1063 [1:31:23<05:17, 2.28s/it]
87%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 925/1063 [1:31:25<05:12, 2.27s/it]
87%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 926/1063 [1:31:27<05:09, 2.26s/it]
87%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 927/1063 [1:31:30<05:04, 2.24s/it]
87%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 928/1063 [1:31:32<05:00, 2.22s/it]
87%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 929/1063 [1:31:34<04:57, 2.22s/it]
87%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 930/1063 [1:31:36<04:56, 2.23s/it]
88%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 931/1063 [1:31:38<04:54, 2.23s/it]
88%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 932/1063 [1:31:41<04:52, 2.23s/it]
88%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 933/1063 [1:31:43<04:49, 2.22s/it]
88%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 934/1063 [1:31:45<04:49, 2.24s/it]
88%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 935/1063 [1:31:47<04:44, 2.23s/it]
88%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 936/1063 [1:31:50<04:43, 2.23s/it]
88%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 937/1063 [1:31:52<04:42, 2.24s/it]
88%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 938/1063 [1:31:54<04:40, 2.25s/it]
88%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 939/1063 [1:31:56<04:38, 2.25s/it]
88%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 940/1063 [1:31:59<04:36, 2.25s/it]
{'loss': 0.0143, 'grad_norm': 0.07016833871603012, 'learning_rate': 1.2027158098933075e-05, 'epoch': 0.88} |
|
88%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 940/1063 [1:31:59<04:36, 2.25s/it]
89%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 941/1063 [1:32:01<04:36, 2.26s/it]
89%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 942/1063 [1:32:03<04:34, 2.27s/it]
89%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 943/1063 [1:32:06<04:33, 2.28s/it]
89%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 944/1063 [1:32:08<04:31, 2.28s/it]
89%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 945/1063 [1:32:10<04:27, 2.27s/it]
89%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 946/1063 [1:32:12<04:24, 2.26s/it]
89%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 947/1063 [1:32:14<04:20, 2.25s/it]
89%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 948/1063 [1:32:17<04:19, 2.26s/it]
89%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 949/1063 [1:32:19<04:18, 2.26s/it]
89%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 950/1063 [1:32:21<04:15, 2.26s/it]
89%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 951/1063 [1:32:24<04:14, 2.27s/it]
90%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 952/1063 [1:32:26<04:11, 2.27s/it]
90%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 953/1063 [1:32:28<04:08, 2.26s/it]
90%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 954/1063 [1:32:30<04:06, 2.26s/it]
90%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 955/1063 [1:32:33<04:04, 2.26s/it]
90%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 956/1063 [1:32:35<04:02, 2.26s/it]
90%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 957/1063 [1:32:37<03:59, 2.26s/it]
90%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 958/1063 [1:32:39<03:58, 2.27s/it]
90%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 959/1063 [1:32:42<03:57, 2.28s/it]
90%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 960/1063 [1:32:44<03:55, 2.29s/it]
{'loss': 0.0147, 'grad_norm': 0.060904502868652344, 'learning_rate': 1.008729388942774e-05, 'epoch': 0.9} |
|
90%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 960/1063 [1:32:44<03:55, 2.29s/it]
90%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 961/1063 [1:32:46<03:51, 2.27s/it]
90%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 962/1063 [1:32:49<03:48, 2.26s/it]
91%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 963/1063 [1:32:51<03:47, 2.27s/it]
91%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 964/1063 [1:32:53<03:44, 2.26s/it]
91%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 965/1063 [1:32:55<03:41, 2.26s/it]
91%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 966/1063 [1:32:58<03:39, 2.26s/it]
91%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 967/1063 [1:33:00<03:37, 2.26s/it]
91%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 968/1063 [1:33:02<03:33, 2.24s/it]
91%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 969/1063 [1:33:04<03:31, 2.25s/it]
91%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 970/1063 [1:33:07<03:29, 2.26s/it]
91%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 971/1063 [1:33:09<03:26, 2.25s/it]
91%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 972/1063 [1:33:11<03:24, 2.25s/it]
92%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 973/1063 [1:33:13<03:23, 2.26s/it]
92%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 974/1063 [1:33:16<03:20, 2.25s/it]
92%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 975/1063 [1:33:18<03:17, 2.25s/it]
92%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 976/1063 [1:33:20<03:15, 2.25s/it]
92%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 977/1063 [1:33:22<03:12, 2.24s/it]
92%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 978/1063 [1:33:25<03:10, 2.25s/it]
92%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 979/1063 [1:33:27<03:11, 2.28s/it]
92%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 980/1063 [1:33:29<03:07, 2.26s/it]
{'loss': 0.0152, 'grad_norm': 0.08607947826385498, 'learning_rate': 8.147429679922405e-06, 'epoch': 0.92} |
|
92%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 980/1063 [1:33:29<03:07, 2.26s/it]
92%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 981/1063 [1:33:31<03:04, 2.25s/it]
92%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 982/1063 [1:33:34<03:01, 2.24s/it]
92%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 983/1063 [1:33:36<02:59, 2.24s/it]
93%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 984/1063 [1:33:38<02:58, 2.25s/it]
93%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 985/1063 [1:33:40<02:54, 2.24s/it]
93%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 986/1063 [1:33:43<02:54, 2.27s/it]
93%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 987/1063 [1:33:45<02:50, 2.24s/it]
93%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 988/1063 [1:33:47<02:48, 2.25s/it]
93%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 989/1063 [1:33:49<02:46, 2.25s/it]
93%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 990/1063 [1:33:52<02:45, 2.26s/it]
93%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 991/1063 [1:33:54<02:42, 2.25s/it]
93%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 992/1063 [1:33:56<02:41, 2.27s/it]
93%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 993/1063 [1:33:58<02:38, 2.26s/it]
94%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 994/1063 [1:34:01<02:35, 2.26s/it]
94%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 995/1063 [1:34:03<02:34, 2.28s/it]
94%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 996/1063 [1:34:05<02:32, 2.28s/it]
94%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 997/1063 [1:34:08<02:30, 2.28s/it]
94%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 998/1063 [1:34:10<02:27, 2.27s/it]
94%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 999/1063 [1:34:12<02:24, 2.26s/it]09:28:14 INFO eval generation: 50/64 (0.18 rec/s, 270.4s elapsed, ~76s left) |
| 09:29:27 INFO eval generation: 64/64 (0.19 rec/s, 343.2s elapsed, ~0s left) |
| 09:29:50 INFO Eval @ step_1000 β n=512 |
| 09:29:50 INFO slot acc acc|C acc|I P(INC) R(INC) F1(INC) |
| 09:29:50 INFO overall 0.891 0.919 0.871 0.940 0.871 0.904 |
| 09:29:50 INFO E1 0.896 0.979 0.450 0.800 0.450 0.576 |
| 09:29:50 INFO E2 0.969 0.984 0.932 0.958 0.932 0.945 |
| 09:29:50 INFO E3 0.908 0.960 0.819 0.922 0.819 0.868 |
| 09:29:50 INFO E4 0.902 0.925 0.864 0.873 0.864 0.868 |
| 09:29:50 INFO exact_match=0.752 edge_macro_acc=0.919 malformed=0.000 |
|
94%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 1000/1063 [1:40:20<1:57:43, 112.13s/it]
{'loss': 0.0151, 'grad_norm': 0.05431496351957321, 'learning_rate': 6.207565470417071e-06, 'epoch': 0.94} |
|
94%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 1000/1063 [1:40:20<1:57:43, 112.13s/it]
94%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 1001/1063 [1:40:23<1:21:48, 79.17s/it]
94%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 1002/1063 [1:40:25<57:00, 56.08s/it]
94%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 1003/1063 [1:40:27<39:56, 39.94s/it]
94%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 1004/1063 [1:40:30<28:10, 28.65s/it]
95%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 1005/1063 [1:40:32<20:00, 20.71s/it]
95%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 1006/1063 [1:40:34<14:23, 15.15s/it]
95%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 1007/1063 [1:40:36<10:31, 11.28s/it]
95%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 1008/1063 [1:40:38<07:50, 8.56s/it]
95%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 1009/1063 [1:40:41<06:00, 6.68s/it]
95%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 1010/1063 [1:40:43<04:43, 5.36s/it]
95%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 1011/1063 [1:40:45<03:50, 4.43s/it]
95%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 1012/1063 [1:40:47<03:12, 3.77s/it]
95%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 1013/1063 [1:40:50<02:45, 3.31s/it]
95%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 1014/1063 [1:40:52<02:27, 3.00s/it]
95%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 1015/1063 [1:40:54<02:13, 2.79s/it]
96%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 1016/1063 [1:40:56<02:02, 2.61s/it]
96%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 1017/1063 [1:40:59<01:54, 2.50s/it]
96%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 1018/1063 [1:41:01<01:49, 2.44s/it]
96%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 1019/1063 [1:41:03<01:45, 2.39s/it]
96%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 1020/1063 [1:41:05<01:40, 2.33s/it]
{'loss': 0.0137, 'grad_norm': 0.11394680291414261, 'learning_rate': 4.267701260911737e-06, 'epoch': 0.96} |
|
96%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 1020/1063 [1:41:05<01:40, 2.33s/it]
96%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 1021/1063 [1:41:08<01:37, 2.31s/it]
96%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 1022/1063 [1:41:10<01:34, 2.30s/it]
96%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 1023/1063 [1:41:12<01:31, 2.28s/it]
96%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 1024/1063 [1:41:14<01:28, 2.27s/it]
96%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 1025/1063 [1:41:17<01:25, 2.26s/it]
97%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 1026/1063 [1:41:19<01:23, 2.25s/it]
97%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 1027/1063 [1:41:21<01:20, 2.25s/it]
97%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 1028/1063 [1:41:23<01:18, 2.24s/it]
97%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 1029/1063 [1:41:26<01:16, 2.25s/it]
97%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 1030/1063 [1:41:28<01:14, 2.26s/it]
97%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 1031/1063 [1:41:30<01:12, 2.25s/it]
97%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 1032/1063 [1:41:32<01:09, 2.24s/it]
97%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 1033/1063 [1:41:35<01:07, 2.25s/it]
97%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 1034/1063 [1:41:37<01:05, 2.24s/it]
97%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 1035/1063 [1:41:39<01:03, 2.26s/it]
97%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 1036/1063 [1:41:41<01:00, 2.26s/it]
98%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 1037/1063 [1:41:44<00:58, 2.25s/it]
98%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 1038/1063 [1:41:46<00:56, 2.25s/it]
98%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 1039/1063 [1:41:48<00:53, 2.23s/it]
98%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 1040/1063 [1:41:50<00:51, 2.24s/it]
{'loss': 0.0135, 'grad_norm': 0.06685776263475418, 'learning_rate': 2.327837051406402e-06, 'epoch': 0.98} |
|
98%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 1040/1063 [1:41:50<00:51, 2.24s/it]
98%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 1041/1063 [1:41:53<00:49, 2.24s/it]
98%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 1042/1063 [1:41:55<00:47, 2.24s/it]
98%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 1043/1063 [1:41:57<00:44, 2.23s/it]
98%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 1044/1063 [1:41:59<00:42, 2.23s/it]
98%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 1045/1063 [1:42:02<00:40, 2.24s/it]
98%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 1046/1063 [1:42:04<00:38, 2.24s/it]
98%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 1047/1063 [1:42:06<00:36, 2.25s/it]
99%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 1048/1063 [1:42:08<00:33, 2.24s/it]
99%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 1049/1063 [1:42:11<00:31, 2.23s/it]
99%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 1050/1063 [1:42:13<00:28, 2.23s/it]
99%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 1051/1063 [1:42:15<00:26, 2.23s/it]
99%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 1052/1063 [1:42:17<00:24, 2.24s/it]
99%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 1053/1063 [1:42:19<00:22, 2.24s/it]
99%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 1054/1063 [1:42:22<00:20, 2.25s/it]
99%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1055/1063 [1:42:24<00:17, 2.25s/it]
99%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1056/1063 [1:42:26<00:15, 2.26s/it]
99%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1057/1063 [1:42:29<00:13, 2.26s/it]
100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1058/1063 [1:42:31<00:11, 2.26s/it]
100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1059/1063 [1:42:33<00:09, 2.26s/it]
100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1060/1063 [1:42:35<00:06, 2.27s/it]
{'loss': 0.0124, 'grad_norm': 0.07160358875989914, 'learning_rate': 3.8797284190106696e-07, 'epoch': 1.0} |
|
100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1060/1063 [1:42:35<00:06, 2.27s/it]
100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1061/1063 [1:42:38<00:04, 2.27s/it]
100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1062/1063 [1:42:40<00:02, 2.28s/it]
100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1063/1063 [1:42:42<00:00, 2.32s/it]09:36:44 INFO eval generation: 50/64 (0.19 rec/s, 269.6s elapsed, ~75s left) |
| 09:37:58 INFO eval generation: 64/64 (0.19 rec/s, 343.8s elapsed, ~0s left) |
| 09:38:18 INFO Eval @ end_of_epoch_1 β n=512 |
| 09:38:18 INFO slot acc acc|C acc|I P(INC) R(INC) F1(INC) |
| 09:38:18 INFO overall 0.893 0.914 0.878 0.937 0.878 0.906 |
| 09:38:18 INFO E1 0.891 0.956 0.537 0.694 0.537 0.606 |
| 09:38:18 INFO E2 0.969 0.978 0.946 0.946 0.946 0.946 |
| 09:38:18 INFO E3 0.904 0.951 0.824 0.906 0.824 0.864 |
| 09:38:18 INFO E4 0.896 0.925 0.848 0.871 0.848 0.859 |
| 09:38:18 INFO exact_match=0.750 edge_macro_acc=0.915 malformed=0.000 |
|
{'train_runtime': 6920.8766, 'train_samples_per_second': 4.913, 'train_steps_per_second': 0.154, 'train_loss': 0.0191556752912581, 'epoch': 1.0} |
|
100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1063/1063 [1:48:49<00:00, 2.32s/it]
100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1063/1063 [1:48:49<00:00, 6.14s/it] |
| 09:38:20 INFO Saved final adapter + processor β /workspace/fms4navigation/results/PRM-v2-r256/final |
| [1;34mwandb[0m: |
| [1;34mwandb[0m: π View run [33mruby-bird-20[0m at: [34mhttps://wandb.ai/mjf-su-stanford-university/huggingface/runs/ciao5cei[0m |
| [1;34mwandb[0m: Find logs at: [1;35mwandb/run-20260515_074304-ciao5cei/logs[0m |
| [rank0]:[W515 09:38:22.830496392 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) |
|
|