2026-02-24 17:30:09 - Load pretrained SentenceTransformer: bert-base-arabertv02 2026-02-24 17:30:19 - '[Errno -2] Name or service not known' thrown while requesting HEAD https://huggingface.co/bert-base-arabertv02/resolve/main/./modules.json 2026-02-24 17:30:19 - Retrying in 1s [Retry 1/5]. 2026-02-24 17:30:20 - No sentence-transformers model found with name bert-base-arabertv02. Creating a new one with mean pooling. Model is running on: cuda 2026-02-24 17:30:25 - Reading the training and eval dataset 2026-02-24 17:31:43 - DatasetDict({ train: Dataset({ features: ['anchor', 'positive', 'negative'], num_rows: 3954179 }) }) 2026-02-24 17:31:43 - DatasetDict({ train: Dataset({ features: ['anchor', 'positive', 'negative'], num_rows: 1129759 }) }) 2026-02-24 17:31:43 - DatasetDict({ train: Dataset({ features: ['anchor', 'positive', 'negative'], num_rows: 564877 }) }) 2026-02-24 17:31:43 - TripletEvaluator: Evaluating the model on the dev-768 dataset (truncated to 768): 2026-02-24 17:55:01 - Accuracy Cosine Similarity: 79.01% 2026-02-24 17:55:01 - TripletEvaluator: Evaluating the model on the dev-512 dataset (truncated to 512): 2026-02-24 18:17:34 - Accuracy Cosine Similarity: 77.91% 2026-02-24 18:17:34 - TripletEvaluator: Evaluating the model on the dev-256 dataset (truncated to 256): 2026-02-24 18:39:41 - Accuracy Cosine Similarity: 79.57% 2026-02-24 18:39:41 - TripletEvaluator: Evaluating the model on the dev-128 dataset (truncated to 128): 2026-02-24 19:02:02 - Accuracy Cosine Similarity: 78.63% 2026-02-24 19:02:02 - TripletEvaluator: Evaluating the model on the dev-64 dataset (truncated to 64): 2026-02-24 19:24:20 - Accuracy Cosine Similarity: 76.00% {'loss': '13.01', 'grad_norm': '27.4', 'learning_rate': '6.441e-07', 'epoch': '0.006474'} {'loss': '6.428', 'grad_norm': '19.47', 'learning_rate': '1.291e-06', 'epoch': '0.01295'} {'loss': '4.365', 'grad_norm': '17.72', 'learning_rate': '1.939e-06', 'epoch': '0.01942'} {'loss': '3.585', 'grad_norm': '14.64', 'learning_rate': '2.586e-06', 'epoch': '0.0259'} {'loss': '3.183', 'grad_norm': '13.4', 'learning_rate': '3.234e-06', 'epoch': '0.03237'} {'loss': '2.873', 'grad_norm': '10.47', 'learning_rate': '3.881e-06', 'epoch': '0.03884'} {'loss': '2.634', 'grad_norm': '12.2', 'learning_rate': '4.528e-06', 'epoch': '0.04532'} {'loss': '2.605', 'grad_norm': '12.74', 'learning_rate': '5.176e-06', 'epoch': '0.05179'} {'loss': '2.31', 'grad_norm': '12.29', 'learning_rate': '5.823e-06', 'epoch': '0.05827'} {'loss': '2.236', 'grad_norm': '10.22', 'learning_rate': '6.47e-06', 'epoch': '0.06474'} {'loss': '2.155', 'grad_norm': '9.492', 'learning_rate': '7.118e-06', 'epoch': '0.07121'} {'loss': '2.019', 'grad_norm': '9.747', 'learning_rate': '7.765e-06', 'epoch': '0.07769'} {'loss': '1.926', 'grad_norm': '9.491', 'learning_rate': '8.412e-06', 'epoch': '0.08416'} {'loss': '1.927', 'grad_norm': '8.972', 'learning_rate': '9.06e-06', 'epoch': '0.09064'} {'loss': '1.866', 'grad_norm': '10.02', 'learning_rate': '9.707e-06', 'epoch': '0.09711'} {'loss': '1.796', 'grad_norm': '9.32', 'learning_rate': '1.035e-05', 'epoch': '0.1036'} {'loss': '1.731', 'grad_norm': '8.47', 'learning_rate': '1.1e-05', 'epoch': '0.1101'} {'loss': '1.725', 'grad_norm': '7.867', 'learning_rate': '1.165e-05', 'epoch': '0.1165'} {'loss': '1.619', 'grad_norm': '9.521', 'learning_rate': '1.23e-05', 'epoch': '0.123'} {'loss': '1.634', 'grad_norm': '9.193', 'learning_rate': '1.294e-05', 'epoch': '0.1295'} {'loss': '1.604', 'grad_norm': '9.547', 'learning_rate': '1.359e-05', 'epoch': '0.136'} {'loss': '1.548', 'grad_norm': '8.025', 'learning_rate': '1.424e-05', 'epoch': '0.1424'} {'loss': '1.542', 'grad_norm': '8.197', 'learning_rate': '1.489e-05', 'epoch': '0.1489'} {'loss': '1.507', 'grad_norm': '7.923', 'learning_rate': '1.553e-05', 'epoch': '0.1554'} {'loss': '1.48', 'grad_norm': '7.746', 'learning_rate': '1.618e-05', 'epoch': '0.1619'} {'loss': '1.462', 'grad_norm': '10.12', 'learning_rate': '1.683e-05', 'epoch': '0.1683'} {'loss': '1.446', 'grad_norm': '7.806', 'learning_rate': '1.748e-05', 'epoch': '0.1748'} {'loss': '1.424', 'grad_norm': '6.649', 'learning_rate': '1.812e-05', 'epoch': '0.1813'} {'loss': '1.393', 'grad_norm': '7.031', 'learning_rate': '1.877e-05', 'epoch': '0.1877'} {'loss': '1.352', 'grad_norm': '6.146', 'learning_rate': '1.942e-05', 'epoch': '0.1942'} 2026-02-24 20:40:33 - TripletEvaluator: Evaluating the model on the dev-768 dataset in epoch 0.19422189851905802 after 6000 steps (truncated to 768): 2026-02-24 21:08:51 - Accuracy Cosine Similarity: 95.64% 2026-02-24 21:08:51 - TripletEvaluator: Evaluating the model on the dev-512 dataset in epoch 0.19422189851905802 after 6000 steps (truncated to 512): 2026-02-24 21:36:55 - Accuracy Cosine Similarity: 95.66% 2026-02-24 21:36:55 - TripletEvaluator: Evaluating the model on the dev-256 dataset in epoch 0.19422189851905802 after 6000 steps (truncated to 256): 2026-02-24 22:05:02 - Accuracy Cosine Similarity: 95.60% 2026-02-24 22:05:02 - TripletEvaluator: Evaluating the model on the dev-128 dataset in epoch 0.19422189851905802 after 6000 steps (truncated to 128): 2026-02-24 22:33:12 - Accuracy Cosine Similarity: 95.46% 2026-02-24 22:33:12 - TripletEvaluator: Evaluating the model on the dev-64 dataset in epoch 0.19422189851905802 after 6000 steps (truncated to 64): 2026-02-24 23:01:40 - Accuracy Cosine Similarity: 95.13% {'eval_train_loss': '1.216', 'eval_dev-768_cosine_accuracy': '0.9564', 'eval_dev-512_cosine_accuracy': '0.9566', 'eval_dev-256_cosine_accuracy': '0.956', 'eval_dev-128_cosine_accuracy': '0.9546', 'eval_dev-64_cosine_accuracy': '0.9513', 'eval_sequential_score': '0.9564', 'eval_train_runtime': '9876', 'eval_train_samples_per_second': '114.4', 'eval_train_steps_per_second': '1.787', 'epoch': '0.1942'} 2026-02-24 23:01:40 - Saving model checkpoint to output/arabert_20260224_1730/checkpoint-6000 2026-02-24 23:01:40 - Save model to output/arabert_20260224_1730/checkpoint-6000 {'loss': '1.372', 'grad_norm': '8.833', 'learning_rate': '1.999e-05', 'epoch': '0.2007'} {'loss': '1.37', 'grad_norm': '6.729', 'learning_rate': '1.992e-05', 'epoch': '0.2072'} {'loss': '1.352', 'grad_norm': '6.586', 'learning_rate': '1.985e-05', 'epoch': '0.2136'} {'loss': '1.288', 'grad_norm': '6.568', 'learning_rate': '1.978e-05', 'epoch': '0.2201'} {'loss': '1.271', 'grad_norm': '6.606', 'learning_rate': '1.971e-05', 'epoch': '0.2266'} {'loss': '1.243', 'grad_norm': '7.913', 'learning_rate': '1.963e-05', 'epoch': '0.2331'} {'loss': '1.23', 'grad_norm': '6.931', 'learning_rate': '1.956e-05', 'epoch': '0.2395'} {'loss': '1.212', 'grad_norm': '6.979', 'learning_rate': '1.949e-05', 'epoch': '0.246'} {'loss':{'loss': '1.37', 'grad_norm': '6.732', 'learning_rate': '1.992e-05', 'epoch': '0.2072'} {'loss': '1.352', 'grad_norm': '6.583', 'learning_rate': '1.985e-05'{'loss': '1.288', 'grad_norm': '6.564', 'learning_rate': '1.978e-05', 'epoch': '0.2201'} ':{'loss': '1.271', 'grad_norm': '6.606', 'learning_rate': '1.971e-05', 'epoch': '0.2266'} {'loss': '1.243', 'grad_norm': '7.906', 'learning_rate': '1.963e-05', 'epoch': '0.2331'} {'loss': '1.23', 'grad_norm': '6.921', 'learning_rate': '1.956e-05', 'epoch': '0.2395'} {'loss': '1.212', 'grad_norm': '6.985', 'learning_rate': '1.949e-05', 'epoch': '0.246'} {'loss': '1.222', 'grad_norm': '7.511', 'learning_rate': '1.942e-05', 'epoch': '0.2525'} {'loss': '1.212', 'grad_norm': '7.834', 'learning_rate': '1.935e-05', 'epoch': '0.259'} {'loss': '1.199', 'grad_norm': '5.273', 'learning_rate': '1.927e-05', 'epoch': '0.2654'} {'loss': '1.207', 'grad_norm': '7.454', 'learning_rate': '1.92e-05', 'epoch': '0.2719'} {'loss': '1.197', 'grad_norm': '7.585', 'learning_rate': '1.913e-05', 'epoch': '0.2784'} {'loss': '1.128', 'grad_norm': '7.16', 'learning_rate': '1.906e-05', 'epoch': '0.2849'} {'loss': '1.139', 'grad_norm': '6.354', 'learning_rate': '1.899e-05', 'epoch': '0.2913'} {'loss': '1.197', 'grad_norm': '8.433', 'learning_rate': '1.891e-05', 'epoch': '0.2978'} {'loss': '1.099', 'grad_norm': '5.537', 'learning_rate': '1.884e-05', 'epoch': '0.3043'} {'loss': '1.133', 'grad_norm': '7.014', 'learning_rate': '1.877e-05', 'epoch': '0.3108'} {'loss': '1.104', 'grad_norm': '7.172', 'learning_rate': '1.87e-05', 'epoch': '0.3172'} {'loss': '1.116', 'grad_norm': '7.246', 'learning_rate': '1.863e-05', 'epoch': '0.3237'} {'loss': '1.083', 'grad_norm': '7.252', 'learning_rate': '1.855e-05', 'epoch': '0.3302'} {'loss': '1.043', 'grad_norm': '9.376', 'learning_rate': '1.848e-05', 'epoch': '0.3367'} {'loss': '1.104', 'grad_norm': '6.334', 'learning_rate': '1.841e-05', 'epoch': '0.3431'} {'loss': '1.074', 'grad_norm': '8.013', 'learning_rate': '1.834e-05', 'epoch': '0.3496'} {'loss': '1.075', 'grad_norm': '6.887', 'learning_rate': '1.827e-05', 'epoch': '0.3561'} {'loss': '1.059', 'grad_norm': '6.77', 'learning_rate': '1.819e-05', 'epoch': '0.3625'} {'loss': '1.063', 'grad_norm': '5.884', 'learning_rate': '1.812e-05', 'epoch': '0.369'} {'loss': '1.025', 'grad_norm': '4.584', 'learning_rate': '1.805e-05', 'epoch': '0.3755'} {'loss': '1.049', 'grad_norm': '5.729', 'learning_rate': '1.798e-05', 'epoch': '0.382'} {'loss': '1.045', 'grad_norm': '4.304', 'learning_rate': '1.791e-05', 'epoch': '0.3884'} 2026-02-25 16:31:58 - TripletEvaluator: Evaluating the model on the dev-768 dataset in epoch 0.38844379703811605 after 12000 steps (truncated to 768): 2026-02-25 16:53:08 - Accuracy Cosine Similarity: 96.68% 2026-02-25 16:53:08 - TripletEvaluator: Evaluating the model on the dev-512 dataset in epoch 0.38844379703811605 after 12000 steps (truncated to 512): 2026-02-25 17:13:37 - Accuracy Cosine Similarity: 96.67% 2026-02-25 17:13:37 - TripletEvaluator: Evaluating the model on the dev-256 dataset in epoch 0.38844379703811605 after 12000 steps (truncated to 256): 2026-02-25 17:34:09 - Accuracy Cosine Similarity: 96.64% 2026-02-25 17:34:09 - TripletEvaluator: Evaluating the model on the dev-128 dataset in epoch 0.38844379703811605 after 12000 steps (truncated to 128): 2026-02-25 17:54:44 - Accuracy Cosine Similarity: 96.56% 2026-02-25 17:54:44 - TripletEvaluator: Evaluating the model on the dev-64 dataset in epoch 0.38844379703811605 after 12000 steps (truncated to 64): 2026-02-25 18:15:17 - Accuracy Cosine Similarity: 96.28% {'eval_train_loss': '0.5325', 'eval_dev-768_cosine_accuracy': '0.9668', 'eval_dev-512_cosine_accuracy': '0.9667', 'eval_dev-256_cosine_accuracy': '0.9664', 'eval_dev-128_cosine_accuracy': '0.9656', 'eval_dev-64_cosine_accuracy': '0.9628', 'eval_sequential_score': '0.9668', 'eval_train_runtime': '9281', 'eval_train_samples_per_second': '121.7', 'eval_train_steps_per_second': '15.22', 'epoch': '0.3884'} 2026-02-25 18:15:17 - Saving model checkpoint to /home/skiredj.abderrahman/khalil/sbert_training/output/arabert_20260224_1730/checkpoint-12000 2026-02-25 18:15:17 - Save model to /home/skiredj.abderrahman/khalil/sbert_training/output/arabert_20260224_1730/checkpoint-12000 {'loss': '0.9883', 'grad_norm': '5.449', 'learning_rate': '1.783e-05', 'epoch': '0.3949'} {'loss': '0.9907', 'grad_norm': '5.808', 'learning_rate': '1.776e-05', 'epoch': '0.4014'} {'loss': '1.022', 'grad_norm': '6.168', 'learning_rate': '1.769e-05', 'epoch': '0.4079'} {'loss': '0.987', 'grad_norm': '5.675', 'learning_rate': '1.762e-05', 'epoch': '0.4143'} {'loss': '1.033', 'grad_norm': '5.961', 'learning_rate': '1.755e-05', 'epoch': '0.4208'} {'loss': '0.9989', 'grad_norm': '7.682', 'learning_rate': '1.748e-05', 'epoch': '0.4273'} {'loss': '0.9805', 'grad_norm': '6.046', 'learning_rate': '1.74e-05', 'epoch': '0.4338'} {'loss': '0.9484', 'grad_norm': '6.129', 'learning_rate': '1.733e-05', 'epoch': '0.4402'} {'loss': '0.9937', 'grad_norm': '4.2', 'learning_rate': '1.726e-05', 'epoch': '0.4467'} {'loss': '1.016', 'grad_norm': '7.164', 'learning_rate': '1.719e-05', 'epoch': '0.4532'} {'loss': '0.9726', 'grad_norm': '5.905', 'learning_rate': '1.712e-05', 'epoch': '0.4597'} {'loss': '0.96', 'grad_norm': '5.824', 'learning_rate': '1.704e-05', 'epoch': '0.4661'} {'loss': '0.9528', 'grad_norm': '6.058', 'learning_rate': '1.697e-05', 'epoch': '0.4726'} {'loss': '0.9292', 'grad_norm': '4.911', 'learning_rate': '1.69e-05', 'epoch': '0.4791'} {'loss': '0.9157', 'grad_norm': '6.308', 'learning_rate': '1.683e-05', 'epoch': '0.4856'} {'loss': '0.9244', 'grad_norm': '5.036', 'learning_rate': '1.676e-05', 'epoch': '0.492'} {'loss': '0.9192', 'grad_norm': '3.666', 'learning_rate': '1.668e-05', 'epoch': '0.4985'} {'loss': '0.9424', 'grad_norm': '4.06', 'learning_rate': '1.661e-05', 'epoch': '0.505'} {'loss': '0.9067', 'grad_norm': '5.872', 'learning_rate': '1.654e-05', 'epoch': '0.5115'} {'loss': '0.9334', 'grad_norm': '5.868', 'learning_rate': '1.647e-05', 'epoch': '0.5179'} {'loss': '0.8922', 'grad_norm': '6.681', 'learning_rate': '1.64e-05', 'epoch': '0.5244'} {'loss': '0.9122', 'grad_norm': '7.907', 'learning_rate': '1.632e-05', 'epoch': '0.5309'} {'loss': '0.8825', 'grad_norm': '6.144', 'learning_rate': '1.625e-05', 'epoch': '0.5373'} {'loss': '0.9069', 'grad_norm': '6.138', 'learning_rate': '1.618e-05', 'epoch': '0.5438'} {'loss': '0.894', 'grad_norm': '6.968', 'learning_rate': '1.611e-05', 'epoch': '0.5503'} {'loss': '0.8898', 'grad_norm': '6.487', 'learning_rate': '1.604e-05', 'epoch': '0.5568'} {'loss': '0.8735', 'grad_norm': '4.058', 'learning_rate': '1.596e-05', 'epoch': '0.5632'} {'loss': '0.8694', 'grad_norm': '5.403', 'learning_rate': '1.589e-05', 'epoch': '0.5697'} {'loss': '0.8776', 'grad_norm': '6.723', 'learning_rate': '1.582e-05', 'epoch': '0.5762'} {'loss': '0.8664', 'grad_norm': '4.427', 'learning_rate': '1.575e-05', 'epoch': '0.5827'} 2026-02-25 20:05:35 - TripletEvaluator: Evaluating the model on the dev-768 dataset in epoch 0.5826656955571741 after 18000 steps (truncated to 768): 2026-02-25 20:26:21 - Accuracy Cosine Similarity: 97.12% 2026-02-25 20:26:21 - TripletEvaluator: Evaluating the model on the dev-512 dataset in epoch 0.5826656955571741 after 18000 steps (truncated to 512): 2026-02-25 20:46:59 - Accuracy Cosine Similarity: 97.11% 2026-02-25 20:46:59 - TripletEvaluator: Evaluating the model on the dev-256 dataset in epoch 0.5826656955571741 after 18000 steps (truncated to 256): 2026-02-25 21:07:40 - Accuracy Cosine Similarity: 97.09% 2026-02-25 21:07:40 - TripletEvaluator: Evaluating the model on the dev-128 dataset in epoch 0.5826656955571741 after 18000 steps (truncated to 128): 2026-02-25 21:28:20 - Accuracy Cosine Similarity: 97.03% 2026-02-25 21:28:20 - TripletEvaluator: Evaluating the model on the dev-64 dataset in epoch 0.5826656955571741 after 18000 steps (truncated to 64): 2026-02-25 21:49:02 - Accuracy Cosine Similarity: 96.82% {'eval_train_loss': '0.4541', 'eval_dev-768_cosine_accuracy': '0.9712', 'eval_dev-512_cosine_accuracy': '0.9711', 'eval_dev-256_cosine_accuracy': '0.9709', 'eval_dev-128_cosine_accuracy': '0.9703', 'eval_dev-64_cosine_accuracy': '0.9682', 'eval_sequential_score': '0.9712', 'eval_train_runtime': '9294', 'eval_train_samples_per_second': '121.6', 'eval_train_steps_per_second': '15.19', 'epoch': '0.5827'} 2026-02-25 21:49:02 - Saving model checkpoint to /home/skiredj.abderrahman/khalil/sbert_training/output/arabert_20260224_1730/checkpoint-18000 2026-02-25 21:49:02 - Save model to /home/skiredj.abderrahman/khalil/sbert_training/output/arabert_20260224_1730/checkpoint-18000 2026-02-26 14:22:01 - Load pretrained SentenceTransformer: bert-base-arabertv02 2026-02-26 14:22:14 - '[Errno -2] Name or service not known' thrown while requesting HEAD https://huggingface.co/bert-base-arabertv02/resolve/main/./modules.json 2026-02-26 14:22:14 - Retrying in 1s [Retry 1/5]. 2026-02-26 14:22:15 - No sentence-transformers model found with name bert-base-arabertv02. Creating a new one with mean pooling. 2026-02-26 14:23:53 - Use pytorch device_name: cuda:0 2026-02-26 14:23:53 - Load pretrained SentenceTransformer: /home/skiredj.abderrahman/khalil/sbert_training/output/arabert_20260224_1730/checkpoint-18000 {'loss': '0.8727', 'grad_norm': '5.438', 'learning_rate': '1.568e-05', 'epoch': '0.5891'} {'loss': '0.8524', 'grad_norm': '5.458', 'learning_rate': '1.56e-05', 'epoch': '0.5956'} {'loss': '0.8995', 'grad_norm': '6.666', 'learning_rate': '1.553e-05', 'epoch': '0.6021'} {'loss': '0.836', 'grad_norm': '5.681', 'learning_rate': '1.546e-05', 'epoch': '0.6086'} {'loss': '0.8628', 'grad_norm': '6.571', 'learning_rate': '1.539e-05', 'epoch': '0.615'} {'loss': '0.8244', 'grad_norm': '6.389', 'learning_rate': '1.532e-05', 'epoch': '0.6215'} {'loss': '0.8647', 'grad_norm': '4.987', 'learning_rate': '1.525e-05', 'epoch': '0.628'} {'loss': '0.8479', 'grad_norm': '4.451', 'learning_rate': '1.517e-05', 'epoch': '0.6345'} {'loss': '0.8204', 'grad_norm': '5.356', 'learning_rate': '1.51e-05', 'epoch': '0.6409'} {'loss': '0.8359', 'grad_norm': '5.146', 'learning_rate': '1.503e-05', 'epoch': '0.6474'} {'loss': '0.7952', 'grad_norm': '4.308', 'learning_rate': '1.496e-05', 'epoch': '0.6539'} {'loss': '0.8375', 'grad_norm': '5.216', 'learning_rate': '1.489e-05', 'epoch': '0.6604'} {'loss': '0.8364', 'grad_norm': '5.812', 'learning_rate': '1.481e-05', 'epoch': '0.6668'} {'loss': '0.8131', 'grad_norm': '5.52', 'learning_rate': '1.474e-05', 'epoch': '0.6733'} {'loss': '0.831', 'grad_norm': '6.452', 'learning_rate': '1.467e-05', 'epoch': '0.6798'} {'loss': '0.8295', 'grad_norm': '4.274', 'learning_rate': '1.46e-05', 'epoch': '0.6863'} {'loss': '0.7865', 'grad_norm': '4.77', 'learning_rate': '1.453e-05', 'epoch': '0.6927'} {'loss': '0.796', 'grad_norm': '5.027', 'learning_rate': '1.445e-05', 'epoch': '0.6992'} {'loss': '0.8287', 'grad_norm': '4.826', 'learning_rate': '1.438e-05', 'epoch': '0.7057'} {'loss': '0.8214', 'grad_norm': '4.381', 'learning_rate': '1.431e-05', 'epoch': '0.7121'} {'loss': '0.7879', 'grad_norm': '6.475', 'learning_rate': '1.424e-05', 'epoch': '0.7186'} {'loss': '0.8139', 'grad_norm': '5.295', 'learning_rate': '1.417e-05', 'epoch': '0.7251'} {'loss': '0.7849', 'grad_norm': '5.051', 'learning_rate': '1.409e-05', 'epoch': '0.7316'} {'loss': '0.788', 'grad_norm': '5.113', 'learning_rate': '1.402e-05', 'epoch': '0.738'} {'loss': '0.7725', 'grad_norm': '4.049', 'learning_rate': '1.395e-05', 'epoch': '0.7445'} {'loss': '0.8086', 'grad_norm': '4.646', 'learning_rate': '1.388e-05', 'epoch': '0.751'} {'loss': '0.7687', 'grad_norm': '5.049', 'learning_rate': '1.381e-05', 'epoch': '0.7575'} {'loss': '0.7828', 'grad_norm': '6.568', 'learning_rate': '1.373e-05', 'epoch': '0.7639'} {'loss': '0.7518', 'grad_norm': '5.9', 'learning_rate': '1.366e-05', 'epoch': '0.7704'} {'loss': '0.7599', 'grad_norm': '6.338', 'learning_rate': '1.359e-05', 'epoch': '0.7769'} 2026-02-26 16:19:09 - TripletEvaluator: Evaluating the model on the dev-768 dataset in epoch 0.7768875940762321 after 24000 steps (truncated to 768): 2026-02-26 16:40:53 - Accuracy Cosine Similarity: 97.37% 2026-02-26 16:40:53 - TripletEvaluator: Evaluating the model on the dev-512 dataset in epoch 0.7768875940762321 after 24000 steps (truncated to 512): 2026-02-26 17:02:09 - Accuracy Cosine Similarity: 97.38% 2026-02-26 17:02:09 - TripletEvaluator: Evaluating the model on the dev-256 dataset in epoch 0.7768875940762321 after 24000 steps (truncated to 256): 2026-02-26 17:23:27 - Accuracy Cosine Similarity: 97.38% 2026-02-26 17:23:27 - TripletEvaluator: Evaluating the model on the dev-128 dataset in epoch 0.7768875940762321 after 24000 steps (truncated to 128): 2026-02-26 17:44:42 - Accuracy Cosine Similarity: 97.34% 2026-02-26 17:44:42 - TripletEvaluator: Evaluating the model on the dev-64 dataset in epoch 0.7768875940762321 after 24000 steps (truncated to 64): 2026-02-26 18:06:06 - Accuracy Cosine Similarity: 97.18% {'eval_train_loss': '0.4041', 'eval_dev-768_cosine_accuracy': '0.9737', 'eval_dev-512_cosine_accuracy': '0.9738', 'eval_dev-256_cosine_accuracy': '0.9738', 'eval_dev-128_cosine_accuracy': '0.9734', 'eval_dev-64_cosine_accuracy': '0.9718', 'eval_sequential_score': '0.9737', 'eval_train_runtime': '9673', 'eval_train_samples_per_second': '116.8', 'eval_train_steps_per_second': '14.6', 'epoch': '0.7769'} 2026-02-26 18:06:06 - Saving model checkpoint to /home/skiredj.abderrahman/khalil/sbert_training/output/arabert_20260224_1730/checkpoint-24000 2026-02-26 18:06:06 - Save model to /home/skiredj.abderrahman/khalil/sbert_training/output/arabert_20260224_1730/checkpoint-24000 {'loss': '0.7332', 'grad_norm': '4.95', 'learning_rate': '1.352e-05', 'epoch': '0.7834'} {'loss': '0.7476', 'grad_norm': '4.513', 'learning_rate': '1.345e-05', 'epoch': '0.7898'} {'loss': '0.7806', 'grad_norm': '5.095', 'learning_rate': '1.337e-05', 'epoch': '0.7963'} {'loss': '0.7511', 'grad_norm': '5.826', 'learning_rate': '1.33e-05', 'epoch': '0.8028'} {'loss': '0.7652', 'grad_norm': '6.09', 'learning_rate': '1.323e-05', 'epoch': '0.8093'} {'loss': '0.7883', 'grad_norm': '4.332', 'learning_rate': '1.316e-05', 'epoch': '0.8157'} {'loss': '0.7305', 'grad_norm': '5.749', 'learning_rate': '1.309e-05', 'epoch': '0.8222'} {'loss': '0.7308', 'grad_norm': '4.871', 'learning_rate': '1.302e-05', 'epoch': '0.8287'} {'loss': '0.7368', 'grad_norm': '4.618', 'learning_rate': '1.294e-05', 'epoch': '0.8352'} {'loss': '0.7432', 'grad_norm': '4.836', 'learning_rate': '1.287e-05', 'epoch': '0.8416'} {'loss': '0.7046', 'grad_norm': '4.988', 'learning_rate': '1.28e-05', 'epoch': '0.8481'} {'loss': '0.7476', 'grad_norm': '4.596', 'learning_rate': '1.273e-05', 'epoch': '0.8546'} {'loss': '0.7212', 'grad_norm': '5.712', 'learning_rate': '1.266e-05', 'epoch': '0.8611'} {'loss': '0.7335', 'grad_norm': '3.99', 'learning_rate': '1.258e-05', 'epoch': '0.8675'} {'loss': '0.7415', 'grad_norm': '5.446', 'learning_rate': '1.251e-05', 'epoch': '0.874'} {'loss': '0.6937', 'grad_norm': '5.257', 'learning_rate': '1.244e-05', 'epoch': '0.8805'} {'loss': '0.7294', 'grad_norm': '5.302', 'learning_rate': '1.237e-05', 'epoch': '0.8869'} {'loss': '0.7436', 'grad_norm': '3.847', 'learning_rate': '1.23e-05', 'epoch': '0.8934'} {'loss': '0.7093', 'grad_norm': '6.182', 'learning_rate': '1.222e-05', 'epoch': '0.8999'} {'loss': '0.748', 'grad_norm': '5.445', 'learning_rate': '1.215e-05', 'epoch': '0.9064'} {'loss': '0.7039', 'grad_norm': '5.002', 'learning_rate': '1.208e-05', 'epoch': '0.9128'} {'loss': '0.7091', 'grad_norm': '5.085', 'learning_rate': '1.201e-05', 'epoch': '0.9193'} {'loss': '0.7019', 'grad_norm': '5.379', 'learning_rate': '1.194e-05', 'epoch': '0.9258'} {'loss': '0.7081', 'grad_norm': '5.63', 'learning_rate': '1.186e-05', 'epoch': '0.9323'} {'loss': '0.6833', 'grad_norm': '2.541', 'learning_rate': '1.179e-05', 'epoch': '0.9387'} {'loss': '0.6982', 'grad_norm': '5.714', 'learning_rate': '1.172e-05', 'epoch': '0.9452'} {'loss': '0.7249', 'grad_norm': '5.051', 'learning_rate': '1.165e-05', 'epoch': '0.9517'} {'loss': '0.7282', 'grad_norm': '6.322', 'learning_rate': '1.158e-05', 'epoch': '0.9582'} {'loss': '0.7147', 'grad_norm': '4.961', 'learning_rate': '1.15e-05', 'epoch': '0.9646'} {'loss': '0.6742', 'grad_norm': '4.871', 'learning_rate': '1.143e-05', 'epoch': '0.9711'} 2026-02-26 19:59:22 - TripletEvaluator: Evaluating the model on the dev-768 dataset in epoch 0.9711094925952901 after 30000 steps (truncated to 768): 2026-02-26 20:20:41 - Accuracy Cosine Similarity: 97.58% 2026-02-26 20:20:41 - TripletEvaluator: Evaluating the model on the dev-512 dataset in epoch 0.9711094925952901 after 30000 steps (truncated to 512): 2026-02-26 20:42:02 - Accuracy Cosine Similarity: 97.59% 2026-02-26 20:42:02 - TripletEvaluator: Evaluating the model on the dev-256 dataset in epoch 0.9711094925952901 after 30000 steps (truncated to 256): 2026-02-26 21:03:17 - Accuracy Cosine Similarity: 97.61% 2026-02-26 21:03:17 - TripletEvaluator: Evaluating the model on the dev-128 dataset in epoch 0.9711094925952901 after 30000 steps (truncated to 128): 2026-02-26 21:24:42 - Accuracy Cosine Similarity: 97.57% 2026-02-26 21:24:42 - TripletEvaluator: Evaluating the model on the dev-64 dataset in epoch 0.9711094925952901 after 30000 steps (truncated to 64): 2026-02-26 21:46:03 - Accuracy Cosine Similarity: 97.42% {'eval_train_loss': '0.364', 'eval_dev-768_cosine_accuracy': '0.9758', 'eval_dev-512_cosine_accuracy': '0.9759', 'eval_dev-256_cosine_accuracy': '0.9761', 'eval_dev-128_cosine_accuracy': '0.9757', 'eval_dev-64_cosine_accuracy': '0.9742', 'eval_sequential_score': '0.9758', 'eval_train_runtime': '9649', 'eval_train_samples_per_second': '117.1', 'eval_train_steps_per_second': '14.64', 'epoch': '0.9711'} 2026-02-26 21:46:03 - Saving model checkpoint to /home/skiredj.abderrahman/khalil/sbert_training/output/arabert_20260224_1730/checkpoint-30000 2026-02-26 21:46:03 - Save model to /home/skiredj.abderrahman/khalil/sbert_training/output/arabert_20260224_1730/checkpoint-30000 {'loss': '0.6901', 'grad_norm': '3.348', 'learning_rate': '1.136e-05', 'epoch': '0.9776'} {'loss': '0.7067', 'grad_norm': '3.76', 'learning_rate': '1.129e-05', 'epoch': '0.9841'} {'loss': '0.7166', 'grad_norm': '4.729', 'learning_rate': '1.122e-05', 'epoch': '0.9905'} {'loss': '0.68', 'grad_norm': '4.648', 'learning_rate': '1.114e-05', 'epoch': '0.997'} {'loss': '0.6846', 'grad_norm': '4.427', 'learning_rate': '1.107e-05', 'epoch': '1.003'} {'loss': '0.6723', 'grad_norm': '4.459', 'learning_rate': '1.1e-05', 'epoch': '1.01'} {'loss': '0.6573', 'grad_norm': '6.387', 'learning_rate': '1.093e-05', 'epoch': '1.016'} {'loss': '0.6895', 'grad_norm': '4.1', 'learning_rate': '1.086e-05', 'epoch': '1.023'} {'loss': '0.6588', 'grad_norm': '5.927', 'learning_rate': '1.079e-05', 'epoch': '1.029'} {'loss': '0.6517', 'grad_norm': '5.9', 'learning_rate': '1.071e-05', 'epoch': '1.036'} {'loss': '0.6498', 'grad_norm': '4.736', 'learning_rate': '1.064e-05', 'epoch': '1.042'} {'loss': '0.6836', 'grad_norm': '5.029', 'learning_rate': '1.057e-05', 'epoch': '1.049'} {'loss': '0.6819', 'grad_norm': '2.595', 'learning_rate': '1.05e-05', 'epoch': '1.055'} {'loss': '0.6463', 'grad_norm': '4.963', 'learning_rate': '1.043e-05', 'epoch': '1.062'} {'loss': '0.6645', 'grad_norm': '5.046', 'learning_rate': '1.035e-05', 'epoch': '1.068'} {'loss': '0.6518', 'grad_norm': '3.307', 'learning_rate': '1.028e-05', 'epoch': '1.075'} {'loss': '0.6235', 'grad_norm': '3.848', 'learning_rate': '1.021e-05', 'epoch': '1.081'} {'loss': '0.6302', 'grad_norm': '4.664', 'learning_rate': '1.014e-05', 'epoch': '1.088'} {'loss': '0.6452', 'grad_norm': '5.47', 'learning_rate': '1.007e-05', 'epoch': '1.094'} {'loss': '0.6477', 'grad_norm': '5.26', 'learning_rate': '9.994e-06', 'epoch': '1.101'} {'loss': '0.6084', 'grad_norm': '4.313', 'learning_rate': '9.922e-06', 'epoch': '1.107'} {'loss': '0.6259', 'grad_norm': '6.499', 'learning_rate': '9.85e-06', 'epoch': '1.114'} {'loss': '0.607', 'grad_norm': '3.922', 'learning_rate': '9.778e-06', 'epoch': '1.12'} {'loss': '0.5977', 'grad_norm': '5.37', 'learning_rate': '9.706e-06', 'epoch': '1.126'} {'loss': '0.6044', 'grad_norm': '5.068', 'learning_rate': '9.634e-06', 'epoch': '1.133'} {'loss': '0.6007', 'grad_norm': '4.109', 'learning_rate': '9.562e-06', 'epoch': '1.139'} {'loss': '0.5628', 'grad_norm': '4.954', 'learning_rate': '9.491e-06', 'epoch': '1.146'} {'loss': '0.5732', 'grad_norm': '4.068', 'learning_rate': '9.419e-06', 'epoch': '1.152'} {'loss': '0.5773', 'grad_norm': '4.939', 'learning_rate': '9.347e-06', 'epoch': '1.159'} {'loss': '0.5719', 'grad_norm': '4.418', 'learning_rate': '9.275e-06', 'epoch': '1.165'} 2026-02-26 23:38:18 - TripletEvaluator: Evaluating the model on the dev-768 dataset in epoch 1.1653152059561382 after 36000 steps (truncated to 768): 2026-02-27 00:01:17 - Accuracy Cosine Similarity: 97.75% 2026-02-27 00:01:17 - TripletEvaluator: Evaluating the model on the dev-512 dataset in epoch 1.1653152059561382 after 36000 steps (truncated to 512): 2026-02-27 00:23:59 - Accuracy Cosine Similarity: 97.77% 2026-02-27 00:23:59 - TripletEvaluator: Evaluating the model on the dev-256 dataset in epoch 1.1653152059561382 after 36000 steps (truncated to 256): 2026-02-27 00:46:38 - Accuracy Cosine Similarity: 97.77% 2026-02-27 00:46:38 - TripletEvaluator: Evaluating the model on the dev-128 dataset in epoch 1.1653152059561382 after 36000 steps (truncated to 128): 2026-02-27 01:09:18 - Accuracy Cosine Similarity: 97.74% 2026-02-27 01:09:18 - TripletEvaluator: Evaluating the model on the dev-64 dataset in epoch 1.1653152059561382 after 36000 steps (truncated to 64): 2026-02-27 01:32:14 - Accuracy Cosine Similarity: 97.60% {'eval_train_loss': '0.3356', 'eval_dev-768_cosine_accuracy': '0.9775', 'eval_dev-512_cosine_accuracy': '0.9777', 'eval_dev-256_cosine_accuracy': '0.9777', 'eval_dev-128_cosine_accuracy': '0.9774', 'eval_dev-64_cosine_accuracy': '0.976', 'eval_sequential_score': '0.9775', 'eval_train_runtime': '1.01e+04', 'eval_train_samples_per_second': '111.8', 'eval_train_steps_per_second': '13.98', 'epoch': '1.165'} 2026-02-27 01:32:14 - Saving model checkpoint to /home/skiredj.abderrahman/khalil/sbert_training/output/arabert_20260224_1730/checkpoint-36000 2026-02-27 01:32:14 - Save model to /home/skiredj.abderrahman/khalil/sbert_training/output/arabert_20260224_1730/checkpoint-36000 {'loss': '0.5471', 'grad_norm': '3.58', 'learning_rate': '9.203e-06', 'epoch': '1.172'} {'loss': '0.5635', 'grad_norm': '5.198', 'learning_rate': '9.131e-06', 'epoch': '1.178'} {'loss': '0.539', 'grad_norm': '4.468', 'learning_rate': '9.059e-06', 'epoch': '1.185'} {'loss': '0.5428', 'grad_norm': '4.349', 'learning_rate': '8.987e-06', 'epoch': '1.191'} {'loss': '0.5205', 'grad_norm': '2.936', 'learning_rate': '8.915e-06', 'epoch': '1.198'} {'loss': '0.5362', 'grad_norm': '3.337', 'learning_rate': '8.843e-06', 'epoch': '1.204'} {'loss': '0.5386', 'grad_norm': '5.76', 'learning_rate': '8.771e-06', 'epoch': '1.211'} {'loss': '0.5203', 'grad_norm': '3.261', 'learning_rate': '8.699e-06', 'epoch': '1.217'} {'loss': '0.5301', 'grad_norm': '3.732', 'learning_rate': '8.627e-06', 'epoch': '1.224'} {'loss': '0.5232', 'grad_norm': '4.54', 'learning_rate': '8.555e-06', 'epoch': '1.23'} {'loss': '0.4922', 'grad_norm': '4.291', 'learning_rate': '8.483e-06', 'epoch': '1.237'} {'loss': '0.5029', 'grad_norm': '3.979', 'learning_rate': '8.412e-06', 'epoch': '1.243'} {'loss': '0.4989', 'grad_norm': '7.829', 'learning_rate': '8.34e-06', 'epoch': '1.249'} {'loss': '0.5053', 'grad_norm': '2.903', 'learning_rate': '8.268e-06', 'epoch': '1.256'} {'loss': '0.5081', 'grad_norm': '5.471', 'learning_rate': '8.196e-06', 'epoch': '1.262'} {'loss': '0.496', 'grad_norm': '5.204', 'learning_rate': '8.124e-06', 'epoch': '1.269'} {'loss': '0.5052', 'grad_norm': '4.377', 'learning_rate': '8.052e-06', 'epoch': '1.275'} {'loss': '0.4984', 'grad_norm': '4.184', 'learning_rate': '7.98e-06', 'epoch': '1.282'} {'loss': '0.4909', 'grad_norm': '4.991', 'learning_rate': '7.908e-06', 'epoch': '1.288'} {'loss': '0.512', 'grad_norm': '3.76', 'learning_rate': '7.836e-06', 'epoch': '1.295'} {'loss': '0.4873', 'grad_norm': '3.844', 'learning_rate': '7.764e-06', 'epoch': '1.301'} {'loss': '0.4896', 'grad_norm': '6.987', 'learning_rate': '7.692e-06', 'epoch': '1.308'} {'loss': '0.49', 'grad_norm': '6.267', 'learning_rate': '7.62e-06', 'epoch': '1.314'} {'loss': '0.5036', 'grad_norm': '3.776', 'learning_rate': '7.548e-06', 'epoch': '1.321'} {'loss': '0.4876', 'grad_norm': '3.42', 'learning_rate': '7.476e-06', 'epoch': '1.327'} {'loss': '0.4705', 'grad_norm': '5.478', 'learning_rate': '7.404e-06', 'epoch': '1.334'} {'loss': '0.4786', 'grad_norm': '3.313', 'learning_rate': '7.333e-06', 'epoch': '1.34'} {'loss': '0.4998', 'grad_norm': '3.13', 'learning_rate': '7.261e-06', 'epoch': '1.347'} {'loss': '0.4692', 'grad_norm': '3.971', 'learning_rate': '7.189e-06', 'epoch': '1.353'} {'loss': '0.5064', 'grad_norm': '6.238', 'learning_rate': '7.117e-06', 'epoch': '1.36'} 2026-02-27 03:24:31 - TripletEvaluator: Evaluating the model on the dev-768 dataset in epoch 1.3595371044751963 after 42000 steps (truncated to 768): 2026-02-27 03:47:24 - Accuracy Cosine Similarity: 97.88% 2026-02-27 03:47:24 - TripletEvaluator: Evaluating the model on the dev-512 dataset in epoch 1.3595371044751963 after 42000 steps (truncated to 512): 2026-02-27 04:10:18 - Accuracy Cosine Similarity: 97.90% 2026-02-27 04:10:18 - TripletEvaluator: Evaluating the model on the dev-256 dataset in epoch 1.3595371044751963 after 42000 steps (truncated to 256): 2026-02-27 04:33:12 - Accuracy Cosine Similarity: 97.90% 2026-02-27 04:33:12 - TripletEvaluator: Evaluating the model on the dev-128 dataset in epoch 1.3595371044751963 after 42000 steps (truncated to 128): 2026-02-27 04:56:07 - Accuracy Cosine Similarity: 97.85% 2026-02-27 04:56:07 - TripletEvaluator: Evaluating the model on the dev-64 dataset in epoch 1.3595371044751963 after 42000 steps (truncated to 64): 2026-02-27 05:19:11 - Accuracy Cosine Similarity: 97.74% {'eval_train_loss': '0.316', 'eval_dev-768_cosine_accuracy': '0.9788', 'eval_dev-512_cosine_accuracy': '0.979', 'eval_dev-256_cosine_accuracy': '0.979', 'eval_dev-128_cosine_accuracy': '0.9785', 'eval_dev-64_cosine_accuracy': '0.9774', 'eval_sequential_score': '0.9788', 'eval_train_runtime': '1.014e+04', 'eval_train_samples_per_second': '111.5', 'eval_train_steps_per_second': '13.93', 'epoch': '1.36'} 2026-02-27 05:19:11 - Saving model checkpoint to /home/skiredj.abderrahman/khalil/sbert_training/output/arabert_20260224_1730/checkpoint-42000 2026-02-27 05:19:11 - Save model to /home/skiredj.abderrahman/khalil/sbert_training/output/arabert_20260224_1730/checkpoint-42000 {'loss': '0.4925', 'grad_norm': '5.158', 'learning_rate': '7.045e-06', 'epoch': '1.366'} {'loss': '0.4601', 'grad_norm': '4.139', 'learning_rate': '6.973e-06', 'epoch': '1.372'} {'loss': '0.4762', 'grad_norm': '3.411', 'learning_rate': '6.901e-06', 'epoch': '1.379'} {'loss': '0.4986', 'grad_norm': '4.23', 'learning_rate': '6.829e-06', 'epoch': '1.385'} {'loss': '0.4656', 'grad_norm': '5.326', 'learning_rate': '6.757e-06', 'epoch': '1.392'} {'loss': '0.4507', 'grad_norm': '3.826', 'learning_rate': '6.685e-06', 'epoch': '1.398'} {'loss': '0.4862', 'grad_norm': '3.509', 'learning_rate': '6.613e-06', 'epoch': '1.405'} {'loss': '0.4596', 'grad_norm': '4.734', 'learning_rate': '6.541e-06', 'epoch': '1.411'} {'loss': '0.4696', 'grad_norm': '4.799', 'learning_rate': '6.469e-06', 'epoch': '1.418'} {'loss': '0.4925', 'grad_norm': '4.942', 'learning_rate': '6.397e-06', 'epoch': '1.424'} {'loss': '0.4796', 'grad_norm': '4.147', 'learning_rate': '6.325e-06', 'epoch': '1.431'} {'loss': '0.4525', 'grad_norm': '5.146', 'learning_rate': '6.254e-06', 'epoch': '1.437'} {'loss': '0.4717', 'grad_norm': '3.52', 'learning_rate': '6.182e-06', 'epoch': '1.444'} {'loss': '0.4803', 'grad_norm': '3.25', 'learning_rate': '6.11e-06', 'epoch': '1.45'} {'loss': '0.4675', 'grad_norm': '7.35', 'learning_rate': '6.038e-06', 'epoch': '1.457'} {'loss': '0.4631', 'grad_norm': '3.847', 'learning_rate': '5.966e-06', 'epoch': '1.463'} {'loss': '0.4622', 'grad_norm': '4.57', 'learning_rate': '5.894e-06', 'epoch': '1.47'} {'loss': '0.4496', 'grad_norm': '1.997', 'learning_rate': '5.822e-06', 'epoch': '1.476'} {'loss': '0.4678', 'grad_norm': '4.266', 'learning_rate': '5.75e-06', 'epoch': '1.483'} {'loss': '0.4495', 'grad_norm': '5.948', 'learning_rate': '5.678e-06', 'epoch': '1.489'} {'loss': '0.4474', 'grad_norm': '3.7', 'learning_rate': '5.606e-06', 'epoch': '1.495'} {'loss': '0.4587', 'grad_norm': '2.877', 'learning_rate': '5.534e-06', 'epoch': '1.502'} {'loss': '0.4591', 'grad_norm': '4.245', 'learning_rate': '5.462e-06', 'epoch': '1.508'} {'loss': '0.4573', 'grad_norm': '5.431', 'learning_rate': '5.39e-06', 'epoch': '1.515'} {'loss': '0.4442', 'grad_norm': '3.338', 'learning_rate': '5.318e-06', 'epoch': '1.521'} {'loss': '0.455', 'grad_norm': '4.723', 'learning_rate': '5.246e-06', 'epoch': '1.528'} {'loss': '0.4493', 'grad_norm': '4.226', 'learning_rate': '5.175e-06', 'epoch': '1.534'} {'loss': '0.4485', 'grad_norm': '4.451', 'learning_rate': '5.103e-06', 'epoch': '1.541'} {'loss': '0.4569', 'grad_norm': '4.297', 'learning_rate': '5.031e-06', 'epoch': '1.547'} {'loss': '0.4346', 'grad_norm': '4.199', 'learning_rate': '4.959e-06', 'epoch': '1.554'} 2026-02-27 07:11:49 - TripletEvaluator: Evaluating the model on the dev-768 dataset in epoch 1.5537590029942543 after 48000 steps (truncated to 768): 2026-02-27 07:34:37 - Accuracy Cosine Similarity: 97.99% 2026-02-27 07:34:37 - TripletEvaluator: Evaluating the model on the dev-512 dataset in epoch 1.5537590029942543 after 48000 steps (truncated to 512): 2026-02-27 07:57:13 - Accuracy Cosine Similarity: 98.02% 2026-02-27 07:57:13 - TripletEvaluator: Evaluating the model on the dev-256 dataset in epoch 1.5537590029942543 after 48000 steps (truncated to 256): 2026-02-27 08:20:07 - Accuracy Cosine Similarity: 98.02% 2026-02-27 08:20:07 - TripletEvaluator: Evaluating the model on the dev-128 dataset in epoch 1.5537590029942543 after 48000 steps (truncated to 128): 2026-02-27 08:42:52 - Accuracy Cosine Similarity: 97.98% 2026-02-27 08:42:52 - TripletEvaluator: Evaluating the model on the dev-64 dataset in epoch 1.5537590029942543 after 48000 steps (truncated to 64): 2026-02-27 09:05:32 - Accuracy Cosine Similarity: 97.88% {'eval_train_loss': '0.3001', 'eval_dev-768_cosine_accuracy': '0.9799', 'eval_dev-512_cosine_accuracy': '0.9802', 'eval_dev-256_cosine_accuracy': '0.9802', 'eval_dev-128_cosine_accuracy': '0.9798', 'eval_dev-64_cosine_accuracy': '0.9788', 'eval_sequential_score': '0.9799', 'eval_train_runtime': '1.008e+04', 'eval_train_samples_per_second': '112.1', 'eval_train_steps_per_second': '14.02', 'epoch': '1.554'} 2026-02-27 09:05:32 - Saving model checkpoint to /home/skiredj.abderrahman/khalil/sbert_training/output/arabert_20260224_1730/checkpoint-48000 2026-02-27 09:05:32 - Save model to /home/skiredj.abderrahman/khalil/sbert_training/output/arabert_20260224_1730/checkpoint-48000 {'loss': '0.4469', 'grad_norm': '3.364', 'learning_rate': '4.887e-06', 'epoch': '1.56'} {'loss': '0.4602', 'grad_norm': '5.309', 'learning_rate': '4.815e-06', 'epoch': '1.567'} {'loss': '0.443', 'grad_norm': '3.875', 'learning_rate': '4.743e-06', 'epoch': '1.573'} {'loss': '0.4524', 'grad_norm': '4.824', 'learning_rate': '4.671e-06', 'epoch': '1.58'} {'loss': '0.4528', 'grad_norm': '4.996', 'learning_rate': '4.599e-06', 'epoch': '1.586'} {'loss': '0.4348', 'grad_norm': '4.96', 'learning_rate': '4.527e-06', 'epoch': '1.593'} {'loss': '0.4533', 'grad_norm': '5.219', 'learning_rate': '4.455e-06', 'epoch': '1.599'} {'loss': '0.4523', 'grad_norm': '3.444', 'learning_rate': '4.383e-06', 'epoch': '1.606'} {'loss': '0.4509', 'grad_norm': '5.647', 'learning_rate': '4.311e-06', 'epoch': '1.612'} {'loss': '0.4365', 'grad_norm': '5.052', 'learning_rate': '4.239e-06', 'epoch': '1.618'} {'loss': '0.4504', 'grad_norm': '5.786', 'learning_rate': '4.167e-06', 'epoch': '1.625'} {'loss': '0.4292', 'grad_norm': '4.353', 'learning_rate': '4.096e-06', 'epoch': '1.631'} {'loss': '0.4406', 'grad_norm': '2.976', 'learning_rate': '4.024e-06', 'epoch': '1.638'} {'loss': '0.4333', 'grad_norm': '3.685', 'learning_rate': '3.952e-06', 'epoch': '1.644'} {'loss': '0.4361', 'grad_norm': '4.107', 'learning_rate': '3.88e-06', 'epoch': '1.651'} {'loss': '0.4065', 'grad_norm': '3.636', 'learning_rate': '3.808e-06', 'epoch': '1.657'} {'loss': '0.4671', 'grad_norm': '3.464', 'learning_rate': '3.736e-06', 'epoch': '1.664'} {'loss': '0.4328', 'grad_norm': '3.129', 'learning_rate': '3.664e-06', 'epoch': '1.67'} {'loss': '0.431', 'grad_norm': '2.453', 'learning_rate': '3.592e-06', 'epoch': '1.677'} {'loss': '0.4523', 'grad_norm': '3.727', 'learning_rate': '3.52e-06', 'epoch': '1.683'} {'loss': '0.4232', 'grad_norm': '4.398', 'learning_rate': '3.448e-06', 'epoch': '1.69'} {'loss': '0.4257', 'grad_norm': '2.861', 'learning_rate': '3.376e-06', 'epoch': '1.696'} {'loss': '0.4448', 'grad_norm': '3.523', 'learning_rate': '3.304e-06', 'epoch': '1.703'} {'loss': '0.4491', 'grad_norm': '3.893', 'learning_rate': '3.232e-06', 'epoch': '1.709'} {'loss': '0.4224', 'grad_norm': '3.399', 'learning_rate': '3.16e-06', 'epoch': '1.716'} {'loss': '0.4297', 'grad_norm': '4.703', 'learning_rate': '3.088e-06', 'epoch': '1.722'} {'loss': '0.4522', 'grad_norm': '4.29', 'learning_rate': '3.017e-06', 'epoch': '1.729'} {'loss': '0.4195', 'grad_norm': '4.29', 'learning_rate': '2.945e-06', 'epoch': '1.735'} {'loss': '0.4227', 'grad_norm': '3.841', 'learning_rate': '2.873e-06', 'epoch': '1.742'} {'loss': '0.4381', 'grad_norm': '4.086', 'learning_rate': '2.801e-06', 'epoch': '1.748'} 2026-02-27 10:59:10 - TripletEvaluator: Evaluating the model on the dev-768 dataset in epoch 1.7479809015133123 after 54000 steps (truncated to 768): 2026-02-27 11:22:08 - Accuracy Cosine Similarity: 98.07% 2026-02-27 11:22:08 - TripletEvaluator: Evaluating the model on the dev-512 dataset in epoch 1.7479809015133123 after 54000 steps (truncated to 512): 2026-02-27 11:44:57 - Accuracy Cosine Similarity: 98.08% 2026-02-27 11:44:57 - TripletEvaluator: Evaluating the model on the dev-256 dataset in epoch 1.7479809015133123 after 54000 steps (truncated to 256): 2026-02-27 12:07:55 - Accuracy Cosine Similarity: 98.08% 2026-02-27 12:07:55 - TripletEvaluator: Evaluating the model on the dev-128 dataset in epoch 1.7479809015133123 after 54000 steps (truncated to 128): 2026-02-27 12:30:35 - Accuracy Cosine Similarity: 98.05% 2026-02-27 12:30:35 - TripletEvaluator: Evaluating the model on the dev-64 dataset in epoch 1.7479809015133123 after 54000 steps (truncated to 64): 2026-02-27 12:53:24 - Accuracy Cosine Similarity: 97.94% {'eval_train_loss': '0.2875', 'eval_dev-768_cosine_accuracy': '0.9807', 'eval_dev-512_cosine_accuracy': '0.9808', 'eval_dev-256_cosine_accuracy': '0.9808', 'eval_dev-128_cosine_accuracy': '0.9805', 'eval_dev-64_cosine_accuracy': '0.9794', 'eval_sequential_score': '0.9807', 'eval_train_runtime': '1.012e+04', 'eval_train_samples_per_second': '111.6', 'eval_train_steps_per_second': '13.95', 'epoch': '1.748'} 2026-02-27 12:53:24 - Saving model checkpoint to /home/skiredj.abderrahman/khalil/sbert_training/output/arabert_20260224_1730/checkpoint-54000 2026-02-27 12:53:24 - Save model to /home/skiredj.abderrahman/khalil/sbert_training/output/arabert_20260224_1730/checkpoint-54000 {'loss': '0.446', 'grad_norm': '4.176', 'learning_rate': '2.729e-06', 'epoch': '1.754'} {'loss': '0.426', 'grad_norm': '4.261', 'learning_rate': '2.657e-06', 'epoch': '1.761'} {'loss': '0.4299', 'grad_norm': '4.676', 'learning_rate': '2.585e-06', 'epoch': '1.767'} {'loss': '0.4247', 'grad_norm': '3.933', 'learning_rate': '2.513e-06', 'epoch': '1.774'} {'loss': '0.4244', 'grad_norm': '4.853', 'learning_rate': '2.441e-06', 'epoch': '1.78'} {'loss': '0.4185', 'grad_norm': '2.985', 'learning_rate': '2.369e-06', 'epoch': '1.787'} {'loss': '0.4292', 'grad_norm': '3.804', 'learning_rate': '2.297e-06', 'epoch': '1.793'} {'loss': '0.4468', 'grad_norm': '3.187', 'learning_rate': '2.225e-06', 'epoch': '1.8'} {'loss': '0.4118', 'grad_norm': '4.004', 'learning_rate': '2.153e-06', 'epoch': '1.806'} {'loss': '0.4306', 'grad_norm': '4.007', 'learning_rate': '2.081e-06', 'epoch': '1.813'} {'loss': '0.4447', 'grad_norm': '4.323', 'learning_rate': '2.009e-06', 'epoch': '1.819'} {'loss': '0.4147', 'grad_norm': '3.863', 'learning_rate': '1.938e-06', 'epoch': '1.826'} {'loss': '0.4189', 'grad_norm': '4.788', 'learning_rate': '1.866e-06', 'epoch': '1.832'} {'loss': '0.4167', 'grad_norm': '4.276', 'learning_rate': '1.794e-06', 'epoch': '1.839'} {'loss': '0.4022', 'grad_norm': '3.887', 'learning_rate': '1.722e-06', 'epoch': '1.845'} {'loss': '0.4158', 'grad_norm': '3.075', 'learning_rate': '1.65e-06', 'epoch': '1.852'} {'loss': '0.4228', 'grad_norm': '3.993', 'learning_rate': '1.578e-06', 'epoch': '1.858'} {'loss': '0.4256', 'grad_norm': '4.497', 'learning_rate': '1.506e-06', 'epoch': '1.865'} {'loss': '0.4251', 'grad_norm': '4.539', 'learning_rate': '1.434e-06', 'epoch': '1.871'} {'loss': '0.4232', 'grad_norm': '2.337', 'learning_rate': '1.362e-06', 'epoch': '1.877'} {'loss': '0.4143', 'grad_norm': '3.389', 'learning_rate': '1.29e-06', 'epoch': '1.884'} {'loss': '0.4331', 'grad_norm': '3.545', 'learning_rate': '1.218e-06', 'epoch': '1.89'} {'loss': '0.4253', 'grad_norm': '5.606', 'learning_rate': '1.146e-06', 'epoch': '1.897'} {'loss': '0.441', 'grad_norm': '4.453', 'learning_rate': '1.074e-06', 'epoch': '1.903'} {'loss': '0.4337', 'grad_norm': '5.374', 'learning_rate': '1.002e-06', 'epoch': '1.91'} {'loss': '0.4016', 'grad_norm': '2.246', 'learning_rate': '9.305e-07', 'epoch': '1.916'} {'loss': '0.4249', 'grad_norm': '5.255', 'learning_rate': '8.585e-07', 'epoch': '1.923'} {'loss': '0.4108', 'grad_norm': '3.59', 'learning_rate': '7.866e-07', 'epoch': '1.929'} {'loss': '0.4272', 'grad_norm': '4.258', 'learning_rate': '7.147e-07', 'epoch': '1.936'} {'loss': '0.3916', 'grad_norm': '3.476', 'learning_rate': '6.427e-07', 'epoch': '1.942'} 2026-02-27 14:47:29 - TripletEvaluator: Evaluating the model on the dev-768 dataset in epoch 1.9422028000323703 after 60000 steps (truncated to 768): 2026-02-27 15:10:59 - Accuracy Cosine Similarity: 98.10% 2026-02-27 15:10:59 - TripletEvaluator: Evaluating the model on the dev-512 dataset in epoch 1.9422028000323703 after 60000 steps (truncated to 512): 2026-02-27 15:34:30 - Accuracy Cosine Similarity: 98.11% 2026-02-27 15:34:30 - TripletEvaluator: Evaluating the model on the dev-256 dataset in epoch 1.9422028000323703 after 60000 steps (truncated to 256): 2026-02-27 15:58:10 - Accuracy Cosine Similarity: 98.13% 2026-02-27 15:58:10 - TripletEvaluator: Evaluating the model on the dev-128 dataset in epoch 1.9422028000323703 after 60000 steps (truncated to 128): 2026-02-27 16:21:18 - Accuracy Cosine Similarity: 98.11% 2026-02-27 16:21:18 - TripletEvaluator: Evaluating the model on the dev-64 dataset in epoch 1.9422028000323703 after 60000 steps (truncated to 64): 2026-02-27 16:44:14 - Accuracy Cosine Similarity: 97.97% {'eval_train_loss': '0.2812', 'eval_dev-768_cosine_accuracy': '0.981', 'eval_dev-512_cosine_accuracy': '0.9811', 'eval_dev-256_cosine_accuracy': '0.9813', 'eval_dev-128_cosine_accuracy': '0.9811', 'eval_dev-64_cosine_accuracy': '0.9797', 'eval_sequential_score': '0.981', 'eval_train_runtime': '1.03e+04', 'eval_train_samples_per_second': '109.7', 'eval_train_steps_per_second': '13.71', 'epoch': '1.942'} 2026-02-27 16:44:14 - Saving model checkpoint to /home/skiredj.abderrahman/khalil/sbert_training/output/arabert_20260224_1730/checkpoint-60000 2026-02-27 16:44:14 - Save model to /home/skiredj.abderrahman/khalil/sbert_training/output/arabert_20260224_1730/checkpoint-60000 {'loss': '0.4334', 'grad_norm': '4.623', 'learning_rate': '5.708e-07', 'epoch': '1.949'} {'loss': '0.4462', 'grad_norm': '5.31', 'learning_rate': '4.989e-07', 'epoch': '1.955'} {'loss': '0.4436', 'grad_norm': '3.379', 'learning_rate': '4.269e-07', 'epoch': '1.962'} {'loss': '0.4278', 'grad_norm': '5.471', 'learning_rate': '3.55e-07', 'epoch': '1.968'} {'loss': '0.417', 'grad_norm': '3.435', 'learning_rate': '2.831e-07', 'epoch': '1.975'} {'loss': '0.4376', 'grad_norm': '2.617', 'learning_rate': '2.111e-07', 'epoch': '1.981'} {'loss': '0.4433', 'grad_norm': '3.465', 'learning_rate': '1.392e-07', 'epoch': '1.988'} {'loss': '0.4292', 'grad_norm': '2.354', 'learning_rate': '6.726e-08', 'epoch': '1.994'} 2026-02-27 17:01:56 - Saving model checkpoint to /home/skiredj.abderrahman/khalil/sbert_training/output/arabert_20260224_1730/checkpoint-61786 2026-02-27 17:01:56 - Save model to /home/skiredj.abderrahman/khalil/sbert_training/output/arabert_20260224_1730/checkpoint-61786 {'train_runtime': '9.588e+04', 'train_samples_per_second': '82.48', 'train_steps_per_second': '0.644', 'train_loss': '0.403', 'epoch': '2'} 2026-02-27 17:01:58 - Save model to /home/skiredj.abderrahman/khalil/sbert_training/output/final model saved successfully 2026-02-27 17:01:59 - TripletEvaluator: Evaluating the model on the test-768 dataset (truncated to 768): 2026-02-27 17:21:39 - Accuracy Cosine Similarity: 98.10% 2026-02-27 17:21:39 - TripletEvaluator: Evaluating the model on the test-512 dataset (truncated to 512): 2026-02-27 17:41:10 - Accuracy Cosine Similarity: 98.13% 2026-02-27 17:41:10 - TripletEvaluator: Evaluating the model on the test-256 dataset (truncated to 256): 2026-02-27 18:00:40 - Accuracy Cosine Similarity: 98.13% 2026-02-27 18:00:40 - TripletEvaluator: Evaluating the model on the test-128 dataset (truncated to 128): 2026-02-27 18:20:06 - Accuracy Cosine Similarity: 98.11% 2026-02-27 18:20:06 - TripletEvaluator: Evaluating the model on the test-64 dataset (truncated to 64): 2026-02-27 18:39:32 - Accuracy Cosine Similarity: 97.97%