| 13:4: not a valid test operator: ( |
| 13:4: not a valid test operator: 535.86.10 |
| 2026-04-28 04:33:01.866542: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 |
| WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them. |
| WARNING:tensorflow:From /workspace/finetune/main_chars_lstm.py:36: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead. |
|
|
| [train] params: {"batch_size": 128, "buffer": 2000, "char_lstm_size": 25, "chars": "/workspace/finetune/data_kvkk_set2_v3/vocab.chars.txt", "dim": 50, "dim_chars": 100, "dropout": 0.5, "early_stop_max_steps": 600, "epochs": 20, "learning_rate": 0.001, "log_step_count_steps": 200, "lstm_size": 100, "min_steps": 600000, "num_oov_buckets": 1, "save_checkpoints_secs": 500, "save_summary_steps": 1000, "tags": "/workspace/finetune/data_kvkk_set2_v3/vocab.tags.txt", "trainable_embeddings": true, "vectors": "/workspace/finetune/data_kvkk_set2_v3/vectors.npz", "words": "/workspace/finetune/data_kvkk_set2_v3/vocab.words.txt"} |
| Using config: {'_model_dir': '/workspace/finetune/results/model', '_tf_random_seed': None, '_save_summary_steps': 1000, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 500, '_session_config': allow_soft_placement: true |
| graph_options { |
| rewrite_options { |
| meta_optimizer_iterations: ONE |
| } |
| } |
| , '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 200, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fe07b876190>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1} |
| Not using Distribute Coordinator. |
| Running training and evaluation locally (non-distributed). |
| Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps None or save_checkpoints_secs 500. |
| Calling model_fn. |
|
|
| The TensorFlow contrib module will not be included in TensorFlow 2.0. |
| For more information, please see: |
| * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md |
| * https://github.com/tensorflow/addons |
| * https://github.com/tensorflow/io (for I/O related ops) |
| If you depend on functionality not listed there, please file an issue. |
|
|
| TensorFlow will not use sklearn by default. This improves performance in some cases. To enable sklearn export the environment variable TF_ALLOW_IOLIBS=1. |
| TensorFlow will not use Dask by default. This improves performance in some cases. To enable Dask export the environment variable TF_ALLOW_IOLIBS=1. |
| TensorFlow will not use Pandas by default. This improves performance in some cases. To enable Pandas export the environment variable TF_ALLOW_IOLIBS=1. |
| From /workspace/finetune/main_chars_lstm.py:104: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead. |
|
|
| From /workspace/finetune/main_chars_lstm.py:169: The name tf.metrics.accuracy is deprecated. Please use tf.compat.v1.metrics.accuracy instead. |
|
|
| From /py_packages/tf_metrics/__init__.py:152: The name tf.diag_part is deprecated. Please use tf.linalg.tensor_diag_part instead. |
|
|
| From /workspace/finetune/main_chars_lstm.py:175: The name tf.summary.scalar is deprecated. Please use tf.compat.v1.summary.scalar instead. |
|
|
| From /workspace/finetune/main_chars_lstm.py:180: The name tf.train.AdamOptimizer is deprecated. Please use tf.compat.v1.train.AdamOptimizer instead. |
|
|
| From /workspace/finetune/main_chars_lstm.py:181: The name tf.train.get_or_create_global_step is deprecated. Please use tf.compat.v1.train.get_or_create_global_step instead. |
|
|
| Done calling model_fn. |
| Create CheckpointSaverHook. |
| Graph was finalized. |
| 2026-04-28 04:33:05.296490: I tensorflow/core/platform/profile_utils/cpu_utils.cc:109] CPU Frequency: 2000000000 Hz |
| 2026-04-28 04:33:05.328827: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x64d0070 initialized for platform Host (this does not guarantee that XLA will be used). Devices: |
| 2026-04-28 04:33:05.328873: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version |
| 2026-04-28 04:33:05.333954: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1 |
| 2026-04-28 04:33:05.523685: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x660a6c0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices: |
| 2026-04-28 04:33:05.523735: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): NVIDIA H100, Compute Capability 9.0 |
| 2026-04-28 04:33:05.524596: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1669] Found device 0 with properties: |
| name: NVIDIA H100 major: 9 minor: 0 memoryClockRate(GHz): 1.98 |
| pciBusID: 0000:ad:00.0 |
| 2026-04-28 04:33:05.524637: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 |
| 2026-04-28 04:33:05.859509: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11 |
| 2026-04-28 04:33:05.891835: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10 |
| 2026-04-28 04:33:05.899990: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10 |
| 2026-04-28 04:33:05.908493: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.11 |
| 2026-04-28 04:33:05.920724: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11 |
| 2026-04-28 04:33:05.922071: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8 |
| 2026-04-28 04:33:05.922505: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1797] Adding visible gpu devices: 0 |
| 2026-04-28 04:33:05.923841: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 |
| 2026-04-28 04:33:05.929938: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1209] Device interconnect StreamExecutor with strength 1 edge matrix: |
| 2026-04-28 04:33:05.929961: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1215] 0 |
| 2026-04-28 04:33:05.929969: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1228] 0: N |
| 2026-04-28 04:33:05.930412: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1354] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 60325 MB memory) -> physical GPU (device: 0, name: NVIDIA H100, pci bus id: 0000:ad:00.0, compute capability: 9.0) |
| Running local_init_op. |
| Done running local_init_op. |
| Saving checkpoints for 0 into /workspace/finetune/results/model/model.ckpt. |
| 2026-04-28 04:33:09.576112: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11 |
| loss = 117.38817, step = 0 |
| global_step/sec: 15.9262 |
| loss = 4.773464, step = 200 (12.558 sec) |
| global_step/sec: 16.4469 |
| loss = 2.8829772, step = 400 (12.160 sec) |
| global_step/sec: 16.4739 |
| loss = 1.6619699, step = 600 (12.140 sec) |
| global_step/sec: 17.5305 |
| loss = 1.4286156, step = 800 (11.409 sec) |
| global_step/sec: 16.591 |
| loss = 1.1506453, step = 1000 (12.055 sec) |
| global_step/sec: 16.3338 |
| loss = 1.6229303, step = 1200 (12.245 sec) |
| global_step/sec: 16.319 |
| loss = 1.1030784, step = 1400 (12.256 sec) |
| global_step/sec: 16.0311 |
| loss = 0.48492754, step = 1600 (12.476 sec) |
| global_step/sec: 15.9238 |
| loss = 0.7973819, step = 1800 (12.560 sec) |
| global_step/sec: 15.9882 |
| loss = 0.5444262, step = 2000 (12.510 sec) |
| global_step/sec: 16.0443 |
| loss = 0.34822357, step = 2200 (12.465 sec) |
| global_step/sec: 16.0029 |
| loss = 0.46935606, step = 2400 (12.498 sec) |
| global_step/sec: 15.9425 |
| loss = 0.5678671, step = 2600 (12.548 sec) |
| global_step/sec: 16.3265 |
| loss = 0.5309495, step = 2800 (12.247 sec) |
| global_step/sec: 16.3088 |
| loss = 0.7498473, step = 3000 (12.264 sec) |
| global_step/sec: 15.5706 |
| loss = 0.6110069, step = 3200 (12.844 sec) |
| global_step/sec: 16.7051 |
| loss = 0.24717766, step = 3400 (11.973 sec) |
| global_step/sec: 16.8543 |
| loss = 1.3274518, step = 3600 (11.866 sec) |
| global_step/sec: 16.4027 |
| loss = 0.33537441, step = 3800 (12.193 sec) |
| global_step/sec: 16.2972 |
| loss = 0.29465103, step = 4000 (12.272 sec) |
| global_step/sec: 16.6392 |
| loss = 0.2784903, step = 4200 (12.020 sec) |
| global_step/sec: 16.6742 |
| loss = 0.2184174, step = 4400 (11.994 sec) |
| global_step/sec: 16.755 |
| loss = 0.44008517, step = 4600 (11.937 sec) |
| global_step/sec: 17.096 |
| loss = 0.25767207, step = 4800 (11.698 sec) |
| global_step/sec: 16.899 |
| loss = 0.17028493, step = 5000 (11.835 sec) |
| global_step/sec: 16.7529 |
| loss = 0.17947096, step = 5200 (11.938 sec) |
| global_step/sec: 17.0098 |
| loss = 0.28755808, step = 5400 (11.758 sec) |
| global_step/sec: 17.141 |
| loss = 0.25743997, step = 5600 (11.668 sec) |
| global_step/sec: 17.1619 |
| loss = 0.44591618, step = 5800 (11.654 sec) |
| global_step/sec: 17.2099 |
| loss = 0.44130975, step = 6000 (11.621 sec) |
| global_step/sec: 16.7787 |
| loss = 0.9595022, step = 6200 (11.920 sec) |
| global_step/sec: 16.5132 |
| loss = 0.17380977, step = 6400 (12.112 sec) |
| global_step/sec: 17.1766 |
| loss = 0.62790394, step = 6600 (11.644 sec) |
| global_step/sec: 16.6513 |
| loss = 0.115854025, step = 6800 (12.011 sec) |
| global_step/sec: 16.8948 |
| loss = 0.36853623, step = 7000 (11.839 sec) |
| global_step/sec: 16.8772 |
| loss = 0.15148377, step = 7200 (11.850 sec) |
| global_step/sec: 17.1629 |
| loss = 0.23943508, step = 7400 (11.653 sec) |
| global_step/sec: 16.7432 |
| loss = 0.13623583, step = 7600 (11.945 sec) |
| global_step/sec: 17.1363 |
| loss = 0.13282633, step = 7800 (11.671 sec) |
| global_step/sec: 17.5276 |
| loss = 0.0774439, step = 8000 (11.413 sec) |
| global_step/sec: 17.2208 |
| loss = 0.08795112, step = 8200 (11.612 sec) |
| Saving checkpoints for 8284 into /workspace/finetune/results/model/model.ckpt. |
| Calling model_fn. |
| Done calling model_fn. |
| Starting evaluation at 2026-04-28T04:41:29Z |
| Graph was finalized. |
| 2026-04-28 04:41:29.885621: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1669] Found device 0 with properties: |
| name: NVIDIA H100 major: 9 minor: 0 memoryClockRate(GHz): 1.98 |
| pciBusID: 0000:ad:00.0 |
| 2026-04-28 04:41:29.885671: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 |
| 2026-04-28 04:41:29.885708: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11 |
| 2026-04-28 04:41:29.885715: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10 |
| 2026-04-28 04:41:29.885723: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10 |
| 2026-04-28 04:41:29.885730: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.11 |
| 2026-04-28 04:41:29.885736: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11 |
| 2026-04-28 04:41:29.885743: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8 |
| 2026-04-28 04:41:29.886053: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1797] Adding visible gpu devices: 0 |
| 2026-04-28 04:41:29.886087: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1209] Device interconnect StreamExecutor with strength 1 edge matrix: |
| 2026-04-28 04:41:29.886092: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1215] 0 |
| 2026-04-28 04:41:29.886097: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1228] 0: N |
| 2026-04-28 04:41:29.886396: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1354] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 60325 MB memory) -> physical GPU (device: 0, name: NVIDIA H100, pci bus id: 0000:ad:00.0, compute capability: 9.0) |
| Restoring parameters from /workspace/finetune/results/model/model.ckpt-8284 |
| Running local_init_op. |
| Done running local_init_op. |
| Evaluation [10/100] |
| Evaluation [20/100] |
| Evaluation [30/100] |
| Evaluation [40/100] |
| Evaluation [50/100] |
| Evaluation [60/100] |
| Evaluation [70/100] |
| Evaluation [80/100] |
| Evaluation [90/100] |
| Evaluation [100/100] |
| Finished evaluation at 2026-04-28-04:41:34 |
| Saving dict for global step 8284: acc = 0.99759144, f1 = 0.9933167, global_step = 8284, loss = 0.17885803, precision = 0.99258435, recall = 0.99405015 |
| Saving 'checkpoint_path' summary for global step 8284: /workspace/finetune/results/model/model.ckpt-8284 |
| global_step/sec: 11.4752 |
| loss = 0.24231434, step = 8400 (17.429 sec) |
| global_step/sec: 17.3567 |
| loss = 0.11804509, step = 8600 (11.523 sec) |
| global_step/sec: 17.0883 |
| loss = 0.23675942, step = 8800 (11.704 sec) |
| global_step/sec: 17.4953 |
| loss = 0.14836943, step = 9000 (11.432 sec) |
| global_step/sec: 17.495 |
| loss = 0.1359005, step = 9200 (11.432 sec) |
| global_step/sec: 17.082 |
| loss = 0.21677959, step = 9400 (11.709 sec) |
| global_step/sec: 17.2371 |
| loss = 0.14764589, step = 9600 (11.603 sec) |
| global_step/sec: 17.2015 |
| loss = 0.09986603, step = 9800 (11.627 sec) |
| global_step/sec: 17.0483 |
| loss = 0.23195946, step = 10000 (11.732 sec) |
| global_step/sec: 17.4384 |
| loss = 0.11996138, step = 10200 (11.469 sec) |
| global_step/sec: 17.4722 |
| loss = 0.077587605, step = 10400 (11.447 sec) |
| global_step/sec: 17.2922 |
| loss = 0.13140821, step = 10600 (11.566 sec) |
| global_step/sec: 17.3569 |
| loss = 0.1550746, step = 10800 (11.523 sec) |
| global_step/sec: 17.3439 |
| loss = 0.16950673, step = 11000 (11.531 sec) |
| global_step/sec: 17.4338 |
| loss = 0.1085785, step = 11200 (11.472 sec) |
| global_step/sec: 17.5506 |
| loss = 0.114634454, step = 11400 (11.396 sec) |
| global_step/sec: 17.7854 |
| loss = 0.09087747, step = 11600 (11.245 sec) |
| global_step/sec: 17.3965 |
| loss = 0.0742746, step = 11800 (11.497 sec) |
| global_step/sec: 17.3043 |
| loss = 0.11870754, step = 12000 (11.558 sec) |
| global_step/sec: 17.6434 |
| loss = 0.15823495, step = 12200 (11.335 sec) |
| global_step/sec: 17.4929 |
| loss = 0.058683336, step = 12400 (11.433 sec) |
| global_step/sec: 17.6003 |
| loss = 0.16609168, step = 12600 (11.364 sec) |
| global_step/sec: 17.6682 |
| loss = 0.2326163, step = 12800 (11.320 sec) |
| global_step/sec: 17.6474 |
| loss = 0.2841699, step = 13000 (11.333 sec) |
| global_step/sec: 17.306 |
| loss = 0.17810643, step = 13200 (11.557 sec) |
| global_step/sec: 17.3409 |
| loss = 0.049505234, step = 13400 (11.533 sec) |
| global_step/sec: 17.453 |
| loss = 0.04317367, step = 13600 (11.459 sec) |
| global_step/sec: 17.4567 |
| loss = 0.107658744, step = 13800 (11.457 sec) |
| global_step/sec: 17.7515 |
| loss = 0.040947437, step = 14000 (11.267 sec) |
| global_step/sec: 17.5825 |
| loss = 0.07045764, step = 14200 (11.375 sec) |
| global_step/sec: 17.4036 |
| loss = 0.07488018, step = 14400 (11.492 sec) |
| global_step/sec: 17.6369 |
| loss = 0.25159824, step = 14600 (11.340 sec) |
| global_step/sec: 17.7241 |
| loss = 0.08559203, step = 14800 (11.284 sec) |
| global_step/sec: 17.4513 |
| loss = 0.03275448, step = 15000 (11.461 sec) |
| global_step/sec: 17.5323 |
| loss = 0.13171089, step = 15200 (11.407 sec) |
| global_step/sec: 17.6592 |
| loss = 0.047302723, step = 15400 (11.326 sec) |
| global_step/sec: 17.6405 |
| loss = 0.13909113, step = 15600 (11.338 sec) |
| global_step/sec: 17.6345 |
| loss = 0.08797115, step = 15800 (11.342 sec) |
| global_step/sec: 17.8771 |
| loss = 0.08592886, step = 16000 (11.187 sec) |
| global_step/sec: 17.5025 |
| loss = 0.042692125, step = 16200 (11.427 sec) |
| global_step/sec: 17.5749 |
| loss = 0.050394714, step = 16400 (11.380 sec) |
| global_step/sec: 17.6533 |
| loss = 0.050355732, step = 16600 (11.329 sec) |
| global_step/sec: 17.6026 |
| loss = 0.14212108, step = 16800 (11.362 sec) |
| Saving checkpoints for 16921 into /workspace/finetune/results/model/model.ckpt. |
| Calling model_fn. |
| Done calling model_fn. |
| Starting evaluation at 2026-04-28T04:49:49Z |
| Graph was finalized. |
| 2026-04-28 04:49:49.668155: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1669] Found device 0 with properties: |
| name: NVIDIA H100 major: 9 minor: 0 memoryClockRate(GHz): 1.98 |
| pciBusID: 0000:ad:00.0 |
| 2026-04-28 04:49:49.668199: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 |
| 2026-04-28 04:49:49.668238: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11 |
| 2026-04-28 04:49:49.668246: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10 |
| 2026-04-28 04:49:49.668254: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10 |
| 2026-04-28 04:49:49.668262: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.11 |
| 2026-04-28 04:49:49.668268: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11 |
| 2026-04-28 04:49:49.668274: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8 |
| 2026-04-28 04:49:49.668591: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1797] Adding visible gpu devices: 0 |
| 2026-04-28 04:49:49.668623: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1209] Device interconnect StreamExecutor with strength 1 edge matrix: |
| 2026-04-28 04:49:49.668628: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1215] 0 |
| 2026-04-28 04:49:49.668633: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1228] 0: N |
| 2026-04-28 04:49:49.668917: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1354] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 60325 MB memory) -> physical GPU (device: 0, name: NVIDIA H100, pci bus id: 0000:ad:00.0, compute capability: 9.0) |
| Restoring parameters from /workspace/finetune/results/model/model.ckpt-16921 |
| Running local_init_op. |
| Done running local_init_op. |
| Evaluation [10/100] |
| Evaluation [20/100] |
| Evaluation [30/100] |
| Evaluation [40/100] |
| Evaluation [50/100] |
| Evaluation [60/100] |
| Evaluation [70/100] |
| Evaluation [80/100] |
| Evaluation [90/100] |
| Evaluation [100/100] |
| Finished evaluation at 2026-04-28-04:49:54 |
| Saving dict for global step 16921: acc = 0.99804306, f1 = 0.9946128, global_step = 16921, loss = 0.13094354, precision = 0.99445856, recall = 0.9947671 |
| Saving 'checkpoint_path' summary for global step 16921: /workspace/finetune/results/model/model.ckpt-16921 |
| global_step/sec: 12.0425 |
| loss = 0.069523394, step = 17000 (16.608 sec) |
| global_step/sec: 16.8534 |
| loss = 0.03588748, step = 17200 (11.867 sec) |
| global_step/sec: 17.389 |
| loss = 0.060875297, step = 17400 (11.502 sec) |
| global_step/sec: 17.5332 |
| loss = 0.093826056, step = 17600 (11.407 sec) |
| global_step/sec: 17.5162 |
| loss = 0.04980874, step = 17800 (11.418 sec) |
| global_step/sec: 17.5958 |
| loss = 0.08436108, step = 18000 (11.367 sec) |
| global_step/sec: 17.6499 |
| loss = 0.07031548, step = 18200 (11.332 sec) |
| global_step/sec: 17.4897 |
| loss = 0.08018917, step = 18400 (11.435 sec) |
| global_step/sec: 17.6163 |
| loss = 0.028933227, step = 18600 (11.353 sec) |
| global_step/sec: 17.3619 |
| loss = 0.06399143, step = 18800 (11.519 sec) |
| global_step/sec: 17.5151 |
| loss = 0.104281485, step = 19000 (11.419 sec) |
| global_step/sec: 17.5888 |
| loss = 0.10930133, step = 19200 (11.371 sec) |
| global_step/sec: 17.6307 |
| loss = 0.056066155, step = 19400 (11.344 sec) |
| global_step/sec: 17.7029 |
| loss = 0.19223732, step = 19600 (11.298 sec) |
| global_step/sec: 17.8645 |
| loss = 0.13925654, step = 19800 (11.195 sec) |
| global_step/sec: 17.6568 |
| loss = 0.05859512, step = 20000 (11.327 sec) |
| global_step/sec: 17.7123 |
| loss = 0.06567615, step = 20200 (11.291 sec) |
| global_step/sec: 17.8937 |
| loss = 0.106875956, step = 20400 (11.177 sec) |
| global_step/sec: 17.6091 |
| loss = 0.03964001, step = 20600 (11.358 sec) |
| global_step/sec: 17.4414 |
| loss = 0.09287375, step = 20800 (11.467 sec) |
| global_step/sec: 17.6155 |
| loss = 0.03273791, step = 21000 (11.354 sec) |
| global_step/sec: 17.8087 |
| loss = 0.0277282, step = 21200 (11.230 sec) |
| global_step/sec: 17.8569 |
| loss = 0.07149267, step = 21400 (11.200 sec) |
| global_step/sec: 17.9393 |
| loss = 0.05456364, step = 21600 (11.149 sec) |
| global_step/sec: 17.7899 |
| loss = 0.011094809, step = 21800 (11.242 sec) |
| global_step/sec: 17.5774 |
| loss = 0.04235983, step = 22000 (11.378 sec) |
| global_step/sec: 17.8274 |
| loss = 0.056289375, step = 22200 (11.218 sec) |
| global_step/sec: 17.8135 |
| loss = 0.06535977, step = 22400 (11.228 sec) |
| global_step/sec: 17.5839 |
| loss = 0.11343026, step = 22600 (11.374 sec) |
| global_step/sec: 17.8426 |
| loss = 0.035084844, step = 22800 (11.209 sec) |
| global_step/sec: 17.9744 |
| loss = 0.02807188, step = 23000 (11.127 sec) |
| global_step/sec: 17.6131 |
| loss = 0.12693703, step = 23200 (11.355 sec) |
| global_step/sec: 17.8415 |
| loss = 0.064061224, step = 23400 (11.210 sec) |
| global_step/sec: 17.7546 |
| loss = 0.090565205, step = 23600 (11.265 sec) |
| global_step/sec: 17.7205 |
| loss = 0.0683707, step = 23800 (11.286 sec) |
| global_step/sec: 17.8176 |
| loss = 0.031541705, step = 24000 (11.225 sec) |
| global_step/sec: 17.906 |
| loss = 0.04308015, step = 24200 (11.169 sec) |
| global_step/sec: 17.643 |
| loss = 0.12729162, step = 24400 (11.336 sec) |
| global_step/sec: 17.5792 |
| loss = 0.05107689, step = 24600 (11.377 sec) |
| global_step/sec: 17.6869 |
| loss = 0.037356377, step = 24800 (11.308 sec) |
| Saving checkpoints for 25000 into /workspace/finetune/results/model/model.ckpt. |
| Calling model_fn. |
| Done calling model_fn. |
| Starting evaluation at 2026-04-28T04:57:32Z |
| Graph was finalized. |
| 2026-04-28 04:57:32.870073: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1669] Found device 0 with properties: |
| name: NVIDIA H100 major: 9 minor: 0 memoryClockRate(GHz): 1.98 |
| pciBusID: 0000:ad:00.0 |
| 2026-04-28 04:57:32.870119: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 |
| 2026-04-28 04:57:32.870154: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11 |
| 2026-04-28 04:57:32.870162: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10 |
| 2026-04-28 04:57:32.870170: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10 |
| 2026-04-28 04:57:32.870177: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.11 |
| 2026-04-28 04:57:32.870184: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11 |
| 2026-04-28 04:57:32.870191: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8 |
| 2026-04-28 04:57:32.870457: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1797] Adding visible gpu devices: 0 |
| 2026-04-28 04:57:32.870492: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1209] Device interconnect StreamExecutor with strength 1 edge matrix: |
| 2026-04-28 04:57:32.870497: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1215] 0 |
| 2026-04-28 04:57:32.870502: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1228] 0: N |
| 2026-04-28 04:57:32.870790: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1354] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 60325 MB memory) -> physical GPU (device: 0, name: NVIDIA H100, pci bus id: 0000:ad:00.0, compute capability: 9.0) |
| Restoring parameters from /workspace/finetune/results/model/model.ckpt-25000 |
| Running local_init_op. |
| Done running local_init_op. |
| Evaluation [10/100] |
| Evaluation [20/100] |
| Evaluation [30/100] |
| Evaluation [40/100] |
| Evaluation [50/100] |
| Evaluation [60/100] |
| Evaluation [70/100] |
| Evaluation [80/100] |
| Evaluation [90/100] |
| Evaluation [100/100] |
| Finished evaluation at 2026-04-28-04:57:37 |
| Saving dict for global step 25000: acc = 0.99808574, f1 = 0.9946638, global_step = 25000, loss = 0.12328317, precision = 0.9939835, recall = 0.995345 |
| Saving 'checkpoint_path' summary for global step 25000: /workspace/finetune/results/model/model.ckpt-25000 |
| Loss for final step: 0.035551548. |
| Calling model_fn. |
| Done calling model_fn. |
| Graph was finalized. |
| 2026-04-28 04:57:37.865806: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1669] Found device 0 with properties: |
| name: NVIDIA H100 major: 9 minor: 0 memoryClockRate(GHz): 1.98 |
| pciBusID: 0000:ad:00.0 |
| 2026-04-28 04:57:37.865856: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 |
| 2026-04-28 04:57:37.865881: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11 |
| 2026-04-28 04:57:37.865889: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10 |
| 2026-04-28 04:57:37.865897: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10 |
| 2026-04-28 04:57:37.865906: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.11 |
| 2026-04-28 04:57:37.865912: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11 |
| 2026-04-28 04:57:37.865920: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8 |
| 2026-04-28 04:57:37.866178: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1797] Adding visible gpu devices: 0 |
| 2026-04-28 04:57:37.866207: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1209] Device interconnect StreamExecutor with strength 1 edge matrix: |
| 2026-04-28 04:57:37.866212: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1215] 0 |
| 2026-04-28 04:57:37.866217: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1228] 0: N |
| 2026-04-28 04:57:37.866511: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1354] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 60325 MB memory) -> physical GPU (device: 0, name: NVIDIA H100, pci bus id: 0000:ad:00.0, compute capability: 9.0) |
| Restoring parameters from /workspace/finetune/results/model/model.ckpt-25000 |
| Running local_init_op. |
| Done running local_init_op. |
| [predict] wrote /workspace/finetune/results/score/train.preds.txt |
| Calling model_fn. |
| Done calling model_fn. |
| Graph was finalized. |
| 2026-04-28 04:58:27.768805: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1669] Found device 0 with properties: |
| name: NVIDIA H100 major: 9 minor: 0 memoryClockRate(GHz): 1.98 |
| pciBusID: 0000:ad:00.0 |
| 2026-04-28 04:58:27.768847: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 |
| 2026-04-28 04:58:27.768867: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11 |
| 2026-04-28 04:58:27.768874: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10 |
| 2026-04-28 04:58:27.768879: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10 |
| 2026-04-28 04:58:27.768884: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.11 |
| 2026-04-28 04:58:27.768889: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11 |
| 2026-04-28 04:58:27.768895: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8 |
| 2026-04-28 04:58:27.769123: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1797] Adding visible gpu devices: 0 |
| 2026-04-28 04:58:27.769147: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1209] Device interconnect StreamExecutor with strength 1 edge matrix: |
| 2026-04-28 04:58:27.769151: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1215] 0 |
| 2026-04-28 04:58:27.769155: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1228] 0: N |
| 2026-04-28 04:58:27.769420: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1354] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 60325 MB memory) -> physical GPU (device: 0, name: NVIDIA H100, pci bus id: 0000:ad:00.0, compute capability: 9.0) |
| Restoring parameters from /workspace/finetune/results/model/model.ckpt-25000 |
| Running local_init_op. |
| Done running local_init_op. |
| [predict] wrote /workspace/finetune/results/score/testa.preds.txt |
| Calling model_fn. |
| Done calling model_fn. |
| Graph was finalized. |
| 2026-04-28 04:58:34.490921: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1669] Found device 0 with properties: |
| name: NVIDIA H100 major: 9 minor: 0 memoryClockRate(GHz): 1.98 |
| pciBusID: 0000:ad:00.0 |
| 2026-04-28 04:58:34.490967: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 |
| 2026-04-28 04:58:34.491002: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11 |
| 2026-04-28 04:58:34.491008: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10 |
| 2026-04-28 04:58:34.491014: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10 |
| 2026-04-28 04:58:34.491020: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.11 |
| 2026-04-28 04:58:34.491025: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11 |
| 2026-04-28 04:58:34.491030: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8 |
| 2026-04-28 04:58:34.491264: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1797] Adding visible gpu devices: 0 |
| 2026-04-28 04:58:34.491287: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1209] Device interconnect StreamExecutor with strength 1 edge matrix: |
| 2026-04-28 04:58:34.491292: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1215] 0 |
| 2026-04-28 04:58:34.491296: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1228] 0: N |
| 2026-04-28 04:58:34.491562: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1354] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 60325 MB memory) -> physical GPU (device: 0, name: NVIDIA H100, pci bus id: 0000:ad:00.0, compute capability: 9.0) |
| Restoring parameters from /workspace/finetune/results/model/model.ckpt-25000 |
| Running local_init_op. |
| Done running local_init_op. |
| [predict] wrote /workspace/finetune/results/score/testb.preds.txt |
|
|