13:4: not a valid test operator: ( 13:4: not a valid test operator: 535.86.10 2026-04-28 05:14:39.684201: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them. Traceback (most recent call last): File "/workspace/finetune/main_chars_lstm.py", line 26, in import tensorflow as tf File "/usr/local/lib/python3.8/dist-packages/tensorflow/__init__.py", line 101, in from tensorflow_core import * File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/__init__.py", line 28, in from tensorflow.python import pywrap_tensorflow # pylint: disable=unused-import File "/usr/local/lib/python3.8/dist-packages/tensorflow/__init__.py", line 50, in __getattr__ module = self._load() File "/usr/local/lib/python3.8/dist-packages/tensorflow/__init__.py", line 44, in _load module = _importlib.import_module(self.__name__) File "/usr/lib/python3.8/importlib/__init__.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/__init__.py", line 63, in from tensorflow.python.framework.framework_lib import * # pylint: disable=redefined-builtin File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/framework/framework_lib.py", line 52, in from tensorflow.python.framework.importer import import_graph_def File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/framework/importer.py", line 28, in from tensorflow.python.framework import function File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/framework/function.py", line 38, in from tensorflow.python.ops import variable_scope as vs File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/ops/variable_scope.py", line 40, in from tensorflow.python.ops import init_ops File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/ops/init_ops.py", line 45, in from tensorflow.python.ops import linalg_ops_impl File "", line 991, in _find_and_load File "", line 975, in _find_and_load_unlocked File "", line 671, in _load_unlocked File "", line 844, in exec_module File "", line 939, in get_code File "", line 1038, in get_data KeyboardInterrupt 13:4: not a valid test operator: ( 13:4: not a valid test operator: 535.86.10 2026-04-28 05:14:49.627917: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them. WARNING:tensorflow:From /workspace/finetune/main_chars_lstm.py:36: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead. [train] params: {"batch_size": 128, "buffer": 2000, "char_lstm_size": 25, "chars": "/workspace/finetune/data_kvkk_set3_v3/vocab.chars.txt", "dim": 50, "dim_chars": 100, "dropout": 0.5, "early_stop_max_steps": 600, "epochs": 20, "learning_rate": 0.001, "log_step_count_steps": 200, "lstm_size": 100, "min_steps": 600000, "num_oov_buckets": 1, "save_checkpoints_secs": 500, "save_summary_steps": 1000, "tags": "/workspace/finetune/data_kvkk_set3_v3/vocab.tags.txt", "trainable_embeddings": true, "vectors": "/workspace/finetune/data_kvkk_set3_v3/vectors.npz", "words": "/workspace/finetune/data_kvkk_set3_v3/vocab.words.txt"} Using config: {'_model_dir': '/workspace/finetune/results/model', '_tf_random_seed': None, '_save_summary_steps': 1000, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 500, '_session_config': allow_soft_placement: true graph_options { rewrite_options { meta_optimizer_iterations: ONE } } , '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 200, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': , '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1} Not using Distribute Coordinator. Running training and evaluation locally (non-distributed). Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps None or save_checkpoints_secs 500. Calling model_fn. The TensorFlow contrib module will not be included in TensorFlow 2.0. For more information, please see: * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md * https://github.com/tensorflow/addons * https://github.com/tensorflow/io (for I/O related ops) If you depend on functionality not listed there, please file an issue. TensorFlow will not use sklearn by default. This improves performance in some cases. To enable sklearn export the environment variable TF_ALLOW_IOLIBS=1. TensorFlow will not use Dask by default. This improves performance in some cases. To enable Dask export the environment variable TF_ALLOW_IOLIBS=1. TensorFlow will not use Pandas by default. This improves performance in some cases. To enable Pandas export the environment variable TF_ALLOW_IOLIBS=1. From /workspace/finetune/main_chars_lstm.py:104: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead. From /workspace/finetune/main_chars_lstm.py:169: The name tf.metrics.accuracy is deprecated. Please use tf.compat.v1.metrics.accuracy instead. From /py_packages/tf_metrics/__init__.py:152: The name tf.diag_part is deprecated. Please use tf.linalg.tensor_diag_part instead. From /workspace/finetune/main_chars_lstm.py:175: The name tf.summary.scalar is deprecated. Please use tf.compat.v1.summary.scalar instead. From /workspace/finetune/main_chars_lstm.py:180: The name tf.train.AdamOptimizer is deprecated. Please use tf.compat.v1.train.AdamOptimizer instead. From /workspace/finetune/main_chars_lstm.py:181: The name tf.train.get_or_create_global_step is deprecated. Please use tf.compat.v1.train.get_or_create_global_step instead. Done calling model_fn. Create CheckpointSaverHook. Graph was finalized. 2026-04-28 05:14:53.664462: I tensorflow/core/platform/profile_utils/cpu_utils.cc:109] CPU Frequency: 2000000000 Hz 2026-04-28 05:14:53.695769: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x695d390 initialized for platform Host (this does not guarantee that XLA will be used). Devices: 2026-04-28 05:14:53.695810: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version 2026-04-28 05:14:53.700326: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1 2026-04-28 05:14:53.861749: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x6640640 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices: 2026-04-28 05:14:53.861787: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): NVIDIA H100, Compute Capability 9.0 2026-04-28 05:14:53.862462: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1669] Found device 0 with properties: name: NVIDIA H100 major: 9 minor: 0 memoryClockRate(GHz): 1.98 pciBusID: 0000:ad:00.0 2026-04-28 05:14:53.862490: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 2026-04-28 05:14:54.169364: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11 2026-04-28 05:14:54.198790: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10 2026-04-28 05:14:54.206171: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10 2026-04-28 05:14:54.213877: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.11 2026-04-28 05:14:54.225162: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11 2026-04-28 05:14:54.226485: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8 2026-04-28 05:14:54.226851: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1797] Adding visible gpu devices: 0 2026-04-28 05:14:54.228166: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 2026-04-28 05:14:54.233928: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1209] Device interconnect StreamExecutor with strength 1 edge matrix: 2026-04-28 05:14:54.233952: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1215] 0 2026-04-28 05:14:54.233961: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1228] 0: N 2026-04-28 05:14:54.234404: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1354] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 60325 MB memory) -> physical GPU (device: 0, name: NVIDIA H100, pci bus id: 0000:ad:00.0, compute capability: 9.0) Running local_init_op. Done running local_init_op. Saving checkpoints for 0 into /workspace/finetune/results/model/model.ckpt. 2026-04-28 05:14:57.835187: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11 loss = 111.06471, step = 0 global_step/sec: 15.9321 loss = 5.9110317, step = 200 (12.553 sec) global_step/sec: 17.0678 loss = 2.240281, step = 400 (11.721 sec) 13:4: not a valid test operator: ( 13:4: not a valid test operator: 535.86.10 2026-04-28 05:15:33.664526: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them. WARNING:tensorflow:From /workspace/finetune/main_chars_lstm.py:36: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead. [train] params: {"batch_size": 128, "buffer": 2000, "char_lstm_size": 25, "chars": "/workspace/finetune/data_kvkk_set3_v3/vocab.chars.txt", "dim": 50, "dim_chars": 100, "dropout": 0.5, "early_stop_max_steps": 600, "epochs": 20, "learning_rate": 0.001, "log_step_count_steps": 200, "lstm_size": 100, "min_steps": 600000, "num_oov_buckets": 1, "save_checkpoints_secs": 500, "save_summary_steps": 1000, "tags": "/workspace/finetune/data_kvkk_set3_v3/vocab.tags.txt", "trainable_embeddings": true, "vectors": "/workspace/finetune/data_kvkk_set3_v3/vectors.npz", "words": "/workspace/finetune/data_kvkk_set3_v3/vocab.words.txt"} Using config: {'_model_dir': '/workspace/finetune/results/model', '_tf_random_seed': None, '_save_summary_steps': 1000, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 500, '_session_config': allow_soft_placement: true graph_options { rewrite_options { meta_optimizer_iterations: ONE } } , '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 200, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': , '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1} Not using Distribute Coordinator. Running training and evaluation locally (non-distributed). Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps None or save_checkpoints_secs 500. Calling model_fn. The TensorFlow contrib module will not be included in TensorFlow 2.0. For more information, please see: * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md * https://github.com/tensorflow/addons * https://github.com/tensorflow/io (for I/O related ops) If you depend on functionality not listed there, please file an issue. TensorFlow will not use sklearn by default. This improves performance in some cases. To enable sklearn export the environment variable TF_ALLOW_IOLIBS=1. TensorFlow will not use Dask by default. This improves performance in some cases. To enable Dask export the environment variable TF_ALLOW_IOLIBS=1. TensorFlow will not use Pandas by default. This improves performance in some cases. To enable Pandas export the environment variable TF_ALLOW_IOLIBS=1. From /workspace/finetune/main_chars_lstm.py:104: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead. From /workspace/finetune/main_chars_lstm.py:169: The name tf.metrics.accuracy is deprecated. Please use tf.compat.v1.metrics.accuracy instead. From /py_packages/tf_metrics/__init__.py:152: The name tf.diag_part is deprecated. Please use tf.linalg.tensor_diag_part instead. From /workspace/finetune/main_chars_lstm.py:175: The name tf.summary.scalar is deprecated. Please use tf.compat.v1.summary.scalar instead. From /workspace/finetune/main_chars_lstm.py:180: The name tf.train.AdamOptimizer is deprecated. Please use tf.compat.v1.train.AdamOptimizer instead. From /workspace/finetune/main_chars_lstm.py:181: The name tf.train.get_or_create_global_step is deprecated. Please use tf.compat.v1.train.get_or_create_global_step instead. Done calling model_fn. Create CheckpointSaverHook. Graph was finalized. 2026-04-28 05:15:37.160482: I tensorflow/core/platform/profile_utils/cpu_utils.cc:109] CPU Frequency: 2000000000 Hz 2026-04-28 05:15:37.198169: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x6a89f10 initialized for platform Host (this does not guarantee that XLA will be used). Devices: 2026-04-28 05:15:37.198203: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version 2026-04-28 05:15:37.203160: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1 2026-04-28 05:15:37.389778: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x695b9c0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices: 2026-04-28 05:15:37.389819: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): NVIDIA H100, Compute Capability 9.0 2026-04-28 05:15:37.390462: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1669] Found device 0 with properties: name: NVIDIA H100 major: 9 minor: 0 memoryClockRate(GHz): 1.98 pciBusID: 0000:ad:00.0 2026-04-28 05:15:37.390490: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 2026-04-28 05:15:37.703030: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11 2026-04-28 05:15:37.735401: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10 2026-04-28 05:15:37.743084: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10 2026-04-28 05:15:37.750452: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.11 2026-04-28 05:15:37.762361: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11 2026-04-28 05:15:37.764124: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8 2026-04-28 05:15:37.766984: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1797] Adding visible gpu devices: 0 2026-04-28 05:15:37.768618: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 2026-04-28 05:15:37.774465: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1209] Device interconnect StreamExecutor with strength 1 edge matrix: 2026-04-28 05:15:37.774485: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1215] 0 2026-04-28 05:15:37.774492: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1228] 0: N 2026-04-28 05:15:37.774878: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1354] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 60325 MB memory) -> physical GPU (device: 0, name: NVIDIA H100, pci bus id: 0000:ad:00.0, compute capability: 9.0) Restoring parameters from /workspace/finetune/results/model/model.ckpt-0 Running local_init_op. Done running local_init_op. Saving checkpoints for 0 into /workspace/finetune/results/model/model.ckpt. 2026-04-28 05:15:40.981003: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11 loss = 102.289, step = 0 global_step/sec: 16.2585 loss = 5.986624, step = 200 (12.302 sec) global_step/sec: 16.5611 loss = 2.1498528, step = 400 (12.076 sec) global_step/sec: 16.1212 loss = 1.312039, step = 600 (12.406 sec) global_step/sec: 16.2189 loss = 1.3136858, step = 800 (12.331 sec) global_step/sec: 16.2345 loss = 1.3853596, step = 1000 (12.320 sec) global_step/sec: 15.9785 loss = 1.2304411, step = 1200 (12.517 sec) global_step/sec: 16.0121 loss = 0.7067219, step = 1400 (12.491 sec) global_step/sec: 16.1801 loss = 0.93841916, step = 1600 (12.361 sec) global_step/sec: 16.1054 loss = 0.65951395, step = 1800 (12.418 sec) global_step/sec: 16.1093 loss = 0.8376456, step = 2000 (12.415 sec) global_step/sec: 16.4475 loss = 0.45452726, step = 2200 (12.160 sec) global_step/sec: 16.3659 loss = 0.6535889, step = 2400 (12.220 sec) global_step/sec: 16.8136 loss = 0.54435384, step = 2600 (11.895 sec) global_step/sec: 16.8206 loss = 0.5056425, step = 2800 (11.891 sec) global_step/sec: 16.4608 loss = 0.54167736, step = 3000 (12.150 sec) global_step/sec: 16.5302 loss = 0.9743586, step = 3200 (12.099 sec) global_step/sec: 16.6723 loss = 0.29906678, step = 3400 (11.996 sec) global_step/sec: 16.9803 loss = 0.5273884, step = 3600 (11.780 sec) global_step/sec: 16.9109 loss = 0.51336044, step = 3800 (11.825 sec) global_step/sec: 17.2923 loss = 0.6441746, step = 4000 (11.566 sec) global_step/sec: 17.4957 loss = 0.34444237, step = 4200 (11.432 sec) global_step/sec: 17.3879 loss = 0.3049839, step = 4400 (11.501 sec) global_step/sec: 17.517 loss = 0.33523333, step = 4600 (11.418 sec) global_step/sec: 17.1766 loss = 0.5199193, step = 4800 (11.643 sec) global_step/sec: 17.223 loss = 0.40655118, step = 5000 (11.613 sec) global_step/sec: 17.67 loss = 0.42372644, step = 5200 (11.319 sec) global_step/sec: 17.67 loss = 0.08840948, step = 5400 (11.318 sec) global_step/sec: 18.0636 loss = 0.21405059, step = 5600 (11.072 sec) global_step/sec: 17.8529 loss = 0.5103779, step = 5800 (11.203 sec) global_step/sec: 17.9147 loss = 0.16661471, step = 6000 (11.164 sec) global_step/sec: 16.9307 loss = 0.24831116, step = 6200 (11.813 sec) global_step/sec: 17.287 loss = 0.07917696, step = 6400 (11.569 sec) global_step/sec: 17.287 loss = 0.19675535, step = 6600 (11.569 sec) global_step/sec: 17.2728 loss = 0.45481026, step = 6800 (11.579 sec) global_step/sec: 17.3187 loss = 0.1687935, step = 7000 (11.548 sec) global_step/sec: 17.3065 loss = 0.10605991, step = 7200 (11.556 sec) global_step/sec: 17.2856 loss = 0.21068978, step = 7400 (11.570 sec) global_step/sec: 17.6764 loss = 0.10583621, step = 7600 (11.314 sec) global_step/sec: 17.1817 loss = 0.25232738, step = 7800 (11.640 sec) global_step/sec: 17.3914 loss = 0.5265127, step = 8000 (11.500 sec) global_step/sec: 17.2989 loss = 0.15738869, step = 8200 (11.561 sec) global_step/sec: 17.4263 loss = 0.13792074, step = 8400 (11.477 sec) Saving checkpoints for 8428 into /workspace/finetune/results/model/model.ckpt. Calling model_fn. Done calling model_fn. Starting evaluation at 2026-04-28T05:24:01Z Graph was finalized. 2026-04-28 05:24:01.172161: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1669] Found device 0 with properties: name: NVIDIA H100 major: 9 minor: 0 memoryClockRate(GHz): 1.98 pciBusID: 0000:ad:00.0 2026-04-28 05:24:01.172213: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 2026-04-28 05:24:01.172243: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11 2026-04-28 05:24:01.172249: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10 2026-04-28 05:24:01.172255: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10 2026-04-28 05:24:01.172260: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.11 2026-04-28 05:24:01.172265: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11 2026-04-28 05:24:01.172272: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8 2026-04-28 05:24:01.172522: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1797] Adding visible gpu devices: 0 2026-04-28 05:24:01.172549: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1209] Device interconnect StreamExecutor with strength 1 edge matrix: 2026-04-28 05:24:01.172553: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1215] 0 2026-04-28 05:24:01.172557: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1228] 0: N 2026-04-28 05:24:01.172822: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1354] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 60325 MB memory) -> physical GPU (device: 0, name: NVIDIA H100, pci bus id: 0000:ad:00.0, compute capability: 9.0) Restoring parameters from /workspace/finetune/results/model/model.ckpt-8428 Running local_init_op. Done running local_init_op. Evaluation [10/100] Evaluation [20/100] Evaluation [30/100] Evaluation [40/100] Evaluation [50/100] Evaluation [60/100] Evaluation [70/100] Evaluation [80/100] Evaluation [90/100] Evaluation [100/100] Finished evaluation at 2026-04-28-05:24:06 Saving dict for global step 8428: acc = 0.997914, f1 = 0.9942703, global_step = 8428, loss = 0.14689502, precision = 0.9936579, recall = 0.99488366 Saving 'checkpoint_path' summary for global step 8428: /workspace/finetune/results/model/model.ckpt-8428 global_step/sec: 11.5153 loss = 0.13722146, step = 8600 (17.368 sec) global_step/sec: 17.6419 loss = 0.31847942, step = 8800 (11.336 sec) global_step/sec: 17.8273 loss = 0.28958094, step = 9000 (11.219 sec) global_step/sec: 17.5481 loss = 0.30293107, step = 9200 (11.397 sec) global_step/sec: 17.5695 loss = 0.11271095, step = 9400 (11.383 sec) global_step/sec: 17.8699 loss = 0.2657137, step = 9600 (11.192 sec) global_step/sec: 17.8482 loss = 0.06779647, step = 9800 (11.205 sec) global_step/sec: 17.8093 loss = 0.116514206, step = 10000 (11.230 sec) global_step/sec: 17.6214 loss = 0.17775589, step = 10200 (11.350 sec) global_step/sec: 18.1588 loss = 0.11069703, step = 10400 (11.014 sec) global_step/sec: 18.0034 loss = 0.024612904, step = 10600 (11.109 sec) global_step/sec: 18.1334 loss = 0.17613989, step = 10800 (11.029 sec) global_step/sec: 18.0531 loss = 0.13324213, step = 11000 (11.079 sec) global_step/sec: 18.0712 loss = 0.12831438, step = 11200 (11.067 sec) global_step/sec: 17.947 loss = 0.2804634, step = 11400 (11.144 sec) global_step/sec: 18.1069 loss = 0.2336666, step = 11600 (11.046 sec) global_step/sec: 17.9596 loss = 0.16818011, step = 11800 (11.136 sec) global_step/sec: 17.9847 loss = 0.062304914, step = 12000 (11.121 sec) global_step/sec: 17.8009 loss = 0.15490824, step = 12200 (11.235 sec) global_step/sec: 18.0888 loss = 0.102054656, step = 12400 (11.056 sec) global_step/sec: 17.9338 loss = 0.04917127, step = 12600 (11.152 sec) global_step/sec: 17.9571 loss = 0.07043046, step = 12800 (11.138 sec) global_step/sec: 16.5145 loss = 0.22558558, step = 13000 (12.111 sec) global_step/sec: 17.9296 loss = 0.0617373, step = 13200 (11.154 sec) global_step/sec: 18.0051 loss = 0.080938876, step = 13400 (11.108 sec) global_step/sec: 17.8975 loss = 0.049816668, step = 13600 (11.175 sec) global_step/sec: 17.6146 loss = 0.091686785, step = 13800 (11.354 sec) global_step/sec: 17.6869 loss = 0.09760392, step = 14000 (11.308 sec) global_step/sec: 17.7814 loss = 0.08662301, step = 14200 (11.247 sec) global_step/sec: 17.8851 loss = 0.12617636, step = 14400 (11.183 sec) global_step/sec: 17.1209 loss = 0.13539296, step = 14600 (11.682 sec) global_step/sec: 18.0841 loss = 0.21070999, step = 14800 (11.059 sec) global_step/sec: 17.9193 loss = 0.09002066, step = 15000 (11.161 sec) global_step/sec: 18.0027 loss = 0.07087046, step = 15200 (11.109 sec) global_step/sec: 18.0166 loss = 0.1169194, step = 15400 (11.101 sec) global_step/sec: 17.7961 loss = 0.094311416, step = 15600 (11.238 sec) global_step/sec: 18.069 loss = 0.08750421, step = 15800 (11.069 sec) global_step/sec: 17.8826 loss = 0.013639748, step = 16000 (11.184 sec) global_step/sec: 17.9857 loss = 0.051932037, step = 16200 (11.120 sec) global_step/sec: 17.8354 loss = 0.11339444, step = 16400 (11.214 sec) global_step/sec: 17.8948 loss = 0.07497531, step = 16600 (11.176 sec) global_step/sec: 17.9611 loss = 0.09614849, step = 16800 (11.135 sec) global_step/sec: 17.984 loss = 0.044727564, step = 17000 (11.121 sec) global_step/sec: 17.9836 loss = 0.06893462, step = 17200 (11.121 sec) Saving checkpoints for 17244 into /workspace/finetune/results/model/model.ckpt. Calling model_fn. Done calling model_fn. Starting evaluation at 2026-04-28T05:32:21Z Graph was finalized. 2026-04-28 05:32:21.131189: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1669] Found device 0 with properties: name: NVIDIA H100 major: 9 minor: 0 memoryClockRate(GHz): 1.98 pciBusID: 0000:ad:00.0 2026-04-28 05:32:21.131230: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 2026-04-28 05:32:21.131259: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11 2026-04-28 05:32:21.131265: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10 2026-04-28 05:32:21.131271: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10 2026-04-28 05:32:21.131276: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.11 2026-04-28 05:32:21.131281: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11 2026-04-28 05:32:21.131288: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8 2026-04-28 05:32:21.131566: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1797] Adding visible gpu devices: 0 2026-04-28 05:32:21.131590: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1209] Device interconnect StreamExecutor with strength 1 edge matrix: 2026-04-28 05:32:21.131594: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1215] 0 2026-04-28 05:32:21.131598: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1228] 0: N 2026-04-28 05:32:21.131906: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1354] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 60325 MB memory) -> physical GPU (device: 0, name: NVIDIA H100, pci bus id: 0000:ad:00.0, compute capability: 9.0) Restoring parameters from /workspace/finetune/results/model/model.ckpt-17244 Running local_init_op. Done running local_init_op. Evaluation [10/100] Evaluation [20/100] Evaluation [30/100] Evaluation [40/100] Evaluation [50/100] Evaluation [60/100] Evaluation [70/100] Evaluation [80/100] Evaluation [90/100] Evaluation [100/100] Finished evaluation at 2026-04-28-05:32:25 Saving dict for global step 17244: acc = 0.9983455, f1 = 0.9955927, global_step = 17244, loss = 0.10923356, precision = 0.99532104, recall = 0.99586445 Saving 'checkpoint_path' summary for global step 17244: /workspace/finetune/results/model/model.ckpt-17244 global_step/sec: 12.2121 loss = 0.07070118, step = 17400 (16.377 sec) global_step/sec: 17.9306 loss = 0.07339883, step = 17600 (11.154 sec) global_step/sec: 17.8084 loss = 0.12021345, step = 17800 (11.231 sec) global_step/sec: 17.8753 loss = 0.0967865, step = 18000 (11.189 sec) global_step/sec: 17.6591 loss = 0.0594576, step = 18200 (11.325 sec) global_step/sec: 17.9587 loss = 0.06613392, step = 18400 (11.137 sec) global_step/sec: 17.6629 loss = 0.11538398, step = 18600 (11.323 sec) global_step/sec: 17.6883 loss = 0.008395612, step = 18800 (11.307 sec) global_step/sec: 18.0527 loss = 0.08756232, step = 19000 (11.079 sec) global_step/sec: 17.9936 loss = 0.023801446, step = 19200 (11.115 sec) global_step/sec: 17.7398 loss = 0.11921042, step = 19400 (11.274 sec) global_step/sec: 17.6782 loss = 0.048651278, step = 19600 (11.314 sec) global_step/sec: 17.7342 loss = 0.17171317, step = 19800 (11.278 sec) global_step/sec: 17.756 loss = 0.072808385, step = 20000 (11.264 sec) global_step/sec: 17.7842 loss = 0.08458197, step = 20200 (11.246 sec) global_step/sec: 18.1083 loss = 0.09870511, step = 20400 (11.045 sec) global_step/sec: 17.8018 loss = 0.014859796, step = 20600 (11.235 sec) global_step/sec: 17.9517 loss = 0.02439493, step = 20800 (11.141 sec) global_step/sec: 17.8295 loss = 0.063162625, step = 21000 (11.218 sec) global_step/sec: 17.8725 loss = 0.026917815, step = 21200 (11.190 sec) global_step/sec: 17.8992 loss = 0.13421822, step = 21400 (11.174 sec) global_step/sec: 18.0597 loss = 0.11149919, step = 21600 (11.074 sec) global_step/sec: 17.3746 loss = 0.019822836, step = 21800 (11.511 sec) global_step/sec: 17.9356 loss = 0.064364076, step = 22000 (11.151 sec) global_step/sec: 17.8118 loss = 0.02763486, step = 22200 (11.228 sec) global_step/sec: 17.6316 loss = 0.09873259, step = 22400 (11.343 sec) global_step/sec: 17.9541 loss = 0.07205546, step = 22600 (11.140 sec) global_step/sec: 17.9894 loss = 0.052541614, step = 22800 (11.118 sec) global_step/sec: 17.7515 loss = 0.07275927, step = 23000 (11.267 sec) global_step/sec: 17.8268 loss = 0.13450235, step = 23200 (11.219 sec) global_step/sec: 18.1835 loss = 0.016338944, step = 23400 (10.999 sec) global_step/sec: 18.0212 loss = 0.07616836, step = 23600 (11.098 sec) global_step/sec: 17.9681 loss = 0.0625878, step = 23800 (11.131 sec) global_step/sec: 17.786 loss = 0.04009646, step = 24000 (11.245 sec) global_step/sec: 17.8586 loss = 0.047451556, step = 24200 (11.199 sec) global_step/sec: 17.7901 loss = 0.04998851, step = 24400 (11.242 sec) global_step/sec: 17.985 loss = 0.043762565, step = 24600 (11.120 sec) global_step/sec: 17.9799 loss = 0.06248969, step = 24800 (11.124 sec) Saving checkpoints for 25000 into /workspace/finetune/results/model/model.ckpt. Calling model_fn. Done calling model_fn. Starting evaluation at 2026-04-28T05:39:40Z Graph was finalized. 2026-04-28 05:39:40.866675: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1669] Found device 0 with properties: name: NVIDIA H100 major: 9 minor: 0 memoryClockRate(GHz): 1.98 pciBusID: 0000:ad:00.0 2026-04-28 05:39:40.866717: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 2026-04-28 05:39:40.866747: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11 2026-04-28 05:39:40.866754: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10 2026-04-28 05:39:40.866760: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10 2026-04-28 05:39:40.866765: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.11 2026-04-28 05:39:40.866770: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11 2026-04-28 05:39:40.866777: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8 2026-04-28 05:39:40.867014: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1797] Adding visible gpu devices: 0 2026-04-28 05:39:40.867040: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1209] Device interconnect StreamExecutor with strength 1 edge matrix: 2026-04-28 05:39:40.867044: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1215] 0 2026-04-28 05:39:40.867048: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1228] 0: N 2026-04-28 05:39:40.867311: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1354] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 60325 MB memory) -> physical GPU (device: 0, name: NVIDIA H100, pci bus id: 0000:ad:00.0, compute capability: 9.0) Restoring parameters from /workspace/finetune/results/model/model.ckpt-25000 Running local_init_op. Done running local_init_op. Evaluation [10/100] Evaluation [20/100] Evaluation [30/100] Evaluation [40/100] Evaluation [50/100] Evaluation [60/100] Evaluation [70/100] Evaluation [80/100] Evaluation [90/100] Evaluation [100/100] Finished evaluation at 2026-04-28-05:39:45 Saving dict for global step 25000: acc = 0.9985107, f1 = 0.99607116, global_step = 25000, loss = 0.11429862, precision = 0.99621725, recall = 0.9959251 Saving 'checkpoint_path' summary for global step 25000: /workspace/finetune/results/model/model.ckpt-25000 Loss for final step: 0.057276487. Calling model_fn. Done calling model_fn. Graph was finalized. 2026-04-28 05:39:45.501816: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1669] Found device 0 with properties: name: NVIDIA H100 major: 9 minor: 0 memoryClockRate(GHz): 1.98 pciBusID: 0000:ad:00.0 2026-04-28 05:39:45.501861: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 2026-04-28 05:39:45.501878: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11 2026-04-28 05:39:45.501884: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10 2026-04-28 05:39:45.501890: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10 2026-04-28 05:39:45.501896: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.11 2026-04-28 05:39:45.501901: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11 2026-04-28 05:39:45.501908: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8 2026-04-28 05:39:45.502144: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1797] Adding visible gpu devices: 0 2026-04-28 05:39:45.502167: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1209] Device interconnect StreamExecutor with strength 1 edge matrix: 2026-04-28 05:39:45.502171: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1215] 0 2026-04-28 05:39:45.502176: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1228] 0: N 2026-04-28 05:39:45.502823: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1354] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 60325 MB memory) -> physical GPU (device: 0, name: NVIDIA H100, pci bus id: 0000:ad:00.0, compute capability: 9.0) Restoring parameters from /workspace/finetune/results/model/model.ckpt-25000 Running local_init_op. Done running local_init_op. [predict] wrote /workspace/finetune/results/score/train.preds.txt Calling model_fn. Done calling model_fn. Graph was finalized. 2026-04-28 05:40:35.402702: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1669] Found device 0 with properties: name: NVIDIA H100 major: 9 minor: 0 memoryClockRate(GHz): 1.98 pciBusID: 0000:ad:00.0 2026-04-28 05:40:35.402752: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 2026-04-28 05:40:35.402780: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11 2026-04-28 05:40:35.402786: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10 2026-04-28 05:40:35.402792: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10 2026-04-28 05:40:35.402797: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.11 2026-04-28 05:40:35.402802: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11 2026-04-28 05:40:35.402809: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8 2026-04-28 05:40:35.403043: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1797] Adding visible gpu devices: 0 2026-04-28 05:40:35.403069: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1209] Device interconnect StreamExecutor with strength 1 edge matrix: 2026-04-28 05:40:35.403073: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1215] 0 2026-04-28 05:40:35.403077: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1228] 0: N 2026-04-28 05:40:35.403350: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1354] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 60325 MB memory) -> physical GPU (device: 0, name: NVIDIA H100, pci bus id: 0000:ad:00.0, compute capability: 9.0) Restoring parameters from /workspace/finetune/results/model/model.ckpt-25000 Running local_init_op. Done running local_init_op. [predict] wrote /workspace/finetune/results/score/testa.preds.txt Calling model_fn. Done calling model_fn. Graph was finalized. 2026-04-28 05:40:42.254590: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1669] Found device 0 with properties: name: NVIDIA H100 major: 9 minor: 0 memoryClockRate(GHz): 1.98 pciBusID: 0000:ad:00.0 2026-04-28 05:40:42.254635: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 2026-04-28 05:40:42.254654: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11 2026-04-28 05:40:42.254662: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10 2026-04-28 05:40:42.254668: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10 2026-04-28 05:40:42.254674: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.11 2026-04-28 05:40:42.254679: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11 2026-04-28 05:40:42.254685: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8 2026-04-28 05:40:42.254917: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1797] Adding visible gpu devices: 0 2026-04-28 05:40:42.254938: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1209] Device interconnect StreamExecutor with strength 1 edge matrix: 2026-04-28 05:40:42.254942: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1215] 0 2026-04-28 05:40:42.254947: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1228] 0: N 2026-04-28 05:40:42.255201: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1354] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 60325 MB memory) -> physical GPU (device: 0, name: NVIDIA H100, pci bus id: 0000:ad:00.0, compute capability: 9.0) Restoring parameters from /workspace/finetune/results/model/model.ckpt-25000 Running local_init_op. Done running local_init_op. [predict] wrote /workspace/finetune/results/score/testb.preds.txt