2026-02-02 07:13:23,912 INFO MainThread:601 [wandb_setup.py:_flush():81] Current SDK version is 0.24.1 2026-02-02 07:13:23,912 INFO MainThread:601 [wandb_setup.py:_flush():81] Configure stats pid to 601 2026-02-02 07:13:23,912 INFO MainThread:601 [wandb_setup.py:_flush():81] Loading settings from environment variables 2026-02-02 07:13:23,912 INFO MainThread:601 [wandb_init.py:setup_run_log_directory():717] Logging user logs to /workspace/hanrui/SpecForge-ext/wandb/run-20260202_071323-2yze80jn/logs/debug.log 2026-02-02 07:13:23,912 INFO MainThread:601 [wandb_init.py:setup_run_log_directory():718] Logging internal logs to /workspace/hanrui/SpecForge-ext/wandb/run-20260202_071323-2yze80jn/logs/debug-internal.log 2026-02-02 07:13:23,912 INFO MainThread:601 [wandb_init.py:init():844] calling init triggers 2026-02-02 07:13:23,912 INFO MainThread:601 [wandb_init.py:init():849] wandb.init called with sweep_config: {} config: {'target_model_path': '/workspace/Qwen3-8B', 'trust_remote_code': False, 'draft_model_config': 'configs/qwen3-8b-qwen3eagle-5layer.json', 'embedding_key': 'model.embed_tokens.weight', 'lm_head_key': 'lm_head.weight', 'is_vlm': False, 'target_model_backend': 'sglang', 'train_data_path': '/workspace/hanrui/qwen3-8b_dflash_regen/sharegpt_train_regenerated.jsonl', 'train_hidden_states_path': None, 'eval_hidden_states_path': None, 'eval_data_path': None, 'chat_template': 'qwen', 'is_preformatted': False, 'train_only_last_turn': False, 'build_dataset_num_proc': 8, 'dataloader_num_workers': 4, 'num_epochs': 10, 'max_num_steps': None, 'batch_size': 2, 'learning_rate': 0.0001, 'max_length': 2048, 'warmup_ratio': 0.015, 'total_steps': 49260, 'max_grad_norm': 0.5, 'ttt_length': 7, 'resume': False, 'ckpt_dir': None, 'eval_interval': 5000, 'save_interval': 5000, 'log_interval': 100, 'seed': 0, 'draft_accumulation_steps': 1, 'tp_size': 1, 'sp_ulysses_size': 1, 'sp_ring_size': 1, 'attention_backend': 'flex_attention', 'cache_key': None, 'cache_dir': 'cache', 'output_dir': 'outputs/qwen3-8b-qwen3eagle-5layer', 'verbose': False, 'dist_timeout': 20, 'model_download_dir': None, 'min_pixels': 50176, 'max_pixels': 802816, 'profile': False, 'profile_start_step': 30, 'profile_num_steps': 4, 'profile_record_shapes': False, 'sglang_attention_backend': 'flashinfer', 'sglang_mem_fraction_static': 0.4, 'sglang_context_length': None, 'sglang_enable_nccl_nvls': False, 'sglang_enable_symm_mem': False, 'sglang_enable_torch_compile': False, 'sglang_enable_dp_attention': False, 'sglang_enable_dp_lm_head': False, 'sglang_enable_piecewise_cuda_graph': False, 'sglang_piecewise_cuda_graph_max_tokens': 4096, 'sglang_piecewise_cuda_graph_tokens': None, 'sglang_ep_size': 1, 'report_to': 'wandb', 'wandb_project': 'qwen3-8b-qwen3eagle', 'wandb_name': '5layer-ttt7', 'wandb_key': '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', 'swanlab_project': None, 'swanlab_name': None, 'swanlab_key': None, 'mlflow_tracking_uri': None, 'mlflow_experiment_name': None, 'mlflow_run_name': None, 'dp_size': 8, 'target_batch_size': 2, '_wandb': {}} 2026-02-02 07:13:23,912 INFO MainThread:601 [wandb_init.py:init():892] starting backend 2026-02-02 07:13:24,247 INFO MainThread:601 [wandb_init.py:init():895] sending inform_init request 2026-02-02 07:13:24,263 INFO MainThread:601 [wandb_init.py:init():903] backend started and connected 2026-02-02 07:13:24,270 INFO MainThread:601 [wandb_init.py:init():973] updated telemetry 2026-02-02 07:13:24,285 INFO MainThread:601 [wandb_init.py:init():997] communicating run to backend with 90.0 second timeout 2026-02-02 07:13:55,052 INFO Thread-7 (wrapped_target):601 [retry.py:__call__():164] [no run ID] Retry attempt failed: Traceback (most recent call last): File "/workspace/specforge/lib/python3.11/site-packages/urllib3/connection.py", line 204, in _new_conn sock = connection.create_connection( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/workspace/specforge/lib/python3.11/site-packages/urllib3/util/connection.py", line 85, in create_connection raise err File "/workspace/specforge/lib/python3.11/site-packages/urllib3/util/connection.py", line 73, in create_connection sock.connect(sa) TimeoutError: timed out The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/workspace/specforge/lib/python3.11/site-packages/urllib3/connectionpool.py", line 787, in urlopen response = self._make_request( ^^^^^^^^^^^^^^^^^^^ File "/workspace/specforge/lib/python3.11/site-packages/urllib3/connectionpool.py", line 488, in _make_request raise new_e File "/workspace/specforge/lib/python3.11/site-packages/urllib3/connectionpool.py", line 464, in _make_request self._validate_conn(conn) File "/workspace/specforge/lib/python3.11/site-packages/urllib3/connectionpool.py", line 1093, in _validate_conn conn.connect() File "/workspace/specforge/lib/python3.11/site-packages/urllib3/connection.py", line 759, in connect self.sock = sock = self._new_conn() ^^^^^^^^^^^^^^^^ File "/workspace/specforge/lib/python3.11/site-packages/urllib3/connection.py", line 213, in _new_conn raise ConnectTimeoutError( urllib3.exceptions.ConnectTimeoutError: (, 'Connection to api.wandb.ai timed out. (connect timeout=20)') The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/workspace/specforge/lib/python3.11/site-packages/requests/adapters.py", line 644, in send resp = conn.urlopen( ^^^^^^^^^^^^^ File "/workspace/specforge/lib/python3.11/site-packages/urllib3/connectionpool.py", line 841, in urlopen retries = retries.increment( ^^^^^^^^^^^^^^^^^^ File "/workspace/specforge/lib/python3.11/site-packages/urllib3/util/retry.py", line 535, in increment raise MaxRetryError(_pool, url, reason) from reason # type: ignore[arg-type] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='api.wandb.ai', port=443): Max retries exceeded with url: /graphql (Caused by ConnectTimeoutError(, 'Connection to api.wandb.ai timed out. (connect timeout=20)')) During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/workspace/specforge/lib/python3.11/site-packages/wandb/sdk/lib/retry.py", line 157, in __call__ result = self._call_fn(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/workspace/specforge/lib/python3.11/site-packages/wandb/sdk/internal/internal_api.py", line 397, in execute return self.client.execute(*args, **kwargs) # type: ignore ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/workspace/specforge/lib/python3.11/site-packages/wandb/vendor/gql-0.2.0/wandb_gql/client.py", line 52, in execute result = self._get_result(document, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/workspace/specforge/lib/python3.11/site-packages/wandb/vendor/gql-0.2.0/wandb_gql/client.py", line 60, in _get_result return self.transport.execute(document, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/workspace/specforge/lib/python3.11/site-packages/wandb/sdk/lib/gql_request.py", line 70, in execute request = self.session.post(self.url, **post_args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/workspace/specforge/lib/python3.11/site-packages/requests/sessions.py", line 637, in post return self.request("POST", url, data=data, json=json, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/workspace/specforge/lib/python3.11/site-packages/requests/sessions.py", line 589, in request resp = self.send(prep, **send_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/workspace/specforge/lib/python3.11/site-packages/requests/sessions.py", line 703, in send r = adapter.send(request, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/workspace/specforge/lib/python3.11/site-packages/requests/adapters.py", line 665, in send raise ConnectTimeout(e, request=request) requests.exceptions.ConnectTimeout: HTTPSConnectionPool(host='api.wandb.ai', port=443): Max retries exceeded with url: /graphql (Caused by ConnectTimeoutError(, 'Connection to api.wandb.ai timed out. (connect timeout=20)')) 2026-02-02 07:14:12,432 INFO Thread-6 (wrapped_target):601 [retry.py:__call__():164] [no run ID] Retry attempt failed: Traceback (most recent call last): File "/workspace/specforge/lib/python3.11/site-packages/urllib3/connection.py", line 204, in _new_conn sock = connection.create_connection( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/workspace/specforge/lib/python3.11/site-packages/urllib3/util/connection.py", line 85, in create_connection raise err File "/workspace/specforge/lib/python3.11/site-packages/urllib3/util/connection.py", line 73, in create_connection sock.connect(sa) TimeoutError: timed out The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/workspace/specforge/lib/python3.11/site-packages/urllib3/connectionpool.py", line 787, in urlopen response = self._make_request( ^^^^^^^^^^^^^^^^^^^ File "/workspace/specforge/lib/python3.11/site-packages/urllib3/connectionpool.py", line 488, in _make_request raise new_e File "/workspace/specforge/lib/python3.11/site-packages/urllib3/connectionpool.py", line 464, in _make_request self._validate_conn(conn) File "/workspace/specforge/lib/python3.11/site-packages/urllib3/connectionpool.py", line 1093, in _validate_conn conn.connect() File "/workspace/specforge/lib/python3.11/site-packages/urllib3/connection.py", line 759, in connect self.sock = sock = self._new_conn() ^^^^^^^^^^^^^^^^ File "/workspace/specforge/lib/python3.11/site-packages/urllib3/connection.py", line 213, in _new_conn raise ConnectTimeoutError( urllib3.exceptions.ConnectTimeoutError: (, 'Connection to api.wandb.ai timed out. (connect timeout=20)') The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/workspace/specforge/lib/python3.11/site-packages/requests/adapters.py", line 644, in send resp = conn.urlopen( ^^^^^^^^^^^^^ File "/workspace/specforge/lib/python3.11/site-packages/urllib3/connectionpool.py", line 841, in urlopen retries = retries.increment( ^^^^^^^^^^^^^^^^^^ File "/workspace/specforge/lib/python3.11/site-packages/urllib3/util/retry.py", line 535, in increment raise MaxRetryError(_pool, url, reason) from reason # type: ignore[arg-type] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='api.wandb.ai', port=443): Max retries exceeded with url: /graphql (Caused by ConnectTimeoutError(, 'Connection to api.wandb.ai timed out. (connect timeout=20)')) During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/workspace/specforge/lib/python3.11/site-packages/wandb/sdk/lib/retry.py", line 157, in __call__ result = self._call_fn(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/workspace/specforge/lib/python3.11/site-packages/wandb/sdk/internal/internal_api.py", line 397, in execute return self.client.execute(*args, **kwargs) # type: ignore ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/workspace/specforge/lib/python3.11/site-packages/wandb/vendor/gql-0.2.0/wandb_gql/client.py", line 52, in execute result = self._get_result(document, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/workspace/specforge/lib/python3.11/site-packages/wandb/vendor/gql-0.2.0/wandb_gql/client.py", line 60, in _get_result return self.transport.execute(document, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/workspace/specforge/lib/python3.11/site-packages/wandb/sdk/lib/gql_request.py", line 70, in execute request = self.session.post(self.url, **post_args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/workspace/specforge/lib/python3.11/site-packages/requests/sessions.py", line 637, in post return self.request("POST", url, data=data, json=json, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/workspace/specforge/lib/python3.11/site-packages/requests/sessions.py", line 589, in request resp = self.send(prep, **send_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/workspace/specforge/lib/python3.11/site-packages/requests/sessions.py", line 703, in send r = adapter.send(request, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/workspace/specforge/lib/python3.11/site-packages/requests/adapters.py", line 665, in send raise ConnectTimeout(e, request=request) requests.exceptions.ConnectTimeout: HTTPSConnectionPool(host='api.wandb.ai', port=443): Max retries exceeded with url: /graphql (Caused by ConnectTimeoutError(, 'Connection to api.wandb.ai timed out. (connect timeout=20)'))