File size: 9,879 Bytes
b0c0df0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
开始时间: Sat Nov  1 14:02:07 CST 2025
节点列表: SH-IDCA1404-10-140-54-69
总进程数: 8
当前任务ID: 6181860
INFO:    User not listed in /etc/subuid, trying root-mapped namespace
INFO:    No user namespaces available, using only the fakeroot command
WARNING: nv files may not be bound with --writable
WARNING: Skipping mount /etc/localtime [binds]: /etc/localtime doesn't exist in container
WARNING: Skipping mount /usr/bin/nvidia-debugdump [files]: /usr/bin/nvidia-debugdump doesn't exist in container
WARNING: Skipping mount /usr/bin/nvidia-persistenced [files]: /usr/bin/nvidia-persistenced doesn't exist in container
WARNING: Skipping mount /usr/bin/nvidia-cuda-mps-control [files]: /usr/bin/nvidia-cuda-mps-control doesn't exist in container
WARNING: Skipping mount /usr/bin/nvidia-cuda-mps-server [files]: /usr/bin/nvidia-cuda-mps-server doesn't exist in container
开始时间: Sat Nov  1 14:07:35 CST 2025
节点列表: SH-IDCA1404-10-140-54-69
总进程数: 8
当前任务ID: 6181860
INFO:    User not listed in /etc/subuid, trying root-mapped namespace
INFO:    No user namespaces available, using only the fakeroot command
WARNING: nv files may not be bound with --writable
WARNING: Skipping mount /etc/localtime [binds]: /etc/localtime doesn't exist in container
WARNING: Skipping mount /usr/bin/nvidia-debugdump [files]: /usr/bin/nvidia-debugdump doesn't exist in container
WARNING: Skipping mount /usr/bin/nvidia-persistenced [files]: /usr/bin/nvidia-persistenced doesn't exist in container
WARNING: Skipping mount /usr/bin/nvidia-cuda-mps-control [files]: /usr/bin/nvidia-cuda-mps-control doesn't exist in container
WARNING: Skipping mount /usr/bin/nvidia-cuda-mps-server [files]: /usr/bin/nvidia-cuda-mps-server doesn't exist in container
[INFO|2025-11-01 06:07:53] llamafactory.launcher:143 >> Initializing 8 distributed tasks at: 127.0.0.1:17821
W1101 06:07:57.063000 40488 site-packages/torch/distributed/run.py:792] 
W1101 06:07:57.063000 40488 site-packages/torch/distributed/run.py:792] *****************************************
W1101 06:07:57.063000 40488 site-packages/torch/distributed/run.py:792] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
W1101 06:07:57.063000 40488 site-packages/torch/distributed/run.py:792] *****************************************
slurmstepd: error: *** JOB 6181860 ON SH-IDCA1404-10-140-54-69 CANCELLED AT 2025-11-01T14:08:08 DUE TO PREEMPTION ***
W1101 06:08:08.849000 40488 site-packages/torch/distributed/elastic/agent/server/api.py:719] Received 15 death signal, shutting down workers
W1101 06:08:08.851000 40488 site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 40555 closing signal SIGTERM
W1101 06:08:08.852000 40488 site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 40556 closing signal SIGTERM
W1101 06:08:08.853000 40488 site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 40557 closing signal SIGTERM
W1101 06:08:08.854000 40488 site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 40558 closing signal SIGTERM
W1101 06:08:08.855000 40488 site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 40559 closing signal SIGTERM
W1101 06:08:08.856000 40488 site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 40560 closing signal SIGTERM
W1101 06:08:08.857000 40488 site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 40561 closing signal SIGTERM
W1101 06:08:08.858000 40488 site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 40562 closing signal SIGTERM
开始时间: Sat Nov  1 14:14:51 CST 2025
节点列表: SH-IDCA1404-10-140-54-51
总进程数: 8
当前任务ID: 6181860
INFO:    User not listed in /etc/subuid, trying root-mapped namespace
INFO:    No user namespaces available, using only the fakeroot command
WARNING: nv files may not be bound with --writable
WARNING: Skipping mount /etc/localtime [binds]: /etc/localtime doesn't exist in container
WARNING: Skipping mount /usr/bin/nvidia-debugdump [files]: /usr/bin/nvidia-debugdump doesn't exist in container
WARNING: Skipping mount /usr/bin/nvidia-persistenced [files]: /usr/bin/nvidia-persistenced doesn't exist in container
WARNING: Skipping mount /usr/bin/nvidia-cuda-mps-control [files]: /usr/bin/nvidia-cuda-mps-control doesn't exist in container
WARNING: Skipping mount /usr/bin/nvidia-cuda-mps-server [files]: /usr/bin/nvidia-cuda-mps-server doesn't exist in container
开始时间: Sat Nov  1 14:36:14 CST 2025
节点列表: SH-IDCA1404-10-140-54-69
总进程数: 8
当前任务ID: 6181860
INFO:    User not listed in /etc/subuid, trying root-mapped namespace
INFO:    No user namespaces available, using only the fakeroot command
WARNING: nv files may not be bound with --writable
WARNING: Skipping mount /etc/localtime [binds]: /etc/localtime doesn't exist in container
WARNING: Skipping mount /usr/bin/nvidia-debugdump [files]: /usr/bin/nvidia-debugdump doesn't exist in container
WARNING: Skipping mount /usr/bin/nvidia-persistenced [files]: /usr/bin/nvidia-persistenced doesn't exist in container
WARNING: Skipping mount /usr/bin/nvidia-cuda-mps-control [files]: /usr/bin/nvidia-cuda-mps-control doesn't exist in container
WARNING: Skipping mount /usr/bin/nvidia-cuda-mps-server [files]: /usr/bin/nvidia-cuda-mps-server doesn't exist in container
[INFO|2025-11-01 06:36:31] llamafactory.launcher:143 >> Initializing 8 distributed tasks at: 127.0.0.1:17821
W1101 06:36:35.028000 101064 site-packages/torch/distributed/run.py:792] 
W1101 06:36:35.028000 101064 site-packages/torch/distributed/run.py:792] *****************************************
W1101 06:36:35.028000 101064 site-packages/torch/distributed/run.py:792] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
W1101 06:36:35.028000 101064 site-packages/torch/distributed/run.py:792] *****************************************
[2025-11-01 06:36:53,992] [INFO] [real_accelerator.py:254:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2025-11-01 06:36:53,992] [INFO] [real_accelerator.py:254:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2025-11-01 06:36:53,992] [INFO] [real_accelerator.py:254:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2025-11-01 06:36:53,993] [INFO] [real_accelerator.py:254:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2025-11-01 06:36:53,993] [INFO] [real_accelerator.py:254:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2025-11-01 06:36:53,993] [INFO] [real_accelerator.py:254:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2025-11-01 06:36:53,994] [INFO] [real_accelerator.py:254:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2025-11-01 06:36:53,994] [INFO] [real_accelerator.py:254:get_accelerator] Setting ds_accelerator to cuda (auto detect)
/opt/conda/lib/python3.11/site-packages/jieba/_compat.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
  import pkg_resources
/opt/conda/lib/python3.11/site-packages/jieba/_compat.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
  import pkg_resources
/opt/conda/lib/python3.11/site-packages/jieba/_compat.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
  import pkg_resources
/opt/conda/lib/python3.11/site-packages/jieba/_compat.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
  import pkg_resources
/opt/conda/lib/python3.11/site-packages/jieba/_compat.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
  import pkg_resources
/opt/conda/lib/python3.11/site-packages/jieba/_compat.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
  import pkg_resources
/opt/conda/lib/python3.11/site-packages/jieba/_compat.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
  import pkg_resources
/opt/conda/lib/python3.11/site-packages/jieba/_compat.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
  import pkg_resources