Fahad-S commited on
Commit
eff606f
·
verified ·
1 Parent(s): 32d547b

Upload Fast-SANA_1024_100k/debug_sana_connector_learnable.txt with huggingface_hub

Browse files
Fast-SANA_1024_100k/debug_sana_connector_learnable.txt ADDED
@@ -0,0 +1,93 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ W0818 01:32:11.361000 22608120907584 torch/distributed/run.py:757]
2
+ W0818 01:32:11.361000 22608120907584 torch/distributed/run.py:757] *****************************************
3
+ W0818 01:32:11.361000 22608120907584 torch/distributed/run.py:757] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
4
+ W0818 01:32:11.361000 22608120907584 torch/distributed/run.py:757] *****************************************
5
+ /proj/cvl/users/x_fahkh2/envs/blip3o/bin/python: can't open file '/proj/cvl/users/x_fahkh2/BLIP3o_SANA/fastvlm-o/Fast-SANA/blip3o/train/train_mem.py': [Errno 2] No such file or directory
6
+ /proj/cvl/users/x_fahkh2/envs/blip3o/bin/python: can't open file '/proj/cvl/users/x_fahkh2/BLIP3o_SANA/fastvlm-o/Fast-SANA/blip3o/train/train_mem.py': [Errno 2] No such file or directory
7
+ /proj/cvl/users/x_fahkh2/envs/blip3o/bin/python: can't open file '/proj/cvl/users/x_fahkh2/BLIP3o_SANA/fastvlm-o/Fast-SANA/blip3o/train/train_mem.py': [Errno 2] No such file or directory
8
+ /proj/cvl/users/x_fahkh2/envs/blip3o/bin/python: can't open file '/proj/cvl/users/x_fahkh2/BLIP3o_SANA/fastvlm-o/Fast-SANA/blip3o/train/train_mem.py': [Errno 2] No such file or directory
9
+ /proj/cvl/users/x_fahkh2/envs/blip3o/bin/python: can't open file '/proj/cvl/users/x_fahkh2/BLIP3o_SANA/fastvlm-o/Fast-SANA/blip3o/train/train_mem.py': [Errno 2] No such file or directory
10
+ /proj/cvl/users/x_fahkh2/envs/blip3o/bin/python: can't open file '/proj/cvl/users/x_fahkh2/BLIP3o_SANA/fastvlm-o/Fast-SANA/blip3o/train/train_mem.py': [Errno 2] No such file or directory
11
+ /proj/cvl/users/x_fahkh2/envs/blip3o/bin/python: can't open file '/proj/cvl/users/x_fahkh2/BLIP3o_SANA/fastvlm-o/Fast-SANA/blip3o/train/train_mem.py': [Errno 2] No such file or directory
12
+ /proj/cvl/users/x_fahkh2/envs/blip3o/bin/python: can't open file '/proj/cvl/users/x_fahkh2/BLIP3o_SANA/fastvlm-o/Fast-SANA/blip3o/train/train_mem.py': [Errno 2] No such file or directory
13
+ E0818 01:32:16.405000 22608120907584 torch/distributed/elastic/multiprocessing/api.py:826] failed (exitcode: 2) local_rank: 0 (pid: 862766) of binary: /proj/cvl/users/x_fahkh2/envs/blip3o/bin/python
14
+ Traceback (most recent call last):
15
+ File "/proj/cvl/users/x_fahkh2/envs/blip3o/bin/torchrun", line 8, in <module>
16
+ sys.exit(main())
17
+ ^^^^^^
18
+ File "/proj/cvl/users/x_fahkh2/envs/blip3o/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 347, in wrapper
19
+ return f(*args, **kwargs)
20
+ ^^^^^^^^^^^^^^^^^^
21
+ File "/proj/cvl/users/x_fahkh2/envs/blip3o/lib/python3.11/site-packages/torch/distributed/run.py", line 879, in main
22
+ run(args)
23
+ File "/proj/cvl/users/x_fahkh2/envs/blip3o/lib/python3.11/site-packages/torch/distributed/run.py", line 870, in run
24
+ elastic_launch(
25
+ File "/proj/cvl/users/x_fahkh2/envs/blip3o/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 132, in __call__
26
+ return launch_agent(self._config, self._entrypoint, list(args))
27
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
28
+ File "/proj/cvl/users/x_fahkh2/envs/blip3o/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 263, in launch_agent
29
+ raise ChildFailedError(
30
+ torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
31
+ ============================================================
32
+ blip3o/train/train_mem.py FAILED
33
+ ------------------------------------------------------------
34
+ Failures:
35
+ [1]:
36
+ time : 2025-08-18_01:32:16
37
+ host : node045
38
+ rank : 1 (local_rank: 1)
39
+ exitcode : 2 (pid: 862767)
40
+ error_file: <N/A>
41
+ traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
42
+ [2]:
43
+ time : 2025-08-18_01:32:16
44
+ host : node045
45
+ rank : 2 (local_rank: 2)
46
+ exitcode : 2 (pid: 862768)
47
+ error_file: <N/A>
48
+ traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
49
+ [3]:
50
+ time : 2025-08-18_01:32:16
51
+ host : node045
52
+ rank : 3 (local_rank: 3)
53
+ exitcode : 2 (pid: 862769)
54
+ error_file: <N/A>
55
+ traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
56
+ [4]:
57
+ time : 2025-08-18_01:32:16
58
+ host : node045
59
+ rank : 4 (local_rank: 4)
60
+ exitcode : 2 (pid: 862770)
61
+ error_file: <N/A>
62
+ traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
63
+ [5]:
64
+ time : 2025-08-18_01:32:16
65
+ host : node045
66
+ rank : 5 (local_rank: 5)
67
+ exitcode : 2 (pid: 862771)
68
+ error_file: <N/A>
69
+ traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
70
+ [6]:
71
+ time : 2025-08-18_01:32:16
72
+ host : node045
73
+ rank : 6 (local_rank: 6)
74
+ exitcode : 2 (pid: 862772)
75
+ error_file: <N/A>
76
+ traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
77
+ [7]:
78
+ time : 2025-08-18_01:32:16
79
+ host : node045
80
+ rank : 7 (local_rank: 7)
81
+ exitcode : 2 (pid: 862773)
82
+ error_file: <N/A>
83
+ traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
84
+ ------------------------------------------------------------
85
+ Root Cause (first observed failure):
86
+ [0]:
87
+ time : 2025-08-18_01:32:16
88
+ host : node045
89
+ rank : 0 (local_rank: 0)
90
+ exitcode : 2 (pid: 862766)
91
+ error_file: <N/A>
92
+ traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
93
+ ============================================================