baxtos commited on
Commit
dbc520e
·
verified ·
1 Parent(s): c97078a

v5 cold-start fix: eager CUDA warmup + concurrency=1 + drop dead timeout_seconds

Browse files

ORT CUDA EP lazily binds on first sess.run; this caused validator's first /predict to eat cold-bind cost (30-300s in TEE-VM) and scheduler reaped the instance before activation. Now Miner.__init__ runs a no-op inference so on_startup blocks until GPU is hot. Also drop concurrency:4 (default 1; our miner.py is not thread-safe) and remove timeout_seconds:900 (not a Chute() kwarg, silently dropped).

Files changed (2) hide show
  1. chute_config.yml +3 -8
  2. miner.py +9 -5
chute_config.yml CHANGED
@@ -9,17 +9,12 @@ NodeSelector:
9
  gpu_count: 1
10
  min_vram_gb_per_gpu: 16
11
  max_hourly_price_per_gpu: 2
12
- exclude:
13
- - "5090"
14
- - b200
15
- - h200
16
- - h20
17
- - mi300x
18
 
19
  Chute:
20
  tee: true
21
- timeout_seconds: 900
22
  shutdown_after_seconds: 86400
23
- concurrency: 4
24
  max_instances: 5
25
  scaling_threshold: 0.5
 
9
  gpu_count: 1
10
  min_vram_gb_per_gpu: 16
11
  max_hourly_price_per_gpu: 2
12
+ include:
13
+ - pro_6000
 
 
 
 
14
 
15
  Chute:
16
  tee: true
 
17
  shutdown_after_seconds: 86400
18
+ concurrency: 1
19
  max_instances: 5
20
  scaling_threshold: 0.5
miner.py CHANGED
@@ -62,13 +62,17 @@ class Miner:
62
  active = self.sess.get_providers()[0]
63
  print(f"✅ ONNX beverage model loaded (provider={active})")
64
 
65
- # Warm CUDA kernels / ORT graph so the very first /predict isn't slow.
66
- warm = np.zeros((64, 64, 3), dtype=np.uint8)
 
 
 
67
  try:
68
- self._infer(warm)
69
- print("✅ ONNX warmup pass done")
 
70
  except Exception as e:
71
- print(f"⚠️ ONNX warmup pass failed: {e}")
72
 
73
  def __repr__(self) -> str:
74
  return f"BeverageONNX(in={self.input_size}, cls={self.num_classes})"
 
62
  active = self.sess.get_providers()[0]
63
  print(f"✅ ONNX beverage model loaded (provider={active})")
64
 
65
+ # Eager CUDA EP allocation: ORT lazily binds CUDA on first sess.run,
66
+ # so without this the validator's first /predict eats the cold-bind
67
+ # cost (30-300s in TEE-VM) and the scheduler reaps the instance
68
+ # before activation. Run a no-op inference here so on_startup only
69
+ # returns once GPU kernels/buffers are hot.
70
  try:
71
+ _dummy = np.zeros((self.input_size, self.input_size, 3), dtype=np.uint8)
72
+ _ = self._infer(_dummy)
73
+ print(f"✅ ONNX warmup pass completed (provider={active})")
74
  except Exception as e:
75
+ print(f"⚠️ ONNX warmup pass failed (not fatal): {e}")
76
 
77
  def __repr__(self) -> str:
78
  return f"BeverageONNX(in={self.input_size}, cls={self.num_classes})"