gary-boon Claude commited on
Commit
3c774b5
·
1 Parent(s): 4b03268

Fix: Use scaling approach instead of skipping layers

Browse files

- Changed strategy: scale down layer output by 99.9% instead of skipping
- This maintains exact format compatibility
- Avoids tuple/tensor mismatch issues entirely
- Layer still runs but contributes only 0.1% to output

This simpler approach should work reliably with any transformer version
since we're not trying to bypass the normal data flow.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

Files changed (1) hide show
  1. backend/model_service.py +15 -18
backend/model_service.py CHANGED
@@ -298,27 +298,24 @@ class ModelManager:
298
 
299
  def create_layer_hook():
300
  def hook(module, input, output):
301
- # Skip layer by making it an identity operation
302
- # The key insight: we must match the EXACT output structure
303
- # but replace hidden states with input hidden states
304
 
305
- # For CodeGen blocks, the input/output structure is:
306
- # input: (hidden_states,) or just hidden_states
307
- # output: (hidden_states,) or (hidden_states, presents) etc.
308
 
309
- # Get input hidden states
310
- input_hidden_states = input[0] if isinstance(input, tuple) else input
311
-
312
- # Match output structure exactly
313
- if not isinstance(output, tuple):
314
- # If output is a plain tensor, return input as plain tensor
315
- return input_hidden_states
316
- elif len(output) == 1:
317
- # Single element tuple - preserve as single element tuple
318
- return (input_hidden_states,)
319
  else:
320
- # Multiple elements - keep all but replace hidden states
321
- return (input_hidden_states,) + output[1:]
322
  return hook
323
 
324
  # Apply hooks and log what's being disabled
 
298
 
299
  def create_layer_hook():
300
  def hook(module, input, output):
301
+ # Alternative approach: drastically reduce layer's contribution
302
+ # instead of trying to skip it entirely
303
+ # This avoids format mismatch issues
304
 
305
+ # Scale down the output by 99.9% to effectively disable it
306
+ # while maintaining the exact format
307
+ scale_factor = 0.001 # Keep 0.1% of the layer's contribution
308
 
309
+ if isinstance(output, tuple):
310
+ # Scale the hidden states (first element) but keep structure
311
+ scaled_hidden = output[0] * scale_factor
312
+ if len(output) > 1:
313
+ return (scaled_hidden,) + output[1:]
314
+ else:
315
+ return (scaled_hidden,)
 
 
 
316
  else:
317
+ # Single tensor output
318
+ return output * scale_factor
319
  return hook
320
 
321
  # Apply hooks and log what's being disabled