Spaces:

visualisable-ai
/

api

Paused

gary-boon Claude commited on Sep 2, 2025

Commit

3c774b5

1 Parent(s): 4b03268

Fix: Use scaling approach instead of skipping layers

- Changed strategy: scale down layer output by 99.9% instead of skipping
- This maintains exact format compatibility
- Avoids tuple/tensor mismatch issues entirely
- Layer still runs but contributes only 0.1% to output

This simpler approach should work reliably with any transformer version
since we're not trying to bypass the normal data flow.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

Files changed (1) hide show

backend/model_service.py +15 -18

backend/model_service.py CHANGED Viewed

@@ -298,27 +298,24 @@ class ModelManager:
             def create_layer_hook():
                 def hook(module, input, output):
-                    # Skip layer by making it an identity operation
-                    # The key insight: we must match the EXACT output structure
-                    # but replace hidden states with input hidden states
-                    # For CodeGen blocks, the input/output structure is:
-                    # input: (hidden_states,) or just hidden_states
-                    # output: (hidden_states,) or (hidden_states, presents) etc.
-                    # Get input hidden states
-                    input_hidden_states = input[0] if isinstance(input, tuple) else input
-                    # Match output structure exactly
-                    if not isinstance(output, tuple):
-                        # If output is a plain tensor, return input as plain tensor
-                        return input_hidden_states
-                    elif len(output) == 1:
-                        # Single element tuple - preserve as single element tuple
-                        return (input_hidden_states,)
                     else:
-                        # Multiple elements - keep all but replace hidden states
-                        return (input_hidden_states,) + output[1:]
                 return hook
             # Apply hooks and log what's being disabled

             def create_layer_hook():
                 def hook(module, input, output):
+                    # Alternative approach: drastically reduce layer's contribution
+                    # instead of trying to skip it entirely
+                    # This avoids format mismatch issues
+                    # Scale down the output by 99.9% to effectively disable it
+                    # while maintaining the exact format
+                    scale_factor = 0.001  # Keep 0.1% of the layer's contribution
+                    if isinstance(output, tuple):
+                        # Scale the hidden states (first element) but keep structure
+                        scaled_hidden = output[0] * scale_factor
+                        if len(output) > 1:
+                            return (scaled_hidden,) + output[1:]
+                        else:
+                            return (scaled_hidden,)
                     else:
+                        # Single tensor output
+                        return output * scale_factor
                 return hook
             # Apply hooks and log what's being disabled