ACE-Step-Custom / FIX_META_TENSOR_ERROR.md
ACE-Step Custom
Fix: Add device_map to prevent meta tensor errors on ZeroGPU
6b39c2d

A newer version of the Gradio SDK is available: 6.5.1

Upgrade

Fix for Meta Tensor Error on Hugging Face Spaces (ZeroGPU)

Problem Summary

When deploying to Hugging Face Spaces with ZeroGPU, the application crashed during model initialization with the error:

RuntimeError: Tensor.item() cannot be called on meta tensors

This occurred in the ResidualFSQ initialization within the custom model code during the model's __init__ method.

Root Cause

On Hugging Face Spaces with ZeroGPU architecture, the Transformers library initializes models on the "meta" device (placeholder tensors) before loading actual weights. The custom ACE-Step model code attempts to perform operations on tensors during initialization (specifically checking assert (levels_tensor > 1).all() in the ResidualFSQ quantizer), which fails because meta tensors cannot be used for actual computations.

Solution

Added explicit device_map parameter to all from_pretrained() calls to force direct loading onto the target device, bypassing the meta device initialization phase.

Changes Made

1. acestep/handler.py

DiT Model Loading (line ~491)

self.model = AutoModel.from_pretrained(
    acestep_v15_checkpoint_path,
    trust_remote_code=True,
    attn_implementation=candidate,
    torch_dtype=self.dtype,
    low_cpu_mem_usage=False,
    _fast_init=False,
    device_map={"": device},  # NEW: Explicitly map to target device
)

VAE Loading (line ~569)

vae_device = device if not self.offload_to_cpu else "cpu"
self.vae = AutoencoderOobleck.from_pretrained(
    vae_checkpoint_path,
    device_map={"": vae_device}  # NEW: Explicitly map to target device
)

Text Encoder Loading (line ~597)

text_encoder_device = device if not self.offload_to_cpu else "cpu"
self.text_encoder = AutoModel.from_pretrained(
    text_encoder_path,
    device_map={"": text_encoder_device}  # NEW: Explicitly map to target device
)

2. acestep/llm_inference.py

Main LLM Loading (line ~275)

def _load_pytorch_model(self, model_path: str, device: str) -> Tuple[bool, str]:
    target_device = device if not self.offload_to_cpu else "cpu"
    self.llm = AutoModelForCausalLM.from_pretrained(
        model_path,
        trust_remote_code=True,
        device_map={"": target_device}  # NEW: Explicitly map to target device
    )

Scoring Models (lines ~3016, 3045)

Added device_map parameter to both vLLM and MLX scoring model loading to ensure consistent device handling.

Technical Details

What is device_map?

The device_map parameter in Transformers' from_pretrained() tells the loader exactly which device each model component should be loaded to. Using {"": device} means "load all components to this single device", which forces immediate materialization on the target device rather than going through meta device first.

Why This Fixes the Issue

  1. Direct Loading: Models are loaded directly to CUDA/CPU without meta device intermediate step
  2. Tensor Materialization: All tensors are real tensors from the start, not placeholders
  3. Initialization Safety: Custom model code can safely perform operations during __init__

Compatibility

  • ✅ Works with ZeroGPU on Hugging Face Spaces
  • ✅ Compatible with local CUDA environments
  • ✅ Supports CPU fallback mode
  • ✅ Maintains offload_to_cpu functionality

Testing Recommendations

After deploying these changes to HF Space:

  1. Test standard generation with various prompts
  2. Verify model loads without meta tensor errors
  3. Check that ZeroGPU scheduling works correctly
  4. Monitor memory usage and generation quality

Deployment Instructions

  1. Commit changes to your repository:

    git add acestep/handler.py acestep/llm_inference.py
    git commit -m "Fix: Add device_map to prevent meta tensor errors on ZeroGPU"
    git push
    
  2. If using HF Space with GitHub sync, the space will auto-update

  3. If manually managing the space, copy updated files to the space repository

  4. Monitor the space logs to confirm successful initialization

Expected Log Output (After Fix)

2026-02-09 XX:XX:XX - acestep.handler - INFO - [initialize_service] Attempting to load model with attention implementation: sdpa
2026-02-09 XX:XX:XX - acestep.handler - INFO - ✅ Model initialized successfully on cuda

No more "Tensor.item() cannot be called on meta tensors" errors should appear.

Additional Notes

  • The fix maintains backward compatibility with existing local setups
  • No changes to model architecture or inference logic
  • Performance characteristics remain unchanged
  • Memory usage patterns are preserved