Spaces:

MogensR
/

VideoBackgroundReplacer2

Paused

App Files Files Community

MogensR commited on Sep 15, 2025

Commit

4559cb6

1 Parent(s): 287d685

consultant 1.0

Browse files

Files changed (7) hide show

COMPREHENSIVE_DIAGNOSTIC_REPORT.md +216 -0
app.py +20 -6
models/__init__.py +21 -165
models/__pycache__/__init__.cpython-313.pyc +0 -0
models/matanyone_loader.py +287 -133
pipeline.py +30 -24
test_matanyone.py +137 -0

COMPREHENSIVE_DIAGNOSTIC_REPORT.md ADDED Viewed

	@@ -0,0 +1,216 @@

+# COMPREHENSIVE DIAGNOSTIC REPORT: Video Background Replacement Pipeline
+**Date:** 2025-09-15
+**Issue:** No video output generated despite successful processing stages
+**Environment:** Hugging Face Spaces, Tesla T4 GPU, CUDA 12.1.1, Python 3.10.12, PyTorch 2.8.0+cu128
+## CRITICAL ISSUES IDENTIFIED
+### 1. **MatAnyone API Incompatibility** ❌ CRITICAL
+**Problem:** The code uses two conflicting MatAnyone implementations:
+- **Local wrapper** (`matanyone_loader.py`): `process_video(frames, seed_mask_hw, every=50)`
+- **Actual library** (`InferenceCore`): `process_video()` doesn't accept `every` parameter
+**Error Log:**
+```
+[2025-09-15 12:27:07,899] ERROR: MatAnyone worker thread failed: InferenceCore.process_video() got an unexpected keyword argument 'every'
+```
+**Root Cause:** Function signature mismatch between expected and actual MatAnyone API.
+**Status:** ✅ FIXED - Removed `every=10` parameter from API call
+### 2. **Duplicate Function Definitions** ❌ CRITICAL
+**Problem:** Two `run_matany()` functions defined in same file (`models/__init__.py` lines 528 and 563)
+- First function: `run_matany(matany, video_path, first_mask_path, work_dir)`
+- Second function: `run_matany(session, video_path, mask_path, out_dir, progress_callback)`
+**Impact:** Python uses the last definition, causing parameter mismatches and logic conflicts.
+**Status:** ✅ FIXED - Removed duplicate function definition
+### 3. **MatAnyone Processing Logic Flaws** ❌ HIGH
+**Problems in current implementation:**
+- Loads ALL video frames into memory (1135 frames × 1280×720 = ~3GB RAM)
+- Incorrect frame indexing in processing loop
+- Missing proper error handling for MatAnyone session methods
+- No validation of MatAnyone output format
+**Current Code Issues:**
+```python
+# PROBLEM: Memory overload
+frames = []
+while True:
+    ret, frame = cap.read()
+    frames.append(frame)  # Stores entire video in RAM
+# PROBLEM: Index mismatch
+for i, alpha_result in enumerate(session.process_video(frames, seed_mask_hw)):
+    current_frame = frames[i]  # May exceed frames list length
+```
+### 4. **API Method Uncertainty** ❌ HIGH
+**Problem:** Code assumes MatAnyone `InferenceCore` has `process_video()` method, but actual API may differ.
+**Evidence from logs:**
+- MatAnyone import succeeds: `[MATANY] import OK from: /usr/local/lib/python3.10/dist-packages/matanyone`
+- But processing fails with parameter error
+**Need to verify:** Actual MatAnyone InferenceCore API methods and signatures.
+## PIPELINE FLOW ANALYSIS
+### Stage 0: Video Preparation ✅ WORKING
+```
+✅ Video loaded: 1280x720 @ 25fps (1135 frames)
+```
+### Stage 1: SAM2 Segmentation ✅ WORKING
+```
+✅ SAM2 segmentation complete
+✅ Stage 1 complete - Mask generated
+```
+### Stage 2: MatAnyone Processing ❌ FAILING
+```
+❌ MatAnyone worker thread failed: InferenceCore.process_video() got an unexpected keyword argument 'every'
+❌ [2] MatAnyone returned no file paths
+```
+### Stage 3+: Never Reached
+Pipeline stops at Stage 2, so no final video is generated.
+## CODE ARCHITECTURE ISSUES
+### 1. **Inconsistent MatAnyone Integration**
+- `models/__init__.py` uses `InferenceCore` directly
+- `models/matanyone_loader.py` defines custom wrapper class
+- Pipeline uses the direct approach but with wrapper-style parameters
+### 2. **Memory Management Problems**
+- Loads entire video into RAM unnecessarily
+- No streaming/chunked processing for large videos
+- GPU memory properly managed, but RAM usage excessive
+### 3. **Error Handling Gaps**
+- MatAnyone failures don't trigger proper fallbacks
+- No validation of intermediate outputs
+- Threading timeout works but doesn't handle API errors gracefully
+## RECOMMENDED FIXES
+### Priority 1: Fix MatAnyone API Integration
+```python
+# CURRENT (BROKEN):
+for i, alpha_result in enumerate(session.process_video(frames, seed_mask_hw)):
+# SHOULD BE (need to verify actual API):
+# Option A: Frame-by-frame processing
+for frame in frames:
+    alpha_result = session.process_frame(frame, seed_mask_hw)
+# Option B: Batch processing
+alpha_results = session.process_video(frames, seed_mask_hw)
+```
+### Priority 2: Implement Streaming Processing
+```python
+# Instead of loading all frames:
+cap = cv2.VideoCapture(str(video_path))
+while True:
+    ret, frame = cap.read()
+    if not ret:
+        break
+    alpha_result = session.process_frame(frame, seed_mask_hw)
+    # Process immediately, don't store
+```
+### Priority 3: Add Proper API Validation
+```python
+# Verify MatAnyone methods before use:
+if hasattr(session, 'process_video'):
+    # Check method signature
+elif hasattr(session, 'process_frame'):
+    # Use frame-by-frame approach
+else:
+    # Fallback to static masking
+```
+## TESTING REQUIREMENTS
+### 1. **MatAnyone API Discovery**
+Need to determine actual InferenceCore methods:
+```python
+from matanyone import InferenceCore
+core = InferenceCore("PeiqingYang/MatAnyone")
+print(dir(core))  # List all available methods
+help(core.process_video)  # Get method signature
+```
+### 2. **Memory Usage Testing**
+- Test with smaller videos first (< 100 frames)
+- Monitor RAM usage during processing
+- Implement frame-by-frame processing
+### 3. **Output Validation**
+- Verify fg.mp4 and alpha.mp4 are created
+- Check file sizes > 0
+- Validate video format compatibility
+## CURRENT STATUS
+**Fixed Issues:**
+- ✅ Removed `every` parameter from MatAnyone call
+- ✅ Removed duplicate function definition
+- ✅ SAM2 processing works correctly
+- ✅ GPU acceleration confirmed
+**Remaining Issues:**
+- ❌ MatAnyone API method signature unknown
+- ❌ Memory-intensive frame loading
+- ❌ No video output generated
+- ❌ Fallback mechanisms not triggered
+## NEXT STEPS FOR EXTERNAL AI
+1. **Investigate MatAnyone InferenceCore API:**
+   - What methods are available?
+   - What are the correct parameter signatures?
+   - Does it support batch or streaming processing?
+2. **Implement Correct API Usage:**
+   - Use proper method calls
+   - Handle different processing modes
+   - Add robust error handling
+3. **Optimize Memory Usage:**
+   - Implement streaming processing
+   - Avoid loading entire video into RAM
+   - Process frames individually or in small batches
+4. **Add Comprehensive Fallbacks:**
+   - Static mask compositing when MatAnyone fails
+   - Alternative matting algorithms
+   - Graceful degradation paths
+## LOG EVIDENCE
+**Success Indicators:**
+```
+[2025-09-15 12:26:59,572] INFO: GPU memory: 1.0GB allocated, 1.1GB reserved
+[2025-09-15 12:27:01,314] INFO: Progress: ✅ SAM2 segmentation complete
+[2025-09-15 12:27:04,639] INFO: GPU memory: 0.2GB allocated, 0.2GB reserved
+```
+**Failure Point:**
+```
+[2025-09-15 12:27:07,899] ERROR: MatAnyone worker thread failed: InferenceCore.process_video() got an unexpected keyword argument 'every'
+[2025-09-15 12:27:07,900] ERROR: [2] MatAnyone returned no file paths
+```
+**Pipeline Continuation:**
+```
+[2025-09-15 12:27:08,068] INFO: Progress: ✅ Stage 2 complete - Video matting done
+```
+*Note: This is misleading - Stage 2 actually failed but pipeline continued*
+The pipeline architecture is sound, but the MatAnyone integration is fundamentally broken due to API incompatibility. Once the correct MatAnyone API usage is implemented, the video output should generate successfully.

app.py CHANGED Viewed

@@ -7,11 +7,6 @@
 print(f"=== APP STARTUP DEBUG: Python {sys.version} ===")
 print("=== APP STARTUP DEBUG: About to import modules ===")
 sys.stdout.flush()
-"""
-BackgroundFX Pro — App Entrypoint (UI separated)
-- UI is built in ui.py (create_interface)
-- Hardened startup: heartbeat, safe diag, bind to $PORT
-"""
 import os
 import sys
@@ -112,7 +107,26 @@ def _safe_startup_diag():
     import perf_tuning  # noqa: F401
     logger.info("perf_tuning imported successfully.")
 except Exception as e:
-    logger.warning("perf_tuning not loaded: %s", e)
 _safe_startup_diag()

 print(f"=== APP STARTUP DEBUG: Python {sys.version} ===")
 print("=== APP STARTUP DEBUG: About to import modules ===")
 sys.stdout.flush()
 import os
 import sys
     import perf_tuning  # noqa: F401
     logger.info("perf_tuning imported successfully.")
 except Exception as e:
+    logger.info("perf_tuning not available: %s", e)
+# MatAnyone API detection probe
+try:
+    from matanyone import InferenceCore
+    core = InferenceCore()
+    api = "step" if hasattr(core, "step") else "process_frame" if hasattr(core, "process_frame") else "process_video" if hasattr(core, "process_video") else "none"
+    import inspect
+    sigs = {}
+    for m in ("step", "process_frame", "process_video"):
+        if hasattr(core, m):
+            try:
+                sigs[m] = str(inspect.signature(getattr(core, m)))
+            except Exception:
+                sigs[m] = "(signature unavailable)"
+    logger.info(f"[MATANY] API={api} signatures={sigs}")
+except Exception as e:
+    logger.error(f"[MATANY] probe failed: {e}")
+# Continue with app startup
 _safe_startup_diag()

models/__init__.py CHANGED Viewed

@@ -525,172 +525,28 @@ def load_matany() -> Tuple[Optional[object], bool, Dict[str, Any]]:
         logger.error(f"MatAnyone init failed: {e}")
         return None, False, meta
-def run_matany(matany: object,
-               video_path: Union[str, Path],
-               first_mask_path: Union[str, Path],
-               work_dir: Union[str, Path]) -> Tuple[Optional[str], Optional[str], bool]:
-    """Return (foreground_video_path, alpha_video_path, ok)."""
-    if matany is None:
-        return None, None, False
-    import threading
-    import time
-    result_container = {"result": None, "exception": None, "completed": False}
-    def run_matany_thread():
-        try:
-            logger.info("MatAnyone: Starting video processing...")
-            if hasattr(matany, "process_video"):
-                logger.info("MatAnyone: Using process_video method")
-                out = matany.process_video(input_path=str(video_path), mask_path=str(first_mask_path), output_path=str(work_dir))
-                logger.info(f"MatAnyone: process_video returned: {type(out)}")
-                if isinstance(out, (list, tuple)) and len(out) >= 2:
-                    result_container["result"] = (str(out[0]), str(out[1]), True)
-                    result_container["completed"] = True
-                    return
-                if isinstance(out, dict):
-                    fg = out.get("foreground") or out.get("fg") or out.get("foreground_path")
-                    al = out.get("alpha") or out.get("alpha_path")
-                    if fg and al:
-                        result_container["result"] = (str(fg), str(al), True)
-                        result_container["completed"] = True
-                        return
-        except Exception as e:
-            logger.error(f"MatAnyone processing failed: {e}")
-            exception_container[0] = e
-def run_matany(session: object, video_path: Union[str, Path], mask_path: Union[str, Path], out_dir: Union[str, Path], progress_callback=None) -> Tuple[Optional[str], Optional[str], bool]:
-    """Run MatAnyone with timeout protection using threading."""
-    logger.info(f"run_matany called with video_path={video_path}, mask_path={mask_path}")
-    if session is None:
-        logger.error("MatAnyone session is None")
-        return None, None, False
-    try:
-        out_dir = Path(out_dir)
-        out_dir.mkdir(parents=True, exist_ok=True)
-        fg_path = out_dir / "fg.mp4"
-        alpha_path = out_dir / "alpha.mp4"
-        # Get total frames for progress tracking
-        import cv2
-        cap = cv2.VideoCapture(str(video_path))
-        total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
-        cap.release()
-        logger.info(f"Starting MatAnyone processing with threading timeout... ({total_frames} frames)")
-        # Use threading-based timeout instead of signal
-        result_container = [None]
-        exception_container = [None]
-        progress_container = [0]
-        def matany_worker():
-            try:
-                logger.info("MatAnyone worker thread started")
-                # Read video frames and mask
-                import cv2
-                import numpy as np
-                cap = cv2.VideoCapture(str(video_path))
-                mask_img = cv2.imread(str(mask_path), cv2.IMREAD_GRAYSCALE)
-                if mask_img is None:
-                    raise ValueError(f"Could not read mask image: {mask_path}")
-                # Get video properties
-                fps = cap.get(cv2.CAP_PROP_FPS)
-                width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
-                height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
-                # Resize mask to match video dimensions
-                mask_resized = cv2.resize(mask_img, (width, height))
-                seed_mask_hw = mask_resized.astype('float32') / 255.0
-                # Prepare output video writers
-                fourcc = cv2.VideoWriter_fourcc(*'mp4v')
-                fg_writer = cv2.VideoWriter(str(fg_path), fourcc, fps, (width, height))
-                alpha_writer = cv2.VideoWriter(str(alpha_path), fourcc, fps, (width, height))
-                # Process frames using MatAnyone frame-by-frame API
-                frame_count = 0
-                frames = []
-                # Read all frames first
-                while True:
-                    ret, frame = cap.read()
-                    if not ret:
-                        break
-                    frames.append(frame)
-                cap.release()
-                # Process frames through MatAnyone
-                for i, alpha_result in enumerate(session.process_video(frames, seed_mask_hw, every=10)):
-                    frame_count += 1
-                    # Update progress
-                    if progress_callback:
-                        progress_msg = f"MatAnyone processing frame {frame_count}/{total_frames} ({frame_count/total_frames*100:.1f}%)"
-                        try:
-                            progress_callback(progress_msg)
-                        except:
-                            logger.info(progress_msg)
-                    # Get current frame
-                    current_frame = frames[i]
-                    # Convert alpha to 3-channel for video writing
-                    alpha_3ch = cv2.cvtColor((alpha_result * 255).astype('uint8'), cv2.COLOR_GRAY2BGR)
-                    # Create foreground by applying alpha mask
-                    alpha_norm = alpha_result[:, :, np.newaxis]
-                    fg_frame = (current_frame.astype('float32') * alpha_norm).astype('uint8')
-                    # Write frames
-                    fg_writer.write(fg_frame)
-                    alpha_writer.write(alpha_3ch)
-                # Clean up
-                fg_writer.release()
-                alpha_writer.release()
-                result_container[0] = True
-                logger.info(f"MatAnyone worker thread completed successfully - processed {frame_count} frames")
-            except Exception as e:
-                logger.error(f"MatAnyone worker thread failed: {e}")
-                exception_container[0] = e
-        import threading
-        worker_thread = threading.Thread(target=matany_worker)
-        worker_thread.daemon = True
-        worker_thread.start()
-        # Wait with timeout (5 minutes)
-        timeout_seconds = 300
-        worker_thread.join(timeout=timeout_seconds)
-        if worker_thread.is_alive():
-            logger.error(f"MatAnyone processing timed out after {timeout_seconds} seconds")
-            return None, None, False
-        if exception_container[0]:
-            logger.error(f"MatAnyone processing failed: {exception_container[0]}")
-            return None, None, False
-        if result_container[0] and fg_path.exists() and alpha_path.exists():
-            logger.info(" MatAnyone processing completed successfully")
-            return str(fg_path), str(alpha_path), True
-        else:
-            logger.error("MatAnyone processing failed or returned no result")
-            return None, None, False
-    except Exception as e:
-        logger.error(f"MatAnyone processing failed with exception: {e}")
-        return None, None, False
 # --------------------------------------------------------------------------------------
 # Fallback Functions

         logger.error(f"MatAnyone init failed: {e}")
         return None, False, meta
+def run_matany(
+    video_path: Path,
+    mask_path: Optional[Path],
+    out_dir: Path,
+    device: Optional[str] = None,
+    progress_callback: Optional[Callable[[float, str], None]] = None,
+) -> Tuple[Path, Path]:
+    """
+    Run MatAnyone streaming matting.
+    Returns (alpha_mp4_path, fg_mp4_path).
+    Raises MatAnyError on failure.
+    """
+    from .matanyone_loader import MatAnyoneSession, MatAnyError
+    session = MatAnyoneSession(device=device, precision="auto")
+    alpha_p, fg_p = session.process_stream(
+        video_path=video_path,
+        seed_mask_path=mask_path,
+        out_dir=out_dir,
+        progress_cb=progress_callback,
+    )
+    return alpha_p, fg_p
 # --------------------------------------------------------------------------------------
 # Fallback Functions

models/__pycache__/__init__.cpython-313.pyc CHANGED Viewed

Binary files a/models/__pycache__/__init__.cpython-313.pyc and b/models/__pycache__/__init__.cpython-313.pyc differ

models/matanyone_loader.py CHANGED Viewed

@@ -1,143 +1,297 @@
 #!/usr/bin/env python3
 """
-MatAnyone Loader (compact)
-- Uses top-level wrapper: `from matanyone import InferenceCore`
-- Constructor takes a model/repo id string (e.g. "PeiqingYang/MatAnyone")
-- Normalizes inputs: image -> CHW float32 [0,1], mask -> 1HW float32 [0,1]
 """
 from __future__ import annotations
-import os, logging, time
-from typing import Iterable, Optional
-import numpy as np
 import torch
-logger = logging.getLogger("backgroundfx_pro")
-# ---------- tiny helpers ----------
-def _to_chw_float01(x: np.ndarray | torch.Tensor) -> torch.Tensor:
-    if isinstance(x, np.ndarray):
-        t = torch.from_numpy(x)
-    else:
-        t = x
-    if t.ndim == 3 and t.shape[-1] in (1, 3, 4):  # HWC
-        t = t.permute(2, 0, 1)                     # -> CHW
-    elif t.ndim == 2:                              # HW -> 1HW
-        t = t.unsqueeze(0)
-    elif t.ndim != 3:
-        raise ValueError(f"image: bad shape {tuple(t.shape)}")
-    t = t.contiguous().to(torch.float32)
-    with torch.no_grad():
-        if t.numel() and (torch.nanmax(t) > 1.0 or torch.nanmin(t) < 0.0):
-            t = t / 255.0
-        t.clamp_(0.0, 1.0)
-    return t
-def _to_1hw_float01(m: np.ndarray | torch.Tensor) -> torch.Tensor:
-    if isinstance(m, np.ndarray):
-        t = torch.from_numpy(m)
-    else:
-        t = m
-    if t.ndim == 2:              # HW
-        t = t.unsqueeze(0)       # -> 1HW
-    elif t.ndim == 3:
-        if t.shape[0] in (1, 3): # CHW
-            t = t[:1, ...]
-        elif t.shape[-1] in (1, 3):  # HWC
-            t = t[..., 0]
-            t = t.unsqueeze(0)
-        else:
-            raise ValueError(f"mask: bad shape {tuple(t.shape)}")
-    else:
-        raise ValueError(f"mask: bad shape {tuple(t.shape)}")
-    t = t.contiguous().to(torch.float32)
-    with torch.no_grad():
-        if t.numel() and (torch.nanmax(t) > 1.0 or torch.nanmin(t) < 0.0):
-            t = t / 255.0
-        t.clamp_(0.0, 1.0)
-    return t
-# ---------- session ----------
 class MatAnyoneSession:
-    def __init__(self, device: Optional[str] = None, repo_id: Optional[str] = None) -> None:
-        self.device = device or ("cuda" if torch.cuda.is_available() else "cpu")
-        self.repo_id = repo_id or os.getenv("MATANY_REPO_ID", "PeiqingYang/MatAnyone")
-        self.core = None
-        self.loaded = False
-    def load(self) -> bool:
-        t0 = time.time()
         try:
-            # ✅ top-level wrapper (accepts model/repo id string)
-            from matanyone import InferenceCore
-            logger.info("[MatA] init: repo_id=%s device=%s", self.repo_id, self.device)
-            # Force GPU device if CUDA available
-            if torch.cuda.is_available() and self.device != "cpu":
-                self.device = "cuda"
-                logger.info("[MatA] FORCING CUDA device for GPU acceleration")
-            self.core = InferenceCore(self.repo_id)
-            # Verify MatAnyone is using GPU if available
-            if hasattr(self.core, 'device'):
-                actual_device = getattr(self.core, 'device', 'unknown')
-                logger.info(f"[MatA] device verification: expected={self.device}, actual={actual_device}")
-            # Try to move core to device if it has a 'to' method
-            if hasattr(self.core, 'to'):
-                self.core = self.core.to(self.device)
-                logger.info(f"[MatA] moved core to device: {self.device}")
-            self.loaded = True
-            logger.info("[MatA] init OK (%.2fs)", time.time() - t0)
-            return True
-        except TypeError as e:
-            logger.error("MatAnyone constructor mismatch: %s (fork expects network=...)", e)
         except Exception as e:
-            logger.error("MatAnyone init error: %s", e)
-        self.loaded = False
-        return False
-    def step(self, image: np.ndarray | torch.Tensor, seed_mask: np.ndarray | torch.Tensor) -> np.ndarray:
-        if not self.loaded or self.core is None:
-            raise RuntimeError("MatAnyone not loaded")
-        # Force GPU device for tensors
-        if torch.cuda.is_available():
-            self.device = "cuda"
-        img = _to_chw_float01(image).to(self.device, non_blocking=True)
-        msk = _to_1hw_float01(seed_mask).to(self.device, non_blocking=True)
-        # Verify tensors are on GPU
-        logger.info(f"[MatA] step: img device={img.device}, mask device={msk.device}, target device={self.device}")
-        out = self.core.step(img, msk)
-        alpha = out[0] if isinstance(out, (tuple, list)) else out
-        if not isinstance(alpha, torch.Tensor):
-            alpha = torch.as_tensor(alpha)
-        if alpha.ndim == 3 and alpha.shape[0] == 1:
-            alpha = alpha[0]
-        if alpha.ndim != 2:
-            raise ValueError(f"alpha: bad shape {tuple(alpha.shape)}")
-        return alpha.detach().to("cpu", torch.float32).clamp_(0.0, 1.0).contiguous().numpy()
-    def process_video(self, frames: Iterable[np.ndarray | torch.Tensor], seed_mask_hw, every: int = 50):
-        for i, f in enumerate(frames, 1):
-            yield self.step(f, seed_mask_hw)
-            if every and (i % every == 0):
-                logger.info("[MatA] processed %d frames", i)
-    def close(self) -> None:
-        self.core = None
-        self.loaded = False
-        if torch.cuda.is_available():
-            torch.cuda.empty_cache()
-# ---------- factory ----------
-def get_matanyone_session(enable: bool = True) -> Optional[MatAnyoneSession]:
-    if not enable:
-        logger.info("[MatA] disabled.")
-        return None
-    s = MatAnyoneSession()
-    return s if s.load() else None

 #!/usr/bin/env python3
 """
+MatAnyone Adapter (streaming, API-agnostic)
+-------------------------------------------
+- Works with multiple MatAnyone variants:
+  - frame API:  core.step(image[, mask])  or  session.process_frame(image, mask)
+  - video API:  process_video(frames, mask)  (falls back to chunking)
+- Streams frames: no full-video-in-RAM.
+- Emits alpha.mp4 (grayscale) and fg.mp4 (RGB) as it goes.
+- Validates outputs and raises MatAnyError on failure (so pipeline can fallback).
+I/O conventions:
+- video_path: Path to input video (BGR if read via OpenCV)
+- seed_mask_path: HxW PNG/JPG (white=foreground), any mode; converted to float32 [0,1]
+- out_dir: directory to place alpha.mp4 and fg.mp4
+Requires: OpenCV, Torch, NumPy
 """
 from __future__ import annotations
+import os
+import cv2
+import sys
+import json
+import math
+import time
 import torch
+import logging
+import numpy as np
+from pathlib import Path
+from typing import Optional, Callable, Tuple
+log = logging.getLogger(__name__)
+class MatAnyError(RuntimeError):
+    pass
+def _read_mask_hw(mask_path: Path, target_hw: Tuple[int, int]) -> np.ndarray:
+    """Read mask image, convert to float32 [0,1], resize to target (H,W)."""
+    if not Path(mask_path).exists():
+        raise MatAnyError(f"Seed mask not found: {mask_path}")
+    mask = cv2.imread(str(mask_path), cv2.IMREAD_GRAYSCALE)
+    if mask is None:
+        raise MatAnyError(f"Failed to read seed mask: {mask_path}")
+    H, W = target_hw
+    if mask.shape[:2] != (H, W):
+        mask = cv2.resize(mask, (W, H), interpolation=cv2.INTER_LINEAR)
+    maskf = (mask.astype(np.float32) / 255.0).clip(0.0, 1.0)
+    return maskf
+def _to_chw01(img_bgr: np.ndarray) -> np.ndarray:
+    """BGR [H,W,3] uint8 -> CHW float32 [0,1] RGB."""
+    # OpenCV gives BGR; convert to RGB
+    rgb = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2RGB)
+    rgbf = rgb.astype(np.float32) / 255.0
+    chw = np.transpose(rgbf, (2, 0, 1))  # C,H,W
+    return chw
+def _mask_to_1hw(mask_hw01: np.ndarray) -> np.ndarray:
+    """HW float32 [0,1] -> 1HW float32 [0,1]."""
+    return np.expand_dims(mask_hw01, axis=0)
+def _ensure_dir(p: Path) -> None:
+    p.mkdir(parents=True, exist_ok=True)
+def _open_video_writers(out_dir: Path, fps: float, size: Tuple[int, int]) -> Tuple[cv2.VideoWriter, cv2.VideoWriter]:
+    """Return (alpha_writer, fg_writer). size=(W,H)."""
+    fourcc = cv2.VideoWriter_fourcc(*"mp4v")
+    W, H = size
+    alpha_path = str(out_dir / "alpha.mp4")
+    fg_path    = str(out_dir / "fg.mp4")
+    # alpha: single channel => write as 3-channel grayscale for broad compatibility
+    alpha_writer = cv2.VideoWriter(alpha_path, fourcc, fps, (W, H), True)
+    fg_writer    = cv2.VideoWriter(fg_path, fourcc, fps, (W, H), True)
+    if not alpha_writer.isOpened() or not fg_writer.isOpened():
+        raise MatAnyError("Failed to open VideoWriter for alpha/fg outputs.")
+    return alpha_writer, fg_writer
+def _validate_nonempty(file_path: Path) -> None:
+    if not file_path.exists() or file_path.stat().st_size == 0:
+        raise MatAnyError(f"Output file missing/empty: {file_path}")
 class MatAnyoneSession:
+    """
+    Unified, streaming wrapper over MatAnyone variants.
+    Public:
+      - process_stream(video_path, seed_mask_path, out_dir, progress_cb)
+    Detects API once at init:
+      - prefers frame-wise:  core.step(img[, mask]) OR session.process_frame(img, mask)
+      - else uses video-wise: process_video(frames, mask) with chunk fallback
+    """
+    def __init__(self, device: Optional[str] = None, precision: str = "auto"):
+        self.device = torch.device(device) if device else (torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu"))
+        self.precision = precision
+        self._core = None
+        self._api_mode = None  # "step", "process_frame", or "process_video"
+        self._lazy_init()
+    def _lazy_init(self) -> None:
         try:
+            from matanyone import InferenceCore  # type: ignore
         except Exception as e:
+            raise MatAnyError(f"MatAnyone import failed: {e}")
+        # Some builds want a model repo string; others need checkpoints path. Keep flexible.
+        try:
+            self._core = InferenceCore()
+        except TypeError:
+            # Fallback: try default constructor with known repo id
+            try:
+                self._core = InferenceCore("PeiqingYang/MatAnyone")
+            except Exception as e:
+                raise MatAnyError(f"MatAnyone InferenceCore init failed: {e}")
+        core = self._core
+        # Detect callable API
+        if hasattr(core, "step") and callable(getattr(core, "step")):
+            self._api_mode = "step"
+        elif hasattr(core, "process_frame") and callable(getattr(core, "process_frame")):
+            self._api_mode = "process_frame"
+        elif hasattr(core, "process_video") and callable(getattr(core, "process_video")):
+            self._api_mode = "process_video"
+        else:
+            raise MatAnyError("No supported MatAnyone API found (step/process_frame/process_video).")
+        log.info(f"[MATANY] Initialized on {self.device} | API mode = {self._api_mode}")
+    def _maybe_amp(self):
+        if self.precision == "fp32":
+            return torch.cuda.amp.autocast(enabled=False)
+        if self.precision == "fp16":
+            return torch.cuda.amp.autocast(enabled=True, dtype=torch.float16)  # if supported
+        # auto
+        return torch.cuda.amp.autocast(enabled=torch.cuda.is_available())
+    def _run_frame(self, frame_bgr: np.ndarray, seed_1hw: Optional[np.ndarray]) -> np.ndarray:
+        """
+        Returns alpha HW float32 [0,1].
+        """
+        img_chw = _to_chw01(frame_bgr)                   # CHW float32 [0,1]
+        if seed_1hw is not None and seed_1hw.ndim != 3:
+            raise MatAnyError(f"seed mask must be 1HW; got shape {seed_1hw.shape}")
+        # Convert to torch
+        img_t  = torch.from_numpy(img_chw).to(self.device)          # C,H,W
+        mask_t = torch.from_numpy(seed_1hw).to(self.device) if seed_1hw is not None else None  # 1,H,W
+        with torch.no_grad(), self._maybe_amp():
+            if self._api_mode == "step":
+                alpha = self._core.step(img_t, mask_t) if mask_t is not None else self._core.step(img_t)
+            elif self._api_mode == "process_frame":
+                alpha = self._core.process_frame(img_t, mask_t)
+            else:
+                # shouldn't happen here
+                raise MatAnyError("Internal: frame path called in process_video mode.")
+        # Accept torch/numpy; normalize to numpy HW float32 [0,1]
+        if isinstance(alpha, torch.Tensor):
+            alpha_np = alpha.detach().float().clamp(0, 1).squeeze().cpu().numpy()
+        else:
+            alpha_np = np.asarray(alpha).astype(np.float32)
+            if alpha_np.max() > 1.0:  # in case 0..255
+                alpha_np = (alpha_np / 255.0).clip(0, 1)
+        if alpha_np.ndim == 3:
+            # reduce (C/H/W); prefer (H,W)
+            alpha_np = np.squeeze(alpha_np)
+            if alpha_np.ndim == 3 and alpha_np.shape[0] == 1:
+                alpha_np = alpha_np[0]
+        if alpha_np.ndim != 2:
+            raise MatAnyError(f"MatAnyone alpha must be HW; got {alpha_np.shape}")
+        return alpha_np
+    def process_stream(
+        self,
+        video_path: Path,
+        seed_mask_path: Optional[Path],
+        out_dir: Path,
+        progress_cb: Optional[Callable[[float, str], None]] = None,
+    ) -> Tuple[Path, Path]:
+        """
+        Stream the video, write alpha.mp4 and fg.mp4, return their paths.
+        """
+        video_path = Path(video_path)
+        out_dir = Path(out_dir)
+        _ensure_dir(out_dir)
+        cap = cv2.VideoCapture(str(video_path))
+        if not cap.isOpened():
+            raise MatAnyError(f"Failed to open video: {video_path}")
+        fps = cap.get(cv2.CAP_PROP_FPS) or 25.0
+        W   = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
+        H   = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
+        N   = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
+        alpha_writer, fg_writer = _open_video_writers(out_dir, fps, (W, H))
+        seed_1hw = None
+        if seed_mask_path is not None:
+            seed_hw = _read_mask_hw(seed_mask_path, (H, W))
+            seed_1hw = _mask_to_1hw(seed_hw)
+        # If only process_video is available, we'll chunk to avoid RAM blow-ups.
+        if self._api_mode == "process_video":
+            frames_buf = []
+            idx = 0
+            chunk = max(1, min(64, int(2048 * 1024 * 1024 / (H * W * 3 * 4))))  # ~2GB budget heuristic
+            # SAFETY: never 0
+            if chunk <= 0:
+                chunk = 32
+            while True:
+                ret, frame = cap.read()
+                if not ret:  # flush tail
+                    if frames_buf:
+                        self._flush_chunk(frames_buf, seed_1hw, alpha_writer, fg_writer)
+                    break
+                frames_buf.append(frame.copy())
+                if len(frames_buf) >= chunk:
+                    self._flush_chunk(frames_buf, seed_1hw, alpha_writer, fg_writer)
+                    frames_buf.clear()
+                idx += 1
+                if progress_cb and N > 0:
+                    progress_cb(min(0.999, idx / N), f"MatAnyone chunking… ({idx}/{N})")
+        else:
+            # Frame-by-frame (preferred)
+            idx = 0
+            while True:
+                ret, frame = cap.read()
+                if not ret:
+                    break
+                alpha_hw = self._run_frame(frame, seed_1hw)
+                # compose fg for immediate write
+                # alpha 0..1 -> 0..255 3-channel grayscale
+                alpha_u8 = (alpha_hw * 255.0 + 0.5).astype(np.uint8)
+                alpha_rgb = cv2.cvtColor(alpha_u8, cv2.COLOR_GRAY2BGR)
+                # Blend: fg = alpha*frame + (1-alpha)*black == alpha*frame
+                fg_bgr = (frame.astype(np.float32) * (alpha_hw[..., None])).clip(0, 255).astype(np.uint8)
+                alpha_writer.write(alpha_rgb)
+                fg_writer.write(fg_bgr)
+                idx += 1
+                if progress_cb and N > 0 and idx % 10 == 0:
+                    progress_cb(min(0.999, idx / N), f"MatAnyone matting… ({idx}/{N})")
+        cap.release()
+        alpha_writer.release()
+        fg_writer.release()
+        alpha_path = out_dir / "alpha.mp4"
+        fg_path    = out_dir / "fg.mp4"
+        _validate_nonempty(alpha_path)
+        _validate_nonempty(fg_path)
+        return alpha_path, fg_path
+    def _flush_chunk(self, frames_bgr, seed_1hw, alpha_writer, fg_writer):
+        """Call core.process_video(frames, mask) safely, then write results."""
+        # Prepare inputs
+        frames_chw = [_to_chw01(f) for f in frames_bgr]                     # list of CHW
+        frames_t   = torch.from_numpy(np.stack(frames_chw)).to(self.device) # T,C,H,W
+        mask_t     = torch.from_numpy(seed_1hw).to(self.device) if seed_1hw is not None else None
+        with torch.no_grad(), self._maybe_amp():
+            # NOTE: no unsupported kwargs like "every"
+            alphas = self._core.process_video(frames_t, mask_t)  # return: T,H,W (torch) or list/np
+        # Normalize to numpy list of HW float32 [0,1]
+        if isinstance(alphas, torch.Tensor):
+            alphas_np = alphas.detach().float().clamp(0, 1).cpu().numpy()
+        else:
+            alphas_np = np.asarray(alphas)
+            if alphas_np.max() > 1.0:
+                alphas_np = (alphas_np / 255.0).clip(0, 1)
+        if alphas_np.ndim == 3:
+            T, H, W = alphas_np.shape
+            pass
+        elif alphas_np.ndim == 4 and alphas_np.shape[1] in (1, 3):
+            # Possibly T,1,H,W — squeeze channel
+            alphas_np = np.squeeze(alphas_np, axis=1) if alphas_np.shape[1] == 1 else np.mean(alphas_np, axis=1)
+        else:
+            raise MatAnyError(f"Unexpected alphas shape from process_video: {alphas_np.shape}")
+        for f_bgr, a_hw in zip(frames_bgr, alphas_np):
+            a_u8 = (a_hw * 255.0 + 0.5).astype(np.uint8)
+            a_rgb = cv2.cvtColor(a_u8, cv2.COLOR_GRAY2BGR)
+            fg_bgr = (f_bgr.astype(np.float32) * (a_hw[..., None])).clip(0, 255).astype(np.uint8)
+            alpha_writer.write(a_rgb)
+            fg_writer.write(fg_bgr)

pipeline.py CHANGED Viewed

@@ -291,35 +291,41 @@ def _progress(msg: str):
         out_dir = tmp_root / "matany_out"
         _ensure_dir(out_dir)
-        ran = False
-        if mat_ok and matany is not None:
             logger.info("[2] Running MatAnyone processing…")
             _progress("🎥 Running MatAnyone video matting...")
-            fg_path, al_path, mat_ok = run_matany(matany, video_path, mask_png, out_dir, _progress)
-            diagnostics["matany_ok"] = bool(mat_ok)
             _progress("✅ MatAnyone processing complete")
-            logger.info(f"[2] MatAnyone results: fg_path={fg_path}, al_path={al_path}, mat_ok={mat_ok}")
-            # Verify MatAnyone actually produced output files
-            if fg_path and al_path:
-                fg_exists = Path(fg_path).exists()
-                al_exists = Path(al_path).exists()
-                fg_size = Path(fg_path).stat().st_size if fg_exists else 0
-                al_size = Path(al_path).stat().st_size if al_exists else 0
-                logger.info(f"[2] MatAnyone output verification: fg_exists={fg_exists} ({fg_size} bytes), al_exists={al_exists} ({al_size} bytes)")
-                if not fg_exists or not al_exists or fg_size == 0 or al_size == 0:
-                    logger.error("[2] MatAnyone failed to produce valid output files")
-                    diagnostics["matany_ok"] = False
-                    mat_ok = False
-            else:
-                logger.error("[2] MatAnyone returned no file paths")
-                diagnostics["matany_ok"] = False
-                mat_ok = False
-        else:
-            logger.info("[2] MatAnyone unavailable or failed to load.")
-            _progress("⚠️ MatAnyone unavailable, using fallback")
         # Free MatAnyone ASAP
         try:

         out_dir = tmp_root / "matany_out"
         _ensure_dir(out_dir)
+        from models import run_matany
+        from models.matanyone_loader import MatAnyError
+        try:
+            if _progress:
+                _progress(0.01, "MatAnyone: starting…")
             logger.info("[2] Running MatAnyone processing…")
             _progress("🎥 Running MatAnyone video matting...")
+            al_path, fg_path = run_matany(
+                video_path=video_path,
+                mask_path=mask_png,
+                out_dir=out_dir,
+                device="cuda" if _cuda_available() else "cpu",
+                progress_callback=lambda frac, msg: _progress(msg) if _progress else None,
+            )
+            logger.info("Stage 2 success: MatAnyone produced outputs.")
+            diagnostics["matany_ok"] = True
+            mat_ok = True
             _progress("✅ MatAnyone processing complete")
+            logger.info(f"[2] MatAnyone results: fg_path={fg_path}, al_path={al_path}")
+        except MatAnyError as e:
+            logger.error(f"Stage 2 failed: {e}")
+            diagnostics["matany_ok"] = False
+            mat_ok = False
+            fg_path, al_path = None, None
+        if not mat_ok:
+            # Trigger fallback - DO NOT log "Stage 2 complete" here
+            if _progress:
+                _progress("MatAnyone failed → using fallback…")
+            logger.info("[2] MatAnyone unavailable or failed, using fallback.")
         # Free MatAnyone ASAP
         try:

test_matanyone.py ADDED Viewed

	@@ -0,0 +1,137 @@

+#!/usr/bin/env python3
+"""
+Test script to verify MatAnyone API integration without UI upload
+Creates a synthetic test video and mask to test the processing pipeline
+"""
+import cv2
+import numpy as np
+import tempfile
+import os
+from pathlib import Path
+import logging
+# Set up logging
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+def create_test_video(output_path, width=320, height=240, fps=10, duration_sec=2):
+    """Create a simple test video with moving colored rectangle"""
+    fourcc = cv2.VideoWriter_fourcc(*'mp4v')
+    writer = cv2.VideoWriter(str(output_path), fourcc, fps, (width, height))
+    total_frames = int(fps * duration_sec)
+    for frame_idx in range(total_frames):
+        # Create a frame with moving rectangle
+        frame = np.zeros((height, width, 3), dtype=np.uint8)
+        frame[:] = (50, 50, 50)  # Dark gray background
+        # Moving rectangle
+        rect_size = 60
+        x = int((frame_idx / total_frames) * (width - rect_size))
+        y = height // 2 - rect_size // 2
+        cv2.rectangle(frame, (x, y), (x + rect_size, y + rect_size), (0, 255, 0), -1)
+        writer.write(frame)
+    writer.release()
+    logger.info(f"Created test video: {output_path} ({total_frames} frames)")
+def create_test_mask(output_path, width=320, height=240):
+    """Create a simple test mask (white rectangle on black background)"""
+    mask = np.zeros((height, width), dtype=np.uint8)
+    # Create a rectangular mask in the center
+    rect_size = 60
+    x = width // 2 - rect_size // 2
+    y = height // 2 - rect_size // 2
+    cv2.rectangle(mask, (x, y), (x + rect_size, y + rect_size), 255, -1)
+    cv2.imwrite(str(output_path), mask)
+    logger.info(f"Created test mask: {output_path}")
+def test_matanyone_processing():
+    """Test the MatAnyone processing with synthetic data"""
+    with tempfile.TemporaryDirectory() as temp_dir:
+        temp_path = Path(temp_dir)
+        # Create test files
+        video_path = temp_path / "test_video.mp4"
+        mask_path = temp_path / "test_mask.png"
+        logger.info("Creating test video and mask...")
+        create_test_video(video_path)
+        create_test_mask(mask_path)
+        # Test MatAnyone loading
+        logger.info("Testing MatAnyone model loading...")
+        try:
+            from models import get_matanyone_session, run_matany
+            # Load MatAnyone
+            matany_session = get_matanyone_session(enable=True)
+            if matany_session is None:
+                logger.error("❌ MatAnyone session could not be created")
+                return False
+            logger.info("✅ MatAnyone session created successfully")
+            # Test processing
+            logger.info("Testing MatAnyone processing...")
+            def progress_callback(msg):
+                logger.info(f"Progress: {msg}")
+            fg_path, alpha_path, success = run_matany(
+                matany_session,
+                video_path,
+                mask_path,
+                temp_path,
+                progress_callback
+            )
+            if success and fg_path and alpha_path:
+                fg_exists = Path(fg_path).exists()
+                alpha_exists = Path(alpha_path).exists()
+                logger.info(f"✅ MatAnyone processing completed")
+                logger.info(f"  Foreground video: {fg_path} (exists: {fg_exists})")
+                logger.info(f"  Alpha video: {alpha_path} (exists: {alpha_exists})")
+                if fg_exists and alpha_exists:
+                    fg_size = Path(fg_path).stat().st_size
+                    alpha_size = Path(alpha_path).stat().st_size
+                    logger.info(f"  File sizes - FG: {fg_size} bytes, Alpha: {alpha_size} bytes")
+                    if fg_size > 0 and alpha_size > 0:
+                        logger.info("🎉 SUCCESS: MatAnyone API integration working correctly!")
+                        return True
+                    else:
+                        logger.error("❌ Output files are empty")
+                        return False
+                else:
+                    logger.error("❌ Output files not created")
+                    return False
+            else:
+                logger.error("❌ MatAnyone processing failed")
+                return False
+        except Exception as e:
+            logger.error(f"❌ Test failed with exception: {e}")
+            import traceback
+            logger.error(traceback.format_exc())
+            return False
+if __name__ == "__main__":
+    logger.info("🧪 Starting MatAnyone API integration test...")
+    success = test_matanyone_processing()
+    if success:
+        logger.info("✅ All tests passed! MatAnyone integration is working.")
+    else:
+        logger.error("❌ Tests failed. Check the logs above for details.")