Spaces:

nlml
/

sheap

Sleeping

App Files Files Community

liamsch commited on Dec 1, 2025

Commit

887af40

1 Parent(s): 46a927f

Initial commit: SHeaP Gradio demo

Browse files

Files changed (47) hide show

.gitattributes +3 -0
FLAME2020/.gitattributes +3 -0
FLAME2020/eyelids.pt +3 -0
FLAME2020/flame_landmark_idxs_barys.pt +3 -0
FLAME2020/generic_model.pkl +3 -0
FLAME2020/generic_model.pt +3 -0
LICENSE.txt +1 -0
README.md +124 -14
app.py +7 -6
convert_flame.py +68 -0
demo.py +99 -0
example_images/00000200.jpg +3 -0
example_images/00000201.jpg +3 -0
example_images/00000202.jpg +3 -0
example_images/00000203.jpg +3 -0
example_images/00000204.jpg +3 -0
example_images/00000205.jpg +3 -0
example_images/00000206.jpg +3 -0
example_images/00000207.jpg +3 -0
example_images/00000208.jpg +3 -0
example_images/00000209.jpg +3 -0
example_videos/dafoe.mp4 +3 -0
gradio_demo.py +284 -0
models/model_expressive.pt +3 -0
pyproject.toml +52 -0
requirements.txt +14 -0
requirements_hf.txt +15 -0
sheap/__init__.py +21 -0
sheap/__pycache__/__init__.cpython-311.pyc +0 -0
sheap/__pycache__/eval_utils.cpython-311.pyc +0 -0
sheap/__pycache__/fa_landmark_utils.cpython-311.pyc +0 -0
sheap/__pycache__/landmark_utils.cpython-311.pyc +0 -0
sheap/__pycache__/load_flame.cpython-311.pyc +0 -0
sheap/__pycache__/load_flame_pkl.cpython-311.pyc +0 -0
sheap/__pycache__/load_model.cpython-311.pyc +0 -0
sheap/__pycache__/render.cpython-311.pyc +0 -0
sheap/__pycache__/tiny_flame.cpython-311.pyc +0 -0
sheap/eval_utils.py +270 -0
sheap/fa_landmark_utils.py +96 -0
sheap/landmark_utils.py +143 -0
sheap/load_flame_pkl.py +35 -0
sheap/load_model.py +85 -0
sheap/py.typed +0 -0
sheap/render.py +83 -0
sheap/tiny_flame.py +168 -0
teaser.jpg +3 -0
video_demo.py +460 -0

.gitattributes CHANGED Viewed

@@ -33,6 +33,9 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
 *.mp4 filter=lfs diff=lfs merge=lfs -text
 *.jpg filter=lfs diff=lfs merge=lfs -text
 *.jpeg filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+**/*.pt filter=lfs diff=lfs merge=lfs -text
+**/*.pth filter=lfs diff=lfs merge=lfs -text
+**/*.pkl filter=lfs diff=lfs merge=lfs -text
 *.mp4 filter=lfs diff=lfs merge=lfs -text
 *.jpg filter=lfs diff=lfs merge=lfs -text
 *.jpeg filter=lfs diff=lfs merge=lfs -text

FLAME2020/.gitattributes ADDED Viewed

	@@ -0,0 +1,3 @@

+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*.pkl filter=lfs diff=lfs merge=lfs -text

FLAME2020/eyelids.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d5d5a2abbc71384b203451085337b0f9a581619bc839838a92b32a80d76ad9fa
+size 121692

FLAME2020/flame_landmark_idxs_barys.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:eeda0176e330a0d69e8f2be29baf4bed62ecbed1ea04a0f5eb1ca0460023e398
+size 2948

FLAME2020/generic_model.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:efcd14cc4a69f3a3d9af8ded80146b5b6b50df3bd74cf69108213b144eba725b
+size 53023716

FLAME2020/generic_model.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:af8c7483c6135d26ccbee9a8a0ac64b39575d6afdf121f79b866ff4c7fbdcf19
+size 26784481

LICENSE.txt ADDED Viewed

	@@ -0,0 +1 @@


1	+ SHeaP: Self-Supervised Head Geometry Predictor Learned via 2D Gaussians © 2025 by Liam Schoneveld is licensed under Creative Commons Attribution-NonCommercial 4.0 International. To view a copy of this license, visit https://creativecommons.org/licenses/by-nc/4.0/

README.md CHANGED Viewed

@@ -1,14 +1,124 @@
----
-title: Sheap
-emoji: 📈
-colorFrom: blue
-colorTo: pink
-sdk: gradio
-sdk_version: 6.0.1
-app_file: app.py
-pinned: false
-license: cc-by-nc-4.0
-short_description: 'SHeaP: Self-Supervised Head Geometry Predictor'
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+<div align="center">
+<h1>🐑 SHeaP 🐑</h1>
+<h2>Self-Supervised Head Geometry Predictor Learned via 2D Gaussians</h2>
+<a href="https://nlml.github.io/sheap" target="_blank" rel="noopener noreferrer">
+  <img src="https://img.shields.io/badge/Project_Page-green" alt="Project Page">
+</a>
+<a href="https://arxiv.org/abs/2504.12292"><img src="https://img.shields.io/badge/arXiv-2504.12292-b31b1b" alt="arXiv"></a>
+<a href="https://www.youtube.com/watch?v=vhXsZJWCBMA"><img src="https://img.shields.io/badge/YouTube-Video-red" alt="YouTube"></a>
+**Liam Schoneveld, Zhe Chen, Davide Davoli, Jiapeng Tang, Saimon Terazawa, Ko Nishino, Matthias Nießner**
+<img src="teaser.jpg" alt="SHeaP Teaser" width="100%">
+</div>
+## Overview
+SHeaP learns to predict head geometry (FLAME parameters) from a single image, by predicting and rendering 2D Gaussians.
+This repository contains code and models for the **FLAME parameter inference only**.
+## Example usage
+**After setting up**, for a simple example, run `python demo.py`.
+To run on a video you can use:
+```bash
+python video_demo.py example_videos/dafoe.mp4
+```
+The above command will produce the result in [example_videos/dafoe_rendered.mp4](https://github.com/nlml/SHeaP/blob/main/example_videos/dafoe_rendered.mp4).
+Or, here is a minimal example script:
+```python
+import torch, torchvision.io as io
+from sheap import load_sheap_model
+# Available model variants:
+# sheap_model = load_sheap_model(model_type="paper")
+sheap_model = load_sheap_model(model_type="expressive")
+impath = "example_images/00000200.jpg"
+# Input should be a head crop similar to those in example_images/
+# shape (N,3,224,224) / pixel values from 0 to 1.
+image_tensor = io.decode_image(impath).float() / 255
+# flame_params_dict contains predicted FLAME parameters
+flame_params_dict = sheap_model(image_tensor[None])
+```
+**Note: `model_type`** can be one of 2 values:
+- **`"paper"`**: used for paper results; gets best performance on NoW.
+- **`"expressive"`**: perhaps better for real-world use; it was trained for longer with less regularisation and tends to be more expressive.
+## Setup
+### Step 1: Install dependencies
+We just require `torch>=2.0.0` and a few other dependencies.
+Just install the latest `torch` in a new venv, then `pip install .`
+Or, if you use [`uv`](https://docs.astral.sh/uv/), you can just run `uv sync`.
+### Step 2: Download and convert FLAME
+Only needed if you want to predict FLAME vertices or render a mesh.
+Download [FLAME2020](https://flame.is.tue.mpg.de/).
+Put it in the `FLAME2020/` dir. We only need generic_model.pkl. Your `FLAME2020/` directory should look like this:
+```bash
+FLAME2020/
+├── eyelids.pt
+├── flame_landmark_idxs_barys.pt
+└── generic_model.pkl
+```
+Now convert FLAME to our format:
+```bash
+python convert_flame.py
+```
+## Reproduce paper results on NoW dataset
+To reproduce the validation results from the paper (median=0.93mm):
+First, update submodules:
+```bash
+git submodule update --init --recursive
+```
+Then build the NoW Evaluation docker image:
+```bash
+docker build -t noweval now/now_evaluation
+```
+Then predict FLAME meshes for all images in NoW using SHeaP:
+```
+cd now/
+python now.py --now-dataset-root /path/to/NoW_Evaluation/dataset
+```
+Upon finishing, the above command will print a command like the following:
+```
+chmod 777 -R /home/user/sheap/now/now_eval_outputs/now_preds && docker run --ipc host --gpus all -it --rm -v /data/NoW_Evaluation/dataset:/dataset -v /home/user/sheap/now/now_eval_outputs/now_preds:/preds noweval
+```
+Run that command. This will run NoW evaluation on the FLAME meshes we just predicted.
+Finally, the results will be placed in `/home/user/sheap/now/now_eval_outputs/now_preds` (or equivalent). The mean and median are already calculated:
+```bash
+➜ cat /home/user/sheap/now/now_eval_outputs/now_preds/results/RECON_computed_distances.npy.meanmedian
+0.9327719333872148  # result in the paper
+1.1568168246248534
+```

app.py CHANGED Viewed

@@ -1,8 +1,9 @@
-import gradio as gr
-def greet(name):
-    return "Hello " + name + "!!"
-demo = gr.Interface(fn=greet, inputs="text", outputs="text")
-demo.launch()

+"""
+Hugging Face Space entry point for SHeaP demo.
+This file imports and runs the gradio demo.
+"""
+from gradio_demo import demo
+if __name__ == "__main__":
+    demo.launch()

convert_flame.py ADDED Viewed

	@@ -0,0 +1,68 @@

+"""
+Converts FLAME pickle files to PyTorch .pt files.
+"""
+import argparse
+from pathlib import Path
+from typing import Union
+import torch
+from sheap.load_flame_pkl import load_pkl_format_flame_model
+def convert_flame(flame_base_dir: Union[str, Path], overwrite: bool) -> None:
+    """Convert FLAME pickle files to PyTorch .pt files.
+    Searches for all .pkl files in the FLAME base directory and converts them to
+    PyTorch .pt format, skipping certain mask files.
+    Args:
+        flame_base_dir: Path to the FLAME model directory containing pickle files.
+        overwrite: Whether to overwrite existing .pt files if they already exist.
+    Raises:
+        AssertionError: If flame_base_dir does not exist.
+    """
+    flame_base_dir = Path(flame_base_dir)
+    assert flame_base_dir is not None  # for mypy
+    assert flame_base_dir.exists(), (
+        f"FLAME_BASE_DIR not found at {flame_base_dir}. "
+        "Please set arg flame_base_dir to the FLAME model directory, "
+        " or set the FLAME_BASE_DIR environment variable."
+    )
+    pickle_files = list(flame_base_dir.glob("**/**/*.pkl"))
+    skip_files = ["FLAME_masks.pkl"]
+    for model_path in pickle_files:
+        if model_path.name in skip_files:
+            continue
+        print(f"Converting {model_path}...")
+        data = load_pkl_format_flame_model(model_path)
+        new_path = model_path.with_suffix(".pt")
+        if new_path.exists() and not overwrite:
+            print(f"Skipping {new_path} because it already exists.")
+            continue
+        torch.save(data, new_path)
+        print(f"Saved {new_path}")
+def main() -> None:
+    """Parse command-line arguments and convert FLAME pickle files to PyTorch format."""
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "--flame_base_dir",
+        type=str,
+        help="Path to the FLAME model directory. "
+        "Defaults to the FLAME_BASE_DIR environment variable.",
+        default="FLAME2020/",
+    )
+    parser.add_argument(
+        "--overwrite",
+        action="store_true",
+        help="Overwrite existing files if they already exist.",
+    )
+    convert_flame(**vars(parser.parse_args()))
+if __name__ == "__main__":
+    main()

demo.py ADDED Viewed

	@@ -0,0 +1,99 @@

+import os
+from pathlib import Path
+import numpy as np
+import torch
+from PIL import Image
+from sheap import inference_images_list, load_sheap_model, render_mesh
+from sheap.tiny_flame import TinyFlame, pose_components_to_rotmats
+os.environ["PYOPENGL_PLATFORM"] = "egl"
+def create_rendering_image(
+    original_image: Image.Image,
+    verts: torch.Tensor,
+    faces: torch.Tensor,
+    c2w: torch.Tensor,
+    output_size: int = 512,
+) -> Image.Image:
+    """
+    Create a combined image with original, mesh, and blended views.
+    Args:
+        original_image: PIL Image of the original frame
+        verts: Vertices tensor for a single frame, shape (num_verts, 3)
+        faces: Faces tensor, shape (num_faces, 3)
+        c2w: Camera-to-world transformation matrix, shape (4, 4)
+        output_size: Size of each sub-image in the combined output
+    Returns:
+        PIL Image with three views side-by-side (original, mesh, blended)
+    """
+    # Render the mesh
+    color, depth = render_mesh(verts=verts, faces=faces, c2w=c2w)
+    # Resize original to match output size
+    original_resized = original_image.convert("RGB").resize((output_size, output_size))
+    # Create blended image (mesh overlaid on original)
+    mask = (depth > 0).astype(np.float32)[..., None]
+    blended = (np.array(color) * mask + np.array(original_resized) * (1 - mask)).astype(np.uint8)
+    # Combine all three images horizontally
+    combined = Image.new("RGB", (output_size * 3, output_size))
+    combined.paste(original_resized, (0, 0))
+    combined.paste(Image.fromarray(color), (output_size, 0))
+    combined.paste(Image.fromarray(blended), (output_size * 2, 0))
+    return combined
+if __name__ == "__main__":
+    # Load SHeaP model
+    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+    sheap_model = load_sheap_model(model_type="expressive").to(device)
+    # Inference on example images
+    folder_containing_images = Path("example_images/")
+    image_paths = list(sorted(folder_containing_images.glob("*.jpg")))
+    with torch.no_grad():
+        predictions = inference_images_list(
+            model=sheap_model,
+            device=device,
+            image_paths=image_paths,
+        )
+    # Load and infer FLAME with our predicted parameters
+    flame_dir = Path("FLAME2020/")
+    flame = TinyFlame(flame_dir / "generic_model.pt", eyelids_ckpt=flame_dir / "eyelids.pt")
+    verts = flame(
+        shape=predictions["shape_from_facenet"],
+        expression=predictions["expr"],
+        pose=pose_components_to_rotmats(predictions),
+        eyelids=predictions["eyelids"],
+        translation=predictions["cam_trans"],
+    )
+    # Render the FLAME mesh for each input image
+    c2w = torch.tensor(
+        [[1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 1], [0, 0, 0, 1]], dtype=torch.float32
+    )
+    for i_frame in range(verts.shape[0]):
+        outpath = image_paths[i_frame].with_name(f"{image_paths[i_frame].name}_rendered.png")
+        if outpath.exists():
+            outpath.unlink()
+        # Load original image
+        original = Image.open(image_paths[i_frame])
+        # Create combined rendering
+        combined = create_rendering_image(
+            original_image=original,
+            verts=verts[i_frame],
+            faces=flame.faces,
+            c2w=c2w,
+            output_size=512,
+        )
+        combined.save(outpath)

example_images/00000200.jpg ADDED Viewed

Git LFS Details

SHA256: 916d2843bb24bb71f7cfc586eda9b5834021d0e44bc6e4c4bc01f8ff91d8ac55
Pointer size: 130 Bytes
Size of remote file: 12.6 kB

example_images/00000201.jpg ADDED Viewed

Git LFS Details

SHA256: dde19f2c095bff0a262c35f8d807b840b1d2b11cd91ea15db84ad9f94b45eed1
Pointer size: 130 Bytes
Size of remote file: 12.5 kB

example_images/00000202.jpg ADDED Viewed

Git LFS Details

SHA256: 90bc8d8464675f0d83eeb67677297f86fec72590a7d6dffbda279e245ec6bbbb
Pointer size: 130 Bytes
Size of remote file: 12.6 kB

example_images/00000203.jpg ADDED Viewed

Git LFS Details

SHA256: a8880aad3eb9d541c8939b41812e1a680aa60b090030dbae92dec9d4b9046701
Pointer size: 130 Bytes
Size of remote file: 12.6 kB

example_images/00000204.jpg ADDED Viewed

Git LFS Details

SHA256: b16b78ff62da579f4216eb354f37957a6b86290f1c13e94b48100c45cd9e8747
Pointer size: 130 Bytes
Size of remote file: 12.5 kB

example_images/00000205.jpg ADDED Viewed

Git LFS Details

SHA256: b67f32bada48399a45769f80a677706d40f650af2e8de1f7faa0c504b58798e6
Pointer size: 130 Bytes
Size of remote file: 12.6 kB

example_images/00000206.jpg ADDED Viewed

Git LFS Details

SHA256: 317a2eb7578bcb77b703dcca5f650bcc0f1d7236b83000f92f8c7f9644e99290
Pointer size: 130 Bytes
Size of remote file: 12.6 kB

example_images/00000207.jpg ADDED Viewed

Git LFS Details

SHA256: 7abc69cb48790a643d5e1ff629bd0afc93be9b5b83217e4395dd42a0feeeb214
Pointer size: 130 Bytes
Size of remote file: 12.5 kB

example_images/00000208.jpg ADDED Viewed

Git LFS Details

SHA256: 332654d0d26fc6e3551bc5c4dc3b919e8673cee04a09d0d73de9d59a61a098f7
Pointer size: 130 Bytes
Size of remote file: 12.7 kB

example_images/00000209.jpg ADDED Viewed

Git LFS Details

SHA256: b70bdbbeeec905bef8b1a6053b2433aad8eb8a3ff203a0b62a2e6e72de097392
Pointer size: 130 Bytes
Size of remote file: 12.6 kB

example_videos/dafoe.mp4 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:faab39e11cde3a3607039dc27b202472f31b39a757cb91a2fceedf67679b9e24
+size 441906

gradio_demo.py ADDED Viewed

	@@ -0,0 +1,284 @@

+"""
+Gradio demo for SHeaP (Self-Supervised Head Geometry Predictor).
+Accepts video or image input and renders the SHEAP output overlayed.
+"""
+import os
+import shutil
+import subprocess
+import tempfile
+from pathlib import Path
+from queue import Queue
+from typing import Optional
+import gradio as gr
+import numpy as np
+import torch
+import torchvision.transforms.functional as TF
+from PIL import Image
+from torch.utils.data import DataLoader
+from demo import create_rendering_image
+from sheap import load_sheap_model
+from sheap.tiny_flame import TinyFlame, pose_components_to_rotmats
+try:
+    import face_alignment
+except ImportError:
+    raise ImportError(
+        "The 'face_alignment' package is required. Please install it via 'pip install face-alignment'."
+    )
+from sheap.fa_landmark_utils import detect_face_and_crop
+os.environ["PYOPENGL_PLATFORM"] = "egl"
+# Global variables for models (load once)
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+sheap_model = None
+flame = None
+fa_model = None
+c2w = torch.tensor([[1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 1], [0, 0, 0, 1]], dtype=torch.float32)
+def initialize_models():
+    """Initialize all models (called once at startup)."""
+    global sheap_model, flame, fa_model
+    print("Loading SHeaP model...")
+    sheap_model = load_sheap_model(model_type="expressive").to(device)
+    sheap_model.eval()
+    print("Loading FLAME model...")
+    flame_dir = Path("FLAME2020/")
+    flame = TinyFlame(flame_dir / "generic_model.pt", eyelids_ckpt=flame_dir / "eyelids.pt").to(
+        device
+    )
+    print("Loading face alignment model...")
+    fa_model = face_alignment.FaceAlignment(
+        face_alignment.LandmarksType.TWO_D, device=str(device), flip_input=False
+    )
+    print("Models loaded successfully!")
+def process_image(image: np.ndarray) -> Image.Image:
+    """
+    Process a single image and return the rendered output.
+    Args:
+        image: Input image as numpy array (RGB)
+    Returns:
+        PIL Image with three views side-by-side (original, mesh, blended)
+    """
+    # Convert to torch tensor for face detection (C, H, W) format with values in [0, 1]
+    image_tensor = torch.from_numpy(image).permute(2, 0, 1).float() / 255.0
+    # Detect face and get crop coordinates
+    x0, y0, x1, y1 = detect_face_and_crop(image_tensor, fa_model, margin=0.9, shift_up=0.5)
+    # Crop the image
+    cropped_tensor = image_tensor[:, y0:y1, x0:x1]
+    # Resize to 224x224 for SHEAP model
+    cropped_resized = TF.resize(cropped_tensor, [224, 224], antialias=True)
+    # Prepare input tensor for model
+    img_tensor = cropped_resized.unsqueeze(0).to(device)
+    # Also create a 512x512 version for rendering
+    cropped_for_render = TF.resize(cropped_tensor, [512, 512], antialias=True)
+    # Run inference
+    with torch.no_grad():
+        predictions = sheap_model(img_tensor)
+        # Get FLAME vertices (predictions are already on device from model)
+        verts = flame(
+            shape=predictions["shape_from_facenet"],
+            expression=predictions["expr"],
+            pose=pose_components_to_rotmats(predictions),
+            eyelids=predictions["eyelids"],
+            translation=predictions["cam_trans"],
+        )
+        # Move vertices to CPU for rendering
+        verts = verts.cpu()
+    # Convert cropped_for_render back to PIL Image for rendering
+    cropped_pil = TF.to_pil_image(cropped_for_render)
+    # Create rendering
+    combined = create_rendering_image(
+        original_image=cropped_pil,
+        verts=verts[0],
+        faces=flame.faces,
+        c2w=c2w,
+        output_size=512,
+    )
+    return combined
+# --- Import video utilities from video_demo.py ---
+from video_demo import RenderingThread, VideoFrameDataset, _tensor_to_numpy_image
+def process_video(video_path: str, progress=gr.Progress()) -> str:
+    """
+    Process a video and return path to the rendered output video using background threads.
+    """
+    temp_dir = Path(tempfile.mkdtemp())
+    render_size = 512
+    try:
+        # Prepare dataset and dataloader
+        dataset = VideoFrameDataset(video_path, fa_model)
+        dataloader = DataLoader(dataset, batch_size=1, num_workers=0)
+        fps = dataset.fps
+        num_frames = len(dataset)
+        # Prepare rendering thread and queue
+        render_queue = Queue(maxsize=32)
+        num_render_workers = 1
+        rendering_threads = []
+        for _ in range(num_render_workers):
+            thread = RenderingThread(render_queue, temp_dir, flame.faces, c2w, render_size)
+            thread.start()
+            rendering_threads.append(thread)
+        progress(0, desc="Processing video frames...")
+        frame_idx = 0
+        with torch.no_grad():
+            for batch in dataloader:
+                images = batch["image"].to(device)
+                cropped_frames = batch["cropped_frame"]
+                # Run inference
+                predictions = sheap_model(images)
+                verts = flame(
+                    shape=predictions["shape_from_facenet"],
+                    expression=predictions["expr"],
+                    pose=pose_components_to_rotmats(predictions),
+                    eyelids=predictions["eyelids"],
+                    translation=predictions["cam_trans"],
+                )
+                verts = verts.cpu()
+                for i in range(images.shape[0]):
+                    cropped_frame = _tensor_to_numpy_image(cropped_frames[i])
+                    render_queue.put((frame_idx, cropped_frame, verts[i]))
+                    frame_idx += 1
+                    progress(
+                        frame_idx / num_frames, desc=f"Processing frame {frame_idx}/{num_frames}"
+                    )
+        # Stop rendering threads
+        for _ in range(num_render_workers):
+            render_queue.put(None)
+        for thread in rendering_threads:
+            thread.join()
+        if frame_idx == 0:
+            raise ValueError("No frames were successfully processed!")
+        # Create output video using ffmpeg
+        progress(0.95, desc="Encoding video...")
+        output_path = temp_dir / "output.mp4"
+        ffmpeg_cmd = [
+            "ffmpeg",
+            "-y",
+            "-framerate",
+            str(fps),
+            "-i",
+            str(temp_dir / "frame_%06d.png"),
+            "-c:v",
+            "libx264",
+            "-pix_fmt",
+            "yuv420p",
+            "-crf",
+            "18",
+            str(output_path),
+        ]
+        subprocess.run(ffmpeg_cmd, check=True, capture_output=True)
+        progress(1.0, desc="Done!")
+        return str(output_path)
+    except Exception as e:
+        shutil.rmtree(temp_dir, ignore_errors=True)
+        raise e
+def process_input(image: Optional[np.ndarray], video: Optional[str]):
+    """
+    Process either image or video input.
+    Args:
+        image: Input image (if provided)
+        video: Input video path (if provided)
+    Returns:
+        Either an image or video path depending on input
+    """
+    if image is not None:
+        return process_image(image), None
+    elif video is not None:
+        return None, process_video(video)
+    else:
+        raise ValueError("Please provide either an image or video!")
+# Initialize models on startup
+initialize_models()
+# Create Gradio interface
+with gr.Blocks(title="SHeaP Demo") as demo:
+    gr.Markdown(
+        """
+    # 🐑 SHeaP: Self-Supervised Head Geometry Predictor 🐑
+    Upload an image or video to predict head geometry and render a 3D mesh overlay!
+    The output shows three views:
+    - **Left**: Original cropped face
+    - **Center**: Rendered FLAME mesh
+    - **Right**: Mesh overlaid on original
+    [Project Page](https://nlml.github.io/sheap) | [Paper](https://arxiv.org/abs/2504.12292) | [GitHub](https://github.com/nlml/sheap)
+    """
+    )
+    with gr.Row():
+        with gr.Column():
+            gr.Markdown("### Input")
+            image_input = gr.Image(label="Upload Image", type="numpy")
+            video_input = gr.Video(label="Upload Video")
+            process_btn = gr.Button("Process", variant="primary")
+        with gr.Column():
+            gr.Markdown("### Output")
+            image_output = gr.Image(label="Rendered Image", type="pil")
+            video_output = gr.Video(label="Rendered Video")
+    gr.Markdown(
+        """
+    ### Tips:
+    - For best results, use images/videos with clearly visible faces
+    - The model works best with frontal face views
+    - Video processing may take a few minutes depending on length
+    """
+    )
+    # Connect the button
+    process_btn.click(
+        fn=process_input,
+        inputs=[image_input, video_input],
+        outputs=[image_output, video_output],
+    )
+    # Add examples
+    gr.Examples(
+        examples=[
+            ["example_images/00000206.jpg", None],
+            [None, "example_videos/dafoe.mp4"],
+        ],
+        inputs=[image_input, video_input],
+        outputs=[image_output, video_output],
+        fn=process_input,
+        cache_examples=False,
+    )
+if __name__ == "__main__":
+    demo.launch()

models/model_expressive.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4d769f493072aa2e98770ed1b71db784bc3ee0a2132a0fd36aab841ee591c5e2
+size 348292433

pyproject.toml ADDED Viewed

	@@ -0,0 +1,52 @@

+[build-system]
+requires = ["hatchling"]
+build-backend = "hatchling.build"
+[project]
+name = "sheap"
+version = "0.1.0"
+description = "SHeaP: Self-Supervised Head Geometry Predictor Learned via 2D Gaussians"
+readme = "README.md"
+requires-python = ">=3.11"
+license = { file = "LICENSE.txt" }
+authors = [
+    { name = "Liam Schoneveld" }
+]
+keywords = ["3d", "face", "flame", "head", "mesh", "reconstruction"]
+classifiers = [
+    "Development Status :: 3 - Alpha",
+    "Intended Audience :: Developers",
+    "Intended Audience :: Science/Research",
+    "Programming Language :: Python :: 3",
+    "Programming Language :: Python :: 3.11",
+    "Programming Language :: Python :: 3.12",
+    "Topic :: Scientific/Engineering :: Artificial Intelligence",
+]
+dependencies = [
+    "chumpy @ git+https://github.com/nlml/chumpy.git",
+    "numpy>=1.20.0",
+    "pillow>=9.0.0",
+    "pyrender>=0.1.45",
+    "roma>=1.5.4",
+    "scipy>=1.16.3",
+    "torch>=2.0.0",
+    "torchaudio>=2.0.0",
+    "torchvision>=0.15.1",
+    "tqdm>=4.67.1",
+    "trimesh>=4.9.0",
+]
+[project.urls]
+Homepage = "https://nlml.github.io/sheap"
+Repository = "https://github.com/nlml/sheap"
+[dependency-groups]
+dev = [
+    "pre-commit>=4.3.0",
+]
+[tool.hatch.metadata]
+allow-direct-references = true
+[tool.hatch.build.targets.wheel]
+packages = ["sheap"]

requirements.txt ADDED Viewed

	@@ -0,0 +1,14 @@

+git+https://github.com/nlml/chumpy.git
+numpy>=1.20.0
+pillow>=9.0.0
+pyrender>=0.1.45
+roma>=1.5.4
+scipy>=1.16.3
+torch>=2.0.0
+torchaudio>=2.0.0
+torchvision>=0.15.1
+tqdm>=4.67.1
+trimesh>=4.9.0
+gradio>=4.0.0
+face-alignment>=1.3.5
+opencv-python>=4.5.0

requirements_hf.txt ADDED Viewed

	@@ -0,0 +1,15 @@

+# Requirements for Hugging Face Spaces deployment
+torch>=2.0.0
+torchvision>=0.15.0
+numpy>=1.24.0
+pillow>=9.5.0
+opencv-python-headless>=4.8.0
+gradio>=4.0.0
+face-alignment>=1.4.1
+pyrender>=0.1.45
+trimesh>=4.0.0
+scipy>=1.11.0
+scikit-image>=0.21.0
+networkx>=3.1
+# For rendering
+pyopengl>=3.1.0

sheap/__init__.py ADDED Viewed

	@@ -0,0 +1,21 @@

+"""SHeaP: Self-Supervised Head Geometry Predictor Learned via 2D Gaussians."""
+from .eval_utils import ImsDataset, inference_images_list, save_result
+from .landmark_utils import vertices_to_7_lmks, vertices_to_landmarks
+from .load_flame_pkl import load_pkl_format_flame_model
+from .load_model import load_sheap_model
+from .render import render_mesh
+from .tiny_flame import TinyFlame
+__version__ = "0.1.0"
+__all__ = [
+    "TinyFlame",
+    "load_pkl_format_flame_model",
+    "vertices_to_landmarks",
+    "vertices_to_7_lmks",
+    "inference_images_list",
+    "save_result",
+    "ImsDataset",
+    "render_mesh",
+    "load_sheap_model",
+]

sheap/__pycache__/__init__.cpython-311.pyc ADDED Viewed

Binary file (858 Bytes). View file

sheap/__pycache__/eval_utils.cpython-311.pyc ADDED Viewed

Binary file (13.2 kB). View file

sheap/__pycache__/fa_landmark_utils.cpython-311.pyc ADDED Viewed

Binary file (4.1 kB). View file

sheap/__pycache__/landmark_utils.cpython-311.pyc ADDED Viewed

Binary file (5.82 kB). View file

sheap/__pycache__/load_flame.cpython-311.pyc ADDED Viewed

Binary file (1.93 kB). View file

sheap/__pycache__/load_flame_pkl.cpython-311.pyc ADDED Viewed

Binary file (2.29 kB). View file

sheap/__pycache__/load_model.cpython-311.pyc ADDED Viewed

Binary file (4.14 kB). View file

sheap/__pycache__/render.cpython-311.pyc ADDED Viewed

Binary file (4.34 kB). View file

sheap/__pycache__/tiny_flame.cpython-311.pyc ADDED Viewed

Binary file (7.83 kB). View file

sheap/eval_utils.py ADDED Viewed

	@@ -0,0 +1,270 @@

+from pathlib import Path
+from typing import Callable, Dict, List, Optional, Tuple, Union
+import numpy as np
+import torch
+import torch.utils.data as tud
+import trimesh
+from PIL import Image
+from tqdm import tqdm
+def _preproc_im_default(p: Union[str, Path]) -> Image.Image:
+    """Default image preprocessing function that loads an image from a path.
+    Args:
+        p: Path to the image file.
+    Returns:
+        PIL Image object.
+    """
+    return Image.open(p)
+class ImsDataset(tud.Dataset):
+    """Dataset for loading and preprocessing images.
+    Args:
+        image_paths: List of paths to image files.
+        img_wh: Tuple of (width, height) to resize images to.
+        load_and_preproc_im: Optional custom function to load and preprocess images.
+    """
+    def __init__(
+        self,
+        image_paths: List[Union[str, Path]],
+        img_wh: Tuple[int, int],
+        load_and_preproc_im: Optional[
+            Callable[[Union[str, Path]], Image.Image]
+        ] = _preproc_im_default,
+    ) -> None:
+        self.image_paths = image_paths
+        self.img_wh = img_wh
+        self.load_and_preproc_im = load_and_preproc_im
+        if self.load_and_preproc_im is None:
+            self.load_and_preproc_im = _preproc_im_default
+    def __len__(self) -> int:
+        """Return the number of images in the dataset."""
+        return len(self.image_paths)
+    def __getitem__(self, idx: int) -> torch.Tensor:
+        """Load and preprocess an image at the given index.
+        Args:
+            idx: Index of the image to load.
+        Returns:
+            Preprocessed image tensor of shape (3, H, W) with values in [0, 1].
+        """
+        impath = self.image_paths[idx]
+        pil_im = self.load_and_preproc_im(impath)
+        im = pil_im.convert("RGB").resize(self.img_wh)
+        im = np.array(im).astype("float64") / 255.0
+        im = torch.from_numpy(im).permute(2, 0, 1).float()
+        return im
+@torch.no_grad()
+def inference_images_list(
+    model: torch.nn.Module,
+    device: torch.device,
+    image_paths: List[Union[str, Path]],
+    custom_pil_im_load_fn: Optional[Callable[[Union[str, Path]], Image.Image]] = None,
+    img_wh: Tuple[int, int] = (224, 224),
+    batch_size: int = 4,
+    num_workers: int = 4,
+    verbose: bool = False,
+) -> Dict[str, torch.Tensor]:
+    """Run inference on a list of images using a model.
+    Args:
+        model: PyTorch model to use for inference.
+        device: Device to run inference on.
+        image_paths: List of paths to image files.
+        custom_pil_im_load_fn: Optional custom function to load and preprocess images.
+        img_wh: Tuple of (width, height) to resize images to. Default is (224, 224).
+        batch_size: Batch size for inference. Default is 4.
+        num_workers: Number of workers for data loading. Default is 4.
+        verbose: Whether to print output shapes. Default is False.
+    Returns:
+        Dictionary mapping output keys to concatenated tensors across all batches.
+    """
+    model = model.to(device)
+    ds = ImsDataset(image_paths, img_wh=img_wh, load_and_preproc_im=custom_pil_im_load_fn)
+    dl = torch.utils.data.DataLoader(
+        ds,
+        batch_size=batch_size,
+        shuffle=False,
+        num_workers=num_workers,
+        drop_last=False,
+        pin_memory=True,
+    )
+    all_outs = {}
+    for images in tqdm(dl, desc="Inferencing images through ViT model"):
+        images = images.to(device)
+        batch_size = images.shape[0]
+        model_outs = model(images)
+        for k in model_outs:
+            if not isinstance(model_outs[k], torch.Tensor):
+                continue
+            if k not in all_outs:
+                all_outs[k] = []
+            all_outs[k].append(model_outs[k].detach().cpu())
+    if verbose:
+        print("Concatenated output shapes:")
+    for k in all_outs:
+        all_outs[k] = torch.cat(all_outs[k], dim=0)
+        if verbose:
+            print(" --", k, all_outs[k].shape)
+    return all_outs
+def invert_4x4_cam_matrix(inp_cam: torch.Tensor) -> torch.Tensor:
+    """Invert a 4x4 camera transformation matrix.
+    Args:
+        inp_cam: 4x4 camera transformation matrix.
+    Returns:
+        Inverted 4x4 camera transformation matrix.
+    """
+    rr = inp_cam[:3, :3].T
+    tt = rr @ -inp_cam[:3, 3]
+    inv_cam = torch.eye(4, device=inp_cam.device, dtype=inp_cam.dtype)
+    inv_cam[:3, :3] = rr
+    inv_cam[:3, 3] = tt
+    return inv_cam
+def save_obj(outpath: Union[str, Path], verts: np.ndarray, faces: np.ndarray) -> None:
+    """Save vertices and faces as an OBJ file using trimesh.
+    Args:
+        outpath: Path where the OBJ file will be saved.
+        verts: Vertex array of shape (N, 3).
+        faces: Face array of shape (M, 3) containing vertex indices.
+    """
+    mesh = trimesh.Trimesh(vertices=verts, faces=faces, process=False)
+    mesh.export(outpath)
+def save_result(
+    flame_faces: np.ndarray,
+    base_dir: Union[str, Path],
+    verts_with_zero_exprn: np.ndarray,
+    lmks7_3d: torch.Tensor,
+    preds_outdir: Path,
+    input_im_path: Union[str, Path],
+    verbose: bool = False,
+) -> None:
+    """Save FLAME model prediction results to disk.
+    Saves the predicted mesh as an OBJ file and 3D landmarks as a numpy file.
+    Vertices and landmarks are scaled by 1000 to match MICA format.
+    Args:
+        flame_faces: FLAME model face indices.
+        base_dir: Base directory for computing relative paths.
+        verts_with_zero_exprn: Predicted vertices with zero expression.
+        lmks7_3d: 3D landmarks tensor.
+        preds_outdir: Output directory for predictions.
+        input_im_path: Path to the input image.
+        verbose: Whether to print save confirmation messages. Default is False.
+    """
+    # MICA scaled up by 1000, so let's try it too:
+    pred_verts = verts_with_zero_exprn * 1000.0
+    pred_lmks7_3d = lmks7_3d.numpy() * 1000.0
+    outpath_jpg = preds_outdir / Path(input_im_path).relative_to(base_dir)
+    outpath_obj = outpath_jpg.with_suffix(".obj")
+    outpath_obj.parent.mkdir(parents=True, exist_ok=True)
+    save_obj(outpath_obj, verts=pred_verts, faces=flame_faces)
+    if verbose:
+        print(f"Saved {outpath_obj}")
+    outpath_lmk_npy = outpath_obj.with_suffix(".npy")
+    np.save(outpath_lmk_npy, pred_lmks7_3d)
+    if verbose:
+        print(f"Saved {outpath_lmk_npy}")
+    assert outpath_obj.exists()
+    assert outpath_lmk_npy.exists()
+def add_pct_to_bbox(
+    top: int,
+    left: int,
+    bottom: int,
+    right: int,
+    im_np_array: Union[np.ndarray, Image.Image],
+    pct: float = 0.2,
+) -> Tuple[int, int, int, int]:
+    """Expand a bounding box by a percentage while staying within image bounds.
+    Args:
+        top: Top coordinate of the bounding box.
+        left: Left coordinate of the bounding box.
+        bottom: Bottom coordinate of the bounding box.
+        right: Right coordinate of the bounding box.
+        im_np_array: Image as numpy array or PIL Image.
+        pct: Percentage to expand the bounding box by. Default is 0.2 (20%).
+    Returns:
+        Tuple of (top, left, bottom, right) coordinates of the expanded bounding box.
+    """
+    if isinstance(im_np_array, Image.Image):
+        im_np_array = np.array(im_np_array)
+    h, w, _ = im_np_array.shape
+    box_height = bottom - top
+    top = max(0, top - int(box_height * pct * 0.5))
+    bottom = top + int(box_height * (1 + pct))
+    bottom = min(h, bottom)
+    box_width = right - left
+    left = max(0, left - int(box_width * pct * 0.5))
+    right = left + int(box_width * (1 + pct))
+    right = min(w, right)
+    return top, left, bottom, right
+def resize_to_max_size(
+    im: Union[np.ndarray, Image.Image], max_size: int = 512, pad_smaller: bool = True
+) -> Union[np.ndarray, Image.Image]:
+    """Resize an image to fit within a maximum size, optionally padding to square.
+    Args:
+        im: Input image as numpy array or PIL Image.
+        max_size: Maximum size for the longest dimension. Default is 512.
+        pad_smaller: Whether to pad the smaller dimension to create a square image.
+            Default is True.
+    Returns:
+        Resized (and optionally padded) image in the same format as input.
+    """
+    was_np = False
+    if isinstance(im, np.ndarray):
+        im = Image.fromarray(im)
+        was_np = True
+    w, h = im.size
+    if h > w:
+        new_h = max_size
+        new_w = int(w * (max_size / h))
+    else:
+        new_w = max_size
+        new_h = int(h * (max_size / w))
+    im = im.resize((new_w, new_h))
+    if pad_smaller:
+        new_im = Image.new("RGB", (max_size, max_size))
+        new_im.paste(im, ((max_size - new_w) // 2, (max_size - new_h) // 2))
+        im = new_im
+    if was_np:
+        return np.array(im)
+    return im

sheap/fa_landmark_utils.py ADDED Viewed

	@@ -0,0 +1,96 @@

+from typing import Tuple
+import face_alignment
+import numpy as np
+import torch
+from numpy.typing import NDArray
+from sheap.landmark_utils import landmarks_2_face_bounding_box
+def get_fa_landmarks(
+    np_array_im_255_uint8: NDArray[np.uint8],
+    fa: face_alignment.FaceAlignment,
+    normalize: bool = True,
+) -> Tuple[torch.Tensor, torch.Tensor]:
+    """
+    Extract facial landmarks from an image using face_alignment.
+    Args:
+        np_array_im_255_uint8: Image array of shape (H, W, 3) with values in [0, 255]
+        fa: FaceAlignment model instance
+        normalize: If True, normalize landmarks to [0, 1] range
+    Returns:
+        Tuple of (landmarks, success):
+            - landmarks: Tensor of shape (68, 2) with normalized coordinates
+            - success: Boolean tensor indicating if face was detected
+    """
+    preds = fa.get_landmarks(np_array_im_255_uint8)
+    if preds is not None:
+        if normalize:
+            h, w = np_array_im_255_uint8.shape[:2]
+            lmks = preds[0][:, :2] / np.array([w, h])
+        else:
+            lmks = preds[0][:, :2]
+        success = True
+    else:
+        lmks = np.zeros((68, 2))
+        success = False
+    lmks_tensor = torch.from_numpy(lmks).float()
+    success_tensor = torch.tensor(success).bool()
+    return lmks_tensor, success_tensor
+def detect_face_and_crop(
+    image: torch.Tensor,
+    fa_model: face_alignment.FaceAlignment,
+    margin: float = 0.6,
+    shift_up: float = 0.2,
+) -> Tuple[int, int, int, int]:
+    """
+    Detect face and compute bounding box coordinates.
+    Args:
+        image: torch.Tensor of shape (3, H, W) with values in [0, 1]
+        fa_model: FaceAlignment model instance for landmark detection
+    Returns:
+        tuple: (x0, x1, y0, y1) bounding box coordinates in pixels
+    """
+    _, h, w = image.shape
+    # Convert image to numpy format for face_alignment (H, W, 3) with values [0, 255]
+    image_np = (image.permute(1, 2, 0).numpy() * 255).astype(np.uint8)
+    # Get facial landmarks
+    lmks, success = get_fa_landmarks(image_np, fa_model, normalize=True)
+    if not success:
+        # If face detection fails, return center square from image
+        if h > w:
+            y0 = (h - w) // 2
+            y1 = y0 + w
+            x0 = 0
+            x1 = w
+        else:
+            x0 = (w - h) // 2
+            x1 = x0 + h
+            y0 = 0
+            y1 = h
+        return x0, x1, y0, y1
+    # Add batch dimension for landmarks_2_face_bounding_box
+    lmks_batched = lmks.unsqueeze(0)  # Shape: (1, 68, 2)
+    valid = torch.ones(1, dtype=torch.bool)
+    # Compute bounding box in normalized coordinates
+    bbox = landmarks_2_face_bounding_box(
+        lmks_batched, valid, margin=margin, clamp=True, shift_up=shift_up, aspect_ratio=w / h
+    )
+    x0, y0, x1, y1 = bbox[0].tolist()
+    x0, y0, x1, y1 = int(x0 * w), int(y0 * h), int(x1 * w), int(y1 * h)
+    return x0, y0, x1, y1

sheap/landmark_utils.py ADDED Viewed

	@@ -0,0 +1,143 @@

+from typing import Tuple
+import torch
+from torch import Tensor
+def vertices_to_landmarks(
+    vertices: Tensor,  # shape: (*batch, num_vertices, 3)
+    faces: Tensor,  # shape: (num_faces, 3), indices of vertices
+    face_indices_with_landmarks: Tensor,  # shape: (num_landmarks,), indices of faces
+    barys: Tensor,  # shape: (num_landmarks, 3), barycentric coordinates
+) -> Tensor:
+    """
+    Calculate the 3D world coordinates of landmarks from mesh vertices.
+    Args:
+        vertices (Tensor): Mesh vertices of shape (*batch, num_vertices, 3).
+        faces (Tensor): Mesh faces of shape (num_faces, 3), containing indices into `vertices`.
+        face_indices_with_landmarks (Tensor): Indices of faces containing the landmarks, shape (num_landmarks,).
+        barys (Tensor): Barycentric coordinates of the landmarks in their respective faces,
+                        shape (num_landmarks, 3). The last dimension should sum to 1.0.
+    Returns:
+        Tensor: Landmark positions of shape (*batch, num_landmarks, 3).
+    """
+    did_unsqueeze = False
+    if vertices.ndim == 2:  # Support no batch dimension case
+        vertices = vertices.unsqueeze(0)
+        did_unsqueeze = True
+    batch_dims = vertices.shape[:-2]
+    # Select the faces that contain the landmarks
+    relevant_faces = faces[face_indices_with_landmarks]
+    # Select vertices corresponding to relevant faces
+    selected_vertices = torch.index_select(vertices, len(batch_dims), relevant_faces.view(-1)).view(
+        *batch_dims, *relevant_faces.shape, 3
+    )
+    # Compute landmark positions using barycentric interpolation
+    landmark_positions = torch.einsum("b...lvx,lv->b...lx", selected_vertices, barys)
+    if did_unsqueeze:
+        landmark_positions = landmark_positions[0]
+    return landmark_positions
+def vertices_to_7_lmks(
+    vertices: Tensor,
+    flame_faces: Tensor,
+    face_alignment_lmk_faces_idx: Tensor,
+    face_alignment_lmk_bary_coords: Tensor,
+) -> Tuple[Tensor, Tensor]:
+    """
+    Extract the 7 specific 3D landmarks (and all landmarks) from mesh vertices.
+    Args:
+        vertices (Tensor): Mesh vertices of shape (*batch, num_vertices, 3).
+        flame_faces (Tensor): Mesh faces of shape (num_faces, 3).
+        face_alignment_lmk_faces_idx (Tensor): Indices of faces that contain facial landmarks.
+        face_alignment_lmk_bary_coords (Tensor): Barycentric coordinates of landmarks within faces.
+    Returns:
+        Tuple[Tensor, Tensor]:
+            - lmks7_3d: Landmark positions for 7 specific points, shape (*batch, 7, 3).
+            - lmks_3d: Landmark positions for all landmarks, shape (*batch, num_landmarks, 3).
+    """
+    lmks_3d = vertices_to_landmarks(
+        vertices,
+        flame_faces,
+        face_alignment_lmk_faces_idx,
+        face_alignment_lmk_bary_coords,
+    )
+    # Select landmark subset starting from index 17 (e.g., 51 landmarks)
+    landmark_51 = lmks_3d[:, 17:]
+    # Extract specific 7 landmark indices
+    lmks7_3d = landmark_51[:, [19, 22, 25, 28, 16, 31, 37]]
+    return lmks7_3d, lmks_3d
+def landmarks_2_face_bounding_box(
+    landmarks: Tensor,
+    valid: Tensor,
+    margin: float = 0.1,
+    clamp: bool = True,
+    shift_up: float = 0.0,
+    too_small_threshold: float = 0.02,
+    aspect_ratio: float = 1.0,
+) -> Tensor:
+    """
+    Calculate a square bounding box around face landmarks with a specified margin for batched inputs.
+    Parameters:
+    - landmarks: torch.Tensor of shape [B1,...,BN,L,2], normalized face landmarks.
+    - valid: torch.Tensor of shape [B1,...,BN], boolean indicating validity of each entry.
+    - margin: float, margin factor to expand the bounding box around the face.
+    - clamp: bool, whether to clamp the bounding box to [0, 1].
+    - shift_up: float, factor to shift the bounding box up.
+    - too_small_threshold: float, threshold for the bounding box size.
+    - aspect_ratio: float, aspect ratio of the image that the landmarks live on (width / height).
+        The box size will be divided by this value, under the assumption that you are going to
+        multiply these normalised coordinates by the image width later.
+    Returns:
+    - bbox: torch.Tensor of shape [B1,...,BN,4] representing the square bounding box.
+    """
+    # Calculate min and max coordinates along the last dimension for x and y
+    min_coords, _ = landmarks.min(dim=-2)
+    max_coords, _ = landmarks.max(dim=-2)
+    # Calculate the center and size of the bounding box
+    center_coords = (min_coords + max_coords) / 2
+    half_size = ((max_coords - min_coords).max(dim=-1).values) / 2
+    not_too_small = half_size > too_small_threshold
+    valid = valid & not_too_small
+    # Apply margin
+    shift_up = shift_up * half_size
+    half_size *= 1 + margin
+    # Calculate the square bounding box coordinates
+    x_min = center_coords[..., 0] - half_size / aspect_ratio
+    x_max = center_coords[..., 0] + half_size / aspect_ratio
+    y_min = center_coords[..., 1] - half_size - shift_up
+    y_max = center_coords[..., 1] + half_size - shift_up
+    # Stack to get the final bounding box tensor
+    bbox = torch.stack([x_min, y_min, x_max, y_max], dim=-1)
+    # Create a full image bounding box of [0, 0, 1, 1]
+    full_image_bbox = torch.tensor([0.0, 0.0, 1.0, 1.0], device=landmarks.device)
+    # Overwrite invalid entries with the full image bounding box
+    bbox = torch.where(valid.unsqueeze(-1), bbox, full_image_bbox)
+    if clamp:
+        return bbox.clamp(0, 1)
+    return bbox

sheap/load_flame_pkl.py ADDED Viewed

	@@ -0,0 +1,35 @@

+import os
+import pickle
+from pathlib import Path
+from typing import Dict, Union
+import numpy as np
+import torch
+from torch import Tensor
+def load_pkl_format_flame_model(path: Union[str, os.PathLike, Path]) -> Dict[str, Tensor]:
+    """Load a FLAME model from a pickle file format.
+    Loads FLAME model parameters including faces, kinematic tree, joint regressor,
+    shape directions, joints, weights, pose directions, and vertex template.
+    Args:
+        path: Path to the FLAME model pickle file.
+    Returns:
+        Dictionary containing FLAME model parameters as tensors.
+    """
+    flame_params: Dict[str, Tensor] = {}
+    with open(path, "rb") as f:
+        flame_data = pickle.load(f, encoding="latin1")
+    flame_params["faces"] = torch.from_numpy(flame_data["f"].astype("int64"))
+    kintree = torch.from_numpy(flame_data["kintree_table"].astype("int64"))
+    kintree[kintree > 100] = -1
+    flame_params["kintree"] = kintree
+    flame_params["J_regressor"] = torch.from_numpy(
+        flame_data["J_regressor"].toarray().astype("float32")
+    )
+    for thing in ["shapedirs", "J", "weights", "posedirs", "v_template"]:
+        flame_params[thing] = torch.from_numpy(np.array(flame_data[thing]).astype("float32"))
+    return flame_params

sheap/load_model.py ADDED Viewed

	@@ -0,0 +1,85 @@

+import urllib.request
+from pathlib import Path
+from typing import Dict, Literal
+import torch
+# Map model types to filenames and (optional) download URLs
+MODEL_INFO: Dict[str, Dict[str, str]] = {
+    "paper": {
+        "filename": "model_paper.pt",
+        "url": "https://github.com/nlml/sheap/releases/download/v1.0.0/model_paper.pt",
+    },
+    "expressive": {
+        "filename": "model_expressive.pt",
+        "url": "https://github.com/nlml/sheap/releases/download/v1.0.0/model_expressive.pt",
+    },
+}
+def ensure_model_downloaded(
+    model_type: Literal["paper", "expressive"] = "paper", models_dir: Path = Path("./models")
+) -> None:
+    """Ensure the requested model is present locally, downloading if needed.
+    Args:
+        model_type: Which model variant to use. Valid options are "paper" or "expressive".
+            Default is "paper".
+        models_dir: Directory where models are stored. Default is "./models".
+    Raises:
+        ValueError: If model_type is not recognized.
+        FileNotFoundError: If model file is not found and no download URL is configured.
+    """
+    if model_type not in MODEL_INFO:
+        valid = ", ".join(MODEL_INFO.keys())
+        raise ValueError(f"Unknown model_type '{model_type}'. Valid options: {valid}")
+    models_dir = Path(models_dir)
+    filename = MODEL_INFO[model_type]["filename"]
+    url = MODEL_INFO[model_type]["url"]
+    model_path = models_dir / filename
+    if model_path.exists():
+        return
+    # If we don't have a URL
+    if not url:
+        raise FileNotFoundError(
+            f"Model file '{model_path}' not found and no download URL is configured for "
+            f"model_type='{model_type}'. Place the file manually or update MODEL_INFO with a valid URL."
+        )
+    print(f"Downloading '{model_type}' model to {model_path}...")
+    model_path.parent.mkdir(parents=True, exist_ok=True)
+    urllib.request.urlretrieve(url, model_path)
+def load_sheap_model(
+    model_type: Literal["paper", "expressive"] = "paper", models_dir: Path = Path("./models")
+) -> torch.jit.ScriptModule:
+    """Load the SHeaP model as a PyTorch JIT trace.
+    The function will download the model if it is not present locally (if a URL is
+    configured for the selected model_type).
+    Args:
+        model_type: Which model variant to load. Valid options are "paper" or "expressive".
+            Default is "paper" for backward compatibility.
+        models_dir: Directory where models are stored. Default is "./models".
+    Returns:
+        The loaded SHeaP model as a PyTorch JIT ScriptModule.
+    Raises:
+        ValueError: If model_type is not recognized.
+    """
+    if model_type not in MODEL_INFO:
+        valid = ", ".join(MODEL_INFO.keys())
+        raise ValueError(f"Unknown model_type '{model_type}'. Valid options: {valid}")
+    models_dir = Path(models_dir)
+    ensure_model_downloaded(model_type=model_type, models_dir=models_dir)
+    filename = MODEL_INFO[model_type]["filename"]
+    sheap_model = torch.jit.load(models_dir / filename)
+    return sheap_model

sheap/py.typed ADDED Viewed

File without changes

sheap/render.py ADDED Viewed

	@@ -0,0 +1,83 @@

+from typing import Tuple, Union
+import numpy as np
+import pyrender
+import torch
+import trimesh
+def render_mesh(
+    verts: Union[np.ndarray, torch.Tensor],
+    faces: Union[np.ndarray, torch.Tensor],
+    c2w: Union[np.ndarray, torch.Tensor],
+    img_width: int = 512,
+    img_height: int = 512,
+    fov_degrees: Union[float, int] = 14.2539,
+    render_normals: bool = True,
+) -> Tuple[np.ndarray, np.ndarray]:
+    """Render a mesh using pyrender with a perspective camera defined by FOV.
+    Args:
+        verts: Mesh vertex positions of shape (N, 3).
+        faces: Triangle vertex indices of shape (F, 3).
+        c2w: Camera-to-world transform matrix (extrinsics) of shape (4, 4).
+        img_width: Rendered image width in pixels. Default is 512.
+        img_height: Rendered image height in pixels. Default is 512.
+        fov_degrees: Vertical field of view in degrees. Default is 14.2539.
+        render_normals: If True, render normals as RGB. If False, render with lighting. Default is True.
+    Returns:
+        Tuple containing:
+            - color: RGB image from the render of shape (H, W, 3) as uint8.
+            - depth: Depth map from the render of shape (H, W) as float32.
+    """
+    if isinstance(c2w, torch.Tensor):
+        c2w = c2w.detach().cpu().numpy()
+    if isinstance(verts, torch.Tensor):
+        verts = verts.detach().cpu().numpy()
+    if isinstance(faces, torch.Tensor):
+        faces = faces.detach().cpu().numpy()
+    if not isinstance(fov_degrees, (float, int)):
+        fov_degrees = float(fov_degrees)
+    # Convert degrees to radians
+    yfov = np.deg2rad(fov_degrees)
+    # Create trimesh mesh
+    mesh = trimesh.Trimesh(vertices=verts, faces=faces)
+    if render_normals:
+        # Get vertex normals and map to RGB colors
+        # Trimesh automatically computes normals when accessed
+        normals = mesh.vertex_normals
+        # Transform normals to camera space
+        w2c = np.linalg.inv(c2w)
+        normals_camera = normals @ w2c[:3, :3].T
+        # Map from [-1, 1] to [0, 255] for RGB
+        vertex_colors = ((normals_camera + 1.0) * 0.5 * 255).astype(np.uint8)
+        mesh.visual.vertex_colors = vertex_colors
+    # Convert to pyrender mesh
+    render_mesh = pyrender.Mesh.from_trimesh(mesh)
+    # Create scene
+    if render_normals:
+        scene = pyrender.Scene(ambient_light=[1.0, 1.0, 1.0])
+    else:
+        scene = pyrender.Scene(ambient_light=[0.3, 0.3, 0.3])
+        # Add directional light
+        light = pyrender.DirectionalLight(color=[1.0, 1.0, 1.0], intensity=3.0)
+        scene.add(light, pose=c2w)
+    scene.add(render_mesh)
+    # Perspective camera
+    camera = pyrender.PerspectiveCamera(yfov=yfov, aspectRatio=img_width / img_height)
+    # pyrender expects camera-to-world
+    scene.add(camera, pose=c2w)
+    # Offscreen render
+    renderer = pyrender.OffscreenRenderer(viewport_width=img_width, viewport_height=img_height)
+    color, depth = renderer.render(scene)
+    return color, depth

sheap/tiny_flame.py ADDED Viewed

	@@ -0,0 +1,168 @@

+from pathlib import Path
+import torch
+import torch.nn.functional as F
+from roma import rotvec_to_rotmat
+from torch import nn
+class TinyFlame(nn.Module):
+    v_template: torch.Tensor
+    J_regressor: torch.Tensor
+    shapedirs: torch.Tensor
+    posedirs: torch.Tensor
+    weights: torch.Tensor
+    faces: torch.Tensor
+    kintree: torch.Tensor
+    def __init__(
+        self,
+        ckpt: Path | str,
+        eyelids_ckpt: Path | str | None = None,
+    ) -> None:
+        """A tiny version of the FLAME model that is compatible with ONNX."""
+        super().__init__()
+        # Load the FLAME model weights
+        ckpt = Path(ckpt).expanduser()
+        data = torch.load(ckpt)
+        for name, tensor in data.items():
+            self.register_buffer(name, tensor)
+        # Load the eyelids blendshapes if provided
+        if eyelids_ckpt is not None:
+            eyelids_ckpt = Path(eyelids_ckpt).expanduser()
+            eyelids_data = torch.load(eyelids_ckpt)
+            self.register_buffer("eyelids_dirs", eyelids_data)
+        else:
+            self.eyelids_dirs = None
+        # To work around the limitation of TorchDynamo, we need to convert kinematic tree to a list,
+        # such that it is treated as a constant.
+        self.parents = self.kintree[0].tolist()
+    def forward(
+        self,
+        shape: torch.Tensor,
+        expression: torch.Tensor,
+        pose: torch.Tensor,
+        translation: torch.Tensor,
+        eyelids: torch.Tensor | None = None,
+    ) -> torch.Tensor:
+        """Convert FLAME parameters to coordinates of FLAME vertices.
+        Args:
+        - shape (torch.Tensor): Shape parameters of the FLAME model with shape (N, 300).
+        - expression (torch.Tensor): Expression parameters of the FLAME model with shape (N, 100).
+        - pose (torch.Tensor): Pose parameters of the FLAME model as 3x3 matrices with shape (N, 5, 3, 3).
+            It is the concatenation of torso pose (global rotation), neck pose, jaw pose,
+            and left/right eye poses.
+        - translation (torch.Tensor): Global translation parameters of the FLAME model with shape (N, 3).
+        - eyelids (torch.Tensor): Eyelids blendshape parameters with shape (N, 2).
+        Returns:
+        - vertices (torch.Tensor): The vertices of the FLAME model with shape (N, V, 3).
+        """
+        # Some common variables
+        batch_size = shape.shape[0]
+        num_joints = len(self.parents)
+        # Step1: compute T per equations (2)-(5) in the paper
+        # Compute the shape offsets from the shape and the expression parameters
+        shape_expr = torch.cat([shape, expression], -1)
+        shape_expr_offsets = (self.shapedirs @ shape_expr.t()).permute(2, 0, 1)
+        # Get the vertex offsets due to pose blendshapes
+        pose_features = pose[:, 1:, :, :] - torch.eye(3, device=pose.device)
+        pose_features = pose_features.view(batch_size, -1)
+        pose_offsets = (self.posedirs @ pose_features.t()).permute(2, 0, 1)
+        # Add offsets to the template mesh to get T
+        shaped_vertices = self.v_template.expand_as(shape_expr_offsets) + shape_expr_offsets
+        if eyelids is not None and self.eyelids_dirs is not None:
+            shaped_vertices = shaped_vertices + (self.eyelids_dirs @ eyelids.t()).permute(2, 0, 1)
+        shaped_vertices_with_pose_correction = shaped_vertices + pose_offsets
+        # Step2: compute the joint locations per equation (1) in the paper
+        # Get the joint locations with the joint regressor
+        joint_locations = self.J_regressor @ shaped_vertices
+        # Step3: compute the final mesh vertices per equation (1) in the paper using standard LBS functions.
+        # Find the transformation for: unposed FLAME -> joints' local coordinate systems -> posed FLAME
+        relative_joint_locations = (
+            joint_locations[:, 1:, :] - joint_locations[:, self.parents[1:], :]
+        )
+        relative_joint_locations = torch.cat(
+            [joint_locations[:, :1, :], relative_joint_locations], dim=1
+        )
+        relative_joint_locations_homogeneous = F.pad(relative_joint_locations, (0, 1), value=1)
+        # joint -> parent joint transformations
+        joint_to_parent_transformations = torch.cat(
+            [
+                F.pad(pose, (0, 0, 0, 1), value=0),
+                relative_joint_locations_homogeneous.unsqueeze(-1),
+            ],
+            dim=-1,
+        )
+        joint_to_posed_transformations_ = [joint_to_parent_transformations[:, 0, :, :]]
+        # joint -> posed FLAME transformations
+        for i in range(1, num_joints):
+            parent_joint = self.parents[i]
+            current_joint_to_posed_transformation = (
+                joint_to_posed_transformations_[parent_joint]
+                @ joint_to_parent_transformations[:, i, :, :]
+            )
+            joint_to_posed_transformations_.append(current_joint_to_posed_transformation)
+        joint_to_posed_transformations = torch.stack(joint_to_posed_transformations_, dim=1)
+        # Unposed FLAME -> joints' local coordinate systems -> posed FLAME transformations
+        unposed_to_posed_transformations = joint_to_posed_transformations - F.pad(
+            joint_to_posed_transformations @ F.pad(joint_locations, (0, 1), value=0).unsqueeze(-1),
+            (3, 0),
+            value=0,
+        )
+        # Scale rotations and translations by the blend weights
+        final_transformations = (self.weights @ unposed_to_posed_transformations.flatten(2)).view(
+            batch_size, -1, 4, 4
+        )
+        # Apply the transformations to the posed vertices T
+        shaped_vertices_with_pose_correction_homogeneous = F.pad(
+            shaped_vertices_with_pose_correction, (0, 1), value=1
+        )
+        posed_vertices = (
+            final_transformations @ shaped_vertices_with_pose_correction_homogeneous.unsqueeze(-1)
+        )[..., :3, 0] + translation.unsqueeze(1)
+        return posed_vertices
+def pose_components_to_rotmats(predictions):
+    """
+    predictions should contain these 5 keys:
+    'torso_pose', 'neck_pose', 'jaw_pose', 'eye_l_pose', 'eye_r_pose'
+    Each of these is expected to be of shape (N, 3) representing rotation vectors.
+    This function converts them to rotation matrices and stacks them into a tensor of shape (N, 5, 3, 3).
+    """
+    pose = torch.stack(
+        [
+            predictions["torso_pose"],
+            predictions["neck_pose"],
+            predictions["jaw_pose"],
+            predictions["eye_l_pose"],
+            predictions["eye_r_pose"],
+        ],
+        dim=1,
+    )
+    pose = pose.view(-1, 3)
+    pose = rotvec_to_rotmat(pose)
+    return pose.view(-1, 5, 3, 3)

teaser.jpg ADDED Viewed

Git LFS Details

SHA256: 3538cd3c11fd0da5f6422fdae283474a74929100424e4ad9a3b1da844edd4696
Pointer size: 131 Bytes
Size of remote file: 173 kB

video_demo.py ADDED Viewed

	@@ -0,0 +1,460 @@

+import argparse
+import os
+import shutil
+import subprocess
+import threading
+from pathlib import Path
+from queue import Empty, Queue
+from typing import Any, Dict, List, Optional, Tuple
+import cv2
+import numpy as np
+import torch
+import torchvision.transforms.functional as TF
+from PIL import Image
+from torch.utils.data import DataLoader, IterableDataset
+from tqdm import tqdm
+from demo import create_rendering_image
+from sheap import load_sheap_model
+from sheap.tiny_flame import TinyFlame, pose_components_to_rotmats
+try:
+    import face_alignment
+except ImportError:
+    raise ImportError(
+        "The 'face_alignment' package is required. Please install it via 'pip install face-alignment'."
+    )
+from sheap.fa_landmark_utils import detect_face_and_crop
+class RenderingThread(threading.Thread):
+    """Background thread for rendering frames to images."""
+    def __init__(
+        self,
+        render_queue: Queue,
+        temp_dir: Path,
+        faces: torch.Tensor,
+        c2w: torch.Tensor,
+        render_size: int,
+    ):
+        """
+        Initialize rendering thread.
+        Args:
+            render_queue: Queue containing (frame_idx, cropped_frame, verts) tuples
+            temp_dir: Directory to save rendered images
+            faces: Face indices tensor from FLAME model
+            c2w: Camera-to-world transformation matrix
+            render_size: Size of each sub-image in the rendered output
+        """
+        super().__init__(daemon=True)
+        self.render_queue = render_queue
+        self.temp_dir = temp_dir
+        self.faces = faces
+        self.c2w = c2w
+        self.render_size = render_size
+        self.stop_event = threading.Event()
+        self.frames_rendered = 0
+    def run(self):
+        """Process rendering queue until stop signal is received."""
+        # Set PyOpenGL platform for this thread
+        os.environ["PYOPENGL_PLATFORM"] = "egl"
+        while not self.stop_event.is_set():
+            try:
+                # Get item from queue with timeout to allow checking stop_event
+                try:
+                    item = self.render_queue.get(timeout=0.1)
+                except Empty:  # Haven't finished, but nothing to render yet
+                    continue
+                if item is None:  # Sentinel value to stop
+                    break
+                frame_idx, cropped_frame, verts = item
+                frame_idx, cropped_frame, verts = item
+                # Render the frame
+                cropped_pil = Image.fromarray(cropped_frame)
+                combined = create_rendering_image(
+                    original_image=cropped_pil,
+                    verts=verts,
+                    faces=self.faces,
+                    c2w=self.c2w,
+                    output_size=self.render_size,
+                )
+                # Save to temp directory with zero-padded frame number
+                output_path = self.temp_dir / f"frame_{frame_idx:06d}.png"
+                combined.save(output_path)
+                self.frames_rendered += 1
+                self.render_queue.task_done()
+            except Exception as e:
+                if not self.stop_event.is_set():
+                    print(f"Error rendering frame: {e}")
+                    import traceback
+                    traceback.print_exc()
+    def stop(self):
+        """Signal the thread to stop."""
+        self.stop_event.set()
+class VideoFrameDataset(IterableDataset):
+    """Iterable dataset for streaming video frames with face detection and cropping."""
+    def __init__(
+        self,
+        video_path: str,
+        fa_model: face_alignment.FaceAlignment,
+        smoothing_alpha: float = 0.3,
+    ):
+        """
+        Initialize video frame dataset.
+        Args:
+            video_path: Path to video file
+            fa_model: FaceAlignment model instance for face detection
+            smoothing_alpha: Smoothing factor for bounding box (0=no smoothing, 1=no change).
+                           Lower values = more smoothing
+        """
+        super().__init__()
+        self.video_path = video_path
+        self.fa_model = fa_model
+        self.smoothing_alpha = smoothing_alpha
+        self.prev_bbox: Optional[Tuple[int, int, int, int]] = None
+        # Get video metadata (don't keep capture open)
+        cap = cv2.VideoCapture(video_path)
+        if not cap.isOpened():
+            raise ValueError(f"Could not open video file: {video_path}")
+        self.fps = cap.get(cv2.CAP_PROP_FPS)
+        self.num_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
+        self.width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
+        self.height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
+        cap.release()
+        print(
+            f"Video info: {self.num_frames} frames, {self.fps:.2f} fps, {self.width}x{self.height}"
+        )
+    def __iter__(self):
+        """
+        Iterate through video frames sequentially.
+        Yields:
+            Dictionary containing frame_idx, processed image, and bounding box
+        """
+        # Reset smoothing state for new iteration
+        self.prev_bbox = None
+        # Open video capture for this iteration
+        cap = cv2.VideoCapture(self.video_path)
+        if not cap.isOpened():
+            raise RuntimeError(f"Could not open video file: {self.video_path}")
+        frame_idx = 0
+        while True:
+            # Read frame
+            ret, frame_bgr = cap.read()
+            if not ret:
+                break
+            # Convert BGR to RGB
+            frame_rgb = cv2.cvtColor(frame_bgr, cv2.COLOR_BGR2RGB)
+            # Convert to torch tensor (C, H, W) with values in [0, 1]
+            image = torch.from_numpy(frame_rgb).permute(2, 0, 1).float() / 255.0
+            # Detect face and crop
+            bbox = detect_face_and_crop(image, self.fa_model, margin=0.9, shift_up=0.5)
+            # Apply smoothing using exponential moving average
+            bbox = self._smooth_bbox(bbox)
+            x0, y0, x1, y1 = bbox
+            cropped = image[:, y0:y1, x0:x1]
+            # Resize to 224x224 for SHEAP model
+            cropped_resized = TF.resize(cropped, [224, 224], antialias=True)
+            cropped_for_render = TF.resize(cropped, [512, 512], antialias=True)
+            yield {
+                "frame_idx": frame_idx,
+                "image": cropped_resized,
+                "bbox": bbox,
+                "original_frame": frame_rgb,  # Keep original for reference (as numpy array)
+                "cropped_frame": cropped_for_render,  # Cropped region resized to 512x512
+            }
+            frame_idx += 1
+        cap.release()
+    def _smooth_bbox(self, bbox: Tuple[int, int, int, int]) -> Tuple[int, int, int, int]:
+        """Apply exponential moving average smoothing to bounding box."""
+        if self.prev_bbox is None:
+            self.prev_bbox = bbox
+            return bbox
+        x0, y0, x1, y1 = bbox
+        prev_x0, prev_y0, prev_x1, prev_y1 = self.prev_bbox
+        # Smooth: new_bbox = alpha * detected_bbox + (1 - alpha) * prev_bbox
+        smoothed = (
+            int(self.smoothing_alpha * x0 + (1 - self.smoothing_alpha) * prev_x0),
+            int(self.smoothing_alpha * y0 + (1 - self.smoothing_alpha) * prev_y0),
+            int(self.smoothing_alpha * x1 + (1 - self.smoothing_alpha) * prev_x1),
+            int(self.smoothing_alpha * y1 + (1 - self.smoothing_alpha) * prev_y1),
+        )
+        self.prev_bbox = smoothed
+        return smoothed
+    def __len__(self) -> int:
+        return self.num_frames
+def process_video(
+    video_path: str,
+    model_type: str = "expressive",
+    batch_size: int = 8,
+    num_workers: int = 0,
+    device: str = "cuda" if torch.cuda.is_available() else "cpu",
+    output_video_path: Optional[str] = None,
+    render_size: int = 512,
+    num_render_workers: int = 16,
+    max_queue_size: int = 128,
+) -> List[Dict[str, Any]]:
+    """
+    Process video frames through SHEAP model and optionally render output video.
+    Uses an IterableDataset for efficient sequential video processing without seeking overhead.
+    Rendering is done in a background thread, and ffmpeg is used to create the final video.
+    Args:
+        video_path: Path to video file
+        model_type: SHEAP model variant ("paper", "expressive", or "lightweight")
+        batch_size: Batch size for processing
+        num_workers: Number of workers (0 or 1 only). Will be clamped to max 1.
+        device: Device to run model on ("cpu" or "cuda")
+        output_video_path: If provided, render and save output video to this path
+        render_size: Size of each sub-image in the rendered output
+        num_render_workers: Number of background threads for rendering
+        max_queue_size: Maximum size of the rendering queue
+    Returns:
+        List of dictionaries containing frame index, bounding box, and FLAME parameters
+    """
+    # Enforce num_workers constraint for IterableDataset
+    num_workers = min(num_workers, 1)
+    if num_workers > 1:
+        print(f"Warning: num_workers > 1 not supported with IterableDataset. Using num_workers=1.")
+    # Load SHEAP model
+    print(f"Loading SHEAP model (type: {model_type})...")
+    sheap_model = load_sheap_model(model_type=model_type)
+    sheap_model.eval()
+    sheap_model = sheap_model.to(device)
+    # Load face alignment model
+    # Force CPU for FA when using num_workers=1 (subprocess issues with GPU)
+    fa_device = "cpu" if num_workers >= 1 else device
+    print(f"Loading face alignment model on {fa_device}...")
+    fa_model = face_alignment.FaceAlignment(
+        face_alignment.LandmarksType.THREE_D, flip_input=False, device=fa_device
+    )
+    # Create dataset and dataloader
+    dataset = VideoFrameDataset(video_path, fa_model)
+    dataloader = DataLoader(
+        dataset,
+        batch_size=batch_size,
+        num_workers=num_workers,
+        pin_memory=torch.cuda.is_available(),
+    )
+    print(f"Processing {len(dataset)} frames from {video_path}")
+    # Initialize FLAME model and rendering thread if rendering
+    flame = None
+    rendering_threads = []
+    render_queue = None
+    temp_dir = None
+    c2w = None
+    if output_video_path:
+        print("Loading FLAME model for rendering...")
+        flame_dir = Path("FLAME2020/")
+        flame = TinyFlame(flame_dir / "generic_model.pt", eyelids_ckpt=flame_dir / "eyelids.pt")
+        flame = flame.to(device)  # Move FLAME to GPU
+        c2w = torch.tensor(
+            [[1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 1], [0, 0, 0, 1]], dtype=torch.float32
+        )
+        # Create temporary directory for rendered frames
+        temp_dir = Path("./temp_sheap_render/")
+        temp_dir.mkdir(parents=True, exist_ok=True)
+        print(f"Using temporary directory: {temp_dir}")
+        # Start multiple background rendering threads
+        render_queue = Queue(maxsize=max_queue_size)
+        for _ in range(num_render_workers):
+            thread = RenderingThread(render_queue, temp_dir, flame.faces, c2w, render_size)
+            thread.start()
+            rendering_threads.append(thread)
+        print(f"Started {num_render_workers} background rendering threads")
+    results = []
+    frame_count = 0
+    with torch.no_grad():
+        progbar = tqdm(total=len(dataset), desc="Processing frames")
+        for batch in dataloader:
+            frame_indices = batch["frame_idx"]
+            images = batch["image"].to(device)
+            bboxes = batch["bbox"]
+            # Process through SHEAP model
+            flame_params_dict = sheap_model(images)
+            # Generate vertices for this batch if rendering
+            if output_video_path and flame is not None:
+                verts = flame(
+                    shape=flame_params_dict["shape_from_facenet"],
+                    expression=flame_params_dict["expr"],
+                    pose=pose_components_to_rotmats(flame_params_dict),
+                    eyelids=flame_params_dict["eyelids"],
+                    translation=flame_params_dict["cam_trans"],
+                )
+            # Store results and queue for rendering
+            for i in range(len(frame_indices)):
+                frame_idx = _extract_scalar(frame_indices[i])
+                bbox = tuple(_extract_scalar(b[i]) for b in bboxes)
+                result = {
+                    "frame_idx": frame_idx,
+                    "bbox": bbox,
+                    "flame_params": {k: v[i].cpu() for k, v in flame_params_dict.items()},
+                }
+                results.append(result)
+                # Queue frame for rendering
+                if output_video_path:
+                    cropped_frame = _tensor_to_numpy_image(batch["cropped_frame"][i])
+                    render_queue.put((frame_idx, cropped_frame, verts[i].cpu()))
+                    frame_count += 1
+            progbar.update(len(frame_indices))
+        progbar.close()
+    # Finalize rendering and create output video
+    if output_video_path and render_queue is not None:
+        _finalize_rendering(
+            rendering_threads,
+            render_queue,
+            num_render_workers,
+            temp_dir,
+            dataset.fps,
+            output_video_path,
+        )
+    return results
+def _extract_scalar(value: Any) -> int:
+    """Extract scalar integer from tensor or return as-is."""
+    return value.item() if isinstance(value, torch.Tensor) else value
+def _tensor_to_numpy_image(tensor: torch.Tensor) -> np.ndarray:
+    """Convert (C, H, W) tensor [0, 1] to numpy (H, W, C) uint8 [0, 255]."""
+    if not isinstance(tensor, torch.Tensor):
+        return tensor
+    return (tensor.permute(1, 2, 0).cpu().numpy() * 255).astype(np.uint8)
+def _finalize_rendering(
+    rendering_threads: List[RenderingThread],
+    render_queue: Queue,
+    num_render_workers: int,
+    temp_dir: Path,
+    fps: float,
+    output_video_path: str,
+) -> None:
+    """Finish rendering threads and create final video with ffmpeg."""
+    print("\nWaiting for rendering threads to complete...")
+    # Add sentinel values to stop workers
+    for _ in range(num_render_workers):
+        render_queue.put(None)
+    # Wait for all threads to finish
+    for thread in rendering_threads:
+        thread.join()
+    total_rendered = sum(thread.frames_rendered for thread in rendering_threads)
+    print(f"Rendered {total_rendered} frames")
+    # Create video with ffmpeg
+    print("Creating video with ffmpeg...")
+    output_path = Path(output_video_path)
+    output_path.parent.mkdir(parents=True, exist_ok=True)
+    ffmpeg_cmd = [
+        "ffmpeg",
+        "-y",  # Overwrite output file if it exists
+        "-framerate",
+        str(fps),
+        "-i",
+        str(temp_dir / "frame_%06d.png"),
+        "-c:v",
+        "libx264",
+        "-pix_fmt",
+        "yuv420p",
+        "-preset",
+        "medium",
+        "-crf",
+        "23",
+        str(output_path),
+    ]
+    subprocess.run(ffmpeg_cmd, check=True, capture_output=True)
+    print(f"Video saved to: {output_video_path}")
+    # Clean up temporary directory
+    if temp_dir.exists():
+        print(f"Removing temporary directory: {temp_dir}")
+        shutil.rmtree(temp_dir)
+        print("Cleanup complete")
+if __name__ == "__main__":
+    # video_path = "skarsgard.mp4"
+    # output_video_path = "skarsgard_rendered.mp4"
+    parser = argparse.ArgumentParser(description="Process and render video with SHEAP model.")
+    parser.add_argument("in_path", type=str, help="Path to input video file.")
+    parser.add_argument(
+        "--out_path", type=str, help="Path to save rendered output video.", default=None
+    )
+    args = parser.parse_args()
+    if args.out_path is None:
+        args.out_path = str(Path(args.in_path).with_name(f"{Path(args.in_path).stem}_rendered.mp4"))
+    device = "cuda" if torch.cuda.is_available() else "cpu"
+    print(f"Using device: {device}")
+    results = process_video(
+        video_path=args.in_path,
+        model_type="expressive",
+        device=device,
+        output_video_path=args.out_path,
+    )