Spaces:

3ZadeSSG
/

PVSDNet-Depth-Only

Sleeping

App Files Files Community

3ZadeSSG commited on Jan 14

Commit

99e2b6c

1 Parent(s): 908f07a

initial commit

Browse files

Files changed (15) hide show

.gitattributes +1 -1
.gitignore +21 -0
.huggingface.yaml +3 -0
README.md +97 -14
__init__.py +0 -0
app.py +238 -0
depth_only_parameters.py +21 -0
helperFunctions.py +26 -0
helper_image_functions.py +290 -0
models/__init__.py +0 -0
models/depth_only_lite_model.py +234 -0
models/depth_only_model.py +232 -0
requirements.txt +122 -0
rff_torch.py +53 -0
utils.py +243 -0

.gitattributes CHANGED Viewed

@@ -32,4 +32,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.xz filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
-*tfevents* filter=lfs diff=lfs merge=lfs -text

 *.xz filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text

.gitignore ADDED Viewed

	@@ -0,0 +1,21 @@

+# Python
+__pycache__/
+*.py[cod]
+*$py.class
+# Models and Engines
+*.onnx
+*.onnx.data
+*.pth
+*.engine
+# Images
+*.png
+*.jpeg
+*.JPG
+# Videos
+*.mp4
+# Logs
+logs/

.huggingface.yaml ADDED Viewed

	@@ -0,0 +1,3 @@

+sdk: gradio
+python_version: '3.12'
+requirements_file: requirements.txt

README.md CHANGED Viewed

@@ -1,14 +1,97 @@
----
-title: PVSDNet Depth Only
-emoji: 🔥
-colorFrom: indigo
-colorTo: blue
-sdk: gradio
-sdk_version: 6.3.0
-app_file: app.py
-pinned: false
-license: agpl-3.0
-short_description: Monocular Depth Estimation Model for Real-Time Inference
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+<div align="center">
+<a href="#"><img src='https://img.shields.io/badge/-Paper-00629B?style=flat&logo=ieee&logoColor=white' alt='arXiv'></a>
+<a href='https://realistic3d-miun.github.io/PVSDNet/'><img src='https://img.shields.io/badge/Project_Page-Website-green?logo=googlechrome&logoColor=white' alt='Project Page'></a>
+<a href='https://huggingface.co/spaces/3ZadeSSG/PVSDNet'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Demo_(Coming_Soon)-blue'></a>
+</div>
+# PVSDNet: Joint Depth Prediction and View Synthesis via Shared Latent Spaces in Real-Time.
+## Supplementary Video (Head to Project Page for more visual results)
+[![Watch the video](https://img.youtube.com/vi/49s2UPvRA6I/maxresdefault.jpg)](https://youtu.be/49s2UPvRA6I)
+# 1. PVSDNet - Joint Depth and View
+**Note:** Will be added soon.
+## 1.A. Normal Inference (Recommended for minimal setup)
+**Note:** Will be added soon.
+## 2.A. Faster Inference (For best possible FPS)
+**Note:** Will be added soon.
+# 2. PVSDNet Depth-Only Model
+This model is a variant of the original PVSDNet model, where we only predict depth and not the target views. The model core is similar except the rendering network and the positional encoding are removed.
+* Download the checkpoints from following table and place them in `checkpoint_onnx` directory.
+    | Model           | Size   | Checkpoint |
+    |-----------------|--------|----------------|
+    | PVSDNet-Depth-Only  | 1.11 GB| [Download](https://huggingface.co/3ZadeSSG/PVSDNet-Depth-Only/resolve/main/depth_only_model.pth) |
+    | PVSDNet-Depth-Only-Lite  | 279 MB | [Download](https://huggingface.co/3ZadeSSG/PVSDNet-Depth-Only/resolve/main/depth_only_lite_model.pth) |
+## 2.A. Normal Inference (Recommended for minimal setup)
+## 2.B. Faster Inference (For best possible FPS)
+You need to setup your own TRT Engine for this purpose.
+* Make sure you modify the `depth_only_parameters` to set resolution you need. By default we have kept it at `384x384`.
+* Run `export_onnx_depth.py` to conver the normal pytorch models located into into onnx
+    ```
+    python export_onnx_depth.py
+    ```
+* Create TRT Engine directory
+    ```
+    mkdir TRT_Engine
+    ```
+* Build the TRT engine based on created onnx files (which by default will be located in `checkpoint_onnx`)
+    ```
+        trtexec --onnx=./checkpoint_onnx/depth_only_model.onnx --saveEngine=./TRT_Engine/depth_only_model_fp16.engine --fp16
+    ```
+    ```
+        trtexec --onnx=./checkpoint_onnx/depth_only_lite_model.onnx --saveEngine=./TRT_Engine/depth_only_lite_model_fp16.engine --fp16
+    ```
+## 2.C. Predicting on Depth Datasets using Multi-Resolution Fusion
+We run the scripts inside the `depth_dataset_predictor` directory. There are two sample images for each dataset to test the code.
+* First we build the TRT engine for each dataset as we use multi-resolution fusion.
+    ```
+    python depth_dataset_predictor/build_trt_<dataset_name>.py
+    ```
+* Then we run the prediction script
+    ```
+    python depth_dataset_predictor/predict_<dataset_name>_TensorRT.py
+    ```
+|Dataset|Setp 1|Step 2|
+|---|---|---|
+|ETH3D| ```python depth_dataset_predictor/build_trt_ETH3D.py``` | ```python depth_dataset_predictor/predict_ETH3D_TensorRT.py```|
+|Sintel| ```python depth_dataset_predictor/build_trt_Sintel.py``` | ```python depth_dataset_predictor/predict_Sintel_TensorRT.py```|
+|KITTI| ```python depth_dataset_predictor/build_trt_KITTI.py``` | ```python depth_dataset_predictor/predict_KITTI_TensorRT.py```|
+|DIODE| ```python depth_dataset_predictor/build_trt_DIODE.py``` | ```python depth_dataset_predictor/predict_DIODE_TensorRT.py```|
+|NYU| ```python depth_dataset_predictor/build_trt_NYU.py``` | ```python depth_dataset_predictor/predict_NYU_TensorRT.py```|
+## 2.D. Predicting on 1080p In-The-Wild Images/Videos using Multi-Resolution Fusion
+Similar to dataset, we can use the mutli-resolution fusion to predict on 1080p In-The-Wild Images/Videos.
+* First we build the trt engine
+    ```
+    python depth_in_wild_predictor/build_trt_1080p.py
+    ```
+* Then we run the prediction script for images
+    ```
+    python depth_in_wild_predictor/predict_1080p_TensorRT.py
+    ```
+    OR, run the prediction script for videos
+    ```
+    python depth_in_wild_predictor/predict_video_1080p_TensorRT.py
+    ```
+#### Note
+* For any other resolutions, you can modify the resolutions in these above scripts to suit your needs. We have kept the default resolution as 1080p for this example.
+* We recommend 3-6 resolutions for best results, but you can use 1-2 smaller resolutions if working with low reoslution images/videos since receptive field of the network can handle that without any issues.

__init__.py ADDED Viewed

File without changes

app.py ADDED Viewed

	@@ -0,0 +1,238 @@

+import gradio as gr
+import torch
+import numpy as np
+import matplotlib.pyplot as plt
+from PIL import Image
+import torch.nn.functional as F
+import torchvision.transforms as transforms
+import depth_only_parameters as params
+from models.depth_only_model import PVSDNet
+from models.depth_only_lite_model import PVSDNet_Lite
+import helperFunctions as helper
+import socket
+from huggingface_hub import hf_hub_download
+import joblib
+REPO_ID = "3ZadeSSG/PVSDNet-Depth-Only"
+print("Downloading/Loading checkpoints from Hugging Face Hub...")
+params.MODEL_Small_Location = hf_hub_download(
+    repo_id=REPO_ID,
+    filename="depth_only_lite_model.pth"
+)
+params.MODEL_Large_Location = hf_hub_download(
+    repo_id=REPO_ID,
+    filename="depth_only_model.pth"
+)
+print(f"Large Model loaded at: {params.MODEL_Large_Location}")
+print(f"Lite Model loaded at: {params.MODEL_Small_Location}")
+def get_valid_resolutions(width, height):
+    """Dynamically determines valid resolutions based on input size.
+    - Caps the highest resolution at 1024px to avoid unnecessary high-res computations.
+    - Uses 6 resolutions for large images to improve multi-scale fusion quality.
+    - Uses 4 resolutions for smaller images (< 512px width or height).
+    """
+    def make_divisible(n, base=16):
+        return max(base, int(round(n / base) * base))
+    max_resolution = 1024
+    high_w, high_h = make_divisible(min(width, max_resolution)), make_divisible(min(height, max_resolution))
+    # Calculate more intermediate steps for better fusion
+    r80_w, r80_h = make_divisible(int(high_w // 1.25)), make_divisible(int(high_h // 1.25))
+    r66_w, r66_h = make_divisible(int(high_w // 1.5)), make_divisible(int(high_h // 1.5))
+    r50_w, r50_h = make_divisible(int(high_w // 2)), make_divisible(int(high_h // 2))
+    r40_w, r40_h = make_divisible(int(high_w // 2.5)), make_divisible(int(high_h // 2.5))
+    r33_w, r33_h = make_divisible(max(256, int(high_w // 3))), make_divisible(max(256, int(high_h // 3)))
+    if width < 512 or height < 512:
+        return [(high_w, high_h), (r80_w, r80_h), (r66_w, r66_h), (r50_w, r50_h)]
+    else:
+        return [
+            (high_w, high_h),
+            (r80_w, r80_h),
+            (r66_w, r66_h),
+            (r50_w, r50_h),
+            (r40_w, r40_h),
+            (r33_w, r33_h)
+        ]
+def get_transforms(resolutions):
+    return [transforms.Compose([transforms.Resize((h, w)), transforms.ToTensor()]) for w, h in resolutions]
+def get_prediction(image, transform, model):
+    img_input = image.convert('RGB')
+    img_input = transform(img_input).unsqueeze(0).to(params.DEVICE)
+    depth_out = model(img_input).detach().squeeze(0).to("cpu")
+    return depth_out
+def predict_single_image(image, model_type):
+    if image is None:
+        return None, None
+    # Select model class and checkpoint
+    if model_type == "Lite":
+        model_class = PVSDNet_Lite
+        checkpoint = params.MODEL_Small_Location
+    else:  # Default to "Large"
+        model_class = PVSDNet
+        checkpoint = params.MODEL_Large_Location
+    model = model_class(total_image_input=params.params_number_input)
+    model = helper.load_Checkpoint(checkpoint, model, load_cpu=True)
+    model.to(params.DEVICE)
+    model.eval()
+    original_width, original_height = image.size
+    resolutions = get_valid_resolutions(original_width, original_height)
+    print(f"Resolutions: {resolutions} for Model Type: {model_type}")
+    transforms_list = get_transforms(resolutions)
+    depth_maps = [get_prediction(image, t, model) for t in transforms_list]
+    depth_maps_resized = [
+        F.interpolate(depth[None], (original_height, original_width), mode='bilinear', align_corners=False)[0, 0]
+        for depth in depth_maps
+    ]
+    depth_final = sum(depth_maps_resized) / len(depth_maps_resized)
+    depth_image = (depth_final - depth_final.min()) / (depth_final.max() - depth_final.min())
+    img_out = depth_image.numpy()
+    img_out_colored = plt.get_cmap('inferno')(img_out / np.max(img_out))[:, :, :3]
+    img_out_colored = (img_out_colored * 255).astype(np.uint8)
+    gray_scale_img_out = (depth_image.numpy() * 255).astype(np.uint8)
+    return Image.fromarray(img_out_colored), Image.fromarray(gray_scale_img_out)
+with gr.Blocks(title="PVSDNet-Depth-Only Model", theme="default") as demo:
+    gr.Markdown(
+    """
+    ## PVSDNet-Depth-Only ZeroShot Relative Depth Estimation Model
+    * Upload an image and get its depth estimation with multi-scale fusion.
+    * Images use 2 - 6 resolutions for multi-scale fusion.
+    **Note:** Huggingface demo is running on CPU so inference speeds will be slow.
+    ### Head to our [Project Page](https://realistic3d-miun.github.io/PVSDNet/) for more details about the models.
+    """)
+    with gr.Row():
+        with gr.Column():
+            img_input = gr.Image(type="pil", label="RGB Image", height=384)
+            with gr.Accordion("Advanced Settings", open=False):
+                model_type_dropdown = gr.Dropdown(["Large", "Lite"], label="Model Type", value="Large")
+            generate_btn = gr.Button("Estimate Depth", variant="primary")
+        with gr.Column():
+            output_color = gr.Image(type="pil", label="Depth Map (Color)", height=384)
+            output_gray = gr.Image(type="pil", label="Depth Map (Grayscale)", height=384)
+    generate_btn.click(
+        fn=predict_single_image,
+        inputs=[img_input, model_type_dropdown],
+        outputs=[output_color, output_gray]
+    )
+    gr.Markdown("### Example Samples")
+    with gr.Column():
+        with gr.Row():
+            with gr.Column(scale=2): gr.Markdown("**Example Image (Click to load)**")
+            with gr.Column(scale=1): gr.Markdown("**Resolution**")
+            with gr.Column(scale=2): gr.Markdown("**Fusion Resolutions**")
+        with gr.Row(variant="panel"):
+            with gr.Column(scale=2):
+                diode_preview = gr.Image("./samples/DIODE/00022_00195_outdoor_010_030.png", label="DIODE", height=120, interactive=False, show_label=True)
+            with gr.Column(scale=1):
+                gr.Markdown("1024 x 768")
+            with gr.Column(scale=2):
+                gr.Markdown("1024x768, 816x608, 688x512, 512x384, 416x304, 336x256")
+        with gr.Row(variant="panel"):
+            with gr.Column(scale=2):
+                eth3d_preview = gr.Image("./samples/ETH3D/DSC_0243.JPG", label="ETH3D", height=120, interactive=False, show_label=True)
+            with gr.Column(scale=1):
+                gr.Markdown("6048 x 4032")
+            with gr.Column(scale=2):
+                gr.Markdown("1024x1024, 816x816, 688x688, 512x512, 416x416, 336x336")
+        with gr.Row(variant="panel"):
+            with gr.Column(scale=2):
+                sintel_preview = gr.Image("./samples/Sintel/frame_0028_temple.png", label="Sintel", height=120, interactive=False, show_label=True)
+            with gr.Column(scale=1):
+                gr.Markdown("1024 x 436")
+            with gr.Column(scale=2):
+                gr.Markdown("1024x432, 816x352, 688x288, 512x224")
+        with gr.Row(variant="panel"):
+            with gr.Column(scale=2):
+                kitti_preview = gr.Image("./samples/KITTI/2011_10_03_drive_0047_sync_image_0000000383_image_02.png", label="KITTI", height=120, interactive=False, show_label=True)
+            with gr.Column(scale=1):
+                gr.Markdown("1216 x 532")
+            with gr.Column(scale=2):
+                gr.Markdown("1024x352, 816x288, 688x240, 512x176")
+        with gr.Row(variant="panel"):
+            with gr.Column(scale=2):
+                wild_1_preview = gr.Image("./samples/Wild/toy.jpeg", label="Wild Image 1", height=120, interactive=False, show_label=True)
+            with gr.Column(scale=1):
+                gr.Markdown("3019 x 3018")
+            with gr.Column(scale=2):
+                gr.Markdown("1024x1024, 816x816, 688x688, 512x512, 416x416, 336x336")
+        with gr.Row(variant="panel"):
+            with gr.Column(scale=2):
+                wild_2_preview = gr.Image("./samples/Wild/hamburg.jpeg", label="Wild Image 2", height=120, interactive=False, show_label=True)
+            with gr.Column(scale=1):
+                gr.Markdown("1536 x 1920")
+            with gr.Column(scale=2):
+                gr.Markdown("1024x1024, 816x816, 688x688, 512x512, 416x416, 336x336")
+        with gr.Row(variant="panel"):
+            with gr.Column(scale=2):
+                wild_3_preview = gr.Image("./samples/Wild/north_hill.jpeg", label="Wild Image 3", height=120, interactive=False, show_label=True)
+            with gr.Column(scale=1):
+                gr.Markdown("2320 x 2321")
+            with gr.Column(scale=2):
+                gr.Markdown("1024x1024, 816x816, 688x688, 512x512, 416x416, 336x336")
+        with gr.Row(variant="panel"):
+            with gr.Column(scale=2):
+                wild_4_preview = gr.Image("./samples/Wild/EH.jpeg", label="Wild Image 4", height=120, interactive=False, show_label=True)
+            with gr.Column(scale=1):
+                gr.Markdown("1920 x 1080")
+            with gr.Column(scale=2):
+                gr.Markdown("1024x1024, 816x816, 688x688, 512x512, 416x416, 336x336")
+        with gr.Row(variant="panel"):
+            with gr.Column(scale=2):
+                wild_5_preview = gr.Image("./samples/Wild/train_station.jpeg", label="Wild Image 5", height=120, interactive=False, show_label=True)
+            with gr.Column(scale=1):
+                gr.Markdown("1066 x 1060")
+            with gr.Column(scale=2):
+                gr.Markdown("1024x1024, 816x816, 688x688, 512x512, 416x416, 336x336")
+    # Define click events to load images
+    eth3d_preview.select(fn=lambda: Image.open("./samples/ETH3D/DSC_0243.JPG"), outputs=img_input)
+    sintel_preview.select(fn=lambda: Image.open("./samples/Sintel/frame_0028_temple.png"), outputs=img_input)
+    kitti_preview.select(fn=lambda: Image.open("./samples/KITTI/2011_10_03_drive_0047_sync_image_0000000383_image_02.png"), outputs=img_input)
+    diode_preview.select(fn=lambda: Image.open("./samples/DIODE/00022_00195_outdoor_010_030.png"), outputs=img_input)
+    wild_1_preview.select(fn=lambda: Image.open("./samples/Wild/toy.jpeg"), outputs=img_input)
+    wild_2_preview.select(fn=lambda: Image.open("./samples/Wild/hamburg.jpeg"), outputs=img_input)
+    wild_3_preview.select(fn=lambda: Image.open("./samples/Wild/north_hill.jpeg"), outputs=img_input)
+    wild_4_preview.select(fn=lambda: Image.open("./samples/Wild/EH.jpeg"), outputs=img_input)
+    wild_5_preview.select(fn=lambda: Image.open("./samples/Wild/train_station.jpeg"), outputs=img_input)
+demo.launch()

depth_only_parameters.py ADDED Viewed

	@@ -0,0 +1,21 @@

+import os
+params_height = 384
+params_width = 384
+params_number_input = 1
+LOG_FILE_LOCATION = "./logs/training_log_0.txt"
+CHECKPOINT_LOCATION = "./checkpoint/"
+DEVICE = "cpu"
+ONNX_PATH = "./checkpoint_onnx"
+MODEL_Large_Location = "./checkpoint/depth_only_model.pth"
+MODEL_Small_Location = "./checkpoint/depth_only_lite_model.pth"
+os.makedirs(ONNX_PATH,exist_ok=True)
+os.makedirs("./logs",exist_ok=True)
+os.makedirs("./checkpoint",exist_ok=True)
+os.makedirs("./output",exist_ok=True)

helperFunctions.py ADDED Viewed

	@@ -0,0 +1,26 @@

+import torch
+import os
+import torch.nn.functional as F
+def save_checkpoint(model, filelocation, save_parallel = True):
+    if save_parallel:
+        torch.save(model.module.state_dict(), filelocation)
+    else:
+        torch.save(model.state_dict(), filelocation)
+def load_Checkpoint(fileLocation,model, load_cpu=False):
+    if load_cpu:
+        model.load_state_dict(torch.load(fileLocation,map_location=lambda storage, loc: storage))
+    else:
+        model.load_state_dict(torch.load(fileLocation))
+    return model
+def writeLog(logList, filename):
+    with open(filename, 'w') as outfile:
+        outfile.write("\n".join(logList))
+def kl_loss(mu, logvar):
+    return -0.5 * (1 + logvar - mu.pow(2) - logvar.exp()).mean()

helper_image_functions.py ADDED Viewed

	@@ -0,0 +1,290 @@

+'''
+Author: Manu Gond (manu.gond@miun.se)
+Date: Nov-15-2022
+Objective:  Accumulation of some general functions which I
+            use daily in my code realted to image relasted task.
+            The function names and parameters are self explanetory.
+Requirements: Installed python libraries which have been imported.
+'''
+import torch
+from torchvision.utils import save_image
+from torchvision.transforms import transforms
+import torchmetrics
+import cv2
+import numpy as np
+from PIL import Image
+import utils
+#======================= Read and Write =====================#
+def readImage(location):
+    image = Image.open(location).convert("RGB")
+    return image
+def writeImage(image, location):
+    image.save(location)
+def writeTensorImage(image, filename):
+    save_image(image, filename)
+def removeChannel(sourceLocation, targetLocation):
+    img = readImage(sourceLocation)
+    writeImage(img, targetLocation)
+def getImageTransform(width, height):
+    transform = transforms.Compose([transforms.Resize((height,width)),
+                                    transforms.ToTensor()])
+    return transform
+def convertTensor(image):
+    transform = getImageTransform(image.size[0], image.size[1])
+    image = transform(image)
+    return image
+#=================== 360 Images =======================#
+def rotateERP180(image):
+    '''
+    :param image: PIL Image
+    :return: BxHxW Torch Tensor Image
+    '''
+    W = image.size[0]
+    H = image.size[1]
+    transform = getImageTransform(W, H)
+    image = transform(image)
+    image1 = image[:, :, 0:(W//2)]
+    image2 = image[:, :, (W//2):W]
+    image3 = torch.zeros(image.size())
+    image3[:, :, 0:(W//2)] = image2
+    image3[:, :, (W//2):W] = image1
+    return image3
+def convertERP2Cube(e_img, face_w=256, mode='bilinear', cube_format='dice'):
+    '''
+        e_img:  ndarray in shape of [H, W, *]
+        face_w: int, the length of each face of the cubemap
+        '''
+    assert len(e_img.shape) == 3
+    h, w = e_img.shape[:2]
+    if mode == 'bilinear':
+        order = 1
+    elif mode == 'nearest':
+        order = 0
+    else:
+        raise NotImplementedError('unknown mode')
+    xyz = utils.xyzcube(face_w)
+    uv = utils.xyz2uv(xyz)
+    coor_xy = utils.uv2coor(uv, h, w)
+    cubemap = np.stack([
+        utils.sample_equirec(e_img[..., i], coor_xy, order=order)
+        for i in range(e_img.shape[2])
+    ], axis=-1)
+    if cube_format == 'horizon':
+        pass
+    elif cube_format == 'list':
+        cubemap = utils.cube_h2list(cubemap)
+    elif cube_format == 'dict':
+        cubemap = utils.cube_h2dict(cubemap)
+    elif cube_format == 'dice':
+        cubemap = utils.cube_h2dice(cubemap)
+    else:
+        raise NotImplementedError()
+    return cubemap
+def convertCube2ERP(cubemap, h, w, mode='bilinear', cube_format='dice'):
+    if mode == 'bilinear':
+        order = 1
+    elif mode == 'nearest':
+        order = 0
+    else:
+        raise NotImplementedError('unknown mode')
+    if cube_format == 'horizon':
+        pass
+    elif cube_format == 'list':
+        cubemap = utils.cube_list2h(cubemap)
+    elif cube_format == 'dict':
+        cubemap = utils.cube_dict2h(cubemap)
+    elif cube_format == 'dice':
+        cubemap = utils.cube_dice2h(cubemap)
+    else:
+        raise NotImplementedError('unknown cube_format')
+    assert len(cubemap.shape) == 3
+    assert cubemap.shape[0] * 6 == cubemap.shape[1]
+    assert w % 8 == 0
+    face_w = cubemap.shape[0]
+    uv = utils.equirect_uvgrid(h, w)
+    u, v = np.split(uv, 2, axis=-1)
+    u = u[..., 0]
+    v = v[..., 0]
+    cube_faces = np.stack(np.split(cubemap, 6, 1), 0)
+    # Get face id to each pixel: 0F 1R 2B 3L 4U 5D
+    tp = utils.equirect_facetype(h, w)
+    coor_x = np.zeros((h, w))
+    coor_y = np.zeros((h, w))
+    for i in range(4):
+        mask = (tp == i)
+        coor_x[mask] = 0.5 * np.tan(u[mask] - np.pi * i / 2)
+        coor_y[mask] = -0.5 * np.tan(v[mask]) / np.cos(u[mask] - np.pi * i / 2)
+    mask = (tp == 4)
+    c = 0.5 * np.tan(np.pi / 2 - v[mask])
+    coor_x[mask] = c * np.sin(u[mask])
+    coor_y[mask] = c * np.cos(u[mask])
+    mask = (tp == 5)
+    c = 0.5 * np.tan(np.pi / 2 - np.abs(v[mask]))
+    coor_x[mask] = c * np.sin(u[mask])
+    coor_y[mask] = -c * np.cos(u[mask])
+    # Final renormalize
+    coor_x = (np.clip(coor_x, -0.5, 0.5) + 0.5) * face_w
+    coor_y = (np.clip(coor_y, -0.5, 0.5) + 0.5) * face_w
+    equirec = np.stack([
+        utils.sample_cubefaces(cube_faces[..., i], tp, coor_y, coor_x, order=order)
+        for i in range(cube_faces.shape[3])
+    ], axis=-1)
+    return equirec
+def convertCube2Slices(image):
+    '''
+    :param image: Image numpy array
+    :return: List of Torch Tensors, CxHxW
+    '''
+    image = convertTensor(image)
+    C, H, W = image.size()
+    #print(C,H,W)
+    top = torch.zeros((C,W//4,W//4))
+    left = torch.zeros(top.size())
+    front = torch.zeros(top.size())
+    right = torch.zeros(top.size())
+    back = torch.zeros(top.size())
+    bottom = torch.zeros(top.size())
+    top = image[:, 0:H//3, (W//4):(W//4)*2]
+    left = image[:, (H//3):(H//3)*2, 0:W//4]
+    front = image[:, (H//3):(H//3)*2, (W//4):(W//4)*2]
+    right = image[:, (H//3):(H//3)*2, (W//4)*2:(W//4)*3]
+    back = image[:, (H // 3):(H // 3) * 2, (W // 4) * 3:]
+    bottom = image[:, (H//3)*2:, (W//4):(W//4)*2]
+    '''
+        save_image(top, 'top.png')
+        save_image(left, 'left.png')
+        save_image(front, 'front.png')
+        save_image(right, 'right.png')
+        save_image(back, 'back.png')
+        save_image(bottom, 'bottom.png')
+    '''
+    return [top, left, front, right, back, bottom]
+def convertSlicesToCube(imageList):
+    '''
+    top = convertTensor(readImage(imageList[0]))
+    left = convertTensor(readImage(imageList[1]))
+    front = convertTensor(readImage(imageList[2]))
+    right = convertTensor(readImage(imageList[3]))
+    back = convertTensor(readImage(imageList[4]))
+    bottom = convertTensor(readImage(imageList[5]))
+    '''
+    top = imageList[0]
+    left = imageList[1]
+    front = imageList[2]
+    right = imageList[3]
+    back = imageList[4]
+    bottom = imageList[5]
+    C, H, W = 3,  top.size()[1]*3, top.size()[2]*4
+    cube = torch.zeros((C, H, W))
+    cube[:, 0:H//3, (W//4):(W//4)*2] = top
+    cube[:, (H // 3):(H // 3) * 2, 0:W // 4] = left
+    cube[:, (H // 3):(H // 3) * 2, (W // 4):(W // 4) * 2] = front
+    cube[:, (H // 3):(H // 3) * 2, (W // 4) * 2:(W // 4) * 3] = right
+    cube[:, (H // 3):(H // 3) * 2, (W // 4) * 3:] = back
+    cube[:, (H // 3) * 2:, (W // 4):(W // 4) * 2] = bottom
+    return cube
+#=================== Quality Measures =======================#
+'''
+Predicted Shape : BxCxHxW
+Original Shape  : BxCxHxW
+Data Type: Torch Tensor
+'''
+def getSSIM(predicted, original):
+    SSIM = torchmetrics.StructuralSimilarityIndexMeasure()
+    return SSIM(predicted, original).item()
+def getPSNR(predicted, original):
+    PSNR = torchmetrics.PeakSignalNoiseRatio()
+    return PSNR(predicted, original).item()
+def getMSE(predicted, original):
+    MSE = torchmetrics.MeanSquaredError()
+    return MSE(predicted, original).item()
+def getMAE(predicted, original):
+    MAE = torchmetrics.MeanAbsoluteError()
+    return MAE(predicted, original).item()
+if __name__ == "__main__":
+    '''
+    img = readImage("31_image_0_0.png")
+    img = convertERP2Cube(e_img=np.asarray(img), face_w=256)
+    img = Image.fromarray(img.astype('uint8'),'RGB')
+    convertCube2Slices(img)
+    '''
+    #image = convertSlicesToCube(["top.png", "left.png", "front.png", "right.png", "back.png", "bottom.png"])
+    #writeTensorImage(image,'this.png')
+    '''
+    writeImage(img, 'cube.png')
+    img = readImage('cube.png')
+    img = convertCube2ERP(np.asarray(img),512,1024)
+    img = Image.fromarray(img.astype('uint8'),'RGB')
+    writeImage(img, 'cubeERP.png')
+    img1 = readImage("31_image_0_0.png")
+    img2 = readImage("cubeERP.png")
+    img1 = convertTensor(img1)
+    img2 = convertTensor(img2)
+    print(getSSIM(img1.unsqueeze(0), img2.unsqueeze(0)))
+    '''
+    #img = rotateERP180(img)
+    #writeTensorImage(img, 'rotated_image.png')
+    #img = convertTensor(img)
+    #print(getMAE(img.unsqueeze(0),img.unsqueeze(0)))

models/__init__.py ADDED Viewed

File without changes

models/depth_only_lite_model.py ADDED Viewed

	@@ -0,0 +1,234 @@

+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+import warnings
+warnings.filterwarnings("ignore")
+import torchvision
+import sys
+import os
+sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), '..')))
+import depth_only_parameters as params
+def getConvLayer(in_channel, out_channel, stride=1, padding=1, activation=nn.ReLU()):
+    return nn.Sequential(
+        nn.Conv2d(in_channel, out_channel,
+                  kernel_size=3,
+                  stride=stride,
+                  padding=padding,
+                  padding_mode='reflect'),
+        activation
+    )
+def getConvTransposeLayer(in_channel, out_channel, kernel=3, stride=1, padding=1, activation=nn.ReLU()):
+    return nn.Sequential(
+        nn.ConvTranspose2d(in_channel,
+                            out_channel,
+                            kernel_size=kernel,
+                            stride=stride,
+                            padding=padding),
+        activation
+    )
+class Flatten(nn.Module):
+    def forward(self, input):
+        return input.view(input.size(0), -1)
+class UnFlatten(nn.Module):
+    def forward(self, input, size=1):
+        return input.view(input.size(0), 1, params.params_height//8, params.params_width//8)
+class ResidualBlock(nn.Module):
+    def __init__(self, in_channels, out_channels, stride=1):
+        super(ResidualBlock, self).__init__()
+        self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3,
+                               stride=stride, padding=1, bias=False)
+        self.relu = nn.ReLU()
+        self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3,
+                               stride=1, padding=1, bias=False)
+        self.stride = stride
+        self.shortcut = nn.Sequential()
+        if stride != 1 or in_channels != out_channels:
+            self.shortcut = nn.Sequential(
+                nn.Conv2d(in_channels, out_channels, kernel_size=1,
+                          stride=stride, bias=False),
+                nn.BatchNorm2d(out_channels)
+            )
+    def forward(self, x):
+        residual = x
+        out = self.conv1(x)
+        out = self.relu(out)
+        out = self.conv2(out)
+        out = out + self.shortcut(residual)
+        out = self.relu(out)
+        return out
+class UpperEncoder(nn.Module):
+    def __init__(self):
+        super().__init__()
+        model = torchvision.models.resnet152(pretrained=True)
+        layers = list(model.children())
+        self.ResNetEncoder = nn.Sequential(*layers[:5].copy())
+        del model
+    def forward(self, x):
+        x1 = x[:, 0:3, :, :]
+        x1 = self.ResNetEncoder(x1)
+        return x1
+    def apply_resnet_encoder(self, x):
+        x1 = x[:, 0:3, :, :]
+        x1 = self.ResNetEncoder(x1)
+        return x1
+class LowerEncoder(nn.Module):
+    def __init__(self, total_image_input=1):
+        super().__init__()
+        # Halved channels compared to the original
+        self.encoder_pre    = ResidualBlock(total_image_input*3, 10)
+        self.encoder_layer1 = ResidualBlock(10, 15)
+        self.encoder_layer2 = ResidualBlock(15, 25)
+        self.encoder_layer3 = nn.Sequential(
+            ResidualBlock(25, 50),
+            nn.MaxPool2d(kernel_size=2, stride=2)
+        )
+        self.encoder_layer4 = ResidualBlock(50, 100)
+        self.encoder_layer5 = nn.Sequential(
+            ResidualBlock(100, 200),
+            nn.MaxPool2d(kernel_size=2, stride=2)
+        )
+        self.encoder_layer6 = ResidualBlock(200, 300)
+        self.encoder_layer7 = nn.Sequential(
+            ResidualBlock(300, 400),
+            nn.MaxPool2d(kernel_size=2, stride=2)
+        )
+        self.encoder_layer8 = ResidualBlock(400, 500)
+        self.encoder_layer9 = nn.Sequential(
+            ResidualBlock(500, 600),
+            nn.MaxPool2d(kernel_size=2, stride=2)
+        )
+        self.encoder_layer10 = ResidualBlock(600, 700)
+        self.encoder_layer11 = ResidualBlock(700, 800)
+    def forward(self, x):
+        x = self.encoder_pre(x)
+        x = self.encoder_layer1(x)
+        x = self.encoder_layer2(x)
+        skip1 = self.encoder_layer3(x)
+        x = self.encoder_layer4(skip1)
+        skip2 = self.encoder_layer5(x)
+        x = self.encoder_layer6(skip2)
+        skip3 = self.encoder_layer7(x)
+        x = self.encoder_layer8(skip3)
+        skip4 = self.encoder_layer9(x)
+        x = self.encoder_layer10(skip4)
+        x = self.encoder_layer11(x)
+        return x, [skip1, skip2, skip3, skip4]
+class MergeDecoder(nn.Module):
+    def __init__(self):
+        super().__init__()
+        # Halved channels for decoder blocks
+        self.decoder_layer1 = ResidualBlock(800, 700)
+        self.decoder_layer2 = ResidualBlock(700, 600)
+        self.decoder_layer3 = ResidualBlock(600, 500)
+        self.decoder_layer4 = nn.Sequential(
+            nn.ConvTranspose2d(500, 400, kernel_size=2, stride=2, padding=0),
+            nn.ReLU(True)
+        )
+        self.decoder_layer5 = ResidualBlock(400, 300)
+        self.decoder_layer6 = nn.Sequential(
+            nn.ConvTranspose2d(300, 200, kernel_size=2, stride=2, padding=0),
+            nn.ReLU(True)
+        )
+        self.decoder_layer7 = ResidualBlock(200, 100)
+        self.decoder_layer8 = nn.Sequential(
+            nn.ConvTranspose2d(100, 50, kernel_size=2, stride=2, padding=0),
+            nn.ReLU(True)
+        )
+        self.decoder_layer9 = ResidualBlock(50, 50)
+        self.decoder_layer10 = nn.Sequential(
+            nn.ConvTranspose2d(50, 50, kernel_size=2, stride=2, padding=0),
+            nn.ReLU(True)
+        )
+        self.decoder_layer11 = ResidualBlock(50, 50)
+        self.decoder_layer12 = ResidualBlock(50, 25)
+        self.decoder_layer13 = ResidualBlock(25, 20)
+        self.decoder_layer14 = ResidualBlock(20, 10)
+        self.decoder_layer15 = nn.Sequential(
+            nn.Conv2d(10, 4, kernel_size=3, stride=1, padding=1),
+            nn.ReLU(True)
+        )
+        self.decoder_layer16 = nn.Sequential(
+            nn.Conv2d(4, 1, kernel_size=3, stride=1, padding=1),
+            nn.ReLU(True)
+        )
+    def forward(self, x, lower_skip_list, upper_skip_list):
+        x = self.decoder_layer1(x)
+        x = self.decoder_layer2(x)
+        # Expecting lower_skip_list[3] and upper_skip_list[1] to have matching dimensions
+        x = x + lower_skip_list[3] + upper_skip_list[1]
+        x = self.decoder_layer3(x)
+        x = self.decoder_layer4(x)
+        x = x + lower_skip_list[2] + upper_skip_list[0]
+        x = self.decoder_layer5(x)
+        x = self.decoder_layer6(x)
+        x = x + lower_skip_list[1]
+        x = self.decoder_layer7(x)
+        x = self.decoder_layer8(x)
+        x = x + lower_skip_list[0]
+        x = self.decoder_layer9(x)
+        x = self.decoder_layer10(x)
+        x = self.decoder_layer11(x)
+        x = self.decoder_layer12(x)
+        x = self.decoder_layer13(x)
+        x = self.decoder_layer14(x)
+        x = self.decoder_layer15(x)
+        x = self.decoder_layer16(x)
+        return x
+class PVSDNet_Lite(nn.Module):
+    def __init__(self, total_image_input=1):
+        super().__init__()
+        # Upper encoder remains mostly the same
+        self.upper_encoder = UpperEncoder()
+        self.lower_encoder = LowerEncoder(total_image_input)
+        self.merge_decoder = MergeDecoder()
+        # Halved extra layers for upper branch:
+        self.upper_encoder_extra_1 = nn.Sequential(
+            ResidualBlock(256, 400),
+            nn.MaxPool2d(kernel_size=2, stride=2)
+        )
+        self.upper_encoder_extra_2 = nn.Sequential(
+            ResidualBlock(400, 600),
+            nn.MaxPool2d(kernel_size=2, stride=2)
+        )
+    def forward(self, x):
+        # First Encoder Branch (Upper)
+        upper_features_1 = self.upper_encoder.apply_resnet_encoder(x)
+        upper_features_1 = self.upper_encoder_extra_1(upper_features_1)
+        upper_features_2 = self.upper_encoder_extra_2(upper_features_1)
+        # Second Encoder Branch (Lower)
+        lower_feature, skip_list = self.lower_encoder(x)
+        # Merge and decode features
+        merged_feature = self.merge_decoder(lower_feature, skip_list, [upper_features_1, upper_features_2])
+        return merged_feature

models/depth_only_model.py ADDED Viewed

	@@ -0,0 +1,232 @@

+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+import warnings
+warnings.filterwarnings("ignore")
+import torchvision
+import sys
+import os
+sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), '..')))
+import depth_only_parameters as params
+def getConvLayer(in_channel,out_channel,stride=1,padding=1,activation=nn.ReLU()):
+    return nn.Sequential(nn.Conv2d(in_channel,
+                    out_channel,
+                    kernel_size=3,
+                    stride=stride,
+                    padding=padding,
+                    padding_mode='reflect'),
+                    activation)
+def getConvTransposeLayer(in_channel, out_channel,kernel=3,stride=1,padding=1,activation=nn.ReLU()):
+    return nn.Sequential(nn.ConvTranspose2d(in_channel,
+                                            out_channel,
+                                            kernel_size = kernel,
+                                            stride=stride,
+                                            padding=padding),
+                                            activation)
+class Flatten(nn.Module):
+    def forward(self, input):
+        return input.view(input.size(0), -1)
+class UnFlatten(nn.Module):
+    def forward(self, input, size=1):
+        return input.view(input.size(0), 1, params.params_height//8, params.params_width//8)
+class ResidualBlock(nn.Module):
+    def __init__(self, in_channels, out_channels, stride=1):
+        super(ResidualBlock, self).__init__()
+        self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=stride, padding=1, bias=False)
+        self.relu = nn.ReLU()
+        self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=1, padding=1, bias=False)
+        self.stride = stride
+        self.shortcut = nn.Sequential()
+        if stride != 1 or in_channels != out_channels:
+            self.shortcut = nn.Sequential(
+                nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=stride, bias=False),
+                nn.BatchNorm2d(out_channels)
+            )
+    def forward(self, x):
+        residual = x
+        out = self.conv1(x)
+        out = self.relu(out)
+        out = self.conv2(out)
+        out = out + self.shortcut(residual)
+        out = self.relu(out)
+        return out
+class UpperEncoder(nn.Module):
+    def __init__(self):
+        super().__init__()
+        model = torchvision.models.resnet152(pretrained=False)
+        layers = list(model.children())
+        self.ResNetEncoder = torch.nn.Sequential(*layers[:5].copy())
+        del model
+    def forward(self, x):
+        x1 = x[:, 0:3, :, :]
+        x1 = self.ResNetEncoder(x1)
+        return x1
+    def apply_resnet_encoder(self, x):
+        x1 = x[:, 0:3, :, :]
+        x1 = self.ResNetEncoder(x1)
+        return x1
+class LowerEncoder(nn.Module):
+    def __init__(self,total_image_input=1):
+        super().__init__()
+        self.encoder_pre = ResidualBlock((total_image_input*3), 20)
+        self.encoder_layer1 = ResidualBlock(20, 30)
+        self.encoder_layer2 = ResidualBlock(30, 50)
+        self.encoder_layer3 = nn.Sequential(
+            ResidualBlock(50, 100),
+            nn.MaxPool2d(kernel_size=2, stride=2)
+        )
+        self.encoder_layer4 = ResidualBlock(100, 200)
+        self.encoder_layer5 = nn.Sequential(
+            ResidualBlock(200, 400),
+            nn.MaxPool2d(kernel_size=2, stride=2)
+        )
+        self.encoder_layer6 = ResidualBlock(400, 600)
+        self.encoder_layer7 = nn.Sequential(
+            ResidualBlock(600, 800),
+            nn.MaxPool2d(kernel_size=2, stride=2)
+        )
+        self.encoder_layer8 = ResidualBlock(800, 1000)
+        self.encoder_layer9 = nn.Sequential(
+            ResidualBlock(1000, 1200),
+            nn.MaxPool2d(kernel_size=2, stride=2)
+        )
+        self.encoder_layer10 = ResidualBlock(1200, 1400)
+        self.encoder_layer11 = ResidualBlock(1400, 1600)
+    def forward(self, x):
+        x = self.encoder_pre(x)
+        x = self.encoder_layer1(x)
+        x = self.encoder_layer2(x)
+        skip1 = self.encoder_layer3(x)
+        x = self.encoder_layer4(skip1)
+        skip2 = self.encoder_layer5(x)
+        x = self.encoder_layer6(skip2)
+        skip3 = self.encoder_layer7(x)
+        x = self.encoder_layer8(skip3)
+        skip4 = self.encoder_layer9(x)
+        x = self.encoder_layer10(skip4)
+        x = self.encoder_layer11(x)
+        return x, [skip1, skip2, skip3, skip4]
+class MergeDecoder(nn.Module):
+    def __init__(self):
+        super().__init__()
+        self.decoder_layer1 = ResidualBlock(1600, 1400)
+        self.decoder_layer2 = ResidualBlock(1400, 1200)
+        self.decoder_layer3 = ResidualBlock(1200, 1000)
+        self.decoder_layer4 = nn.Sequential(
+            nn.ConvTranspose2d(1000, 800, 2, stride=2, padding=0),
+            nn.ReLU(True)
+        )
+        self.decoder_layer5 = ResidualBlock(800, 600)
+        self.decoder_layer6 = nn.Sequential(
+            nn.ConvTranspose2d(600, 400, 2, stride=2, padding=0),
+            nn.ReLU(True)
+        )
+        self.decoder_layer7 = ResidualBlock(400, 200)
+        self.decoder_layer8 = nn.Sequential(
+            nn.ConvTranspose2d(200, 100, 2, stride=2, padding=0),
+            nn.ReLU(True)
+        )
+        self.decoder_layer9 = ResidualBlock(100, 100)
+        self.decoder_layer10 = nn.Sequential(
+            nn.ConvTranspose2d(100, 100, 2, stride=2, padding=0),
+            nn.ReLU(True)
+        )
+        self.decoder_layer11 = ResidualBlock(100, 100)
+        self.decoder_layer12 = ResidualBlock(100, 50)
+        self.decoder_layer13 = ResidualBlock(50, 40)
+        self.decoder_layer14 = ResidualBlock(40, 20)
+        self.decoder_layer15 = nn.Sequential(
+            nn.Conv2d(20, 8, 3, stride=1, padding=1),
+            nn.ReLU(True)
+        )
+        self.decoder_layer16 = nn.Sequential(
+            nn.Conv2d(8, 1, 3, stride=1, padding=1),
+            nn.ReLU(True)
+        )
+    def forward(self, x, lower_skip_list, upper_skip_list):
+        x = self.decoder_layer1(x)
+        x = self.decoder_layer2(x)
+        x = x + lower_skip_list[3] + upper_skip_list[1]
+        x = self.decoder_layer3(x)
+        x = self.decoder_layer4(x)
+        x = x + lower_skip_list[2] + upper_skip_list[0]
+        x = self.decoder_layer5(x)
+        x = self.decoder_layer6(x)
+        x = x + lower_skip_list[1]
+        x = self.decoder_layer7(x)
+        x = self.decoder_layer8(x)
+        x = x + lower_skip_list[0]
+        x = self.decoder_layer9(x)
+        x = self.decoder_layer10(x)
+        x = self.decoder_layer11(x)
+        x = self.decoder_layer12(x)
+        x = self.decoder_layer13(x)
+        x = self.decoder_layer14(x)
+        x = self.decoder_layer15(x)
+        x = self.decoder_layer16(x)
+        return x
+class PVSDNet(nn.Module):
+    def __init__(self,total_image_input=1):
+        super().__init__()
+        self.upper_encoder = UpperEncoder()
+        self.lower_encoder = LowerEncoder(total_image_input)
+        self.merge_decoder = MergeDecoder()
+        self.upper_encoder_extra_1 = nn.Sequential(
+            ResidualBlock(256, 800),
+            nn.MaxPool2d(kernel_size=2, stride=2)
+        )
+        self.upper_encoder_extra_2 = nn.Sequential(
+            ResidualBlock(800, 1200),
+            nn.MaxPool2d(kernel_size=2, stride=2)
+        )
+    def forward(self, x):
+        upper_features_1 = self.upper_encoder.apply_resnet_encoder(x)
+        upper_features_1 = self.upper_encoder_extra_1(upper_features_1)
+        upper_features_2 = self.upper_encoder_extra_2(upper_features_1)
+        lower_feature, skip_list = self.lower_encoder(x)
+        merged_feature = self.merge_decoder(lower_feature, skip_list, [upper_features_1, upper_features_2])
+        return merged_feature

requirements.txt ADDED Viewed

	@@ -0,0 +1,122 @@

+aiofiles==24.1.0
+annotated-doc==0.0.4
+annotated-types==0.7.0
+anyio==4.12.1
+av==16.0.1
+blinker==1.9.0
+brotli==1.2.0
+certifi==2026.1.4
+charset-normalizer==3.4.4
+click==8.3.1
+colorama==0.4.6
+contourpy==1.3.3
+cuda-toolkit==12.9.1
+cycler==0.12.1
+decorator==4.4.2
+fastapi==0.128.0
+ffmpy==1.0.0
+filelock==3.20.0
+Flask==3.1.2
+fonttools==4.61.1
+fsspec==2025.12.0
+fvcore==0.1.5.post20221221
+gradio==6.2.0
+gradio_client==2.0.2
+groovy==0.1.2
+h11==0.16.0
+hf-xet==1.2.0
+httpcore==1.0.9
+httpx==0.28.1
+huggingface_hub==1.2.4
+idna==3.11
+ImageIO==2.37.2
+imageio-ffmpeg==0.6.0
+iopath==0.1.10
+itsdangerous==2.2.0
+Jinja2==3.1.6
+joblib==1.5.3
+kiwisolver==1.4.9
+lazy_loader==0.4
+Mako==1.3.10
+markdown-it-py==4.0.0
+MarkupSafe==2.1.5
+matplotlib==3.10.8
+matplotlib-inline==0.1.6
+mdurl==0.1.2
+ml_dtypes==0.5.4
+moviepy==1.0.3
+mpmath==1.3.0
+networkx==3.6.1
+numpy==1.26.4
+nvidia-cuda-runtime-cu12==12.9.79
+onnx==1.20.0
+onnx-ir==0.1.14
+onnxscript==0.5.7
+opencv-python==4.6.0.66
+orjson==3.11.5
+packaging==25.0
+pandas==2.3.3
+parameterized==0.9.0
+pillow==10.4.0
+pillow_heif==0.15.0
+platformdirs==4.5.1
+portalocker==3.2.0
+proglog==0.1.12
+protobuf==6.33.2
+pycuda==2025.1.2
+pydantic==2.12.5
+pydantic_core==2.41.5
+pydub==0.25.1
+Pygments==2.19.2
+pyparsing==3.3.1
+python-dateutil==2.9.0.post0
+python-multipart==0.0.21
+pytools==2025.2.5
+pytorch-msssim==1.0.0
+pytorchvideo==0.1.5
+pytz==2025.2
+pywin32==311
+PyYAML==6.0.3
+requests==2.32.5
+rich==14.2.0
+safehttpx==0.1.7
+safetensors==0.7.0
+scikit-image==0.26.0
+scikit-learn==1.8.0
+scipy==1.11.2
+semantic-version==2.10.0
+setuptools==80.9.0
+shellingham==1.5.4
+siphash24==1.8
+six==1.17.0
+starlette==0.50.0
+sympy==1.14.0
+tabulate==0.9.0
+tensorrt_cu12==10.14.1.48.post1
+tensorrt_cu12_bindings==10.14.1.48.post1
+tensorrt_cu12_libs==10.14.1.48.post1
+tensorrt_dispatch_cu12==10.14.1.48.post1
+tensorrt_dispatch_cu12_bindings==10.14.1.48.post1
+tensorrt_dispatch_cu12_libs==10.14.1.48.post1
+tensorrt_lean_cu12==10.14.1.48.post1
+tensorrt_lean_cu12_bindings==10.14.1.48.post1
+tensorrt_lean_cu12_libs==10.14.1.48.post1
+termcolor==3.3.0
+threadpoolctl==3.6.0
+tifffile==2025.12.20
+timm==1.0.24
+tomlkit==0.13.3
+torch==2.9.1+cu130
+torchvision==0.24.1+cu130
+tqdm==4.65.0
+traitlets==5.14.3
+typer==0.21.1
+typer-slim==0.21.1
+typing-inspection==0.4.2
+typing_extensions==4.15.0
+tzdata==2025.3
+urllib3==2.6.3
+uvicorn==0.40.0
+Werkzeug==3.1.5
+wheel==0.45.1
+yacs==0.1.8

rff_torch.py ADDED Viewed

	@@ -0,0 +1,53 @@

+import numpy as np
+import torch
+from torch import Tensor
+import torch.nn as nn
+@torch.jit.script
+def positional_encoding(
+        v: Tensor,
+        sigma: float,
+        m: int) -> Tensor:
+    r"""Computes :math:`\gamma(\mathbf{v}) = (\dots, \cos{2 \pi \sigma^{(j/m)} \mathbf{v}} , \sin{2 \pi \sigma^{(j/m)} \mathbf{v}}, \dots)`
+        where :math:`j \in \{0, \dots, m-1\}`
+    Args:
+        v (Tensor): input tensor of shape :math:`(N, *, \text{input_size})`
+        sigma (float): constant chosen based upon the domain of :attr:`v`
+        m (int): [description]
+    Returns:
+        Tensor: mapped tensor of shape :math:`(N, *, 2 \cdot m \cdot \text{input_size})`
+    See :class:`~rff.layers.PositionalEncoding` for more details.
+    """
+    j = torch.arange(m, device=v.device)
+    coeffs = 2 * np.pi * sigma ** (j / m)
+    vp = coeffs * torch.unsqueeze(v, -1)
+    vp_cat = torch.cat((torch.cos(vp), torch.sin(vp)), dim=-1)
+    return vp_cat.flatten(-2, -1)
+class PositionalEncoding(nn.Module):
+    """Layer for mapping coordinates using the positional encoding"""
+    def __init__(self, sigma: float, m: int):
+        r"""
+        Args:
+            sigma (float): frequency constant
+            m (int): number of frequencies to map to
+        """
+        super().__init__()
+        self.sigma = sigma
+        self.m = m
+    def forward(self, v: Tensor) -> Tensor:
+        r"""Computes :math:`\gamma(\mathbf{v}) = (\dots, \cos{2 \pi \sigma^{(j/m)} \mathbf{v}} , \sin{2 \pi \sigma^{(j/m)} \mathbf{v}}, \dots)`
+        Args:
+            v (Tensor): input tensor of shape :math:`(N, *, \text{input_size})`
+        Returns:
+            Tensor: mapped tensor of shape :math:`(N, *, 2 \cdot m \cdot \text{input_size})`
+        """
+        return positional_encoding(v, self.sigma, self.m)

utils.py ADDED Viewed

	@@ -0,0 +1,243 @@

+import numpy as np
+from scipy.ndimage import map_coordinates
+def xyzcube(face_w):
+    '''
+    Return the xyz cordinates of the unit cube in [F R B L U D] format.
+    '''
+    out = np.zeros((face_w, face_w * 6, 3), np.float32)
+    rng = np.linspace(-0.5, 0.5, num=face_w, dtype=np.float32)
+    grid = np.stack(np.meshgrid(rng, -rng), -1)
+    # Front face (z = 0.5)
+    out[:, 0*face_w:1*face_w, [0, 1]] = grid
+    out[:, 0*face_w:1*face_w, 2] = 0.5
+    # Right face (x = 0.5)
+    out[:, 1*face_w:2*face_w, [2, 1]] = grid
+    out[:, 1*face_w:2*face_w, 0] = 0.5
+    # Back face (z = -0.5)
+    out[:, 2*face_w:3*face_w, [0, 1]] = grid
+    out[:, 2*face_w:3*face_w, 2] = -0.5
+    # Left face (x = -0.5)
+    out[:, 3*face_w:4*face_w, [2, 1]] = grid
+    out[:, 3*face_w:4*face_w, 0] = -0.5
+    # Up face (y = 0.5)
+    out[:, 4*face_w:5*face_w, [0, 2]] = grid
+    out[:, 4*face_w:5*face_w, 1] = 0.5
+    # Down face (y = -0.5)
+    out[:, 5*face_w:6*face_w, [0, 2]] = grid
+    out[:, 5*face_w:6*face_w, 1] = -0.5
+    return out
+def equirect_uvgrid(h, w):
+    u = np.linspace(-np.pi, np.pi, num=w, dtype=np.float32)
+    v = np.linspace(np.pi, -np.pi, num=h, dtype=np.float32) / 2
+    return np.stack(np.meshgrid(u, v), axis=-1)
+def equirect_facetype(h, w):
+    '''
+    0F 1R 2B 3L 4U 5D
+    '''
+    tp = np.roll(np.arange(4).repeat(w // 4)[None, :].repeat(h, 0), 3 * w // 8, 1)
+    # Prepare ceil mask
+    mask = np.zeros((h, w // 4), np.bool)
+    idx = np.linspace(-np.pi, np.pi, w // 4) / 4
+    idx = h // 2 - np.round(np.arctan(np.cos(idx)) * h / np.pi).astype(int)
+    for i, j in enumerate(idx):
+        mask[:j, i] = 1
+    mask = np.roll(np.concatenate([mask] * 4, 1), 3 * w // 8, 1)
+    tp[mask] = 4
+    tp[np.flip(mask, 0)] = 5
+    return tp.astype(np.int32)
+def xyzpers(h_fov, v_fov, u, v, out_hw, in_rot):
+    out = np.ones((*out_hw, 3), np.float32)
+    x_max = np.tan(h_fov / 2)
+    y_max = np.tan(v_fov / 2)
+    x_rng = np.linspace(-x_max, x_max, num=out_hw[1], dtype=np.float32)
+    y_rng = np.linspace(-y_max, y_max, num=out_hw[0], dtype=np.float32)
+    out[..., :2] = np.stack(np.meshgrid(x_rng, -y_rng), -1)
+    Rx = rotation_matrix(v, [1, 0, 0])
+    Ry = rotation_matrix(u, [0, 1, 0])
+    Ri = rotation_matrix(in_rot, np.array([0, 0, 1.0]).dot(Rx).dot(Ry))
+    return out.dot(Rx).dot(Ry).dot(Ri)
+def xyz2uv(xyz):
+    '''
+    xyz: ndarray in shape of [..., 3]
+    '''
+    x, y, z = np.split(xyz, 3, axis=-1)
+    u = np.arctan2(x, z)
+    c = np.sqrt(x**2 + z**2)
+    v = np.arctan2(y, c)
+    return np.concatenate([u, v], axis=-1)
+def uv2unitxyz(uv):
+    u, v = np.split(uv, 2, axis=-1)
+    y = np.sin(v)
+    c = np.cos(v)
+    x = c * np.sin(u)
+    z = c * np.cos(u)
+    return np.concatenate([x, y, z], axis=-1)
+def uv2coor(uv, h, w):
+    '''
+    uv: ndarray in shape of [..., 2]
+    h: int, height of the equirectangular image
+    w: int, width of the equirectangular image
+    '''
+    u, v = np.split(uv, 2, axis=-1)
+    coor_x = (u / (2 * np.pi) + 0.5) * w - 0.5
+    coor_y = (-v / np.pi + 0.5) * h - 0.5
+    return np.concatenate([coor_x, coor_y], axis=-1)
+def coor2uv(coorxy, h, w):
+    coor_x, coor_y = np.split(coorxy, 2, axis=-1)
+    u = ((coor_x + 0.5) / w - 0.5) * 2 * np.pi
+    v = -((coor_y + 0.5) / h - 0.5) * np.pi
+    return np.concatenate([u, v], axis=-1)
+def sample_equirec(e_img, coor_xy, order):
+    w = e_img.shape[1]
+    coor_x, coor_y = np.split(coor_xy, 2, axis=-1)
+    pad_u = np.roll(e_img[[0]], w // 2, 1)
+    pad_d = np.roll(e_img[[-1]], w // 2, 1)
+    e_img = np.concatenate([e_img, pad_d, pad_u], 0)
+    return map_coordinates(e_img, [coor_y, coor_x],
+                           order=order, mode='wrap')[..., 0]
+def sample_cubefaces(cube_faces, tp, coor_y, coor_x, order):
+    cube_faces = cube_faces.copy()
+    cube_faces[1] = np.flip(cube_faces[1], 1)
+    cube_faces[2] = np.flip(cube_faces[2], 1)
+    cube_faces[4] = np.flip(cube_faces[4], 0)
+    # Pad up down
+    pad_ud = np.zeros((6, 2, cube_faces.shape[2]))
+    pad_ud[0, 0] = cube_faces[5, 0, :]
+    pad_ud[0, 1] = cube_faces[4, -1, :]
+    pad_ud[1, 0] = cube_faces[5, :, -1]
+    pad_ud[1, 1] = cube_faces[4, ::-1, -1]
+    pad_ud[2, 0] = cube_faces[5, -1, ::-1]
+    pad_ud[2, 1] = cube_faces[4, 0, ::-1]
+    pad_ud[3, 0] = cube_faces[5, ::-1, 0]
+    pad_ud[3, 1] = cube_faces[4, :, 0]
+    pad_ud[4, 0] = cube_faces[0, 0, :]
+    pad_ud[4, 1] = cube_faces[2, 0, ::-1]
+    pad_ud[5, 0] = cube_faces[2, -1, ::-1]
+    pad_ud[5, 1] = cube_faces[0, -1, :]
+    cube_faces = np.concatenate([cube_faces, pad_ud], 1)
+    # Pad left right
+    pad_lr = np.zeros((6, cube_faces.shape[1], 2))
+    pad_lr[0, :, 0] = cube_faces[1, :, 0]
+    pad_lr[0, :, 1] = cube_faces[3, :, -1]
+    pad_lr[1, :, 0] = cube_faces[2, :, 0]
+    pad_lr[1, :, 1] = cube_faces[0, :, -1]
+    pad_lr[2, :, 0] = cube_faces[3, :, 0]
+    pad_lr[2, :, 1] = cube_faces[1, :, -1]
+    pad_lr[3, :, 0] = cube_faces[0, :, 0]
+    pad_lr[3, :, 1] = cube_faces[2, :, -1]
+    pad_lr[4, 1:-1, 0] = cube_faces[1, 0, ::-1]
+    pad_lr[4, 1:-1, 1] = cube_faces[3, 0, :]
+    pad_lr[5, 1:-1, 0] = cube_faces[1, -2, :]
+    pad_lr[5, 1:-1, 1] = cube_faces[3, -2, ::-1]
+    cube_faces = np.concatenate([cube_faces, pad_lr], 2)
+    return map_coordinates(cube_faces, [tp, coor_y, coor_x], order=order, mode='wrap')
+def cube_h2list(cube_h):
+    assert cube_h.shape[0] * 6 == cube_h.shape[1]
+    return np.split(cube_h, 6, axis=1)
+def cube_list2h(cube_list):
+    assert len(cube_list) == 6
+    assert sum(face.shape == cube_list[0].shape for face in cube_list) == 6
+    return np.concatenate(cube_list, axis=1)
+def cube_h2dict(cube_h):
+    cube_list = cube_h2list(cube_h)
+    return dict([(k, cube_list[i])
+                 for i, k in enumerate(['F', 'R', 'B', 'L', 'U', 'D'])])
+def cube_dict2h(cube_dict, face_k=['F', 'R', 'B', 'L', 'U', 'D']):
+    assert len(face_k) == 6
+    return cube_list2h([cube_dict[k] for k in face_k])
+def cube_h2dice(cube_h):
+    assert cube_h.shape[0] * 6 == cube_h.shape[1]
+    w = cube_h.shape[0]
+    cube_dice = np.zeros((w * 3, w * 4, cube_h.shape[2]), dtype=cube_h.dtype)
+    cube_list = cube_h2list(cube_h)
+    # Order: F R B L U D
+    sxy = [(1, 1), (2, 1), (3, 1), (0, 1), (1, 0), (1, 2)]
+    for i, (sx, sy) in enumerate(sxy):
+        face = cube_list[i]
+        if i in [1, 2]:
+            face = np.flip(face, axis=1)
+        if i == 4:
+            face = np.flip(face, axis=0)
+        cube_dice[sy*w:(sy+1)*w, sx*w:(sx+1)*w] = face
+    return cube_dice
+def cube_dice2h(cube_dice):
+    w = cube_dice.shape[0] // 3
+    assert cube_dice.shape[0] == w * 3 and cube_dice.shape[1] == w * 4
+    cube_h = np.zeros((w, w * 6, cube_dice.shape[2]), dtype=cube_dice.dtype)
+    # Order: F R B L U D
+    sxy = [(1, 1), (2, 1), (3, 1), (0, 1), (1, 0), (1, 2)]
+    for i, (sx, sy) in enumerate(sxy):
+        face = cube_dice[sy*w:(sy+1)*w, sx*w:(sx+1)*w]
+        if i in [1, 2]:
+            face = np.flip(face, axis=1)
+        if i == 4:
+            face = np.flip(face, axis=0)
+        cube_h[:, i*w:(i+1)*w] = face
+    return cube_h
+def rotation_matrix(rad, ax):
+    ax = np.array(ax)
+    assert len(ax.shape) == 1 and ax.shape[0] == 3
+    ax = ax / np.sqrt((ax**2).sum())
+    R = np.diag([np.cos(rad)] * 3)
+    R = R + np.outer(ax, ax) * (1.0 - np.cos(rad))
+    ax = ax * np.sin(rad)
+    R = R + np.array([[0, -ax[2], ax[1]],
+                      [ax[2], 0, -ax[0]],
+                      [-ax[1], ax[0], 0]])
+    return R