Spaces:

sunjuice
/

FashionFlow

Sleeping

App Files Files Community

sunjuice commited on Nov 4, 2024

Commit

f075308

1 Parent(s): 3f8c938

init

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

.gitignore +1 -0
.ipynb_checkpoints/Untitled-checkpoint.ipynb +6 -0
README.md +84 -12
Untitled.ipynb +68 -0
app.py +188 -0
distributed.py +126 -0
model_structure.txt +113 -0
models/diffusion_model.py +762 -0
models/unet_dual_encoder.py +62 -0
project_latent_space.py +75 -0
requirements.txt +30 -0
sample/blue.jpg +0 -0
sample/green.jpg +0 -0
sample/silver.jpg +0 -0
src/deps/__init__.py +0 -0
src/deps/facial_recognition/__init__.py +3 -0
src/deps/facial_recognition/helpers.py +123 -0
src/deps/facial_recognition/model_irse.py +88 -0
src/dnnlib/__init__.py +9 -0
src/dnnlib/util.py +480 -0
src/infra/__init__.py +0 -0
src/infra/experiments.yaml +60 -0
src/infra/launch.py +113 -0
src/infra/slurm_batch_launch.py +96 -0
src/infra/slurm_job.py +46 -0
src/infra/slurm_job_proxy.sh +4 -0
src/infra/utils.py +140 -0
src/metrics/__init__.py +9 -0
src/metrics/frechet_inception_distance.py +54 -0
src/metrics/frechet_video_distance.py +59 -0
src/metrics/inception_score.py +47 -0
src/metrics/kernel_inception_distance.py +46 -0
src/metrics/metric_main.py +154 -0
src/metrics/metric_utils.py +332 -0
src/metrics/video_inception_score.py +54 -0
src/scripts/__init__.py +0 -0
src/scripts/calc_metrics.py +250 -0
src/scripts/calc_metrics_for_dataset.py +169 -0
src/scripts/clip_edit.py +403 -0
src/scripts/construct_static_videos_dataset.py +46 -0
src/scripts/convert_video_to_dataset.py +87 -0
src/scripts/convert_videos_to_frames.py +105 -0
src/scripts/crop_video_dataset.py +69 -0
src/scripts/frames_to_video_grid.py +78 -0
src/scripts/generate.py +148 -0
src/scripts/preprocess_ffs.py +204 -0
src/scripts/profile_model.py +104 -0
src/scripts/project.py +479 -0
src/torch_utils/__init__.py +9 -0
src/torch_utils/custom_ops.py +126 -0

.gitignore ADDED Viewed

	@@ -0,0 +1 @@


1	+ .idea/

.ipynb_checkpoints/Untitled-checkpoint.ipynb ADDED Viewed

	@@ -0,0 +1,6 @@

+{
+ "cells": [],
+ "metadata": {},
+ "nbformat": 4,
+ "nbformat_minor": 5
+}

README.md CHANGED Viewed

@@ -1,12 +1,84 @@
----
-title: FashionFlow
-emoji: 🏆
-colorFrom: green
-colorTo: purple
-sdk: streamlit
-sdk_version: 1.37.1
-app_file: app.py
-pinned: false
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+<div id="top"></div>
+<h3>FashionFlow: Leveraging Diffusion Models for Dynamic Fashion Video Synthesis from Static Imagery</h3>
+<p>
+This repository has the official code for 'FashionFlow: Leveraging Diffusion Models for Dynamic Fashion Video Synthesis from Static Imagery'.
+We have included the pre-trained checkpoint, dataset and results.
+</p>
+> **Abstract:** *Our study introduces a new image-to-video generator called FashionFlow to generate fashion videos. By utilising a diffusion model, we are able to create short videos from still fashion images. Our approach involves developing and connecting relevant components with the diffusion model, which results in the creation of high-fidelity videos that are aligned with the conditional image. The components include the use of pseudo-3D convolutional layers to generate videos efficiently. VAE and CLIP encoders capture vital characteristics from still images to condition the diffusion model at a global level. Our research demonstrates a successful synthesis of fashion videos featuring models posing from various angles, showcasing the fit and appearance of the garment. Our findings hold great promise for improving and enhancing the shopping experience for the online fashion industry.*
+<!-- Results -->
+## Teaser
+![image](sample/teaser.gif)
+## Requirements
+- Python 3.9
+- PyTorch 1.11+
+- Tensoboard
+- cv2
+- transformers
+- diffusers
+## Model Specification
+The model was developed using PyTorch and loads pretrained weights for VAE and CLIP. The latent diffusion model consists of a 1D convolutional layer stacked against a 2D convolutional layer (forming a pseudo 3D convolution) and includes attention layers. See the ```model_structure.txt``` file to see the exact layers of our LDM.
+## Installation
+Clone this repository:
+```
+git clone https://github.com/1702609/FashionFlow
+cd ./FashionFlow/
+```
+Install PyTorch and other dependencies:
+```
+pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0 --extra-index-url https://download.pytorch.org/whl/cu113
+pip install -r requirements.txt
+```
+## Dataset
+Download the Fashion dataset by clicking on this link:
+[[Fashion dataset]](https://vision.cs.ubc.ca/datasets/fashion/)
+Extract the files and place them in the ```fashion_dataset``` directory. The dataset should be organised as follows:
+```
+fashion_dataset
+ test
+  |-- 91-3003CN5S.mp4
+  |-- 91BjuE6irxS.mp4
+  |-- 91bxAN6BjAS.mp4
+  |-- ...
+ train
+  |-- 81FyMPk-WIS.mp4
+  |-- 91+bCFG1jOS.mp4
+  |-- 91+PxmDyrgS.mp4
+  |-- ...
+```
+Feel free to add your own dataset while following the provided file and folder structure.
+## Pre-trained Checkpoint
+Download the checkpoint by clicking on this link:
+[[Pre-trained checkpoints]](https://www.dropbox.com/scl/fi/p9fv7o3j7ti0yu2umsgmv/FashionFlow_checkpoint.pth?rlkey=mqsto9i4ujh6xhvab0e2s6n7d&dl=0)
+Extract the files and place them in the ```checkpoint``` directory
+## Inference
+To run the inference of our model, execute ```python inference.py```. The results will be saved in the ```result``` directory.
+## Train
+Before training, images and videos have to be projected to latent space for efficient training. Execute ```python project_latent_space.py``` where the tensors will be saved in the ```fashion_dataset_tensor``` directory.
+Run ```python -m torch.distributed.launch --nproc_per_node=<number of GPUs> train.py``` to train the model. The checkpoints will be saved in the ```checkpoint``` directory periodically. Also, you can view the training progress using tensorboardX located in ```video_progress``` or find the generated ```.mp4``` on ```training_sample```.
+## Comparison
+![image](sample/comparison.gif)

Untitled.ipynb ADDED Viewed

	@@ -0,0 +1,68 @@

+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "5b3c9fac-51c3-4ecc-8606-5a298076560e",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from huggingface_hub import notebook_login\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "51fe1170-c6eb-4b3a-a055-663faf35ab5a",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "6c6d20c8f5e847d7985f6b49a7206a2d",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "VBox(children=(HTML(value='<center> <img\\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "notebook_login()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "4297ea17-a4f8-4290-b561-f582b1adc189",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "work",
+   "language": "python",
+   "name": "work"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.9"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}

app.py ADDED Viewed

	@@ -0,0 +1,188 @@

+import torch
+import cv2
+import torchvision.transforms as transforms
+from models.unet_dual_encoder import Embedding_Adapter
+from models.diffusion_model import SpaceTimeUnet
+import numpy as np
+import torchvision.transforms.functional as TVF
+from diffusers import AutoencoderKL
+from PIL import Image
+from transformers import CLIPVisionModel, CLIPProcessor
+import torch.nn.functional as F
+import gradio as gr
+from huggingface_hub import hf_hub_download
+device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
+frameLimit = 70
+def cosine_beta_schedule(timesteps, start=0.0001, end=0.02):
+    betas = []
+    for i in reversed(range(timesteps)):
+        T = timesteps - 1
+        beta = start + 0.5 * (end - start) * (1 + np.cos((i / T) * np.pi))
+        betas.append(beta)
+    return torch.Tensor(betas)
+def get_index_from_list(vals, t, x_shape):
+    batch_size = t.shape[0]
+    out = vals.gather(-1, t.cpu())
+    return out.reshape(batch_size, *((1,) * (len(x_shape) - 1))).to(t.device)
+def forward_diffusion_sample(x_0, t):
+    noise = torch.randn_like(x_0)
+    sqrt_alphas_cumprod_t = get_index_from_list(sqrt_alphas_cumprod, t, x_0.shape)
+    sqrt_one_minus_alphas_cumprod_t = get_index_from_list(
+        sqrt_one_minus_alphas_cumprod, t, x_0.shape
+    )
+    # mean + variance
+    return sqrt_alphas_cumprod_t.to(device) * x_0.to(device) \
+    + sqrt_one_minus_alphas_cumprod_t.to(device) * noise.to(device), noise.to(device)
+T = 1000
+betas = cosine_beta_schedule(timesteps=T)
+# Pre-calculate different terms for closed form
+alphas = 1. - betas
+alphas_cumprod = torch.cumprod(alphas, axis=0)
+alphas_cumprod_prev = F.pad(alphas_cumprod[:-1], (1, 0), value=1.0)
+sqrt_recip_alphas = torch.sqrt(1.0 / alphas)
+sqrt_alphas_cumprod = torch.sqrt(alphas_cumprod)
+sqrt_one_minus_alphas_cumprod = torch.sqrt(1. - alphas_cumprod)
+posterior_variance = betas * (1. - alphas_cumprod_prev) / (1. - alphas_cumprod)
+def get_transform():
+    image_transforms = transforms.Compose(
+        [
+        transforms.Resize((640, 512), interpolation=transforms.InterpolationMode.BILINEAR),
+        transforms.ToTensor(),
+        ])
+    return image_transforms
+vae = AutoencoderKL.from_pretrained("CompVis/stable-diffusion-v1-4",
+            subfolder="vae",
+            revision="ebb811dd71cdc38a204ecbdd6ac5d580f529fd8c")
+vae.to(device)
+vae.requires_grad_(False)
+with torch.no_grad():
+    Net = SpaceTimeUnet(
+        dim = 64,
+        channels = 4,
+        dim_mult = (1, 2, 4, 8),
+        temporal_compression = (False, False, False, True),
+        self_attns = (False, False, False, True),
+        condition_on_timestep=True
+    ).to(device)
+adapter = Embedding_Adapter(input_nc=1280, output_nc=1280).to(device)
+clip_encoder = CLIPVisionModel.from_pretrained("openai/clip-vit-base-patch32").to(device)
+clip_encoder.requires_grad_(False)
+clip_processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")
+checkpoint = torch.load(hf_hub_download(repo_id="sunjuice/FashionFlow_model", filename="FashionFlow_model.pth"))
+Net.load_state_dict(checkpoint['net'])
+adapter.load_state_dict(checkpoint['adapter'])
+del checkpoint
+torch.cuda.empty_cache()
+def save_video_frames_as_mp4(frames, fps, save_path):
+    frame_h, frame_w = frames[0].shape[2:]
+    fourcc = cv2.VideoWriter_fourcc('m', 'p', '4', 'v')
+    video = cv2.VideoWriter(save_path, fourcc, fps, (frame_w, frame_h))
+    frames = frames[0]
+    for frame in frames:
+        frame = np.array(TVF.to_pil_image(frame))
+        video.write(cv2.cvtColor(frame, cv2.COLOR_RGB2BGR))
+    video.release()
+@torch.no_grad()
+def VAE_encode(image):
+    init_latent_dist = vae.encode(image).latent_dist.sample()
+    init_latent_dist *= 0.18215
+    encoded_image = (init_latent_dist).unsqueeze(1)
+    return encoded_image
+@torch.no_grad()
+def VAE_decode(video, vae_net):
+    decoded_video = None
+    for i in range(video.shape[1]):
+        image = video[:, i, :, :, :]
+        image = 1 / 0.18215 * image
+        image = vae_net.decode(image).sample
+        image = image.clamp(0,1)
+        if i == 0:
+            decoded_video = image.unsqueeze(1)
+        else:
+            decoded_video = torch.cat([decoded_video, image.unsqueeze(1)], 1)
+    return decoded_video
+@torch.no_grad()
+def sample_timestep(x, image, t):
+    betas_t = get_index_from_list(betas, t, x.shape)
+    sqrt_one_minus_alphas_cumprod_t = get_index_from_list(
+        sqrt_one_minus_alphas_cumprod, t, x.shape
+    )
+    sqrt_recip_alphas_t = get_index_from_list(sqrt_recip_alphas, t, x.shape)
+    # Call model (current image - noise prediction)
+    with torch.cuda.amp.autocast():
+        sample_output = Net(x.permute(0, 2, 1, 3, 4), image, timestep=t.float())
+    sample_output = sample_output.permute(0, 2, 1, 3, 4)
+    model_mean = sqrt_recip_alphas_t * (
+            x - betas_t * sample_output / sqrt_one_minus_alphas_cumprod_t
+    )
+    if t.item() == 0:
+        return model_mean
+    else:
+        noise = torch.randn_like(x)
+        posterior_variance_t = get_index_from_list(posterior_variance, t, x.shape)
+        return model_mean + torch.sqrt(posterior_variance_t) * noise
+def tensor2image(tensor):
+    numpy_image = tensor[0].cpu().detach().numpy()
+    rescaled_image = (numpy_image * 255).astype(np.uint8)
+    pil_image = Image.fromarray(rescaled_image.transpose(1, 2, 0))
+    return pil_image
+@torch.no_grad()
+def get_image_embedding(input_image):
+    inputs = clip_processor(images=list(input_image), return_tensors="pt")
+    inputs = {k: v.to(device) for k, v in inputs.items()}
+    clip_hidden_states = clip_encoder(**inputs).last_hidden_state.to(device)
+    vae_hidden_states = vae.encode(input_image).latent_dist.sample() * 0.18215
+    encoder_hidden_states = adapter(clip_hidden_states, vae_hidden_states)
+    return encoder_hidden_states
+def predict_fn(img_path, progress=gr.Progress()):
+    image = get_transform(Image.open(img_path).convert('RGB'))
+    encoder_hidden_states = get_image_embedding(input_image=image)
+    encoded_image = VAE_encode(image)
+    noise_video = torch.randn([1, frameLimit, 4, 80, 64]).to(device)
+    noise_video[:, 0:1] = encoded_image
+    with torch.no_grad():
+        for i in progress.tqdm(range(0, T)[::-1]):
+            t = torch.full((1,), i, device=device).long()
+            noise_video = sample_timestep(noise_video, encoder_hidden_states, t)
+            noise_video[:, 0:1] = encoded_image
+        final_video = VAE_decode(noise_video, vae)
+        save_video_frames_as_mp4(final_video, 25, "result.mp4")
+    return "result.mp4"
+with gr.Tab("Image-to-Video"):
+    with gr.Row():
+        with gr.Column():
+            image_input = gr.Image(type="pil", label="Input Image")
+            img_generate = gr.Button("Generate Video")
+        with gr.Column():
+            img_output = gr.Video(label="Generated Video")
+    gr.Examples(
+        examples=[
+            ['sample/blue.jpg',]
+        ],
+        inputs=[image_input],
+        outputs=[img_output],
+        fn=predict_fn,
+        cache_examples='lazy',
+    )

distributed.py ADDED Viewed

	@@ -0,0 +1,126 @@

+import math
+import pickle
+import torch
+from torch import distributed as dist
+from torch.utils.data.sampler import Sampler
+def get_rank():
+    if not dist.is_available():
+        return 0
+    if not dist.is_initialized():
+        return 0
+    return dist.get_rank()
+def synchronize():
+    if not dist.is_available():
+        return
+    if not dist.is_initialized():
+        return
+    world_size = dist.get_world_size()
+    if world_size == 1:
+        return
+    dist.barrier()
+def get_world_size():
+    if not dist.is_available():
+        return 1
+    if not dist.is_initialized():
+        return 1
+    return dist.get_world_size()
+def reduce_sum(tensor):
+    if not dist.is_available():
+        return tensor
+    if not dist.is_initialized():
+        return tensor
+    tensor = tensor.clone()
+    dist.all_reduce(tensor, op=dist.ReduceOp.SUM)
+    return tensor
+def gather_grad(params):
+    world_size = get_world_size()
+    if world_size == 1:
+        return
+    for param in params:
+        if param.grad is not None:
+            dist.all_reduce(param.grad.data, op=dist.ReduceOp.SUM)
+            param.grad.data.div_(world_size)
+def all_gather(data):
+    world_size = get_world_size()
+    if world_size == 1:
+        return [data]
+    buffer = pickle.dumps(data)
+    storage = torch.ByteStorage.from_buffer(buffer)
+    tensor = torch.ByteTensor(storage).to('cuda')
+    local_size = torch.IntTensor([tensor.numel()]).to('cuda')
+    size_list = [torch.IntTensor([0]).to('cuda') for _ in range(world_size)]
+    dist.all_gather(size_list, local_size)
+    size_list = [int(size.item()) for size in size_list]
+    max_size = max(size_list)
+    tensor_list = []
+    for _ in size_list:
+        tensor_list.append(torch.ByteTensor(size=(max_size,)).to('cuda'))
+    if local_size != max_size:
+        padding = torch.ByteTensor(size=(max_size - local_size,)).to('cuda')
+        tensor = torch.cat((tensor, padding), 0)
+    dist.all_gather(tensor_list, tensor)
+    data_list = []
+    for size, tensor in zip(size_list, tensor_list):
+        buffer = tensor.cpu().numpy().tobytes()[:size]
+        data_list.append(pickle.loads(buffer))
+    return data_list
+def reduce_loss_dict(loss_dict):
+    world_size = get_world_size()
+    if world_size < 2:
+        return loss_dict
+    with torch.no_grad():
+        keys = []
+        losses = []
+        for k in sorted(loss_dict.keys()):
+            keys.append(k)
+            losses.append(loss_dict[k])
+        losses = torch.stack(losses, 0)
+        dist.reduce(losses, dst=0)
+        if dist.get_rank() == 0:
+            losses /= world_size
+        reduced_losses = {k: v for k, v in zip(keys, losses)}
+    return reduced_losses

model_structure.txt ADDED Viewed

	@@ -0,0 +1,113 @@

+======================================================================================================================================================
+Layer (type (var_name))                                                Input Shape          Output Shape         Param #              Trainable
+======================================================================================================================================================
+SpaceTimeUnet (SpaceTimeUnet)                                          [1, 4, 70, 80, 64]   [1, 4, 70, 80, 64]   --                   True
+├─Sequential (to_timestep_cond)                                        [1]                  [1, 256]             --                   True
+│    └─SinusoidalPosEmb (0)                                            [1]                  [1, 64]              --                   --
+│    └─Linear (1)                                                      [1, 64]              [1, 256]             16,640               True
+│    └─SiLU (2)                                                        [1, 256]             [1, 256]             --                   --
+├─PseudoConv3d (conv_in)                                               [1, 4, 70, 80, 64]   [1, 64, 70, 80, 64]  --                   True
+│    └─Conv2d (spatial_conv)                                           [70, 4, 80, 64]      [70, 64, 80, 64]     12,608               True
+│    └─Conv1d (temporal_conv)                                          [5120, 64, 70]       [5120, 64, 70]       12,352               True
+├─ModuleList (downs)                                                   --                   --                   --                   True
+│    └─ModuleList (0)                                                  --                   --                   --                   True
+│    │    └─ResnetBlock (0)                                            [1, 64, 70, 80, 64]  [1, 64, 70, 80, 64]  131,712              True
+│    │    └─ModuleList (1)                                             --                   --                   197,632              True
+│    │    └─Downsample (3)                                             [1, 64, 70, 80, 64]  [1, 64, 70, 40, 32]  16,384               True
+│    │    └─AttentionBlock (4)                                         [1, 64, 70, 40, 32]  [1, 64, 70, 40, 32]  160,704              True
+│    └─ModuleList (1)                                                  --                   --                   --                   True
+│    │    └─ResnetBlock (0)                                            [1, 64, 70, 40, 32]  [1, 128, 70, 40, 32] 394,624              True
+│    │    └─ModuleList (1)                                             --                   --                   788,480              True
+│    │    └─Downsample (3)                                             [1, 128, 70, 40, 32] [1, 128, 70, 20, 16] 65,536               True
+│    │    └─AttentionBlock (4)                                         [1, 128, 70, 20, 16] [1, 128, 70, 20, 16] 444,288              True
+│    └─ModuleList (2)                                                  --                   --                   --                   True
+│    │    └─ResnetBlock (0)                                            [1, 128, 70, 20, 16] [1, 256, 70, 20, 16] 1,444,608            True
+│    │    └─ModuleList (1)                                             --                   --                   3,149,824            True
+│    │    └─Downsample (3)                                             [1, 256, 70, 20, 16] [1, 256, 70, 10, 8]  262,144              True
+│    │    └─AttentionBlock (4)                                         [1, 256, 70, 10, 8]  [1, 256, 70, 10, 8]  1,380,096            True
+│    └─ModuleList (3)                                                  --                   --                   --                   True
+│    │    └─ResnetBlock (0)                                            [1, 256, 70, 10, 8]  [1, 512, 70, 10, 8]  5,510,656            True
+│    │    └─ModuleList (1)                                             --                   --                   12,591,104           True
+│    │    └─SpatioTemporalAttention (2)                                [1, 512, 70, 10, 8]  [1, 512, 70, 10, 8]  4,334,181            True
+│    │    └─Downsample (3)                                             [1, 512, 70, 10, 8]  [1, 512, 35, 5, 4]   1,572,864            True
+│    │    └─AttentionBlock (4)                                         [1, 512, 35, 5, 4]   [1, 512, 35, 5, 4]   4,726,272            True
+├─ResnetBlock (mid_block1)                                             [1, 512, 35, 5, 4]   [1, 512, 35, 5, 4]   --                   True
+│    └─Sequential (timestep_mlp)                                       [1, 256]             [1, 1024]            --                   True
+│    │    └─SiLU (0)                                                   [1, 256]             [1, 256]             --                   --
+│    │    └─Linear (1)                                                 [1, 256]             [1, 1024]            263,168              True
+│    └─Block (block1)                                                  [1, 512, 35, 5, 4]   [1, 512, 35, 5, 4]   --                   True
+│    │    └─PseudoConv3d (project)                                     [1, 512, 35, 5, 4]   [1, 512, 35, 5, 4]   3,146,752            True
+│    │    └─GroupNorm (norm)                                           [1, 512, 35, 5, 4]   [1, 512, 35, 5, 4]   1,024                True
+│    │    └─SiLU (act)                                                 [1, 512, 35, 5, 4]   [1, 512, 35, 5, 4]   --                   --
+│    └─Block (block2)                                                  [1, 512, 35, 5, 4]   [1, 512, 35, 5, 4]   --                   True
+│    │    └─PseudoConv3d (project)                                     [1, 512, 35, 5, 4]   [1, 512, 35, 5, 4]   3,146,752            True
+│    │    └─GroupNorm (norm)                                           [1, 512, 35, 5, 4]   [1, 512, 35, 5, 4]   1,024                True
+│    │    └─SiLU (act)                                                 [1, 512, 35, 5, 4]   [1, 512, 35, 5, 4]   --                   --
+│    └─Identity (res_conv)                                             [1, 512, 35, 5, 4]   [1, 512, 35, 5, 4]   --                   --
+├─SpatioTemporalAttention (mid_attn)                                   [1, 512, 35, 5, 4]   [1, 512, 35, 5, 4]   --                   True
+│    └─ContinuousPositionBias (spatial_rel_pos_bias)                   --                   [8, 20, 20]          --                   True
+│    │    └─ModuleList (net)                                           --                   --                   68,616               True
+│    └─Attention (spatial_attn)                                        [35, 20, 512]        [35, 20, 512]        --                   True
+│    │    └─LayerNorm (norm)                                           [35, 20, 512]        [35, 20, 512]        1,024                True
+│    │    └─Linear (to_q)                                              [35, 20, 512]        [35, 20, 512]        262,144              True
+│    │    └─Linear (to_kv)                                             [35, 20, 512]        [35, 20, 1024]       524,288              True
+│    │    └─Linear (to_out)                                            [35, 20, 512]        [35, 20, 512]        262,144              True
+│    └─ContinuousPositionBias (temporal_rel_pos_bias)                  --                   [8, 35, 35]          --                   True
+│    │    └─ModuleList (net)                                           --                   --                   68,360               True
+│    └─Attention (temporal_attn)                                       [20, 35, 512]        [20, 35, 512]        --                   True
+│    │    └─LayerNorm (norm)                                           [20, 35, 512]        [20, 35, 512]        1,024                True
+│    │    └─Linear (to_q)                                              [20, 35, 512]        [20, 35, 512]        262,144              True
+│    │    └─Linear (to_kv)                                             [20, 35, 512]        [20, 35, 1024]       524,288              True
+│    │    └─Linear (to_out)                                            [20, 35, 512]        [20, 35, 512]        262,144              True
+│    └─FeedForward (ff)                                                [1, 512, 35, 5, 4]   [1, 512, 35, 5, 4]   --                   True
+│    │    └─Sequential (proj_in)                                       [1, 512, 35, 5, 4]   [1, 1365, 35, 5, 4]  1,397,760            True
+│    │    └─Sequential (proj_out)                                      [1, 1365, 35, 5, 4]  [1, 512, 35, 5, 4]   700,245              True
+├─ResnetBlock (mid_block2)                                             [1, 512, 35, 5, 4]   [1, 512, 35, 5, 4]   --                   True
+│    └─Sequential (timestep_mlp)                                       [1, 256]             [1, 1024]            --                   True
+│    │    └─SiLU (0)                                                   [1, 256]             [1, 256]             --                   --
+│    │    └─Linear (1)                                                 [1, 256]             [1, 1024]            263,168              True
+│    └─Block (block1)                                                  [1, 512, 35, 5, 4]   [1, 512, 35, 5, 4]   --                   True
+│    │    └─PseudoConv3d (project)                                     [1, 512, 35, 5, 4]   [1, 512, 35, 5, 4]   3,146,752            True
+│    │    └─GroupNorm (norm)                                           [1, 512, 35, 5, 4]   [1, 512, 35, 5, 4]   1,024                True
+│    │    └─SiLU (act)                                                 [1, 512, 35, 5, 4]   [1, 512, 35, 5, 4]   --                   --
+│    └─Block (block2)                                                  [1, 512, 35, 5, 4]   [1, 512, 35, 5, 4]   --                   True
+│    │    └─PseudoConv3d (project)                                     [1, 512, 35, 5, 4]   [1, 512, 35, 5, 4]   3,146,752            True
+│    │    └─GroupNorm (norm)                                           [1, 512, 35, 5, 4]   [1, 512, 35, 5, 4]   1,024                True
+│    │    └─SiLU (act)                                                 [1, 512, 35, 5, 4]   [1, 512, 35, 5, 4]   --                   --
+│    └─Identity (res_conv)                                             [1, 512, 35, 5, 4]   [1, 512, 35, 5, 4]   --                   --
+├─ModuleList (ups)                                                     --                   --                   --                   True
+│    └─ModuleList (3)                                                  --                   --                   --                   True
+│    │    └─Upsample (3)                                               [1, 512, 35, 5, 4]   [1, 512, 70, 10, 8]  1,575,936            True
+│    │    └─ResnetBlock (0)                                            [1, 1024, 70, 10, 8] [1, 256, 70, 10, 8]  3,738,368            True
+│    │    └─ModuleList (1)                                             --                   --                   4,526,336            True
+│    │    └─SpatioTemporalAttention (2)                                [1, 256, 70, 10, 8]  [1, 256, 70, 10, 8]  1,609,786            True
+│    │    └─AttentionBlock (4)                                         [1, 256, 70, 10, 8]  [1, 256, 70, 10, 8]  1,380,096            True
+│    └─ModuleList (2)                                                  --                   --                   --                   True
+│    │    └─Upsample (3)                                               [1, 256, 70, 10, 8]  [1, 256, 70, 20, 16] 263,168              True
+│    │    └─ResnetBlock (0)                                            [1, 512, 70, 20, 16] [1, 128, 70, 20, 16] 968,064              True
+│    │    └─ModuleList (1)                                             --                   --                   1,132,672            True
+│    │    └─AttentionBlock (4)                                         [1, 128, 70, 20, 16] [1, 128, 70, 20, 16] 444,288              True
+│    └─ModuleList (1)                                                  --                   --                   --                   True
+│    │    └─Upsample (3)                                               [1, 128, 70, 20, 16] [1, 128, 70, 40, 32] 66,048               True
+│    │    └─ResnetBlock (0)                                            [1, 256, 70, 40, 32] [1, 64, 70, 40, 32]  258,752              True
+│    │    └─ModuleList (1)                                             --                   --                   283,712              True
+│    │    └─AttentionBlock (4)                                         [1, 64, 70, 40, 32]  [1, 64, 70, 40, 32]  160,704              True
+│    └─ModuleList (0)                                                  --                   --                   --                   True
+│    │    └─Upsample (3)                                               [1, 64, 70, 40, 32]  [1, 64, 70, 80, 64]  16,640               True
+│    │    └─ResnetBlock (0)                                            [1, 128, 70, 80, 64] [1, 64, 70, 80, 64]  176,832              True
+│    │    └─ModuleList (1)                                             --                   --                   242,752              True
+│    │    └─AttentionBlock (4)                                         [1, 64, 70, 80, 64]  [1, 64, 70, 80, 64]  160,704              True
+├─PseudoConv3d (conv_out)                                              [1, 64, 70, 80, 64]  [1, 4, 70, 80, 64]   --                   True
+│    └─Conv2d (spatial_conv)                                           [70, 64, 80, 64]     [70, 4, 80, 64]      2,308                True
+│    └─Conv1d (temporal_conv)                                          [5120, 4, 70]        [5120, 4, 70]        52                   True
+======================================================================================================================================================
+Total params: 71,671,548
+Trainable params: 71,671,548
+Non-trainable params: 0
+Total mult-adds (G): 732.56
+======================================================================================================================================================
+Input size (MB): 5.89
+Forward/backward pass size (MB): 18136.46
+Params size (MB): 286.69
+Estimated Total Size (MB): 18429.04
+======================================================================================================================================================

models/diffusion_model.py ADDED Viewed

	@@ -0,0 +1,762 @@

+import math
+import functools
+from operator import mul
+import torch
+import torch.nn.functional as F
+from torch import nn, einsum
+from einops import rearrange, repeat, pack, unpack
+from einops.layers.torch import Rearrange
+# helper functions
+def exists(val):
+    return val is not None
+def default(val, d):
+    return val if exists(val) else d
+def mul_reduce(tup):
+    return functools.reduce(mul, tup)
+def divisible_by(numer, denom):
+    return (numer % denom) == 0
+mlist = nn.ModuleList
+# for time conditioning
+class SinusoidalPosEmb(nn.Module):
+    def __init__(self, dim, theta=10000):
+        super().__init__()
+        self.theta = theta
+        self.dim = dim
+    def forward(self, x):
+        dtype, device = x.dtype, x.device
+        assert dtype == torch.float, 'input to sinusoidal pos emb must be a float type'
+        half_dim = self.dim // 2
+        emb = math.log(self.theta) / (half_dim - 1)
+        emb = torch.exp(torch.arange(half_dim, device=device, dtype=dtype) * -emb)
+        emb = rearrange(x, 'i -> i 1') * rearrange(emb, 'j -> 1 j')
+        return torch.cat((emb.sin(), emb.cos()), dim=-1).type(dtype)
+# layernorm 3d
+class ChanLayerNorm(nn.Module):
+    def __init__(self, dim):
+        super().__init__()
+        self.g = nn.Parameter(torch.ones(dim, 1, 1, 1))
+    def forward(self, x):
+        eps = 1e-5 if x.dtype == torch.float32 else 1e-3
+        var = torch.var(x, dim=1, unbiased=False, keepdim=True)
+        mean = torch.mean(x, dim=1, keepdim=True)
+        return (x - mean) * var.clamp(min=eps).rsqrt() * self.g
+# feedforward
+def shift_token(t):
+    t, t_shift = t.chunk(2, dim=1)
+    t_shift = F.pad(t_shift, (0, 0, 0, 0, 1, -1), value=0.)
+    return torch.cat((t, t_shift), dim=1)
+class GEGLU(nn.Module):
+    def forward(self, x):
+        x, gate = x.chunk(2, dim=1)
+        return x * F.gelu(gate)
+class FeedForward(nn.Module):
+    def __init__(self, dim, mult=4):
+        super().__init__()
+        inner_dim = int(dim * mult * 2 / 3)
+        self.proj_in = nn.Sequential(
+            nn.Conv3d(dim, inner_dim * 2, 1, bias=False),
+            GEGLU()
+        )
+        self.proj_out = nn.Sequential(
+            ChanLayerNorm(inner_dim),
+            nn.Conv3d(inner_dim, dim, 1, bias=False)
+        )
+    def forward(self, x, enable_time=True):
+        x = self.proj_in(x)
+        if enable_time:
+            x = shift_token(x)
+        return self.proj_out(x)
+# best relative positional encoding
+class ContinuousPositionBias(nn.Module):
+    """ from https://arxiv.org/abs/2111.09883 """
+    def __init__(
+            self,
+            *,
+            dim,
+            heads,
+            num_dims=1,
+            layers=2
+    ):
+        super().__init__()
+        self.num_dims = num_dims
+        self.net = nn.ModuleList([])
+        self.net.append(nn.Sequential(nn.Linear(self.num_dims, dim), nn.SiLU()))
+        for _ in range(layers - 1):
+            self.net.append(nn.Sequential(nn.Linear(dim, dim), nn.SiLU()))
+        self.net.append(nn.Linear(dim, heads))
+    @property
+    def device(self):
+        return next(self.parameters()).device
+    def forward(self, *dimensions):
+        device = self.device
+        shape = torch.tensor(dimensions, device=device)
+        rel_pos_shape = 2 * shape - 1
+        # calculate strides
+        strides = torch.flip(rel_pos_shape, (0,)).cumprod(dim=-1)
+        strides = torch.flip(F.pad(strides, (1, -1), value=1), (0,))
+        # get all positions and calculate all the relative distances
+        positions = [torch.arange(d, device=device) for d in dimensions]
+        grid = torch.stack(torch.meshgrid(*positions, indexing='ij'), dim=-1)
+        grid = rearrange(grid, '... c -> (...) c')
+        rel_dist = rearrange(grid, 'i c -> i 1 c') - rearrange(grid, 'j c -> 1 j c')
+        # get all relative positions across all dimensions
+        rel_positions = [torch.arange(-d + 1, d, device=device) for d in dimensions]
+        rel_pos_grid = torch.stack(torch.meshgrid(*rel_positions, indexing='ij'), dim=-1)
+        rel_pos_grid = rearrange(rel_pos_grid, '... c -> (...) c')
+        # mlp input
+        bias = rel_pos_grid.float()
+        for layer in self.net:
+            bias = layer(bias)
+        # convert relative distances to indices of the bias
+        rel_dist += (shape - 1)  # make sure all positive
+        rel_dist *= strides
+        rel_dist_indices = rel_dist.sum(dim=-1)
+        # now select the bias for each unique relative position combination
+        bias = bias[rel_dist_indices]
+        return rearrange(bias, 'i j h -> h i j')
+# helper classes
+class CrossAttention(nn.Module):
+    def __init__(self, n_heads, d_embed, d_cross, in_proj_bias=True, out_proj_bias=True):
+        super().__init__()
+        self.q_proj   = nn.Linear(d_embed, d_embed, bias=in_proj_bias)
+        self.k_proj   = nn.Linear(d_cross, d_embed, bias=in_proj_bias)
+        self.v_proj   = nn.Linear(d_cross, d_embed, bias=in_proj_bias)
+        self.out_proj = nn.Linear(d_embed, d_embed, bias=out_proj_bias)
+        self.n_heads = n_heads
+        self.d_head = d_embed // n_heads
+    def forward(self, x, y):
+        input_shape = x.shape
+        batch_size, sequence_length, d_embed = input_shape
+        interim_shape = (batch_size, -1, self.n_heads, self.d_head)
+        q = self.q_proj(x)
+        k = self.k_proj(y)
+        v = self.v_proj(y)
+        q = q.view(interim_shape).transpose(1, 2)
+        k = k.view(interim_shape).transpose(1, 2)
+        v = v.view(interim_shape).transpose(1, 2)
+        weight = q @ k.transpose(-1, -2)
+        weight /= math.sqrt(self.d_head)
+        weight = F.softmax(weight, dim=-1)
+        output = weight @ v
+        output = output.transpose(1, 2).contiguous()
+        output = output.view(input_shape)
+        output = self.out_proj(output)
+        return output
+class AttentionBlock(nn.Module):
+    def __init__(self, n_head: int, n_embd: int, d_context=768):
+        super().__init__()
+        channels = n_head * n_embd
+        #self.groupnorm = nn.GroupNorm(32, channels, eps=1e-6)
+        #self.conv_input = PseudoConv3d(channels, channels, 1)
+        self.layernorm_2 = nn.LayerNorm(channels)
+        self.attention_2 = CrossAttention(n_head, channels, d_context, in_proj_bias=False)
+        self.layernorm_3 = nn.LayerNorm(channels)
+        self.linear_geglu_1  = nn.Linear(channels, 4 * channels * 2)
+        self.linear_geglu_2 = nn.Linear(4 * channels, channels)
+        self.conv_output = PseudoConv3d(channels, channels, 1, bias=False)
+    def forward(self, x, context):
+        b, c, *_, h, w = x.shape
+        #x = self.groupnorm(x)
+        #x = self.conv_input(x)
+        x = rearrange(x, 'b c f h w -> b (h w f) c')
+        residue_short = x
+        x = self.layernorm_2(x)
+        x = self.attention_2(x, context)
+        x += residue_short
+        residue_short = x
+        x = self.layernorm_3(x)
+        x, gate = self.linear_geglu_1(x).chunk(2, dim=-1)
+        x = x * F.gelu(gate)
+        x = self.linear_geglu_2(x)
+        x += residue_short
+        x = rearrange(x, 'b (h w f) c -> b c f h w', b=b, c=c, h=h, w=w)
+        x = self.conv_output(x)
+        return x
+class Attention(nn.Module):
+    def __init__(
+            self,
+            dim,
+            dim_head=64,
+            heads=8
+    ):
+        super().__init__()
+        self.heads = heads
+        self.scale = dim_head ** -0.5
+        inner_dim = dim_head * heads
+        self.norm = nn.LayerNorm(dim)
+        self.to_q = nn.Linear(dim, inner_dim, bias=False)
+        self.to_kv = nn.Linear(dim, inner_dim * 2, bias=False)
+        self.to_out = nn.Linear(inner_dim, dim, bias=False)
+        nn.init.zeros_(self.to_out.weight.data)  # identity with skip connection
+    def forward(
+            self,
+            x,
+            rel_pos_bias=None
+    ):
+        x = self.norm(x)
+        q, k, v = self.to_q(x), *self.to_kv(x).chunk(2, dim=-1)
+        q, k, v = map(lambda t: rearrange(t, 'b n (h d) -> b h n d', h=self.heads), (q, k, v))
+        q = q * self.scale
+        sim = einsum('b h i d, b h j d -> b h i j', q, k)
+        if exists(rel_pos_bias):
+            sim = sim + rel_pos_bias
+        attn = sim.softmax(dim=-1)
+        out = einsum('b h i j, b h j d -> b h i d', attn, v)
+        out = rearrange(out, 'b h n d -> b n (h d)')
+        return self.to_out(out)
+# main contribution - pseudo 3d conv
+class PseudoConv3d(nn.Module):
+    def __init__(
+            self,
+            dim,
+            dim_out=None,
+            kernel_size=3,
+            *,
+            temporal_kernel_size=None,
+            **kwargs
+    ):
+        super().__init__()
+        dim_out = default(dim_out, dim)
+        temporal_kernel_size = default(temporal_kernel_size, kernel_size)
+        self.spatial_conv = nn.Conv2d(dim, dim_out, kernel_size=kernel_size, padding=kernel_size // 2)
+        self.temporal_conv = nn.Conv1d(dim_out, dim_out, kernel_size=temporal_kernel_size,
+                                       padding=temporal_kernel_size // 2) if kernel_size > 1 else None
+        if exists(self.temporal_conv):
+            nn.init.dirac_(self.temporal_conv.weight.data)  # initialized to be identity
+            nn.init.zeros_(self.temporal_conv.bias.data)
+    def forward(
+            self,
+            x,
+            enable_time=True
+    ):
+        b, c, *_, h, w = x.shape
+        is_video = x.ndim == 5
+        enable_time &= is_video
+        if is_video:
+            x = rearrange(x, 'b c f h w -> (b f) c h w')
+        x = self.spatial_conv(x)
+        if is_video:
+            x = rearrange(x, '(b f) c h w -> b c f h w', b=b)
+        if not enable_time or not exists(self.temporal_conv):
+            return x
+        x = rearrange(x, 'b c f h w -> (b h w) c f')
+        x = self.temporal_conv(x)
+        x = rearrange(x, '(b h w) c f -> b c f h w', h=h, w=w)
+        return x
+# factorized spatial temporal attention from Ho et al.
+class SpatioTemporalAttention(nn.Module):
+    def __init__(
+            self,
+            dim,
+            *,
+            dim_head=64,
+            heads=8,
+            add_feed_forward=True,
+            ff_mult=4
+    ):
+        super().__init__()
+        self.spatial_attn = Attention(dim=dim, dim_head=dim_head, heads=heads)
+        self.spatial_rel_pos_bias = ContinuousPositionBias(dim=dim // 2, heads=heads, num_dims=2)
+        self.temporal_attn = Attention(dim=dim, dim_head=dim_head, heads=heads)
+        self.temporal_rel_pos_bias = ContinuousPositionBias(dim=dim // 2, heads=heads, num_dims=1)
+        self.has_feed_forward = add_feed_forward
+        if not add_feed_forward:
+            return
+        self.ff = FeedForward(dim=dim, mult=ff_mult)
+    def forward(
+            self,
+            x,
+            enable_time=True
+    ):
+        b, c, *_, h, w = x.shape
+        is_video = x.ndim == 5
+        enable_time &= is_video
+        if is_video:
+            x = rearrange(x, 'b c f h w -> (b f) (h w) c')
+        else:
+            x = rearrange(x, 'b c h w -> b (h w) c')
+        space_rel_pos_bias = self.spatial_rel_pos_bias(h, w)
+        x = self.spatial_attn(x, rel_pos_bias=space_rel_pos_bias) + x
+        if is_video:
+            x = rearrange(x, '(b f) (h w) c -> b c f h w', b=b, h=h, w=w)
+        else:
+            x = rearrange(x, 'b (h w) c -> b c h w', h=h, w=w)
+        if enable_time:
+            x = rearrange(x, 'b c f h w -> (b h w) f c')
+            time_rel_pos_bias = self.temporal_rel_pos_bias(x.shape[1])
+            x = self.temporal_attn(x, rel_pos_bias=time_rel_pos_bias) + x
+            x = rearrange(x, '(b h w) f c -> b c f h w', w=w, h=h)
+        if self.has_feed_forward:
+            x = self.ff(x, enable_time=enable_time) + x
+        return x
+# resnet block
+class Block(nn.Module):
+    def __init__(
+            self,
+            dim,
+            dim_out,
+            kernel_size=3,
+            temporal_kernel_size=None,
+            groups=8
+    ):
+        super().__init__()
+        self.project = PseudoConv3d(dim, dim_out, 3)
+        self.norm = nn.GroupNorm(groups, dim_out)
+        self.act = nn.SiLU()
+    def forward(
+            self,
+            x,
+            scale_shift=None,
+            enable_time=False
+    ):
+        x = self.project(x, enable_time=enable_time)
+        x = self.norm(x)
+        if exists(scale_shift):
+            scale, shift = scale_shift
+            x = x * (scale + 1) + shift
+        return self.act(x)
+class ResnetBlock(nn.Module):
+    def __init__(
+            self,
+            dim,
+            dim_out,
+            *,
+            timestep_cond_dim=None,
+            groups=8
+    ):
+        super().__init__()
+        self.timestep_mlp = None
+        if exists(timestep_cond_dim):
+            self.timestep_mlp = nn.Sequential(
+                nn.SiLU(),
+                nn.Linear(timestep_cond_dim, dim_out * 2)
+            )
+        self.block1 = Block(dim, dim_out, groups=groups)
+        self.block2 = Block(dim_out, dim_out, groups=groups)
+        self.res_conv = PseudoConv3d(dim, dim_out, 1) if dim != dim_out else nn.Identity()
+    def forward(
+            self,
+            x,
+            timestep_emb=None,
+            enable_time=True
+    ):
+        assert not (exists(timestep_emb) ^ exists(self.timestep_mlp))
+        scale_shift = None
+        if exists(self.timestep_mlp) and exists(timestep_emb):
+            time_emb = self.timestep_mlp(timestep_emb)
+            to_einsum_eq = 'b c 1 1 1' if x.ndim == 5 else 'b c 1 1'
+            time_emb = rearrange(time_emb, f'b c -> {to_einsum_eq}')
+            scale_shift = time_emb.chunk(2, dim=1)
+        h = self.block1(x, scale_shift=scale_shift, enable_time=enable_time)
+        h = self.block2(h, enable_time=enable_time)
+        return h + self.res_conv(x)
+# pixelshuffle upsamples and downsamples
+# where time dimension can be configured
+class Downsample(nn.Module):
+    def __init__(
+            self,
+            dim,
+            downsample_space=True,
+            downsample_time=False,
+            nonlin=False
+    ):
+        super().__init__()
+        assert downsample_space or downsample_time
+        self.down_space = nn.Sequential(
+            Rearrange('b c (h p1) (w p2) -> b (c p1 p2) h w', p1=2, p2=2),
+            nn.Conv2d(dim * 4, dim, 1, bias=False),
+            nn.SiLU() if nonlin else nn.Identity()
+        ) if downsample_space else None
+        self.down_time = nn.Sequential(
+            Rearrange('b c (f p) h w -> b (c p) f h w', p=2),
+            nn.Conv3d(dim * 2, dim, 1, bias=False),
+            nn.SiLU() if nonlin else nn.Identity()
+        ) if downsample_time else None
+    def forward(
+            self,
+            x,
+            enable_time=True
+    ):
+        is_video = x.ndim == 5
+        if is_video:
+            x = rearrange(x, 'b c f h w -> b f c h w')
+            x, ps = pack([x], '* c h w')
+        if exists(self.down_space):
+            x = self.down_space(x)
+        if is_video:
+            x, = unpack(x, ps, '* c h w')
+            x = rearrange(x, 'b f c h w -> b c f h w')
+        if not is_video or not exists(self.down_time) or not enable_time:
+            return x
+        x = self.down_time(x)
+        return x
+class Upsample(nn.Module):
+    def __init__(
+            self,
+            dim,
+            upsample_space=True,
+            upsample_time=False,
+            nonlin=False
+    ):
+        super().__init__()
+        assert upsample_space or upsample_time
+        self.up_space = nn.Sequential(
+            nn.Conv2d(dim, dim * 4, 1),
+            nn.SiLU() if nonlin else nn.Identity(),
+            Rearrange('b (c p1 p2) h w -> b c (h p1) (w p2)', p1=2, p2=2)
+        ) if upsample_space else None
+        self.up_time = nn.Sequential(
+            nn.Conv3d(dim, dim * 2, 1),
+            nn.SiLU() if nonlin else nn.Identity(),
+            Rearrange('b (c p) f h w -> b c (f p) h w', p=2)
+        ) if upsample_time else None
+        self.init_()
+    def init_(self):
+        if exists(self.up_space):
+            self.init_conv_(self.up_space[0], 4)
+        if exists(self.up_time):
+            self.init_conv_(self.up_time[0], 2)
+    def init_conv_(self, conv, factor):
+        o, *remain_dims = conv.weight.shape
+        conv_weight = torch.empty(o // factor, *remain_dims)
+        nn.init.kaiming_uniform_(conv_weight)
+        conv_weight = repeat(conv_weight, 'o ... -> (o r) ...', r=factor)
+        conv.weight.data.copy_(conv_weight)
+        nn.init.zeros_(conv.bias.data)
+    def forward(
+            self,
+            x,
+            enable_time=True
+    ):
+        is_video = x.ndim == 5
+        if is_video:
+            x = rearrange(x, 'b c f h w -> b f c h w')
+            x, ps = pack([x], '* c h w')
+        if exists(self.up_space):
+            x = self.up_space(x)
+        if is_video:
+            x, = unpack(x, ps, '* c h w')
+            x = rearrange(x, 'b f c h w -> b c f h w')
+        if not is_video or not exists(self.up_time) or not enable_time:
+            return x
+        x = self.up_time(x)
+        return x
+class SpaceTimeUnet(nn.Module):
+    def __init__(
+            self,
+            *,
+            dim,
+            channels=4,
+            dim_mult=(1, 2, 4, 8),
+            self_attns=(False, False, False, True),
+            temporal_compression=(False, True, True, True),
+            resnet_block_depths=(2, 2, 2, 2),
+            attn_dim_head=64,
+            attn_heads=8,
+            condition_on_timestep=False,
+    ):
+        super().__init__()
+        assert len(dim_mult) == len(self_attns) == len(temporal_compression) == len(resnet_block_depths)
+        num_layers = len(dim_mult)
+        dims = [dim, *map(lambda mult: mult * dim, dim_mult)]
+        dim_in_out = zip(dims[:-1], dims[1:])
+        # determine the valid multiples of the image size and frames of the video
+        self.frame_multiple = 2 ** sum(tuple(map(int, temporal_compression)))
+        self.image_size_multiple = 2 ** num_layers
+        # timestep conditioning for DDPM, not to be confused with the time dimension of the video
+        self.to_timestep_cond = None
+        timestep_cond_dim = (dim * 4) if condition_on_timestep else None
+        if condition_on_timestep:
+            self.to_timestep_cond = nn.Sequential(
+                SinusoidalPosEmb(dim),
+                nn.Linear(dim, timestep_cond_dim),
+                nn.SiLU()
+            )
+        # Cross Attention
+        cross_attention_D1 = AttentionBlock(1, 64)  # 64
+        cross_attention_D2 = AttentionBlock(1, 128)  # 128
+        cross_attention_D3 = AttentionBlock(2, 128)  # 256
+        cross_attention_D4 = AttentionBlock(4, 128)  # 512
+        cross_attention_U1 = AttentionBlock(4, 64)  # 256
+        cross_attention_U2 = AttentionBlock(2, 64)  # 128
+        cross_attention_U3 = AttentionBlock(1, 64)  # 64
+        cross_attention_U4 = AttentionBlock(1, 64)  # 64
+        cross_attns_down = (cross_attention_D1, cross_attention_D2, cross_attention_D3, cross_attention_D4)
+        cross_attns_up = (cross_attention_U4, cross_attention_U3, cross_attention_U2, cross_attention_U1)
+        # layers
+        self.downs = mlist([])
+        self.ups = mlist([])
+        attn_kwargs = dict(
+            dim_head=attn_dim_head,
+            heads=attn_heads
+        )
+        mid_dim = dims[-1]
+        self.mid_block1 = ResnetBlock(mid_dim, mid_dim, timestep_cond_dim=timestep_cond_dim)
+        self.mid_attn = SpatioTemporalAttention(dim=mid_dim)
+        self.mid_block2 = ResnetBlock(mid_dim, mid_dim, timestep_cond_dim=timestep_cond_dim)
+        for _, self_attend, (dim_in, dim_out), compress_time, resnet_block_depth, cross_attns_d, cross_attns_u in zip(range(num_layers),
+                                                                                        self_attns,
+                                                                                        dim_in_out,
+                                                                                        temporal_compression,
+                                                                                        resnet_block_depths,
+                                                                                        cross_attns_down,
+                                                                                        cross_attns_up):
+            assert resnet_block_depth >= 1
+            self.downs.append(mlist([
+                ResnetBlock(dim_in, dim_out, timestep_cond_dim=timestep_cond_dim),
+                mlist([ResnetBlock(dim_out, dim_out) for _ in range(resnet_block_depth)]),
+                SpatioTemporalAttention(dim=dim_out, **attn_kwargs) if self_attend else None,
+                Downsample(dim_out, downsample_time=compress_time),
+                cross_attns_d if exists(cross_attns_d) else None
+            ]))
+            self.ups.append(mlist([
+                ResnetBlock(dim_out * 2, dim_in, timestep_cond_dim=timestep_cond_dim),
+                mlist(
+                    [ResnetBlock(dim_in + (dim_out if ind == 0 else 0), dim_in) for ind in range(resnet_block_depth)]),
+                SpatioTemporalAttention(dim=dim_in, **attn_kwargs) if self_attend else None,
+                Upsample(dim_out, upsample_time=compress_time),
+                cross_attns_u if exists(cross_attns_u) else None
+            ]))
+        self.skip_scale = 2 ** -0.5  # paper shows faster convergence
+        self.conv_in = PseudoConv3d(dim=channels, dim_out=dim, kernel_size=7, temporal_kernel_size=3)
+        self.conv_out = PseudoConv3d(dim=dim, dim_out=channels, kernel_size=3, temporal_kernel_size=3)
+    def forward(
+            self,
+            x,
+            clip_vae_embed,
+            timestep=None,
+            enable_time=True
+    ):
+        assert not (exists(self.to_timestep_cond) ^ exists(timestep))
+        is_video = x.ndim == 5
+        if enable_time and is_video:
+            frames = x.shape[2]
+            assert divisible_by(frames,
+                                self.frame_multiple), f'number of frames on the video ({frames}) must be divisible by the frame multiple ({self.frame_multiple})'
+        height, width = x.shape[-2:]
+        assert divisible_by(height, self.image_size_multiple) and divisible_by(width,
+                                                                               self.image_size_multiple), f'height and width of the image or video must be a multiple of {self.image_size_multiple}'
+        # main logic
+        t = self.to_timestep_cond(rearrange(timestep, '... -> (...)')) if exists(timestep) else None
+        x = self.conv_in(x, enable_time=enable_time)
+        hiddens = []
+        for init_block, blocks, maybe_attention, downsample, cross_attn in self.downs:
+            x = init_block(x, t, enable_time=enable_time)
+            hiddens.append(x.clone())
+            for block in blocks:
+                x = block(x, enable_time=enable_time)
+            if exists(maybe_attention):
+                x = maybe_attention(x, enable_time=enable_time) # only happens in the last layer
+            hiddens.append(x.clone())
+            x = downsample(x, enable_time=enable_time)
+            if exists(cross_attn):
+                x = cross_attn(x, clip_vae_embed)
+        x = self.mid_block1(x, t, enable_time=enable_time)
+        x = self.mid_attn(x, enable_time=enable_time)
+        x = self.mid_block2(x, t, enable_time=enable_time)
+        for init_block, blocks, maybe_attention, upsample, cross_attn in reversed(self.ups):
+            x = upsample(x, enable_time=enable_time)
+            x = torch.cat((hiddens.pop() * self.skip_scale, x), dim=1)
+            x = init_block(x, t, enable_time=enable_time)
+            x = torch.cat((hiddens.pop() * self.skip_scale, x), dim=1)
+            for block in blocks:
+                x = block(x, enable_time=enable_time)
+            if exists(maybe_attention):
+                x = maybe_attention(x, enable_time=enable_time)
+            if exists(cross_attn):
+                x = cross_attn(x, clip_vae_embed)
+        x = self.conv_out(x, enable_time=enable_time)
+        return x
+if __name__ == '__main__':
+    Net = SpaceTimeUnet(
+        dim=64,
+        channels=3,
+        dim_mult=(1, 2, 4, 8),
+        temporal_compression=(False, False, False, True),
+        self_attns=(False, False, False, True),
+        condition_on_timestep=False)
+    x = torch.randn([1,8,3,32,32])
+    sample_output = Net(x.permute(0, 2, 1, 3, 4))

models/unet_dual_encoder.py ADDED Viewed

	@@ -0,0 +1,62 @@

+# Load pretrained 2D UNet and modify with temporal attention
+import torch
+import torch.nn as nn
+import torch.utils.checkpoint
+from einops import rearrange
+from diffusers.models import UNet2DConditionModel
+def get_unet(pretrained_model_name_or_path, revision, resolution=256, n_poses=5):
+    # Load pretrained UNet layers
+    unet = UNet2DConditionModel.from_pretrained("CompVis/stable-diffusion-v1-4",
+        subfolder="unet",
+        revision="ebb811dd71cdc38a204ecbdd6ac5d580f529fd8c",
+        cache_dir="checkpoints/unet")
+    # Modify input layer to have 1 additional input channels (pose)
+    weights = unet.conv_in.weight.clone()
+    unet.conv_in = nn.Conv2d(4 + 2*n_poses, weights.shape[0], kernel_size=3, padding=(1, 1)) # input noise + n poses
+    with torch.no_grad():
+        unet.conv_in.weight[:, :4] = weights # original weights
+        unet.conv_in.weight[:, 3:] = torch.zeros(unet.conv_in.weight[:, 3:].shape) # new weights initialized to zero
+    return unet
+'''
+    This module takes in CLIP + VAE embeddings and outputs CLIP-compatible embeddings.
+'''
+class Embedding_Adapter(nn.Module):
+    def __init__(self, input_nc=38, output_nc=4, norm_layer=nn.InstanceNorm2d, chkpt=None):
+        super(Embedding_Adapter, self).__init__()
+        self.save_method_name = "adapter"
+        self.pool =  nn.MaxPool2d(2)
+        self.vae2clip = nn.Linear(1280, 768)
+        self.linear1 = nn.Linear(54, 50) # 50 x 54 shape
+        # initialize weights
+        with torch.no_grad():
+            self.linear1.weight = nn.Parameter(torch.eye(50, 54))
+        if chkpt is not None:
+            pass
+    def forward(self, clip, vae):
+        vae = self.pool(vae) # 1 4 80 64 --> 1 4 40 32
+        vae = rearrange(vae, 'b c h w -> b c (h w)') # 1 4 20 16 --> 1 4 1280
+        vae = self.vae2clip(vae) # 1 4 768
+        # Concatenate
+        concat = torch.cat((clip, vae), 1)
+        # Encode
+        concat = rearrange(concat, 'b c d -> b d c')
+        concat = self.linear1(concat)
+        concat = rearrange(concat, 'b d c -> b c d')
+        return concat

project_latent_space.py ADDED Viewed

	@@ -0,0 +1,75 @@

+import torchvision.transforms as transforms
+import os.path as osp
+import cv2
+import torch
+import os, argparse
+import tqdm
+from PIL import Image
+from diffusers import AutoencoderKL
+import random
+device = torch.device("cuda")
+parser = argparse.ArgumentParser(description="Configuration of the tensor projection.")
+parser.add_argument('--dataset', default="fashion_dataset/train", help="Path to the dataset")
+parser.add_argument('--output_dir', default="fashion_dataset_tensor", help="Path to save the tensors")
+args = parser.parse_args()
+vae = AutoencoderKL.from_pretrained(
+            "CompVis/stable-diffusion-v1-4",
+            subfolder="vae",
+            revision="ebb811dd71cdc38a204ecbdd6ac5d580f529fd8c"
+        ).to(device)
+vae.requires_grad_(False)
+@torch.no_grad()
+def VAE_encode(video):
+    for i in range(video.shape[0]):
+        image = video[i, :, :, :]
+        image = image.unsqueeze(0)
+        if i == 0:
+            init_latent_dist = vae.encode(image).latent_dist.sample()
+            init_latent_dist *= 0.18215
+            encoded_video = (init_latent_dist).unsqueeze(1)
+        else:
+            init_latent_dist = vae.encode(image).latent_dist.sample()
+            init_latent_dist *= 0.18215
+            encoded_video = torch.cat([encoded_video, (init_latent_dist).unsqueeze(1)], 1)
+    return encoded_video
+def get_transform():
+    image_transforms = transforms.Compose(
+        [
+        transforms.Resize((640, 512), interpolation=transforms.InterpolationMode.BILINEAR),
+        transforms.ToTensor(),
+        ])
+    return image_transforms
+path = osp.join(args.dataset)
+video_names = os.listdir(path)
+transform = get_transform()
+if not os.path.exists(args.output_dir):
+    os.makedirs(args.output_dir)
+for video_name in tqdm.tqdm(video_names):
+    cap = cv2.VideoCapture(osp.join(path, video_name))
+    numberOfFrames = 241
+    number = random.randint(0, numberOfFrames - 70)
+    for i in range(number, number + 70):
+        cap.set(cv2.CAP_PROP_POS_FRAMES, i)
+        _, frame = cap.read()
+        frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
+        frame = Image.fromarray(frame)
+        frame = transform(frame)
+        if i == number:
+            inputImage = frame
+            torch.save(inputImage, args.output_dir + "/" + video_name[:-4] + "_image.pt")
+            frame = frame.unsqueeze(0)
+            restOfVideo = torch.clone(frame)
+        else:
+            frame = frame.unsqueeze(0)
+            restOfVideo = torch.cat([restOfVideo, frame], 0)
+    restOfVideo = restOfVideo.to(device=device)
+    vae_video = VAE_encode(restOfVideo).detach().cpu()[0]
+    torch.save(vae_video, args.output_dir + "/" + video_name[:-4] + ".pt")

requirements.txt ADDED Viewed

	@@ -0,0 +1,30 @@

+accelerate==0.26.1
+certifi==2023.11.17
+charset-normalizer==3.3.2
+diffusers==0.14.0
+einops==0.7.0
+filelock==3.13.1
+fsspec==2023.12.2
+huggingface-hub==0.20.2
+idna==3.6
+importlib-metadata==7.0.1
+numpy==1.26.3
+opencv-python==4.9.0.80
+packaging==23.2
+pillow==10.2.0
+protobuf==4.25.2
+psutil==5.9.7
+PyYAML==6.0.1
+regex==2023.12.25
+requests==2.31.0
+safetensors==0.4.1
+tensorboardX==2.6.2.2
+tokenizers==0.15.0
+torch==1.11.0+cu113
+torchaudio==0.11.0+cu113
+torchvision==0.12.0+cu113
+tqdm==4.66.1
+transformers==4.36.2
+typing_extensions==4.9.0
+urllib3==2.1.0
+zipp==3.17.0

sample/blue.jpg ADDED Viewed

sample/green.jpg ADDED Viewed

sample/silver.jpg ADDED Viewed

src/deps/__init__.py ADDED Viewed

File without changes

src/deps/facial_recognition/__init__.py ADDED Viewed

	@@ -0,0 +1,3 @@

+"""
+Copy-pasted from https://github.com/orpatashnik/StyleCLIP/tree/main/models/facial_recognition/__init__.py
+"""

src/deps/facial_recognition/helpers.py ADDED Viewed

	@@ -0,0 +1,123 @@

+"""
+Copy-pasted from https://github.com/orpatashnik/StyleCLIP/tree/main/models/facial_recognition/helpers.py
+"""
+from collections import namedtuple
+import torch
+from torch.nn import Conv2d, BatchNorm2d, PReLU, ReLU, Sigmoid, MaxPool2d, AdaptiveAvgPool2d, Sequential, Module
+"""
+ArcFace implementation from [TreB1eN](https://github.com/TreB1eN/InsightFace_Pytorch)
+"""
+class Flatten(Module):
+    def forward(self, input):
+        return input.view(input.size(0), -1)
+def l2_norm(input, axis=1):
+    norm = torch.norm(input, 2, axis, True)
+    output = torch.div(input, norm)
+    return output
+class Bottleneck(namedtuple('Block', ['in_channel', 'depth', 'stride'])):
+    """ A named tuple describing a ResNet block. """
+def get_block(in_channel, depth, num_units, stride=2):
+    return [Bottleneck(in_channel, depth, stride)] + [Bottleneck(depth, depth, 1) for i in range(num_units - 1)]
+def get_blocks(num_layers):
+    if num_layers == 50:
+        blocks = [
+            get_block(in_channel=64, depth=64, num_units=3),
+            get_block(in_channel=64, depth=128, num_units=4),
+            get_block(in_channel=128, depth=256, num_units=14),
+            get_block(in_channel=256, depth=512, num_units=3)
+        ]
+    elif num_layers == 100:
+        blocks = [
+            get_block(in_channel=64, depth=64, num_units=3),
+            get_block(in_channel=64, depth=128, num_units=13),
+            get_block(in_channel=128, depth=256, num_units=30),
+            get_block(in_channel=256, depth=512, num_units=3)
+        ]
+    elif num_layers == 152:
+        blocks = [
+            get_block(in_channel=64, depth=64, num_units=3),
+            get_block(in_channel=64, depth=128, num_units=8),
+            get_block(in_channel=128, depth=256, num_units=36),
+            get_block(in_channel=256, depth=512, num_units=3)
+        ]
+    else:
+        raise ValueError("Invalid number of layers: {}. Must be one of [50, 100, 152]".format(num_layers))
+    return blocks
+class SEModule(Module):
+    def __init__(self, channels, reduction):
+        super(SEModule, self).__init__()
+        self.avg_pool = AdaptiveAvgPool2d(1)
+        self.fc1 = Conv2d(channels, channels // reduction, kernel_size=1, padding=0, bias=False)
+        self.relu = ReLU(inplace=True)
+        self.fc2 = Conv2d(channels // reduction, channels, kernel_size=1, padding=0, bias=False)
+        self.sigmoid = Sigmoid()
+    def forward(self, x):
+        module_input = x
+        x = self.avg_pool(x)
+        x = self.fc1(x)
+        x = self.relu(x)
+        x = self.fc2(x)
+        x = self.sigmoid(x)
+        return module_input * x
+class bottleneck_IR(Module):
+    def __init__(self, in_channel, depth, stride):
+        super(bottleneck_IR, self).__init__()
+        if in_channel == depth:
+            self.shortcut_layer = MaxPool2d(1, stride)
+        else:
+            self.shortcut_layer = Sequential(
+                Conv2d(in_channel, depth, (1, 1), stride, bias=False),
+                BatchNorm2d(depth)
+            )
+        self.res_layer = Sequential(
+            BatchNorm2d(in_channel),
+            Conv2d(in_channel, depth, (3, 3), (1, 1), 1, bias=False), PReLU(depth),
+            Conv2d(depth, depth, (3, 3), stride, 1, bias=False), BatchNorm2d(depth)
+        )
+    def forward(self, x):
+        shortcut = self.shortcut_layer(x)
+        res = self.res_layer(x)
+        return res + shortcut
+class bottleneck_IR_SE(Module):
+    def __init__(self, in_channel, depth, stride):
+        super(bottleneck_IR_SE, self).__init__()
+        if in_channel == depth:
+            self.shortcut_layer = MaxPool2d(1, stride)
+        else:
+            self.shortcut_layer = Sequential(
+                Conv2d(in_channel, depth, (1, 1), stride, bias=False),
+                BatchNorm2d(depth)
+            )
+        self.res_layer = Sequential(
+            BatchNorm2d(in_channel),
+            Conv2d(in_channel, depth, (3, 3), (1, 1), 1, bias=False),
+            PReLU(depth),
+            Conv2d(depth, depth, (3, 3), stride, 1, bias=False),
+            BatchNorm2d(depth),
+            SEModule(depth, 16)
+        )
+    def forward(self, x):
+        shortcut = self.shortcut_layer(x)
+        res = self.res_layer(x)
+        return res + shortcut

src/deps/facial_recognition/model_irse.py ADDED Viewed

	@@ -0,0 +1,88 @@

+"""
+Copy-pasted from https://github.com/orpatashnik/StyleCLIP/tree/main/models/facial_recognition/model_irse.py
+"""
+from torch.nn import Linear, Conv2d, BatchNorm1d, BatchNorm2d, PReLU, Dropout, Sequential, Module
+from .helpers import get_blocks, Flatten, bottleneck_IR, bottleneck_IR_SE, l2_norm
+"""
+Modified Backbone implementation from [TreB1eN](https://github.com/TreB1eN/InsightFace_Pytorch)
+"""
+class Backbone(Module):
+    WEIGHTS_URL = "https://www.dropbox.com/s/n6xicva1lrghb5w/model_ir_se50.pth?dl=1"
+    def __init__(self, input_size, num_layers, mode='ir', drop_ratio=0.4, affine=True):
+        super(Backbone, self).__init__()
+        assert input_size in [112, 224], "input_size should be 112 or 224"
+        assert num_layers in [50, 100, 152], "num_layers should be 50, 100 or 152"
+        assert mode in ['ir', 'ir_se'], "mode should be ir or ir_se"
+        blocks = get_blocks(num_layers)
+        if mode == 'ir':
+            unit_module = bottleneck_IR
+        elif mode == 'ir_se':
+            unit_module = bottleneck_IR_SE
+        self.input_layer = Sequential(Conv2d(3, 64, (3, 3), 1, 1, bias=False),
+                                      BatchNorm2d(64),
+                                      PReLU(64))
+        if input_size == 112:
+            self.output_layer = Sequential(BatchNorm2d(512),
+                                           Dropout(drop_ratio),
+                                           Flatten(),
+                                           Linear(512 * 7 * 7, 512),
+                                           BatchNorm1d(512, affine=affine))
+        else:
+            self.output_layer = Sequential(BatchNorm2d(512),
+                                           Dropout(drop_ratio),
+                                           Flatten(),
+                                           Linear(512 * 14 * 14, 512),
+                                           BatchNorm1d(512, affine=affine))
+        modules = []
+        for block in blocks:
+            for bottleneck in block:
+                modules.append(unit_module(bottleneck.in_channel,
+                                           bottleneck.depth,
+                                           bottleneck.stride))
+        self.body = Sequential(*modules)
+    def forward(self, x):
+        x = self.input_layer(x)
+        x = self.body(x)
+        x = self.output_layer(x)
+        return l2_norm(x)
+def IR_50(input_size):
+    """Constructs a ir-50 model."""
+    model = Backbone(input_size, num_layers=50, mode='ir', drop_ratio=0.4, affine=False)
+    return model
+def IR_101(input_size):
+    """Constructs a ir-101 model."""
+    model = Backbone(input_size, num_layers=100, mode='ir', drop_ratio=0.4, affine=False)
+    return model
+def IR_152(input_size):
+    """Constructs a ir-152 model."""
+    model = Backbone(input_size, num_layers=152, mode='ir', drop_ratio=0.4, affine=False)
+    return model
+def IR_SE_50(input_size):
+    """Constructs a ir_se-50 model."""
+    model = Backbone(input_size, num_layers=50, mode='ir_se', drop_ratio=0.4, affine=False)
+    return model
+def IR_SE_101(input_size):
+    """Constructs a ir_se-101 model."""
+    model = Backbone(input_size, num_layers=100, mode='ir_se', drop_ratio=0.4, affine=False)
+    return model
+def IR_SE_152(input_size):
+    """Constructs a ir_se-152 model."""
+    model = Backbone(input_size, num_layers=152, mode='ir_se', drop_ratio=0.4, affine=False)
+    return model

src/dnnlib/__init__.py ADDED Viewed

	@@ -0,0 +1,9 @@

+# Copyright (c) 2021, NVIDIA CORPORATION.  All rights reserved.
+#
+# NVIDIA CORPORATION and its licensors retain all intellectual property
+# and proprietary rights in and to this software, related documentation
+# and any modifications thereto.  Any use, reproduction, disclosure or
+# distribution of this software and related documentation without an express
+# license agreement from NVIDIA CORPORATION is strictly prohibited.
+from .util import EasyDict, make_cache_dir_path

src/dnnlib/util.py ADDED Viewed

	@@ -0,0 +1,480 @@

+# Copyright (c) 2021, NVIDIA CORPORATION.  All rights reserved.
+#
+# NVIDIA CORPORATION and its licensors retain all intellectual property
+# and proprietary rights in and to this software, related documentation
+# and any modifications thereto.  Any use, reproduction, disclosure or
+# distribution of this software and related documentation without an express
+# license agreement from NVIDIA CORPORATION is strictly prohibited.
+"""Miscellaneous utility classes and functions."""
+import ctypes
+import fnmatch
+import importlib
+import inspect
+import numpy as np
+import os
+import shutil
+import sys
+import types
+import io
+import pickle
+import re
+import requests
+import html
+import hashlib
+import glob
+import tempfile
+import urllib
+import urllib.request
+import uuid
+from distutils.util import strtobool
+from typing import Any, List, Tuple, Union, Dict
+# Util classes
+# ------------------------------------------------------------------------------------------
+class EasyDict(dict):
+    """Convenience class that behaves like a dict but allows access with the attribute syntax."""
+    def __getattr__(self, name: str) -> Any:
+        try:
+            return self[name]
+        except KeyError:
+            raise AttributeError(name)
+    def __setattr__(self, name: str, value: Any) -> None:
+        self[name] = value
+    def __delattr__(self, name: str) -> None:
+        del self[name]
+    def to_dict(self) -> Dict:
+        return {k: (v.to_dict() if isinstance(v, EasyDict) else v) for (k, v) in self.items()}
+class Logger(object):
+    """Redirect stderr to stdout, optionally print stdout to a file, and optionally force flushing on both stdout and the file."""
+    def __init__(self, file_name: str = None, file_mode: str = "w", should_flush: bool = True):
+        self.file = None
+        if file_name is not None:
+            self.file = open(file_name, file_mode)
+        self.should_flush = should_flush
+        self.stdout = sys.stdout
+        self.stderr = sys.stderr
+        sys.stdout = self
+        sys.stderr = self
+    def __enter__(self) -> "Logger":
+        return self
+    def __exit__(self, exc_type: Any, exc_value: Any, traceback: Any) -> None:
+        self.close()
+    def write(self, text: Union[str, bytes]) -> None:
+        """Write text to stdout (and a file) and optionally flush."""
+        if isinstance(text, bytes):
+            text = text.decode()
+        if len(text) == 0: # workaround for a bug in VSCode debugger: sys.stdout.write(''); sys.stdout.flush() => crash
+            return
+        if self.file is not None:
+            self.file.write(text)
+        self.stdout.write(text)
+        if self.should_flush:
+            self.flush()
+    def flush(self) -> None:
+        """Flush written text to both stdout and a file, if open."""
+        if self.file is not None:
+            self.file.flush()
+        self.stdout.flush()
+    def close(self) -> None:
+        """Flush, close possible files, and remove stdout/stderr mirroring."""
+        self.flush()
+        # if using multiple loggers, prevent closing in wrong order
+        if sys.stdout is self:
+            sys.stdout = self.stdout
+        if sys.stderr is self:
+            sys.stderr = self.stderr
+        if self.file is not None:
+            self.file.close()
+            self.file = None
+# Cache directories
+# ------------------------------------------------------------------------------------------
+_dnnlib_cache_dir = None
+def set_cache_dir(path: str) -> None:
+    global _dnnlib_cache_dir
+    _dnnlib_cache_dir = path
+def make_cache_dir_path(*paths: str) -> str:
+    if _dnnlib_cache_dir is not None:
+        return os.path.join(_dnnlib_cache_dir, *paths)
+    if 'DNNLIB_CACHE_DIR' in os.environ:
+        return os.path.join(os.environ['DNNLIB_CACHE_DIR'], *paths)
+    if 'HOME' in os.environ:
+        return os.path.join(os.environ['HOME'], '.cache', 'dnnlib', *paths)
+    if 'USERPROFILE' in os.environ:
+        return os.path.join(os.environ['USERPROFILE'], '.cache', 'dnnlib', *paths)
+    return os.path.join(tempfile.gettempdir(), '.cache', 'dnnlib', *paths)
+# Small util functions
+# ------------------------------------------------------------------------------------------
+def format_time(seconds: Union[int, float]) -> str:
+    """Convert the seconds to human readable string with days, hours, minutes and seconds."""
+    s = int(np.rint(seconds))
+    if s < 60:
+        return "{0}s".format(s)
+    elif s < 60 * 60:
+        return "{0}m {1:02}s".format(s // 60, s % 60)
+    elif s < 24 * 60 * 60:
+        return "{0}h {1:02}m {2:02}s".format(s // (60 * 60), (s // 60) % 60, s % 60)
+    else:
+        return "{0}d {1:02}h {2:02}m".format(s // (24 * 60 * 60), (s // (60 * 60)) % 24, (s // 60) % 60)
+def ask_yes_no(question: str) -> bool:
+    """Ask the user the question until the user inputs a valid answer."""
+    while True:
+        try:
+            print("{0} [y/n]".format(question))
+            return strtobool(input().lower())
+        except ValueError:
+            pass
+def tuple_product(t: Tuple) -> Any:
+    """Calculate the product of the tuple elements."""
+    result = 1
+    for v in t:
+        result *= v
+    return result
+_str_to_ctype = {
+    "uint8": ctypes.c_ubyte,
+    "uint16": ctypes.c_uint16,
+    "uint32": ctypes.c_uint32,
+    "uint64": ctypes.c_uint64,
+    "int8": ctypes.c_byte,
+    "int16": ctypes.c_int16,
+    "int32": ctypes.c_int32,
+    "int64": ctypes.c_int64,
+    "float32": ctypes.c_float,
+    "float64": ctypes.c_double
+}
+def get_dtype_and_ctype(type_obj: Any) -> Tuple[np.dtype, Any]:
+    """Given a type name string (or an object having a __name__ attribute), return matching Numpy and ctypes types that have the same size in bytes."""
+    type_str = None
+    if isinstance(type_obj, str):
+        type_str = type_obj
+    elif hasattr(type_obj, "__name__"):
+        type_str = type_obj.__name__
+    elif hasattr(type_obj, "name"):
+        type_str = type_obj.name
+    else:
+        raise RuntimeError("Cannot infer type name from input")
+    assert type_str in _str_to_ctype.keys()
+    my_dtype = np.dtype(type_str)
+    my_ctype = _str_to_ctype[type_str]
+    assert my_dtype.itemsize == ctypes.sizeof(my_ctype)
+    return my_dtype, my_ctype
+def is_pickleable(obj: Any) -> bool:
+    try:
+        with io.BytesIO() as stream:
+            pickle.dump(obj, stream)
+        return True
+    except:
+        return False
+# Functionality to import modules/objects by name, and call functions by name
+# ------------------------------------------------------------------------------------------
+def get_module_from_obj_name(obj_name: str) -> Tuple[types.ModuleType, str]:
+    """Searches for the underlying module behind the name to some python object.
+    Returns the module and the object name (original name with module part removed)."""
+    # allow convenience shorthands, substitute them by full names
+    obj_name = re.sub("^np.", "numpy.", obj_name)
+    obj_name = re.sub("^tf.", "tensorflow.", obj_name)
+    # list alternatives for (module_name, local_obj_name)
+    parts = obj_name.split(".")
+    name_pairs = [(".".join(parts[:i]), ".".join(parts[i:])) for i in range(len(parts), 0, -1)]
+    # try each alternative in turn
+    for module_name, local_obj_name in name_pairs:
+        try:
+            module = importlib.import_module(module_name) # may raise ImportError
+            get_obj_from_module(module, local_obj_name) # may raise AttributeError
+            return module, local_obj_name
+        except:
+            pass
+    # maybe some of the modules themselves contain errors?
+    for module_name, _local_obj_name in name_pairs:
+        try:
+            importlib.import_module(module_name) # may raise ImportError
+        except ImportError:
+            if not str(sys.exc_info()[1]).startswith("No module named '" + module_name + "'"):
+                raise
+    # maybe the requested attribute is missing?
+    for module_name, local_obj_name in name_pairs:
+        try:
+            module = importlib.import_module(module_name) # may raise ImportError
+            get_obj_from_module(module, local_obj_name) # may raise AttributeError
+        except ImportError:
+            pass
+    # we are out of luck, but we have no idea why
+    raise ImportError(obj_name)
+def get_obj_from_module(module: types.ModuleType, obj_name: str) -> Any:
+    """Traverses the object name and returns the last (rightmost) python object."""
+    if obj_name == '':
+        return module
+    obj = module
+    for part in obj_name.split("."):
+        obj = getattr(obj, part)
+    return obj
+def get_obj_by_name(name: str) -> Any:
+    """Finds the python object with the given name."""
+    module, obj_name = get_module_from_obj_name(name)
+    return get_obj_from_module(module, obj_name)
+def call_func_by_name(*args, func_name: str = None, **kwargs) -> Any:
+    """Finds the python object with the given name and calls it as a function."""
+    assert func_name is not None
+    func_obj = get_obj_by_name(func_name)
+    assert callable(func_obj)
+    return func_obj(*args, **kwargs)
+def construct_class_by_name(*args, class_name: str = None, **kwargs) -> Any:
+    """Finds the python class with the given name and constructs it with the given arguments."""
+    return call_func_by_name(*args, func_name=class_name, **kwargs)
+def get_module_dir_by_obj_name(obj_name: str) -> str:
+    """Get the directory path of the module containing the given object name."""
+    module, _ = get_module_from_obj_name(obj_name)
+    return os.path.dirname(inspect.getfile(module))
+def is_top_level_function(obj: Any) -> bool:
+    """Determine whether the given object is a top-level function, i.e., defined at module scope using 'def'."""
+    return callable(obj) and obj.__name__ in sys.modules[obj.__module__].__dict__
+def get_top_level_function_name(obj: Any) -> str:
+    """Return the fully-qualified name of a top-level function."""
+    assert is_top_level_function(obj)
+    module = obj.__module__
+    if module == '__main__':
+        module = os.path.splitext(os.path.basename(sys.modules[module].__file__))[0]
+    return module + "." + obj.__name__
+# File system helpers
+# ------------------------------------------------------------------------------------------
+def list_dir_recursively_with_ignore(dir_path: str, ignores: List[str] = None, add_base_to_relative: bool = False) -> List[Tuple[str, str]]:
+    """List all files recursively in a given directory while ignoring given file and directory names.
+    Returns list of tuples containing both absolute and relative paths."""
+    assert os.path.isdir(dir_path)
+    base_name = os.path.basename(os.path.normpath(dir_path))
+    if ignores is None:
+        ignores = []
+    result = []
+    for root, dirs, files in os.walk(dir_path, topdown=True):
+        for ignore_ in ignores:
+            dirs_to_remove = [d for d in dirs if fnmatch.fnmatch(d, ignore_)]
+            # dirs need to be edited in-place
+            for d in dirs_to_remove:
+                dirs.remove(d)
+            files = [f for f in files if not fnmatch.fnmatch(f, ignore_)]
+        absolute_paths = [os.path.join(root, f) for f in files]
+        relative_paths = [os.path.relpath(p, dir_path) for p in absolute_paths]
+        if add_base_to_relative:
+            relative_paths = [os.path.join(base_name, p) for p in relative_paths]
+        assert len(absolute_paths) == len(relative_paths)
+        result += zip(absolute_paths, relative_paths)
+    return result
+def copy_files_and_create_dirs(files: List[Tuple[str, str]]) -> None:
+    """Takes in a list of tuples of (src, dst) paths and copies files.
+    Will create all necessary directories."""
+    for file in files:
+        target_dir_name = os.path.dirname(file[1])
+        # will create all intermediate-level directories
+        if not os.path.exists(target_dir_name):
+            os.makedirs(target_dir_name)
+        shutil.copyfile(file[0], file[1])
+# URL helpers
+# ------------------------------------------------------------------------------------------
+def is_url(obj: Any, allow_file_urls: bool = False) -> bool:
+    """Determine whether the given object is a valid URL string."""
+    if not isinstance(obj, str) or not "://" in obj:
+        return False
+    if allow_file_urls and obj.startswith('file://'):
+        return True
+    try:
+        res = requests.compat.urlparse(obj)
+        if not res.scheme or not res.netloc or not "." in res.netloc:
+            return False
+        res = requests.compat.urlparse(requests.compat.urljoin(obj, "/"))
+        if not res.scheme or not res.netloc or not "." in res.netloc:
+            return False
+    except:
+        return False
+    return True
+def open_url(url: str, cache_dir: str = None, num_attempts: int = 10, verbose: bool = True, return_filename: bool = False, cache: bool = True) -> Any:
+    """Download the given URL and return a binary-mode file object to access the data."""
+    assert num_attempts >= 1
+    assert not (return_filename and (not cache))
+    # Doesn't look like an URL scheme so interpret it as a local filename.
+    if not re.match('^[a-z]+://', url):
+        return url if return_filename else open(url, "rb")
+    # Handle file URLs.  This code handles unusual file:// patterns that
+    # arise on Windows:
+    #
+    # file:///c:/foo.txt
+    #
+    # which would translate to a local '/c:/foo.txt' filename that's
+    # invalid.  Drop the forward slash for such pathnames.
+    #
+    # If you touch this code path, you should test it on both Linux and
+    # Windows.
+    #
+    # Some internet resources suggest using urllib.request.url2pathname() but
+    # but that converts forward slashes to backslashes and this causes
+    # its own set of problems.
+    if url.startswith('file://'):
+        filename = urllib.parse.urlparse(url).path
+        if re.match(r'^/[a-zA-Z]:', filename):
+            filename = filename[1:]
+        return filename if return_filename else open(filename, "rb")
+    assert is_url(url)
+    # Lookup from cache.
+    if cache_dir is None:
+        cache_dir = make_cache_dir_path('downloads')
+    url_md5 = hashlib.md5(url.encode("utf-8")).hexdigest()
+    if cache:
+        cache_files = glob.glob(os.path.join(cache_dir, url_md5 + "_*"))
+        if len(cache_files) == 1:
+            filename = cache_files[0]
+            return filename if return_filename else open(filename, "rb")
+    # Download.
+    url_name = None
+    url_data = None
+    with requests.Session() as session:
+        if verbose:
+            print("Downloading %s ..." % url, end="", flush=True)
+        for attempts_left in reversed(range(num_attempts)):
+            try:
+                with session.get(url) as res:
+                    res.raise_for_status()
+                    if len(res.content) == 0:
+                        raise IOError("No data received")
+                    if len(res.content) < 8192:
+                        content_str = res.content.decode("utf-8")
+                        if "download_warning" in res.headers.get("Set-Cookie", ""):
+                            links = [html.unescape(link) for link in content_str.split('"') if "export=download" in link]
+                            if len(links) == 1:
+                                url = requests.compat.urljoin(url, links[0])
+                                raise IOError("Google Drive virus checker nag")
+                        if "Google Drive - Quota exceeded" in content_str:
+                            raise IOError("Google Drive download quota exceeded -- please try again later")
+                    match = re.search(r'filename="([^"]*)"', res.headers.get("Content-Disposition", ""))
+                    url_name = match[1] if match else url
+                    url_data = res.content
+                    if verbose:
+                        print(" done")
+                    break
+            except KeyboardInterrupt:
+                raise
+            except:
+                if not attempts_left:
+                    if verbose:
+                        print(" failed")
+                    raise
+                if verbose:
+                    print(".", end="", flush=True)
+    # Save to cache.
+    if cache:
+        safe_name = re.sub(r"[^0-9a-zA-Z-._]", "_", url_name)
+        cache_file = os.path.join(cache_dir, url_md5 + "_" + safe_name)
+        temp_file = os.path.join(cache_dir, "tmp_" + uuid.uuid4().hex + "_" + url_md5 + "_" + safe_name)
+        os.makedirs(cache_dir, exist_ok=True)
+        with open(temp_file, "wb") as f:
+            f.write(url_data)
+        os.replace(temp_file, cache_file) # atomic
+        if return_filename:
+            return cache_file
+    # Return data as file object.
+    assert not return_filename
+    return io.BytesIO(url_data)

src/infra/__init__.py ADDED Viewed

File without changes

src/infra/experiments.yaml ADDED Viewed

	@@ -0,0 +1,60 @@

+#----------------------------------------------------------------------------
+# Here, we keep the experiments HPs in case we want to do mass-launching via SLURM
+#----------------------------------------------------------------------------
+mocogan_sg2:
+  common_args:
+    model: mocogan
+    training.batch: 16
+    dataset.max_num_frames: 32
+  experiments:
+    b16_mnf16:
+      sampling: traditional_16
+      dataset.max_num_frames: 16
+      model.generator.motion.long_history: false
+#----------------------------------------------------------------------------
+ffs:
+  common_args:
+    sampling.num_frames_per_video: 3
+  experiments:
+    mnf1024_sfpm32_minperiod16: {}
+    mnf1024_sfpm32_minperiod32:
+      model.generator.time_enc.min_period_len: 32
+#----------------------------------------------------------------------------
+sky_timelapse:
+  common_args:
+    sampling.num_frames_per_video: 3
+  experiments:
+    mnf1024_sfpm32_minperiod16: {}
+    mnf1024_sfpm256_minperiod256:
+      model.generator.motion.motion_z_distance: 256
+      model.generator.time_enc.min_period_len: 256
+#----------------------------------------------------------------------------
+highres:
+  common_args:
+    training.metrics: \"fvd2048_16f,fvd2048_128f_subsample,fid50k_full\"
+    training.batch: 16
+    sampling.num_frames_per_video: 2
+  experiments:
+    mnf1024_sfpm32_minperiod16_batch16: {}
+    mnf32_sfpm32_minperiod16_batch16:
+      dataset.max_num_frames: 32
+#----------------------------------------------------------------------------
+cond_ablation_ffs:
+  common_args:
+    sampling.num_frames_per_video: 3
+  experiments:
+    hyper_mod:
+      model.discriminator.hyper_type: hyper
+    without_proj_cond:
+      model.discriminator.dummy_c: true
+#----------------------------------------------------------------------------

src/infra/launch.py ADDED Viewed

	@@ -0,0 +1,113 @@

+"""
+Run a __reproducible__ experiment on __allocated__ resources
+It submits a slurm job(s) with the given hyperparams which will then execute `slurm_job.py`
+This is the main entry-point
+"""
+import os
+os.environ["HYDRA_FULL_ERROR"] = "1"
+import subprocess
+import re
+import hydra
+from omegaconf import DictConfig, OmegaConf
+from pathlib import Path
+from utils import create_project_dir, recursive_instantiate
+#----------------------------------------------------------------------------
+HYDRA_ARGS = "hydra.run.dir=. hydra.output_subdir=null hydra/job_logging=disabled hydra/hydra_logging=disabled"
+#----------------------------------------------------------------------------
+@hydra.main(config_path="../../configs", config_name="config.yaml")
+def main(cfg: DictConfig):
+    recursive_instantiate(cfg)
+    OmegaConf.set_struct(cfg, True)
+    cfg.env.project_path = str(cfg.env.project_path) # This is needed to evaluate ${hydra:runtime.cwd}
+    before_train_cmd = '\n'.join(cfg.env.before_train_commands)
+    before_train_cmd = before_train_cmd + '\n' if len(before_train_cmd) > 0 else ''
+    torch_extensions_dir = os.environ.get('TORCH_EXTENSIONS_DIR', cfg.env.torch_extensions_dir)
+    training_cmd = f'{before_train_cmd}TORCH_EXTENSIONS_DIR={torch_extensions_dir} cd {cfg.project_release_dir} && {cfg.env.python_bin} src/train.py {HYDRA_ARGS}'
+    quiet = cfg.get('quiet', False)
+    training_cmd_save_path = os.path.join(cfg.project_release_dir, 'training_cmd.sh')
+    cfg_save_path = os.path.join(cfg.project_release_dir, 'experiment_config.yaml')
+    if not quiet:
+        print('<=== TRAINING COMMAND START ===>')
+        print(training_cmd)
+        print('<=== TRAINING COMMAND END ===>')
+    is_running_from_scratch = True
+    if cfg.training.resume == "latest" and os.path.isdir(cfg.project_release_dir) and os.path.isfile(training_cmd_save_path) and os.path.isfile(cfg_save_path):
+        is_running_from_scratch = False
+        if not quiet:
+            print("We are going to resume the training and the experiment already exists. " \
+                "That's why the provided config/training_cmd are discarded and the project dir is not created.")
+    if is_running_from_scratch and not cfg.print_only:
+        create_project_dir(
+            cfg.project_release_dir,
+            cfg.env.objects_to_copy,
+            cfg.env.symlinks_to_create,
+            quiet=quiet,
+            ignore_uncommited_changes=cfg.get('ignore_uncommited_changes', False),
+            overwrite=cfg.get('overwrite', False))
+        with open(training_cmd_save_path, 'w') as f:
+            f.write(training_cmd + '\n')
+            if not quiet:
+                print(f'Saved training command in {training_cmd_save_path}')
+        with open(cfg_save_path, 'w') as f:
+            OmegaConf.save(config=cfg, f=f)
+            if not quiet:
+                print(f'Saved config in {cfg_save_path}')
+    if not cfg.print_only:
+        os.chdir(cfg.project_release_dir)
+    if cfg.slurm:
+        assert Path(cfg.dataset.path_for_slurm_job).exists()
+        curr_job_id = None
+        for i in range(cfg.job_sequence_length):
+            if i == 0:
+                deps_args_str = ''
+            else:
+                deps_args_str = f'--dependency=afterany:{curr_job_id}'
+            # Submitting the slurm job
+            qos_arg_str = f'--account {os.environ["PRIORITY_BOOST_ACC"]}' if cfg.use_qos else ''
+            output_file_arg_str = f'--output {cfg.project_release_dir}/slurm_{i}.log'
+            submit_job_cmd = f'sbatch {cfg.sbatch_args_str} {output_file_arg_str} {qos_arg_str} --export=ALL,{cfg.env_args_str} {deps_args_str} src/infra/slurm_job_proxy.sh'
+            if cfg.print_only:
+                print(submit_job_cmd)
+                curr_job_id = "DUMMY_JOB_ID"
+            else:
+                result = subprocess.run(submit_job_cmd, stdout=subprocess.PIPE, shell=True)
+                output_str = result.stdout.decode("utf-8").strip("\n") # It has a format of "Submitted batch job 17033559"
+                if not quiet or i == 0:
+                    print(output_str)
+                curr_job_id = re.findall(r"^Submitted\ batch\ job\ \d{5,8}$", output_str)
+                assert len(curr_job_id) == 1, f"Bad output: `{output_str}`"
+                curr_job_id = int(curr_job_id[0][len('Submitted batch job '):])
+    else:
+        assert cfg.job_sequence_length == 1, "You can use a job sequence only when running via slurm."
+        if cfg.print_only:
+            print(training_cmd)
+        else:
+            os.system(training_cmd)
+#----------------------------------------------------------------------------
+if __name__ == "__main__":
+    main()
+#----------------------------------------------------------------------------

src/infra/slurm_batch_launch.py ADDED Viewed

	@@ -0,0 +1,96 @@

+import os
+import argparse
+import copy
+from typing import List, Dict, Optional
+from omegaconf import OmegaConf, DictConfig
+from src.infra.utils import cfg_to_args_str
+#----------------------------------------------------------------------------
+HYDRA_ARGS = "hydra.run.dir=. hydra.output_subdir=null hydra/job_logging=disabled hydra/hydra_logging=disabled"
+#----------------------------------------------------------------------------
+def batch_launch(launcher: str, experiments_dir: os.PathLike, cfg: DictConfig, datasets: List[str], print_only: bool, time: str, use_qos: bool=False, other_args: Dict={}, num_gpus: int=4, *args, **kwargs):
+    for dataset in datasets:
+        for exp_args in construct_experiments_args(cfg, *args, **kwargs):
+            exp_args['sbatch_args.time'] = time
+            exp_args['experiments_dir'] = experiments_dir
+            exp_args['dataset'] = dataset
+            exp_args['env'] = 'ibex'
+            exp_args['use_qos'] = use_qos
+            exp_args = {**exp_args, **other_args}
+            curr_exp_args_str = cfg_to_args_str(exp_args, use_dashes=False)
+            launching_command = f"{launcher} num_gpus={num_gpus} {curr_exp_args_str}"
+            if print_only:
+                os.makedirs(exp_args['experiments_dir'], exist_ok=True)
+                print(launching_command)
+            else:
+                os.system(launching_command)
+#----------------------------------------------------------------------------
+def construct_experiments_args(cfg: DictConfig, experiments_list: Optional[List[str]]=None, suffix: str="") -> List[Dict]:
+    args_dicts = []
+    common_cfg = cfg.get('common_args', {})
+    for exp_name, exp_cfg in to_dict(cfg.experiments).items():
+        if not experiments_list is None and not exp_name in experiments_list:
+            continue
+        curr_exp_cfg = {**copy.deepcopy(to_dict(common_cfg)), **to_dict(exp_cfg)}
+        curr_exp_cfg['exp_suffix'] = f'{exp_name}{suffix}'
+        args_dicts.append(curr_exp_cfg)
+    return args_dicts
+#----------------------------------------------------------------------------
+def to_dict(cfg) -> Dict:
+    return OmegaConf.to_container(OmegaConf.create({**cfg}))
+#----------------------------------------------------------------------------
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description="Experiments launcher")
+    parser.add_argument('-e', '--series_name', type=str, required=True, help="Which experiments series to launch?")
+    parser.add_argument('-d', '--datasets', required=True, type=str, help='Comma-separate list of datasets')
+    parser.add_argument('-p', '--print_only', action='store_true', help='Just print commands and exit?')
+    parser.add_argument('-t', '--time', type=str, default='1-0', help='Which time to specify for the sbatch command?')
+    parser.add_argument('-q', '--use_qos', action='store_true', help='Should we use QoS to launch jobs?')
+    parser.add_argument('--experiments_list', type=str, help='Should we run only some specific experiments from this experiments series?')
+    parser.add_argument('--other_args', type=str, default="", help='Additional arguments for the experiments')
+    parser.add_argument('--suffix', type=str, default="", help='Additional suffix for the experiments')
+    parser.add_argument('--num_gpus', type=int, default=4, help='Number of GPUs to use per each experiment')
+    parser.add_argument('--project_dir', type=str, default=os.getcwd(), help='Project directory path')
+    parser.add_argument('--project_dir_for_exps_cfg', type=str, help="Overwrite the project directory to use for experiments.yaml. Useful for debugging the config.")
+    args = parser.parse_args()
+    os.chdir(args.project_dir)
+    user = os.environ.get('USER', 'unknown')
+    python_bin = os.path.join(args.project_dir, 'env/bin/python')
+    launcher = f"{python_bin} src/infra/launch.py {HYDRA_ARGS} +quiet=true slurm=true"
+    experiments_dir = f'experiments/{user}/{args.series_name}'
+    exps_cfg_path = os.path.join(args.project_dir if args.project_dir_for_exps_cfg is None else args.project_dir_for_exps_cfg, 'src/infra/experiments.yaml')
+    all_exp_series = OmegaConf.load(exps_cfg_path)
+    assert args.series_name in all_exp_series, f"Experiments series not found: {args.series_name}"
+    cfg = all_exp_series[args.series_name]
+    datasets = args.datasets.split(',')
+    experiments_list = None if args.experiments_list is None else args.experiments_list.split(',')
+    other_args = {kv.split('=')[0]: kv.split('=')[1] for kv in args.other_args.split(',') if len(kv.split('=')) == 2}
+    batch_launch(
+        launcher=launcher,
+        experiments_dir=experiments_dir,
+        cfg=cfg,
+        datasets=datasets,
+        print_only=args.print_only,
+        time=args.time,
+        use_qos=args.use_qos,
+        experiments_list=experiments_list,
+        other_args=other_args,
+        suffix=args.suffix,
+        num_gpus=args.num_gpus,
+    )
+#----------------------------------------------------------------------------

src/infra/slurm_job.py ADDED Viewed

	@@ -0,0 +1,46 @@

+"""
+Must be launched from the released project dir
+"""
+import os
+import time
+import random
+import subprocess
+from shutil import copyfile
+import hydra
+from omegaconf import DictConfig
+# Unfortunately, (AFAIK) we cannot pass arguments normally (to parse them with argparse)
+# that's why we are reading them from env
+SLURM_JOB_ID = os.getenv('SLURM_JOB_ID')
+project_dir = os.getenv('project_dir')
+python_bin = os.getenv('python_bin')
+# Printing the environment
+print('PROJECT DIR:', project_dir)
+print(f'SLURM_JOB_ID: {SLURM_JOB_ID}')
+print('HOSTNAME:', subprocess.run(['hostname'], stdout=subprocess.PIPE).stdout.decode('utf-8'))
+print(subprocess.run([os.path.join(os.path.dirname(python_bin), 'gpustat')], stdout=subprocess.PIPE).stdout.decode('utf-8'))
+@hydra.main(config_name=os.path.join(project_dir, 'experiment_config.yaml'))
+def main(cfg: DictConfig):
+    os.chdir(project_dir)
+    target_data_dir_base = os.path.dirname(cfg.dataset.path)
+    if os.path.islink(target_data_dir_base):
+        os.makedirs(os.readlink(target_data_dir_base), exist_ok=True)
+    else:
+        os.makedirs(target_data_dir_base, exist_ok=True)
+    copyfile(cfg.dataset.path_for_slurm_job, cfg.dataset.path)
+    print(f'Copied the data: {cfg.dataset.path_for_slurm_job} => {cfg.dataset.path}. Starting the training...')
+    training_cmd = open('training_cmd.sh').read()
+    print('<=== TRAINING COMMAND ===>')
+    print(training_cmd)
+    os.system(training_cmd)
+if __name__ == "__main__":
+    main()

src/infra/slurm_job_proxy.sh ADDED Viewed

	@@ -0,0 +1,4 @@

+#!/bin/bash
+# We need this proxy so not to put the shebang into `slurm_job.py`
+# We cannot put a shebang there since we use different python executors for it
+$python_bin $python_script

src/infra/utils.py ADDED Viewed

	@@ -0,0 +1,140 @@

+import os
+import shutil
+import subprocess
+from distutils.dir_util import copy_tree
+from shutil import copyfile
+from typing import List, Optional
+from hydra.utils import instantiate
+import click
+import git
+from omegaconf import DictConfig
+#----------------------------------------------------------------------------
+def copy_objects(target_dir: os.PathLike, objects_to_copy: List[os.PathLike]):
+    for src_path in objects_to_copy:
+        trg_path = os.path.join(target_dir, os.path.basename(src_path))
+        if os.path.islink(src_path):
+            os.symlink(os.readlink(src_path), trg_path)
+        elif os.path.isfile(src_path):
+            copyfile(src_path, trg_path)
+        elif os.path.isdir(src_path):
+            copy_tree(src_path, trg_path)
+        else:
+            raise NotImplementedError(f"Unknown object type: {src_path}")
+#----------------------------------------------------------------------------
+def create_symlinks(target_dir: os.PathLike, symlinks_to_create: List[os.PathLike]):
+    """
+    Creates symlinks to the given paths
+    """
+    for src_path in symlinks_to_create:
+        trg_path = os.path.join(target_dir, os.path.basename(src_path))
+        if os.path.islink(src_path):
+            # Let's not create symlinks to symlinks
+            # Since dropping the current symlink will break the experiment
+            os.symlink(os.readlink(src_path), trg_path)
+        else:
+            print(f'Creating a symlink to {src_path}, so try not to delete it occasionally!')
+            os.symlink(src_path, trg_path)
+#----------------------------------------------------------------------------
+def is_git_repo(path: os.PathLike):
+    try:
+        _ = git.Repo(path).git_dir
+        return True
+    except git.exc.InvalidGitRepositoryError:
+        return False
+#----------------------------------------------------------------------------
+def create_project_dir(
+    project_dir: os.PathLike,
+    objects_to_copy: List[os.PathLike],
+    symlinks_to_create: List[os.PathLike],
+    quiet: bool=False,
+    ignore_uncommited_changes: bool=False,
+    overwrite: bool=False):
+    if is_git_repo(os.getcwd()) and are_there_uncommitted_changes():
+        if ignore_uncommited_changes or click.confirm("There are uncommited changes. Continue?", default=False):
+            pass
+        else:
+            raise PermissionError("Cannot created a dir when there are uncommited changes")
+    if os.path.exists(project_dir):
+        if overwrite or click.confirm(f'Dir {project_dir} already exists. Overwrite it?', default=False):
+            shutil.rmtree(project_dir)
+        else:
+            print('User refused to delete an existing project dir.')
+            raise PermissionError("There is an existing dir and I cannot delete it.")
+    os.makedirs(project_dir)
+    copy_objects(project_dir, objects_to_copy)
+    create_symlinks(project_dir, symlinks_to_create)
+    if not quiet:
+        print(f'Created a project dir: {project_dir}')
+#----------------------------------------------------------------------------
+def get_git_hash() -> Optional[str]:
+    if not is_git_repo(os.getcwd()):
+        return None
+    try:
+        return subprocess \
+            .check_output(['git', 'rev-parse', '--short', 'HEAD']) \
+            .decode("utf-8") \
+            .strip()
+    except:
+        return None
+#----------------------------------------------------------------------------
+# def get_experiment_path(master_dir: os.PathLike, experiment_name: str) -> os.PathLike:
+#     return os.path.join(master_dir, f"{experiment_name}-{get_git_hash()}")
+#----------------------------------------------------------------------------
+def get_git_hash_suffix() -> str:
+    git_hash: Optional[str] = get_git_hash()
+    git_hash_suffix = "-nogit" if git_hash is None else f"-{git_hash}"
+    return git_hash_suffix
+#----------------------------------------------------------------------------
+def are_there_uncommitted_changes() -> bool:
+    return len(subprocess.check_output('git status -s'.split()).decode("utf-8")) > 0
+#----------------------------------------------------------------------------
+def cfg_to_args_str(cfg: DictConfig, use_dashes=True) -> str:
+    dashes = '--' if use_dashes else ''
+    return ' '.join([f'{dashes}{p}={cfg[p]}' for p in cfg])
+#----------------------------------------------------------------------------
+def recursive_instantiate(cfg: DictConfig):
+    for key in cfg:
+        # print(type(cfg[key]))
+        if isinstance(cfg[key], DictConfig):
+            if '_target_' in cfg[key]:
+                cfg[key] = instantiate(cfg[key])
+            else:
+                recursive_instantiate(cfg[key])
+#----------------------------------------------------------------------------
+def num_gpus_to_mem(num_gpus: int, mem_per_gpu: 64) -> str:
+    # Doing it here since hydra config cannot do formatting for ${...}
+    return f"{num_gpus * mem_per_gpu}G"
+#----------------------------------------------------------------------------

src/metrics/__init__.py ADDED Viewed

	@@ -0,0 +1,9 @@

+# Copyright (c) 2021, NVIDIA CORPORATION.  All rights reserved.
+#
+# NVIDIA CORPORATION and its licensors retain all intellectual property
+# and proprietary rights in and to this software, related documentation
+# and any modifications thereto.  Any use, reproduction, disclosure or
+# distribution of this software and related documentation without an express
+# license agreement from NVIDIA CORPORATION is strictly prohibited.
+# empty

src/metrics/frechet_inception_distance.py ADDED Viewed

	@@ -0,0 +1,54 @@

+# Copyright (c) 2021, NVIDIA CORPORATION.  All rights reserved.
+#
+# NVIDIA CORPORATION and its licensors retain all intellectual property
+# and proprietary rights in and to this software, related documentation
+# and any modifications thereto.  Any use, reproduction, disclosure or
+# distribution of this software and related documentation without an express
+# license agreement from NVIDIA CORPORATION is strictly prohibited.
+"""Frechet Inception Distance (FID) from the paper
+"GANs trained by a two time-scale update rule converge to a local Nash
+equilibrium". Matches the original implementation by Heusel et al. at
+https://github.com/bioinf-jku/TTUR/blob/master/fid.py"""
+import numpy as np
+import scipy.linalg
+from . import metric_utils
+NUM_FRAMES_IN_BATCH = {128: 32, 256: 32, 512: 8, 1024: 2}
+#----------------------------------------------------------------------------
+def compute_fid(opts, max_real, num_gen):
+    # Direct TorchScript translation of http://download.tensorflow.org/models/image/imagenet/inception-2015-12-05.tgz
+    detector_url = 'https://api.ngc.nvidia.com/v2/models/nvidia/research/stylegan3/versions/1/files/metrics/inception-2015-12-05.pkl'
+    detector_kwargs = dict(return_features=True) # Return raw features before the softmax layer.
+    batch_size = NUM_FRAMES_IN_BATCH[opts.dataset_kwargs.resolution]
+    mu_real, sigma_real = metric_utils.compute_feature_stats_for_dataset(
+        opts=opts, detector_url=detector_url, detector_kwargs=detector_kwargs,
+        rel_lo=0, rel_hi=0, capture_mean_cov=True, max_items=max_real, use_image_dataset=True).get_mean_cov()
+    if opts.generator_as_dataset:
+        compute_gen_stats_fn = metric_utils.compute_feature_stats_for_dataset
+        gen_opts = metric_utils.rewrite_opts_for_gen_dataset(opts)
+        gen_kwargs = dict(use_image_dataset=True)
+    else:
+        compute_gen_stats_fn = metric_utils.compute_feature_stats_for_generator
+        gen_opts = opts
+        gen_kwargs = dict()
+    mu_gen, sigma_gen = compute_gen_stats_fn(
+        opts=gen_opts, detector_url=detector_url, detector_kwargs=detector_kwargs, batch_size=batch_size,
+        rel_lo=0, rel_hi=1, capture_mean_cov=True, max_items=num_gen, **gen_kwargs).get_mean_cov()
+    if opts.rank != 0:
+        return float('nan')
+    m = np.square(mu_gen - mu_real).sum()
+    s, _ = scipy.linalg.sqrtm(np.dot(sigma_gen, sigma_real), disp=False) # pylint: disable=no-member
+    fid = np.real(m + np.trace(sigma_gen + sigma_real - s * 2))
+    return float(fid)
+#----------------------------------------------------------------------------

src/metrics/frechet_video_distance.py ADDED Viewed

	@@ -0,0 +1,59 @@

+"""
+Frechet Video Distance (FVD). Matches the original tensorflow implementation from
+https://github.com/google-research/google-research/blob/master/frechet_video_distance/frechet_video_distance.py
+up to the upsampling operation. Note that this tf.hub I3D model is different from the one released in the I3D repo.
+"""
+import copy
+import numpy as np
+import scipy.linalg
+from . import metric_utils
+#----------------------------------------------------------------------------
+NUM_FRAMES_IN_BATCH = {128: 128, 256: 128, 512: 64, 1024: 32}
+#----------------------------------------------------------------------------
+def compute_fvd(opts, max_real: int, num_gen: int, num_frames: int, subsample_factor: int=1):
+    # Perfectly reproduced torchscript version of the I3D model, trained on Kinetics-400, used here:
+    # https://github.com/google-research/google-research/blob/master/frechet_video_distance/frechet_video_distance.py
+    # Note that the weights on tf.hub (used in the script above) differ from the original released weights
+    detector_url = 'https://www.dropbox.com/s/ge9e5ujwgetktms/i3d_torchscript.pt?dl=1'
+    detector_kwargs = dict(rescale=True, resize=True, return_features=True) # Return raw features before the softmax layer.
+    opts = copy.deepcopy(opts)
+    opts.dataset_kwargs.load_n_consecutive = num_frames
+    opts.dataset_kwargs.subsample_factor = subsample_factor
+    opts.dataset_kwargs.discard_short_videos = True
+    batch_size = NUM_FRAMES_IN_BATCH[opts.dataset_kwargs.resolution] // num_frames
+    mu_real, sigma_real = metric_utils.compute_feature_stats_for_dataset(
+        opts=opts, detector_url=detector_url, detector_kwargs=detector_kwargs, rel_lo=0, rel_hi=0,
+        capture_mean_cov=True, max_items=max_real, temporal_detector=True, batch_size=batch_size).get_mean_cov()
+    if opts.generator_as_dataset:
+        compute_gen_stats_fn = metric_utils.compute_feature_stats_for_dataset
+        gen_opts = metric_utils.rewrite_opts_for_gen_dataset(opts)
+        gen_opts.dataset_kwargs.load_n_consecutive = num_frames
+        gen_opts.dataset_kwargs.load_n_consecutive_random_offset = False
+        gen_opts.dataset_kwargs.subsample_factor = subsample_factor
+        gen_kwargs = dict()
+    else:
+        compute_gen_stats_fn = metric_utils.compute_feature_stats_for_generator
+        gen_opts = opts
+        gen_kwargs = dict(num_video_frames=num_frames, subsample_factor=subsample_factor)
+    mu_gen, sigma_gen = compute_gen_stats_fn(
+        opts=gen_opts, detector_url=detector_url, detector_kwargs=detector_kwargs, rel_lo=0, rel_hi=1, capture_mean_cov=True,
+        max_items=num_gen, temporal_detector=True, batch_size=batch_size, **gen_kwargs).get_mean_cov()
+    if opts.rank != 0:
+        return float('nan')
+    m = np.square(mu_gen - mu_real).sum()
+    s, _ = scipy.linalg.sqrtm(np.dot(sigma_gen, sigma_real), disp=False) # pylint: disable=no-member
+    fid = np.real(m + np.trace(sigma_gen + sigma_real - s * 2))
+    return float(fid)
+#----------------------------------------------------------------------------

src/metrics/inception_score.py ADDED Viewed

	@@ -0,0 +1,47 @@

+# Copyright (c) 2021, NVIDIA CORPORATION.  All rights reserved.
+#
+# NVIDIA CORPORATION and its licensors retain all intellectual property
+# and proprietary rights in and to this software, related documentation
+# and any modifications thereto.  Any use, reproduction, disclosure or
+# distribution of this software and related documentation without an express
+# license agreement from NVIDIA CORPORATION is strictly prohibited.
+"""Inception Score (IS) from the paper "Improved techniques for training
+GANs". Matches the original implementation by Salimans et al. at
+https://github.com/openai/improved-gan/blob/master/inception_score/model.py"""
+import numpy as np
+from . import metric_utils
+#----------------------------------------------------------------------------
+def compute_is(opts, num_gen, num_splits):
+    # Direct TorchScript translation of http://download.tensorflow.org/models/image/imagenet/inception-2015-12-05.tgz
+    detector_url = 'https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada-pytorch/pretrained/metrics/inception-2015-12-05.pt'
+    detector_kwargs = dict(no_output_bias=True) # Match the original implementation by not applying bias in the softmax layer.
+    if opts.generator_as_dataset:
+        compute_gen_stats_fn = metric_utils.compute_feature_stats_for_dataset
+        gen_opts = metric_utils.rewrite_opts_for_gen_dataset(opts)
+        gen_kwargs = dict(use_image_dataset=True)
+    else:
+        compute_gen_stats_fn = metric_utils.compute_feature_stats_for_generator
+        gen_opts = opts
+        gen_kwargs = dict()
+    gen_probs = compute_gen_stats_fn(
+        opts=gen_opts, detector_url=detector_url, detector_kwargs=detector_kwargs,
+        capture_all=True, max_items=num_gen, **gen_kwargs).get_all()
+    if opts.rank != 0:
+        return float('nan'), float('nan')
+    scores = []
+    for i in range(num_splits):
+        part = gen_probs[i * num_gen // num_splits : (i + 1) * num_gen // num_splits]
+        kl = part * (np.log(part) - np.log(np.mean(part, axis=0, keepdims=True)))
+        kl = np.mean(np.sum(kl, axis=1))
+        scores.append(np.exp(kl))
+    return float(np.mean(scores)), float(np.std(scores))
+#----------------------------------------------------------------------------

src/metrics/kernel_inception_distance.py ADDED Viewed

	@@ -0,0 +1,46 @@

+# Copyright (c) 2021, NVIDIA CORPORATION.  All rights reserved.
+#
+# NVIDIA CORPORATION and its licensors retain all intellectual property
+# and proprietary rights in and to this software, related documentation
+# and any modifications thereto.  Any use, reproduction, disclosure or
+# distribution of this software and related documentation without an express
+# license agreement from NVIDIA CORPORATION is strictly prohibited.
+"""Kernel Inception Distance (KID) from the paper "Demystifying MMD
+GANs". Matches the original implementation by Binkowski et al. at
+https://github.com/mbinkowski/MMD-GAN/blob/master/gan/compute_scores.py"""
+import numpy as np
+from . import metric_utils
+#----------------------------------------------------------------------------
+def compute_kid(opts, max_real, num_gen, num_subsets, max_subset_size):
+    # Direct TorchScript translation of http://download.tensorflow.org/models/image/imagenet/inception-2015-12-05.tgz
+    detector_url = 'https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada-pytorch/pretrained/metrics/inception-2015-12-05.pt'
+    detector_kwargs = dict(return_features=True) # Return raw features before the softmax layer.
+    real_features = metric_utils.compute_feature_stats_for_dataset(
+        opts=opts, detector_url=detector_url, detector_kwargs=detector_kwargs,
+        rel_lo=0, rel_hi=0, capture_all=True, max_items=max_real, use_image_dataset=True).get_all()
+    gen_features = metric_utils.compute_feature_stats_for_generator(
+        opts=opts, detector_url=detector_url, detector_kwargs=detector_kwargs,
+        rel_lo=0, rel_hi=1, capture_all=True, max_items=num_gen).get_all()
+    if opts.rank != 0:
+        return float('nan')
+    n = real_features.shape[1]
+    m = min(min(real_features.shape[0], gen_features.shape[0]), max_subset_size)
+    t = 0
+    for _subset_idx in range(num_subsets):
+        x = gen_features[np.random.choice(gen_features.shape[0], m, replace=False)]
+        y = real_features[np.random.choice(real_features.shape[0], m, replace=False)]
+        a = (x @ x.T / n + 1) ** 3 + (y @ y.T / n + 1) ** 3
+        b = (x @ y.T / n + 1) ** 3
+        t += (a.sum() - np.diag(a).sum()) / (m - 1) - b.sum() * 2 / m
+    kid = t / num_subsets / m
+    return float(kid) * 1000.0
+#----------------------------------------------------------------------------

src/metrics/metric_main.py ADDED Viewed

	@@ -0,0 +1,154 @@

+# Copyright (c) 2021, NVIDIA CORPORATION.  All rights reserved.
+#
+# NVIDIA CORPORATION and its licensors retain all intellectual property
+# and proprietary rights in and to this software, related documentation
+# and any modifications thereto.  Any use, reproduction, disclosure or
+# distribution of this software and related documentation without an express
+# license agreement from NVIDIA CORPORATION is strictly prohibited.
+import os
+import time
+import json
+import torch
+import numpy as np
+from src import dnnlib
+from . import metric_utils
+from . import frechet_inception_distance
+from . import kernel_inception_distance
+from . import inception_score
+from . import video_inception_score
+from . import frechet_video_distance
+#----------------------------------------------------------------------------
+_metric_dict = dict() # name => fn
+def register_metric(fn):
+    assert callable(fn)
+    _metric_dict[fn.__name__] = fn
+    return fn
+def is_valid_metric(metric):
+    return metric in _metric_dict
+def list_valid_metrics():
+    return list(_metric_dict.keys())
+def is_power_of_two(n: int) -> bool:
+    return (n & (n-1) == 0) and n != 0
+#----------------------------------------------------------------------------
+def calc_metric(metric, num_runs: int=1, **kwargs): # See metric_utils.MetricOptions for the full list of arguments.
+    assert is_valid_metric(metric)
+    opts = metric_utils.MetricOptions(**kwargs)
+    # Calculate.
+    start_time = time.time()
+    all_runs_results = [_metric_dict[metric](opts) for _ in range(num_runs)]
+    total_time = time.time() - start_time
+    # Broadcast results.
+    for results in all_runs_results:
+        for key, value in list(results.items()):
+            if opts.num_gpus > 1:
+                value = torch.as_tensor(value, dtype=torch.float64, device=opts.device)
+                torch.distributed.broadcast(tensor=value, src=0)
+                value = float(value.cpu())
+            results[key] = value
+    if num_runs > 1:
+        results = {f'{key}_run{i+1:02d}': value for i, results in enumerate(all_runs_results) for key, value in results.items()}
+        for key, value in all_runs_results[0].items():
+            all_runs_values = [r[key] for r in all_runs_results]
+            results[f'{key}_mean'] = np.mean(all_runs_values)
+            results[f'{key}_std'] = np.std(all_runs_values)
+    else:
+        results = all_runs_results[0]
+    # Decorate with metadata.
+    return dnnlib.EasyDict(
+        results         = dnnlib.EasyDict(results),
+        metric          = metric,
+        total_time      = total_time,
+        total_time_str  = dnnlib.util.format_time(total_time),
+        num_gpus        = opts.num_gpus,
+    )
+#----------------------------------------------------------------------------
+def report_metric(result_dict, run_dir=None, snapshot_pkl=None):
+    metric = result_dict['metric']
+    assert is_valid_metric(metric)
+    if run_dir is not None and snapshot_pkl is not None:
+        snapshot_pkl = os.path.relpath(snapshot_pkl, run_dir)
+    jsonl_line = json.dumps(dict(result_dict, snapshot_pkl=snapshot_pkl, timestamp=time.time()))
+    print(jsonl_line)
+    if run_dir is not None and os.path.isdir(run_dir):
+        with open(os.path.join(run_dir, f'metric-{metric}.jsonl'), 'at') as f:
+            f.write(jsonl_line + '\n')
+#----------------------------------------------------------------------------
+# Primary metrics.
+@register_metric
+def fid50k_full(opts):
+    opts.dataset_kwargs.update(max_size=None, xflip=False)
+    fid = frechet_inception_distance.compute_fid(opts, max_real=None, num_gen=50000)
+    return dict(fid50k_full=fid)
+@register_metric
+def kid50k_full(opts):
+    opts.dataset_kwargs.update(max_size=None, xflip=False)
+    kid = kernel_inception_distance.compute_kid(opts, max_real=1000000, num_gen=50000, num_subsets=100, max_subset_size=1000)
+    return dict(kid50k_full=kid)
+@register_metric
+def is50k(opts):
+    opts.dataset_kwargs.update(max_size=None, xflip=False)
+    mean, std = inception_score.compute_is(opts, num_gen=50000, num_splits=10)
+    return dict(is50k_mean=mean, is50k_std=std)
+@register_metric
+def fvd2048_16f(opts):
+    opts.dataset_kwargs.update(max_size=None, xflip=False)
+    fvd = frechet_video_distance.compute_fvd(opts, max_real=2048, num_gen=2048, num_frames=16)
+    return dict(fvd2048_16f=fvd)
+@register_metric
+def fvd2048_128f(opts):
+    opts.dataset_kwargs.update(max_size=None, xflip=False)
+    fvd = frechet_video_distance.compute_fvd(opts, max_real=2048, num_gen=2048, num_frames=128)
+    return dict(fvd2048_128f=fvd)
+@register_metric
+def fvd2048_128f_subsample8f(opts):
+    """Similar to `fvd2048_128f`, but we sample each 8-th frame"""
+    opts.dataset_kwargs.update(max_size=None, xflip=False)
+    fvd = frechet_video_distance.compute_fvd(opts, max_real=2048, num_gen=2048, num_frames=16, subsample_factor=8)
+    return dict(fvd2048_128f_subsample8f=fvd)
+@register_metric
+def isv2048_ucf(opts):
+    opts.dataset_kwargs.update(max_size=None, xflip=False)
+    mean, std = video_inception_score.compute_isv(opts, num_gen=2048, num_splits=10, backbone='c3d_ucf101')
+    return dict(isv2048_ucf_mean=mean, isv2048_ucf_std=std)
+#----------------------------------------------------------------------------
+# Legacy metrics.
+@register_metric
+def fid50k(opts):
+    opts.dataset_kwargs.update(max_size=None)
+    fid = frechet_inception_distance.compute_fid(opts, max_real=50000, num_gen=50000)
+    return dict(fid50k=fid)
+@register_metric
+def kid50k(opts):
+    opts.dataset_kwargs.update(max_size=None)
+    kid = kernel_inception_distance.compute_kid(opts, max_real=50000, num_gen=50000, num_subsets=100, max_subset_size=1000)
+    return dict(kid50k=kid)
+#----------------------------------------------------------------------------

src/metrics/metric_utils.py ADDED Viewed

	@@ -0,0 +1,332 @@

+# Copyright (c) 2021, NVIDIA CORPORATION.  All rights reserved.
+#
+# NVIDIA CORPORATION and its licensors retain all intellectual property
+# and proprietary rights in and to this software, related documentation
+# and any modifications thereto.  Any use, reproduction, disclosure or
+# distribution of this software and related documentation without an express
+# license agreement from NVIDIA CORPORATION is strictly prohibited.
+import os
+import time
+import hashlib
+import pickle
+import copy
+import uuid
+from urllib.parse import urlparse
+import numpy as np
+import torch
+from src import dnnlib
+from src.training.dataset import video_to_image_dataset_kwargs
+#----------------------------------------------------------------------------
+class MetricOptions:
+    def __init__(self, G=None, G_kwargs={}, dataset_kwargs={}, num_gpus=1, rank=0, device=None,
+                       progress=None, cache=True, gen_dataset_kwargs={}, generator_as_dataset=False):
+        assert 0 <= rank < num_gpus
+        self.G                        = G
+        self.G_kwargs                 = dnnlib.EasyDict(G_kwargs)
+        self.dataset_kwargs           = dnnlib.EasyDict(dataset_kwargs)
+        self.num_gpus                 = num_gpus
+        self.rank                     = rank
+        self.device                   = device if device is not None else torch.device('cuda', rank)
+        self.progress                 = progress.sub() if progress is not None and rank == 0 else ProgressMonitor()
+        self.cache                    = cache
+        self.gen_dataset_kwargs       = gen_dataset_kwargs
+        self.generator_as_dataset     = generator_as_dataset
+#----------------------------------------------------------------------------
+_feature_detector_cache = dict()
+def get_feature_detector_name(url):
+    return os.path.splitext(url.split('/')[-1])[0]
+def get_feature_detector(url, device=torch.device('cpu'), num_gpus=1, rank=0, verbose=False):
+    assert 0 <= rank < num_gpus
+    key = (url, device)
+    if key not in _feature_detector_cache:
+        is_leader = (rank == 0)
+        if not is_leader and num_gpus > 1:
+            torch.distributed.barrier() # leader goes first
+        with dnnlib.util.open_url(url, verbose=(verbose and is_leader)) as f:
+            if urlparse(url).path.endswith('.pkl'):
+                _feature_detector_cache[key] = pickle.load(f).to(device)
+            else:
+                _feature_detector_cache[key] = torch.jit.load(f).eval().to(device)
+        if is_leader and num_gpus > 1:
+            torch.distributed.barrier() # others follow
+    return _feature_detector_cache[key]
+#----------------------------------------------------------------------------
+class FeatureStats:
+    def __init__(self, capture_all=False, capture_mean_cov=False, max_items=None):
+        self.capture_all = capture_all
+        self.capture_mean_cov = capture_mean_cov
+        self.max_items = max_items
+        self.num_items = 0
+        self.num_features = None
+        self.all_features = None
+        self.raw_mean = None
+        self.raw_cov = None
+    def set_num_features(self, num_features):
+        if self.num_features is not None:
+            assert num_features == self.num_features
+        else:
+            self.num_features = num_features
+            self.all_features = []
+            self.raw_mean = np.zeros([num_features], dtype=np.float64)
+            self.raw_cov = np.zeros([num_features, num_features], dtype=np.float64)
+    def is_full(self):
+        return (self.max_items is not None) and (self.num_items >= self.max_items)
+    def append(self, x):
+        x = np.asarray(x, dtype=np.float32)
+        assert x.ndim == 2
+        if (self.max_items is not None) and (self.num_items + x.shape[0] > self.max_items):
+            if self.num_items >= self.max_items:
+                return
+            x = x[:self.max_items - self.num_items]
+        self.set_num_features(x.shape[1])
+        self.num_items += x.shape[0]
+        if self.capture_all:
+            self.all_features.append(x)
+        if self.capture_mean_cov:
+            x64 = x.astype(np.float64)
+            self.raw_mean += x64.sum(axis=0)
+            self.raw_cov += x64.T @ x64
+    def append_torch(self, x, num_gpus=1, rank=0):
+        assert isinstance(x, torch.Tensor) and x.ndim == 2
+        assert 0 <= rank < num_gpus
+        if num_gpus > 1:
+            ys = []
+            for src in range(num_gpus):
+                y = x.clone()
+                torch.distributed.broadcast(y, src=src)
+                ys.append(y)
+            x = torch.stack(ys, dim=1).flatten(0, 1) # interleave samples
+        self.append(x.cpu().numpy())
+    def get_all(self):
+        assert self.capture_all
+        return np.concatenate(self.all_features, axis=0)
+    def get_all_torch(self):
+        return torch.from_numpy(self.get_all())
+    def get_mean_cov(self):
+        assert self.capture_mean_cov
+        mean = self.raw_mean / self.num_items
+        cov = self.raw_cov / self.num_items
+        cov = cov - np.outer(mean, mean)
+        return mean, cov
+    def save(self, pkl_file):
+        with open(pkl_file, 'wb') as f:
+            pickle.dump(self.__dict__, f)
+    @staticmethod
+    def load(pkl_file):
+        with open(pkl_file, 'rb') as f:
+            s = dnnlib.EasyDict(pickle.load(f))
+        obj = FeatureStats(capture_all=s.capture_all, max_items=s.max_items)
+        obj.__dict__.update(s)
+        return obj
+#----------------------------------------------------------------------------
+class ProgressMonitor:
+    def __init__(self, tag=None, num_items=None, flush_interval=1000, verbose=False, progress_fn=None, pfn_lo=0, pfn_hi=1000, pfn_total=1000):
+        self.tag = tag
+        self.num_items = num_items
+        self.verbose = verbose
+        self.flush_interval = flush_interval
+        self.progress_fn = progress_fn
+        self.pfn_lo = pfn_lo
+        self.pfn_hi = pfn_hi
+        self.pfn_total = pfn_total
+        self.start_time = time.time()
+        self.batch_time = self.start_time
+        self.batch_items = 0
+        if self.progress_fn is not None:
+            self.progress_fn(self.pfn_lo, self.pfn_total)
+    def update(self, cur_items: int):
+        assert (self.num_items is None) or (cur_items <= self.num_items), f"Wrong `items` values: cur_items={cur_items}, self.num_items={self.num_items}"
+        if (cur_items < self.batch_items + self.flush_interval) and (self.num_items is None or cur_items < self.num_items):
+            return
+        cur_time = time.time()
+        total_time = cur_time - self.start_time
+        time_per_item = (cur_time - self.batch_time) / max(cur_items - self.batch_items, 1)
+        if (self.verbose) and (self.tag is not None):
+            print(f'{self.tag:<19s} items {cur_items:<7d} time {dnnlib.util.format_time(total_time):<12s} ms/item {time_per_item*1e3:.2f}')
+        self.batch_time = cur_time
+        self.batch_items = cur_items
+        if (self.progress_fn is not None) and (self.num_items is not None):
+            self.progress_fn(self.pfn_lo + (self.pfn_hi - self.pfn_lo) * (cur_items / self.num_items), self.pfn_total)
+    def sub(self, tag=None, num_items=None, flush_interval=1000, rel_lo=0, rel_hi=1):
+        return ProgressMonitor(
+            tag             = tag,
+            num_items       = num_items,
+            flush_interval  = flush_interval,
+            verbose         = self.verbose,
+            progress_fn     = self.progress_fn,
+            pfn_lo          = self.pfn_lo + (self.pfn_hi - self.pfn_lo) * rel_lo,
+            pfn_hi          = self.pfn_lo + (self.pfn_hi - self.pfn_lo) * rel_hi,
+            pfn_total       = self.pfn_total,
+        )
+#----------------------------------------------------------------------------
+@torch.no_grad()
+def compute_feature_stats_for_dataset(
+    opts, detector_url, detector_kwargs, rel_lo=0, rel_hi=1, batch_size=64,
+    data_loader_kwargs=None, max_items=None, temporal_detector=False, use_image_dataset=False,
+    feature_stats_cls=FeatureStats, **stats_kwargs):
+    dataset_kwargs = video_to_image_dataset_kwargs(opts.dataset_kwargs) if use_image_dataset else opts.dataset_kwargs
+    dataset = dnnlib.util.construct_class_by_name(**dataset_kwargs)
+    if data_loader_kwargs is None:
+        data_loader_kwargs = dict(pin_memory=True, num_workers=3, prefetch_factor=2)
+    # Try to lookup from cache.
+    cache_file = None
+    if opts.cache:
+        # Choose cache file name.
+        args = dict(dataset_kwargs=opts.dataset_kwargs, detector_url=detector_url, detector_kwargs=detector_kwargs,
+                    stats_kwargs=stats_kwargs, feature_stats_cls=feature_stats_cls.__name__)
+        md5 = hashlib.md5(repr(sorted(args.items())).encode('utf-8'))
+        cache_tag = f'{dataset.name}-{get_feature_detector_name(detector_url)}-{md5.hexdigest()}'
+        cache_file = dnnlib.make_cache_dir_path('gan-metrics', cache_tag + '.pkl')
+        # Check if the file exists (all processes must agree).
+        flag = os.path.isfile(cache_file) if opts.rank == 0 else False
+        if opts.num_gpus > 1:
+            flag = torch.as_tensor(flag, dtype=torch.float32, device=opts.device)
+            torch.distributed.broadcast(tensor=flag, src=0)
+            flag = (float(flag.cpu()) != 0)
+        # Load.
+        if flag:
+            return feature_stats_cls.load(cache_file)
+    # Initialize.
+    num_items = len(dataset)
+    if max_items is not None:
+        num_items = min(num_items, max_items)
+    stats = feature_stats_cls(max_items=num_items, **stats_kwargs)
+    progress = opts.progress.sub(tag='dataset features', num_items=num_items, rel_lo=rel_lo, rel_hi=rel_hi)
+    detector = get_feature_detector(url=detector_url, device=opts.device, num_gpus=opts.num_gpus, rank=opts.rank, verbose=progress.verbose)
+    # Main loop.
+    item_subset = [(i * opts.num_gpus + opts.rank) % num_items for i in range((num_items - 1) // opts.num_gpus + 1)]
+    for batch in torch.utils.data.DataLoader(dataset=dataset, sampler=item_subset, batch_size=batch_size, **data_loader_kwargs):
+        images = batch['image']
+        if temporal_detector:
+            images = images.permute(0, 2, 1, 3, 4).contiguous() # [batch_size, c, t, h, w]
+            # images = images.float() / 255
+            # images = torch.nn.functional.interpolate(images, size=(images.shape[2], 128, 128), mode='trilinear', align_corners=False) # downsample
+            # images = torch.nn.functional.interpolate(images, size=(images.shape[2], 256, 256), mode='trilinear', align_corners=False) # upsample
+            # images = (images * 255).to(torch.uint8)
+        else:
+            images = images.view(-1, *images.shape[-3:]) # [-1, c, h, w]
+        if images.shape[1] == 1:
+            images = images.repeat([1, 3, *([1] * (images.ndim - 2))])
+        features = detector(images.to(opts.device), **detector_kwargs)
+        stats.append_torch(features, num_gpus=opts.num_gpus, rank=opts.rank)
+        progress.update(stats.num_items)
+    # Save to cache.
+    if cache_file is not None and opts.rank == 0:
+        os.makedirs(os.path.dirname(cache_file), exist_ok=True)
+        temp_file = cache_file + '.' + uuid.uuid4().hex
+        stats.save(temp_file)
+        os.replace(temp_file, cache_file) # atomic
+    return stats
+#----------------------------------------------------------------------------
+@torch.no_grad()
+def compute_feature_stats_for_generator(
+    opts, detector_url, detector_kwargs, rel_lo=0, rel_hi=1, batch_size: int=16,
+    batch_gen=None, jit=False, temporal_detector=False, num_video_frames: int=16,
+    feature_stats_cls=FeatureStats, subsample_factor: int=1, **stats_kwargs):
+    if batch_gen is None:
+        batch_gen = min(batch_size, 4)
+    assert batch_size % batch_gen == 0
+    # Setup generator and load labels.
+    G = copy.deepcopy(opts.G).eval().requires_grad_(False).to(opts.device)
+    dataset = dnnlib.util.construct_class_by_name(**opts.dataset_kwargs)
+    # Image generation func.
+    def run_generator(z, c, t):
+        img = G(z=z, c=c, t=t, **opts.G_kwargs)
+        bt, c, h, w = img.shape
+        if temporal_detector:
+            img = img.view(bt // num_video_frames, num_video_frames, c, h, w) # [batch_size, t, c, h, w]
+            img = img.permute(0, 2, 1, 3, 4).contiguous() # [batch_size, c, t, h, w]
+        # img = torch.nn.functional.interpolate(img, size=(img.shape[2], 128, 128), mode='trilinear', align_corners=False) # downsample
+        # img = torch.nn.functional.interpolate(img, size=(img.shape[2], 256, 256), mode='trilinear', align_corners=False) # upsample
+        img = (img * 127.5 + 128).clamp(0, 255).to(torch.uint8)
+        return img
+    # JIT.
+    if jit:
+        z = torch.zeros([batch_gen, G.z_dim], device=opts.device)
+        c = torch.zeros([batch_gen, G.c_dim], device=opts.device)
+        t = torch.zeros([batch_gen, G.cfg.sampling.num_frames_per_video], device=opts.device)
+        run_generator = torch.jit.trace(run_generator, [z, c, t], check_trace=False)
+    # Initialize.
+    stats = feature_stats_cls(**stats_kwargs)
+    assert stats.max_items is not None
+    progress = opts.progress.sub(tag='generator features', num_items=stats.max_items, rel_lo=rel_lo, rel_hi=rel_hi)
+    detector = get_feature_detector(url=detector_url, device=opts.device, num_gpus=opts.num_gpus, rank=opts.rank, verbose=progress.verbose)
+    # Main loop.
+    while not stats.is_full():
+        images = []
+        for _i in range(batch_size // batch_gen):
+            z = torch.randn([batch_gen, G.z_dim], device=opts.device)
+            cond_sample_idx = [np.random.randint(len(dataset)) for _ in range(batch_gen)]
+            c = [dataset.get_label(i) for i in cond_sample_idx]
+            c = torch.from_numpy(np.stack(c)).pin_memory().to(opts.device)
+            t = [list(range(0, num_video_frames * subsample_factor, subsample_factor)) for _i in range(batch_gen)]
+            t = torch.from_numpy(np.stack(t)).pin_memory().to(opts.device)
+            images.append(run_generator(z, c, t))
+        images = torch.cat(images)
+        if images.shape[1] == 1:
+            images = images.repeat([1, 3, *([1] * (images.ndim - 2))])
+        features = detector(images, **detector_kwargs)
+        stats.append_torch(features, num_gpus=opts.num_gpus, rank=opts.rank)
+        progress.update(stats.num_items)
+    return stats
+#----------------------------------------------------------------------------
+def rewrite_opts_for_gen_dataset(opts):
+    """
+    Updates dataset arguments in the opts to enable the second dataset stats computation
+    """
+    new_opts = copy.deepcopy(opts)
+    new_opts.dataset_kwargs = new_opts.gen_dataset_kwargs
+    new_opts.cache = False
+    return new_opts
+#----------------------------------------------------------------------------

src/metrics/video_inception_score.py ADDED Viewed

	@@ -0,0 +1,54 @@

+"""Inception Score (IS) from the paper "Improved techniques for training
+GANs". Matches the original implementation by Salimans et al. at
+https://github.com/openai/improved-gan/blob/master/inception_score/model.py"""
+import numpy as np
+from . import metric_utils
+#----------------------------------------------------------------------------
+NUM_FRAMES_IN_BATCH = {128: 128, 256: 128, 512: 64, 1024: 32}
+#----------------------------------------------------------------------------
+def compute_isv(opts, num_gen: int, num_splits: int, backbone: str):
+    if backbone == 'c3d_ucf101':
+        # Perfectly reproduced torchscript version of the original chainer checkpoint:
+        # https://github.com/pfnet-research/tgan2/blob/f892bc432da315d4f6b6ae9448f69d046ef6fe01/tgan2/models/c3d/c3d_ucf101.py
+        # It is a UCF-101-finetuned C3D model.
+        detector_url = 'https://www.dropbox.com/s/jxpu7avzdc9n97q/c3d_ucf101.pt?dl=1'
+    else:
+        raise NotImplementedError(f'Backbone {backbone} is not supported.')
+    num_frames = 16
+    batch_size = NUM_FRAMES_IN_BATCH[opts.dataset_kwargs.resolution] // num_frames
+    if opts.generator_as_dataset:
+        compute_gen_stats_fn = metric_utils.compute_feature_stats_for_dataset
+        gen_opts = metric_utils.rewrite_opts_for_gen_dataset(opts)
+        gen_opts.dataset_kwargs.load_n_consecutive = num_frames
+        gen_opts.dataset_kwargs.load_n_consecutive_random_offset = False
+        gen_opts.dataset_kwargs.subsample_factor = 1
+        gen_kwargs = dict()
+    else:
+        compute_gen_stats_fn = metric_utils.compute_feature_stats_for_generator
+        gen_opts = opts
+        gen_kwargs = dict(num_video_frames=num_frames, subsample_factor=1)
+    gen_probs = compute_gen_stats_fn(
+        opts=gen_opts, detector_url=detector_url, detector_kwargs={},
+        capture_all=True, max_items=num_gen, temporal_detector=True, **gen_kwargs).get_all() # [num_gen, num_classes]
+    if opts.rank != 0:
+        return float('nan'), float('nan')
+    scores = []
+    np.random.RandomState(42).shuffle(gen_probs)
+    for i in range(num_splits):
+        part = gen_probs[i * num_gen // num_splits : (i + 1) * num_gen // num_splits]
+        kl = part * (np.log(part) - np.log(np.mean(part, axis=0, keepdims=True)))
+        kl = np.mean(np.sum(kl, axis=1))
+        scores.append(np.exp(kl))
+    return float(np.mean(scores)), float(np.std(scores))
+#----------------------------------------------------------------------------

src/scripts/__init__.py ADDED Viewed

File without changes

src/scripts/calc_metrics.py ADDED Viewed

	@@ -0,0 +1,250 @@

+# Copyright (c) 2021, NVIDIA CORPORATION.  All rights reserved.
+#
+# NVIDIA CORPORATION and its licensors retain all intellectual property
+# and proprietary rights in and to this software, related documentation
+# and any modifications thereto.  Any use, reproduction, disclosure or
+# distribution of this software and related documentation without an express
+# license agreement from NVIDIA CORPORATION is strictly prohibited.
+"""Calculate quality metrics for previous training run or pretrained network pickle."""
+import sys; sys.path.extend(['.', 'src'])
+import os
+import re
+import click
+import json
+import tempfile
+import copy
+import torch
+from src import dnnlib
+from omegaconf import OmegaConf
+import legacy
+from metrics import metric_main
+from metrics import metric_utils
+from src.torch_utils import training_stats
+from src.torch_utils import custom_ops
+from src.torch_utils import misc
+#----------------------------------------------------------------------------
+def subprocess_fn(rank, args, temp_dir):
+    dnnlib.util.Logger(should_flush=True)
+    # Init torch.distributed.
+    if args.num_gpus > 1:
+        init_file = os.path.abspath(os.path.join(temp_dir, '.torch_distributed_init'))
+        if os.name == 'nt':
+            init_method = 'file:///' + init_file.replace('\\', '/')
+            torch.distributed.init_process_group(backend='gloo', init_method=init_method, rank=rank, world_size=args.num_gpus)
+        else:
+            init_method = f'file://{init_file}'
+            torch.distributed.init_process_group(backend='nccl', init_method=init_method, rank=rank, world_size=args.num_gpus)
+    # Init torch_utils.
+    sync_device = torch.device('cuda', rank) if args.num_gpus > 1 else None
+    training_stats.init_multiprocessing(rank=rank, sync_device=sync_device)
+    if rank != 0 or not args.verbose:
+        custom_ops.verbosity = 'none'
+    # Print network summary.
+    device = torch.device('cuda', rank)
+    torch.backends.cudnn.benchmark = True
+    torch.backends.cuda.matmul.allow_tf32 = False
+    torch.backends.cudnn.allow_tf32 = False
+    G = copy.deepcopy(args.G).eval().requires_grad_(False).to(device)
+    if rank == 0 and args.verbose:
+        z = torch.empty([8, G.z_dim], device=device)
+        c = torch.empty([8, G.c_dim], device=device)
+        t = torch.zeros([8, G.cfg.sampling.num_frames_per_video], device=device).long()
+        misc.print_module_summary(G, [z, c, t])
+    # Calculate each metric.
+    for metric in args.metrics:
+        if rank == 0 and args.verbose:
+            print(f'Calculating {metric}...')
+        progress = metric_utils.ProgressMonitor(verbose=args.verbose)
+        result_dict = metric_main.calc_metric(
+            metric=metric,
+            G=G,
+            dataset_kwargs=args.dataset_kwargs,
+            num_gpus=args.num_gpus,
+            rank=rank,
+            device=device,
+            progress=progress,
+            cache=args.use_cache,
+            num_runs=(1 if metric == 'fid50k_full' else args.num_runs),
+        )
+        if rank == 0:
+            metric_main.report_metric(result_dict, run_dir=args.run_dir, snapshot_pkl=args.network_pkl)
+        if rank == 0 and args.verbose:
+            print()
+    # Done.
+    if rank == 0 and args.verbose:
+        print('Exiting...')
+#----------------------------------------------------------------------------
+class CommaSeparatedList(click.ParamType):
+    name = 'list'
+    def convert(self, value, param, ctx):
+        _ = param, ctx
+        if value is None or value.lower() == 'none' or value == '':
+            return []
+        return value.split(',')
+#----------------------------------------------------------------------------
+@click.command()
+@click.pass_context
+@click.option('--network_pkl', '--network', help='Network pickle filename or URL', metavar='PATH')
+@click.option('--networks_dir', '--networks_dir', help='Path to the experiment directory if the latest checkpoint is requested.', metavar='PATH')
+@click.option('--metrics', help='Comma-separated list or "none"', type=CommaSeparatedList(), default='fid50k_full', show_default=True)
+@click.option('--data', help='Dataset to evaluate metrics against (directory or zip) [default: same as training data]', metavar='PATH')
+@click.option('--mirror', help='Whether the dataset was augmented with x-flips during training [default: look up]', type=bool, metavar='BOOL')
+@click.option('--gpus', help='Number of GPUs to use', type=int, default=1, metavar='INT', show_default=True)
+@click.option('--cfg_path', help='Path to the experiments config', type=str, default="auto", metavar='PATH')
+@click.option('--verbose', help='Print optional information', type=bool, default=False, metavar='BOOL', show_default=True)
+@click.option('--use_cache', help='Should we use the cache file?', type=bool, default=True, metavar='BOOL', show_default=True)
+@click.option('--num_runs', help='Number of runs', type=int, default=1, metavar='INT', show_default=True)
+def calc_metrics(ctx, network_pkl, networks_dir, metrics, data, mirror, gpus, cfg_path, verbose, use_cache: bool, num_runs: int):
+    """Calculate quality metrics for previous training run or pretrained network pickle.
+    Examples:
+    \b
+    # Previous training run: look up options automatically, save result to JSONL file.
+    python calc_metrics.py --metrics=pr50k3_full \\
+        --network=~/training-runs/00000-ffhq10k-res64-auto1/network-snapshot-000000.pkl
+    \b
+    # Pre-trained network pickle: specify dataset explicitly, print result to stdout.
+    python calc_metrics.py --metrics=fid50k_full --data=~/datasets/ffhq.zip --mirror=1 \\
+        --network=https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada-pytorch/pretrained/ffhq.pkl
+    Available metrics:
+    \b
+      ADA paper:
+        fid50k_full  Frechet inception distance against the full dataset.
+        kid50k_full  Kernel inception distance against the full dataset.
+        pr50k3_full  Precision and recall againt the full dataset.
+        is50k        Inception score for CIFAR-10.
+    \b
+      StyleGAN and StyleGAN2 papers:
+        fid50k       Frechet inception distance against 50k real images.
+        kid50k       Kernel inception distance against 50k real images.
+        pr50k3       Precision and recall against 50k real images.
+        ppl2_wend    Perceptual path length in W at path endpoints against full image.
+        ppl_zfull    Perceptual path length in Z for full paths against cropped image.
+        ppl_wfull    Perceptual path length in W for full paths against cropped image.
+        ppl_zend     Perceptual path length in Z at path endpoints against cropped image.
+        ppl_wend     Perceptual path length in W at path endpoints against cropped image.
+    """
+    dnnlib.util.Logger(should_flush=True)
+    if network_pkl is None:
+        output_regex = "^network-snapshot-\d{6}.pkl$"
+        ckpt_regex = re.compile("^network-snapshot-\d{6}.pkl$")
+        # ckpts = sorted([f for f in os.listdir(networks_dir) if ckpt_regex.match(f)])
+        # network_pkl = os.path.join(networks_dir, ckpts[-1])
+        metrics_file = os.path.join(networks_dir, 'metric-fvd2048_16f.jsonl')
+        with open(metrics_file, 'r') as f:
+            snapshot_metrics_vals = [json.loads(line) for line in f.read().splitlines()]
+        best_snapshot = sorted(snapshot_metrics_vals, key=lambda m: m['results']['fvd2048_16f'])[0]
+        network_pkl = os.path.join(networks_dir, best_snapshot['snapshot_pkl'])
+        print(f'Using checkpoint: {network_pkl} with FVD16 of', best_snapshot['results']['fvd2048_16f'])
+        # Selecting a checkpoint with the best score
+    # Validate arguments.
+    args = dnnlib.EasyDict(metrics=metrics, num_gpus=gpus, network_pkl=network_pkl, verbose=verbose)
+    if cfg_path == "auto":
+        # Assuming that `network_pkl` has the structure /path/to/experiment/output/network-X.pkl
+        output_path = os.path.dirname(network_pkl)
+        assert os.path.basename(output_path) == "output", f"Unknown path structure: {output_path}"
+        experiment_path = os.path.dirname(output_path)
+        cfg_path = os.path.join(experiment_path, 'experiment_config.yaml')
+    cfg = OmegaConf.load(cfg_path)
+    if not all(metric_main.is_valid_metric(metric) for metric in args.metrics):
+        ctx.fail('\n'.join(['--metrics can only contain the following values:'] + metric_main.list_valid_metrics()))
+    if not args.num_gpus >= 1:
+        ctx.fail('--gpus must be at least 1')
+    # Load network.
+    if not dnnlib.util.is_url(network_pkl, allow_file_urls=True) and not os.path.isfile(network_pkl):
+        ctx.fail('--network must point to a file or URL')
+    if args.verbose:
+        print(f'Loading network from "{network_pkl}"...')
+    with dnnlib.util.open_url(network_pkl, verbose=args.verbose) as f:
+        network_dict = legacy.load_network_pkl(f)
+        args.G = network_dict['G_ema'] # subclass of torch.nn.Module
+        from src.training.networks import Generator
+        G = args.G
+        G.cfg.z_dim = G.z_dim
+        G_new = Generator(
+            w_dim=G.cfg.w_dim,
+            mapping_kwargs=dnnlib.EasyDict(num_layers=G.cfg.get('mapping_net_n_layers', 2), cfg=G.cfg),
+            synthesis_kwargs=dnnlib.EasyDict(
+                channel_base=int(G.cfg.get('fmaps', 0.5) * 32768),
+                channel_max=G.cfg.get('channel_max', 512),
+                num_fp16_res=4,
+                conv_clamp=256,
+            ),
+            cfg=G.cfg,
+            img_resolution=256,
+            img_channels=3,
+            c_dim=G.cfg.c_dim,
+        ).eval()
+        G_new.load_state_dict(G.state_dict())
+        args.G = G_new
+    # Initialize dataset options.
+    if data is not None:
+        args.dataset_kwargs = dnnlib.EasyDict(class_name='training.dataset.VideoFramesFolderDataset', cfg=cfg.dataset, path=data)
+    elif network_dict['training_set_kwargs'] is not None:
+        args.dataset_kwargs = dnnlib.EasyDict(network_dict['training_set_kwargs'])
+    else:
+        ctx.fail('Could not look up dataset options; please specify --data')
+    # Finalize dataset options.
+    args.dataset_kwargs.resolution = args.G.img_resolution
+    args.dataset_kwargs.use_labels = (args.G.c_dim != 0)
+    if mirror is not None:
+        args.dataset_kwargs.xflip = mirror
+    args.use_cache = use_cache
+    args.num_runs = num_runs
+    # Print dataset options.
+    if args.verbose:
+        print('Dataset options:')
+        print(cfg.dataset)
+    # Locate run dir.
+    args.run_dir = None
+    if os.path.isfile(network_pkl):
+        pkl_dir = os.path.dirname(network_pkl)
+        if os.path.isfile(os.path.join(pkl_dir, 'training_options.json')):
+            args.run_dir = pkl_dir
+    # Launch processes.
+    if args.verbose:
+        print('Launching processes...')
+    torch.multiprocessing.set_start_method('spawn')
+    with tempfile.TemporaryDirectory() as temp_dir:
+        if args.num_gpus == 1:
+            subprocess_fn(rank=0, args=args, temp_dir=temp_dir)
+        else:
+            torch.multiprocessing.spawn(fn=subprocess_fn, args=(args, temp_dir), nprocs=args.num_gpus)
+#----------------------------------------------------------------------------
+if __name__ == "__main__":
+    calc_metrics() # pylint: disable=no-value-for-parameter
+#----------------------------------------------------------------------------

src/scripts/calc_metrics_for_dataset.py ADDED Viewed

	@@ -0,0 +1,169 @@

+# Copyright (c) 2021, NVIDIA CORPORATION.  All rights reserved.
+#
+# NVIDIA CORPORATION and its licensors retain all intellectual property
+# and proprietary rights in and to this software, related documentation
+# and any modifications thereto.  Any use, reproduction, disclosure or
+# distribution of this software and related documentation without an express
+# license agreement from NVIDIA CORPORATION is strictly prohibited.
+"""Calculate quality metrics for previous training run or pretrained network pickle."""
+import sys; sys.path.extend(['.', 'src'])
+import os
+import click
+import tempfile
+import torch
+from omegaconf import OmegaConf
+from src import dnnlib
+from metrics import metric_main
+from metrics import metric_utils
+from src.torch_utils import training_stats
+from src.torch_utils import custom_ops
+#----------------------------------------------------------------------------
+def subprocess_fn(rank, args, temp_dir):
+    dnnlib.util.Logger(should_flush=True)
+    # Init torch.distributed.
+    if args.num_gpus > 1:
+        init_file = os.path.abspath(os.path.join(temp_dir, '.torch_distributed_init'))
+        if os.name == 'nt':
+            init_method = 'file:///' + init_file.replace('\\', '/')
+            torch.distributed.init_process_group(backend='gloo', init_method=init_method, rank=rank, world_size=args.num_gpus)
+        else:
+            init_method = f'file://{init_file}'
+            torch.distributed.init_process_group(backend='nccl', init_method=init_method, rank=rank, world_size=args.num_gpus)
+    # Init torch_utils.
+    sync_device = torch.device('cuda', rank) if args.num_gpus > 1 else None
+    training_stats.init_multiprocessing(rank=rank, sync_device=sync_device)
+    if rank != 0 or not args.verbose:
+        custom_ops.verbosity = 'none'
+    # Print network summary.
+    device = torch.device('cuda', rank)
+    torch.backends.cudnn.benchmark = True
+    torch.backends.cuda.matmul.allow_tf32 = False
+    torch.backends.cudnn.allow_tf32 = False
+    # Calculate each metric.
+    for metric in args.metrics:
+        if rank == 0 and args.verbose:
+            print(f'Calculating {metric}...')
+        progress = metric_utils.ProgressMonitor(verbose=args.verbose)
+        result_dict = metric_main.calc_metric(
+            metric=metric,
+            dataset_kwargs=args.dataset_kwargs,
+            gen_dataset_kwargs=args.gen_dataset_kwargs,
+            generator_as_dataset=args.generator_as_dataset,
+            num_gpus=args.num_gpus,
+            rank=rank,
+            device=device,
+            progress=progress,
+            cache=args.use_cache,
+            num_runs=args.num_runs,
+        )
+        if rank == 0:
+            metric_main.report_metric(result_dict, run_dir=args.run_dir)
+        if rank == 0 and args.verbose:
+            print()
+    # Done.
+    if rank == 0 and args.verbose:
+        print('Exiting...')
+#----------------------------------------------------------------------------
+class CommaSeparatedList(click.ParamType):
+    name = 'list'
+    def convert(self, value, param, ctx):
+        _ = param, ctx
+        if value is None or value.lower() == 'none' or value == '':
+            return []
+        return value.split(',')
+#----------------------------------------------------------------------------
+def calc_metrics_for_dataset(ctx, metrics, real_data_path, fake_data_path, mirror, resolution, gpus, verbose, use_cache: bool, num_runs: int):
+    dnnlib.util.Logger(should_flush=True)
+    # Validate arguments.
+    args = dnnlib.EasyDict(metrics=metrics, num_gpus=gpus, verbose=verbose)
+    if not all(metric_main.is_valid_metric(metric) for metric in args.metrics):
+        ctx.fail('\n'.join(['--metrics can only contain the following values:'] + metric_main.list_valid_metrics()))
+    if not args.num_gpus >= 1:
+        ctx.fail('--gpus must be at least 1')
+    dummy_dataset_cfg = OmegaConf.create({'max_num_frames': 10000})
+    # Initialize dataset options for real data.
+    args.dataset_kwargs = dnnlib.EasyDict(
+        class_name='training.dataset.VideoFramesFolderDataset',
+        path=real_data_path,
+        cfg=dummy_dataset_cfg,
+        xflip=mirror,
+        resolution=resolution,
+        use_labels=False,
+    )
+    # Initialize dataset options for fake data.
+    args.gen_dataset_kwargs = dnnlib.EasyDict(
+        class_name='training.dataset.VideoFramesFolderDataset',
+        path=fake_data_path,
+        cfg=dummy_dataset_cfg,
+        xflip=False,
+        resolution=resolution,
+        use_labels=False,
+    )
+    args.generator_as_dataset = True
+    # Print dataset options.
+    if args.verbose:
+        print('Real data options:')
+        print(args.dataset_kwargs)
+        print('Fake data options:')
+        print(args.gen_dataset_kwargs)
+    # Locate run dir.
+    args.run_dir = None
+    args.use_cache = use_cache
+    args.num_runs = num_runs
+    # Launch processes.
+    if args.verbose:
+        print('Launching processes...')
+    torch.multiprocessing.set_start_method('spawn')
+    with tempfile.TemporaryDirectory() as temp_dir:
+        if args.num_gpus == 1:
+            subprocess_fn(rank=0, args=args, temp_dir=temp_dir)
+        else:
+            torch.multiprocessing.spawn(fn=subprocess_fn, args=(args, temp_dir), nprocs=args.num_gpus)
+#----------------------------------------------------------------------------
+@click.command()
+@click.pass_context
+@click.option('--metrics', help='Comma-separated list or "none"', type=CommaSeparatedList(), default='fvd2048_16f,fid50k_full', show_default=True)
+@click.option('--real_data_path', help='Dataset to evaluate metrics against (directory or zip) [default: same as training data]', metavar='PATH')
+@click.option('--fake_data_path', help='Generated images (directory or zip)', metavar='PATH')
+@click.option('--mirror', help='Should we mirror the real data?', type=bool, metavar='BOOL')
+@click.option('--resolution', help='Resolution for the source dataset', type=int, metavar='INT')
+@click.option('--gpus', help='Number of GPUs to use', type=int, default=1, metavar='INT', show_default=True)
+@click.option('--verbose', help='Print optional information', type=bool, default=False, metavar='BOOL', show_default=True)
+@click.option('--use_cache', help='Use stats cache', type=bool, default=True, metavar='BOOL', show_default=True)
+@click.option('--num_runs', help='Number of runs', type=int, default=1, metavar='INT', show_default=True)
+def calc_metrics_cli_wrapper(ctx, *args, **kwargs):
+    calc_metrics_for_dataset(ctx, *args, **kwargs)
+#----------------------------------------------------------------------------
+if __name__ == "__main__":
+    calc_metrics_cli_wrapper() # pylint: disable=no-value-for-parameter
+#----------------------------------------------------------------------------

src/scripts/clip_edit.py ADDED Viewed

	@@ -0,0 +1,403 @@

+# import sys; sys.path.extend(['.', 'src', '/home/skoroki/StyleCLIP'])
+import argparse
+import math
+import os
+from typing import List
+import json
+import re
+import random
+import yaml
+import itertools
+import torchvision
+from torch import optim
+from PIL import Image
+import click
+import numpy as np
+import torch
+from tqdm import tqdm
+from omegaconf import OmegaConf
+import torch.nn as nn
+import torch.nn.functional as F
+from torchvision import utils
+from torch import Tensor
+import torchvision.transforms.functional as TVF
+from torchvision.utils import save_image
+from torch import Tensor
+from src.deps.facial_recognition.model_irse import Backbone
+try:
+    import clip
+except ImportError:
+    raise ImportError(
+        "To edit videos with CLIP, you need to install the `clip` library. " \
+        "Please follow the instructions in https://github.com/openai/CLIP")
+from src import dnnlib
+import legacy
+from src.scripts.project import save_edited_w
+#----------------------------------------------------------------------------
+def get_lr(t, initial_lr, rampdown=0.25, rampup=0.05):
+    lr_ramp = min(1, (1 - t) / rampdown)
+    lr_ramp = 0.5 - 0.5 * math.cos(lr_ramp * math.pi)
+    lr_ramp = lr_ramp * min(1, t / rampup)
+    return initial_lr * lr_ramp
+#----------------------------------------------------------------------------
+class CLIPLoss(torch.nn.Module):
+    """
+    Copy-pasted and adapted from StyleCLIP
+    """
+    def __init__(self):
+        super(CLIPLoss, self).__init__()
+        self.model, self.preprocess = clip.load("ViT-B/32", device="cuda")
+        #self.upsample = torch.nn.Upsample(scale_factor=7)
+        #self.avg_pool = torch.nn.AvgPool2d(kernel_size=opts.stylegan_size // 32)
+    def forward(self, image, text):
+        #image = self.avg_pool(self.upsample(image))
+        #print('shape', image.shape, text.shape)
+        image = F.interpolate(image, size=(224, 224), mode='area')
+        similarity = 1 - self.model(image, text)[0] / 100
+        similarity = similarity.diag()
+        return similarity
+#----------------------------------------------------------------------------
+class IDLoss(nn.Module):
+    """
+    Copy-pasted from StyleCLIP
+    """
+    def __init__(self):
+        super(IDLoss, self).__init__()
+        self.facenet = Backbone(input_size=112, num_layers=50, drop_ratio=0.6, mode='ir_se')
+        with dnnlib.util.open_url(Backbone.WEIGHTS_URL, verbose=True) as f:
+            ir_se50_weights = torch.load(f)
+        self.facenet.load_state_dict(ir_se50_weights)
+        self.pool = torch.nn.AdaptiveAvgPool2d((256, 256))
+        self.face_pool = torch.nn.AdaptiveAvgPool2d((112, 112))
+        self.facenet.eval()
+        self.facenet.cuda()
+    def extract_feats(self, x):
+        if x.shape[2] != 256:
+            x = self.pool(x)
+        x = x[:, :, 35:223, 32:220]  # Crop interesting region
+        x = self.face_pool(x)
+        x_feats = self.facenet(x)
+        return x_feats
+    def forward(self, y_hat, y):
+        n_samples = y.shape[0]
+        y_feats = self.extract_feats(y)  # Otherwise use the feature from there
+        y_hat_feats = self.extract_feats(y_hat)
+        y_feats = y_feats.detach()
+        loss = 0
+        for i in range(n_samples):
+            diff_target = y_hat_feats[i].dot(y_feats[i])
+            loss += 1 - diff_target
+        return loss / n_samples
+#----------------------------------------------------------------------------
+def run_edit_optimization(
+    _sentinel=None,
+    G: nn.Module=None,
+    w_orig: Tensor=None,
+    descriptions: List[str]=None,
+    # ckpt: float="stylegan2-ffhq-config-f.pt",
+    lr: float=0.1,
+    num_steps: int=40,
+    l2_lambda: float=0.001,
+    id_lambda: float=0.005,
+    # latent_path: float=latent_path,
+    # truncation: float=0.7,
+    # save_intermediate_image_every: float=1 if create_video else 20,
+    # results_dir: float="results",
+    mask: float=None,
+    mask_lambda: float=0.0,
+    verbose: bool=False,
+) -> Tensor:
+    assert _sentinel is None
+    # text_inputs = torch.cat([clip.tokenize(d) for d in descriptions]).to(device)
+    num_prompts = len(descriptions)
+    num_images = len(w_orig)
+    device = w_orig.device
+    text_inputs = clip.tokenize(descriptions).to(device) # [num_prompts, 77]
+    text_inputs = text_inputs.repeat_interleave(len(w_orig), dim=0) # [num_prompts * num_images, 77]
+    c = torch.zeros(num_prompts * num_images, 0, device=device)
+    ts = torch.zeros(num_prompts * num_images, 1, device=device)
+    w_orig = w_orig.repeat(num_prompts, 1, 1) # [num_prompts * num_images, num_ws, w_dim]
+    with torch.no_grad():
+        img_orig = G.synthesis(ws=w_orig, c=c, t=ts) # [num_prompts * num_images, 3, c, h, w]
+    w = w_orig.detach().clone() # [num_prompts * num_images, num_ws, w_dim]
+    w.requires_grad = True
+    if mask_lambda > 0:
+        target_image = img_orig * (1 - mask) # [num_prompts * num_images, 3, c, h, w]
+        #target_image = img_orig[:, :, -128:, :128]
+        target_image = (target_image * 0.5 + 0.5) * 255.0 # [num_prompts * num_images, 3, c, h, w]
+        if target_image.shape[2] > 256:
+            target_image = F.interpolate(target_image, size=(256, 256), mode='area')
+        target_features = vgg16(target_image, resize_images=False, return_lpips=True)
+        #dist = (target_features - synth_features).square().sum()
+    else:
+        target_features = None
+    clip_loss = CLIPLoss()
+    id_loss = IDLoss()
+    optimizer = optim.Adam([w], lr=lr)
+    if verbose:
+        pbar = tqdm(range(num_steps))
+    else:
+        pbar = range(num_steps)
+    for curr_iter in pbar:
+        curr_lr = get_lr(curr_iter / num_steps, lr)
+        # optimizer.param_groups[0]["lr"] = lr
+        for param_group in optimizer.param_groups:
+            param_group['lr'] = curr_lr
+        #img_gen, _ = g_ema([latent], input_is_latent=True, randomize_noise=False, input_is_stylespace=work_in_stylespace)
+        img_gen = G.synthesis(ws=w, c=c, t=ts) # [num_prompts * num_images, 3, c, h, w]
+        if mask_lambda > 0:
+            raise NotImplementedError
+            synth_image = img_gen * (1 - mask)
+            #synth_image = img_gen[:, :, -128:, :128]
+            synth_image = (synth_image * 0.5 + 0.5) * 255.0
+            if synth_image.shape[2] > 256:
+                synth_image = F.interpolate(synth_image, size=(256, 256), mode='area')
+            synth_features = vgg16(synth_image, resize_images=False, return_lpips=True)
+            mask_loss = (target_features - synth_features).square().sum()
+        else:
+            mask_loss = 0
+        if not mask is None:
+            img_gen = img_gen * mask.unsqueeze(0) # [num_prompts * num_images, 3, c, h, w]
+        c_loss = clip_loss(img_gen, text_inputs) # [num_prompts * num_images]
+        if id_lambda > 0:
+            i_loss = id_loss(img_gen, img_orig)
+        else:
+            i_loss = 0
+        l2_loss = ((w_orig - w) ** 2) # [1]
+        loss = c_loss.sum() + l2_lambda * l2_loss.sum() + id_lambda * i_loss + mask_lambda * mask_loss
+        optimizer.zero_grad()
+        loss.backward()
+        optimizer.step()
+        if verbose:
+            pbar.set_description((f"loss: {loss.item():.4f};"))
+    final_result = torch.stack([img_orig, img_gen]) # [2, num_prompts * num_images, c, h, w]
+    return final_result, w
+    # x, new_w = main(args)
+    # pair = torch.cat([img for img in x], dim=2)
+    # TVF.to_pil_image((pair.cpu().detach() * 0.5 + 0.5).clamp(0, 1))
+#----------------------------------------------------------------------------
+@click.command()
+@click.pass_context
+@click.option('--network_pkl', help='Network pickle filename', metavar='PATH')
+@click.option('--networks_dir', help='Network pickles directory', metavar='PATH')
+# @click.option('--truncation_psi', type=float, help='Truncation psi', default=1.0, show_default=True)
+# @click.option('--noise_mode', help='Noise mode', type=click.Choice(['const', 'random', 'none']), default='const', show_default=True)
+# @click.option('--same_motion_codes', type=bool, help='Should we use the same motion codes for all videos?', default=False, show_default=True)
+@click.option('--w_dir', help='A directory leading to latent codes.', type=str, required=False, metavar='DIR')
+@click.option('--results_dir', help='A directory to save the results in.', type=str, required=False, metavar='DIR')
+@click.option('--truncation_psi', help='If we use new w, what truncation to use.', type=float, required=False, metavar='FLOAT', default=1.0)
+@click.option('--num_w', help='If we use new w, how many to sample?', type=int, required=False, metavar='FLOAT', default=16)
+@click.option('--prompts', help='A path to prompts or a string of prompts.', type=str, required=True, metavar='DIR')
+@click.option('--seed', type=int, help='Random seed', default=42, metavar='DIR')
+@click.option('--zero_periods', help='Zero-out periods predictor?', default=False, type=bool, metavar='BOOL')
+@click.option('--num_weights_to_slice', help='Number of high-frequency coords to remove.', default=0, type=int, metavar='INT')
+@click.option('--num_steps', help='Number of the optimization steps to perform.', default=40, type=int, metavar='INT')
+@click.option('--stack_samples', help='When saving, should we stack samples together?', default=False, type=bool, metavar='BOOL')
+# l2_lambda=0.001,
+# id_lambda=0.005,
+# l2_lambda=0.0005,
+# id_lambda=0.0,
+@click.option('--l2_lambda', help='L2 loss coef', default=0.001, type=float, metavar='FLOAT')
+@click.option('--id_lambda', help='ID loss coef', default=0.005, type=float, metavar='FLOAT')
+@click.option('--lr', help='Learning rate', default=0.1, type=float, metavar='FLOAT')
+@click.option('--mask_lambda', help='If we use a mask, specify the loss coef', default=0.0, type=float, metavar='FLOAT')
+@click.option('--use_id_lambda', help='Should we use id lambda in HPO?', default=False, type=bool, metavar='BOOL')
+def main(
+    ctx: click.Context,
+    network_pkl: str,
+    networks_dir: str,
+    seed: int,
+    w_dir: str,
+    results_dir: str,
+    truncation_psi: float,
+    num_w: int,
+    # save_as_mp4: bool,
+    # video_len: int,
+    # fps: int,
+    # as_grids: bool,
+    zero_periods: bool,
+    num_weights_to_slice: int,
+    num_steps: int,
+    stack_samples: bool,
+    l2_lambda: float,
+    id_lambda: float,
+    lr: float,
+    prompts: str,
+    mask_lambda: float,
+    use_id_lambda: bool,
+):
+    if network_pkl is None:
+        output_regex = "^network-snapshot-\d{6}.pkl$"
+        ckpt_regex = re.compile("^network-snapshot-\d{6}.pkl$")
+        # ckpts = sorted([f for f in os.listdir(networks_dir) if ckpt_regex.match(f)])
+        # network_pkl = os.path.join(networks_dir, ckpts[-1])
+        metrics_file = os.path.join(networks_dir, 'metric-fvd2048_16f.jsonl')
+        with open(metrics_file, 'r') as f:
+            snapshot_metrics_vals = [json.loads(line) for line in f.read().splitlines()]
+        best_snapshot = sorted(snapshot_metrics_vals, key=lambda m: m['results']['fvd2048_16f'])[0]
+        network_pkl = os.path.join(networks_dir, best_snapshot['snapshot_pkl'])
+        print(f'Using checkpoint: {network_pkl} with FVD16 of', best_snapshot['results']['fvd2048_16f'])
+        # Selecting a checkpoint with the best score
+    else:
+        assert networks_dir is None, "Cant have both parameters: network_pkl and networks_dir"
+    print('Loading networks from "%s"...' % network_pkl, end='')
+    device = torch.device('cuda')
+    with dnnlib.util.open_url(network_pkl) as f:
+        G = legacy.load_network_pkl(f)['G_ema'].to(device).eval() # type: ignore
+    print('Loaded!')
+    random.seed(seed)
+    np.random.seed(seed)
+    torch.manual_seed(seed)
+    if zero_periods:
+        G.synthesis.motion_encoder.time_encoder.periods_predictor.weight.data.zero_()
+    if num_weights_to_slice > 0:
+        G.synthesis.motion_encoder.time_encoder.weights[:, -num_weights_to_slice:] = 0.0
+    # description = "Bright sunny sky and mountains far away"
+    # experiment_type = 'edit' #@param ['edit', 'free_generation']
+    # mask = torch.zeros(3, 256, 256, device=device)
+    # mask[:, :, 64+32 : 128+32] = 1.0
+    # mask[:, :-128, :] = 1.0
+    # mask[:, :, 128:] = 1.0
+    if w_dir is None:
+        print('Sampling new w')
+        z = torch.randn(num_w, G.z_dim, device=device)
+        c = torch.zeros(len(z), G.c_dim, device=device)
+        w_orig = G.mapping(z=z, c=c, truncation_psi=truncation_psi)
+        os.makedirs(results_dir, exist_ok=True)
+        torch.save(w_orig.cpu(), f'{results_dir}_w_orig.pt')
+        w_save_dir = os.path.join(results_dir, 'w_edit')
+        samples_save_dir = os.path.join(results_dir, 'edited_samples')
+    else:
+        w_paths = sorted([os.path.join(w_dir, f) for f in os.listdir(w_dir) if f.endswith('_w.pt')])
+        w_names = [os.path.basename(f) for f in w_paths]
+        w_orig = [torch.load(f) for f in w_paths]
+        w_orig = torch.stack(w_orig).to(device) # [num_images, num_ws, w_dim]
+        w_save_dir = f'{w_dir}_edited_w'
+        samples_save_dir = f'{w_dir}_edited_samples'
+    os.makedirs(w_save_dir, exist_ok=True)
+    os.makedirs(samples_save_dir, exist_ok=True)
+    print(f'Loading prompts from file: {prompts}')
+    with open(prompts, 'r') as f:
+        descs_dict = yaml.load(f)
+        edit_names, descriptions = list(zip(*descs_dict.items()))
+        edit_names = edit_names
+        descriptions = descriptions
+    del id_lambda, num_steps, l2_lambda
+    l2_lambdas = [1000000.0, 0.0025, 0.001, 0.00025, 0.0005, 0.0001]
+    if use_id_lambda:
+        id_lambdas = [0.005, 0.0025, 0.001, 0.00025, 0.0005, 0.0001, 0.0]
+    else:
+        id_lambdas = [0.0]
+    all_num_steps = [40]
+    for curr_edit_name, curr_prompt in zip(edit_names, descriptions):
+        all_images = []
+        all_w_edited = []
+        for l2_lambda, id_lambda, num_steps in tqdm(list(itertools.product(l2_lambdas, id_lambdas, all_num_steps)), desc=f'Performing HPO for {curr_edit_name}'):
+            final_image, w_edited = run_edit_optimization(
+                G=G,
+                w_orig=w_orig,
+                descriptions=[curr_prompt],
+                # ckpt="stylegan2-ffhq-config-f.pt",
+                lr=lr,
+                num_steps=num_steps,
+                l2_lambda=l2_lambda,
+                id_lambda=id_lambda,
+                mask_lambda=mask_lambda,
+                verbose=False,
+                # latent_path=latent_path,
+                # truncation=0.7,
+                # mask=None,
+                # mask_lambda=0.1,
+            )
+            all_images.extend((final_image[1].cpu() * 0.5 + 0.5).clamp(0, 1))
+            all_w_edited.append({
+                "w_edit": w_edited.cpu(),
+                "l2_lambda": l2_lambda,
+                "id_lambda": id_lambda,
+                "num_steps": num_steps,
+                "prompt": curr_prompt,
+                "edit_name": curr_edit_name,
+            })
+            # img_names = [f'{w_name}_{edit_name}' for edit_name in edit_names for w_name in w_names]
+            # save_edited_w(
+            #     G=G,
+            #     w_outdir = f'{w_dir}_edited',
+            #     samples_outdir = f'{w_dir}_projected_samples',
+            #     img_names=img_names,
+            #     stack_samples=stack_samples,
+            #     all_w = w_edited,
+            #     all_motion_z = None,
+            #     stacked_samples_out_path = f'{w_dir}_edited_samples.png'
+            # )
+        torch.save(all_w_edited, f"{w_save_dir}/{curr_edit_name}_w.pt")
+        grid = utils.make_grid(torch.stack(all_images), nrow=len(w_orig))
+        print('savig intp', f"{samples_save_dir}/{curr_edit_name}.png")
+        save_image(grid, f"{samples_save_dir}/{curr_edit_name}.png")
+    print('Done!')
+#----------------------------------------------------------------------------
+if __name__ == "__main__":
+    main() # pylint: disable=no-value-for-parameter
+#----------------------------------------------------------------------------

src/scripts/construct_static_videos_dataset.py ADDED Viewed

	@@ -0,0 +1,46 @@

+"""
+Takes a dataset directory and repeats the frames to include only a random frame from each video
+This is needed to calculate same-frame FVD and DiFID
+"""
+import os
+import random
+import argparse
+from typing import List
+import shutil
+from tqdm import tqdm
+def construct_static_videos_dataset(videos_dir: os.PathLike, max_len: int=None, output_dir: os.PathLike=None, force_len: int=None):
+    output_dir = output_dir if not output_dir is None else f'{videos_dir}_freeze'
+    clips_paths = [os.path.join(videos_dir, d) for d in os.listdir(videos_dir)]
+    print(f'Saving into {output_dir}')
+    for video_idx, clip_path in enumerate(tqdm(clips_paths)):
+        frames_paths = os.listdir(clip_path)
+        frame_to_repeat = random.choice(frames_paths)
+        curr_output_dir = os.path.join(output_dir, f'{video_idx:05d}')
+        os.makedirs(curr_output_dir, exist_ok=True)
+        num_frames_to_create = force_len if not force_len is None else min(len(frames_paths), max_len)
+        for i in range(num_frames_to_create):
+            ext = os.path.splitext(frame_to_repeat)[1].lower()
+            target_file_path = os.path.join(curr_output_dir, f'{i:06d}{ext}')
+            shutil.copy(os.path.join(clip_path, frame_to_repeat), target_file_path)
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    parser.add_argument('-d', '--directory', type=str, help='Directory with video frames')
+    parser.add_argument('-o', '--output_dir', type=None, help='Where to save the file?.')
+    parser.add_argument('-l', '--max_len', type=int, help='Max video length')
+    parser.add_argument('-fl', '--force_len', type=int, help='Force video length')
+    args = parser.parse_args()
+    construct_static_videos_dataset(
+        videos_dir=args.directory,
+        max_len=args.max_len,
+        output_dir=args.output_dir,
+        force_len=args.force_len,
+    )

src/scripts/convert_video_to_dataset.py ADDED Viewed

	@@ -0,0 +1,87 @@

+"""
+Converts a dataset of mp4 videos into a dataset of video frames
+I.e. a directory of mp4 files becomes a directory of directories of frames
+This speeds up loading during training because we do not need
+"""
+import os
+from typing import List
+import argparse
+from pathlib import Path
+from multiprocessing import Pool
+from collections import Counter
+import numpy as np
+from PIL import Image
+import torchvision.transforms.functional as TVF
+from moviepy.editor import VideoFileClip
+from tqdm import tqdm
+def convert_videos_into_dataset(video_path: os.PathLike, target_dir: os.PathLike, num_chunks: int, chunk_size: int, start_frame: int, target_size: int, force_fps: int):
+    assert (num_chunks is None) or (chunk_size is None), "Cant use both num_chunks and chunk_size"
+    os.makedirs(target_dir, exist_ok=True)
+    clip = VideoFileClip(video_path)
+    fps = clip.fps if force_fps is None else force_fps
+    num_frames_total = int(np.floor(clip.duration * fps)) - start_frame
+    if num_chunks is None:
+        num_chunks = num_frames_total // chunk_size
+    else:
+        chunk_size = num_frames_total // num_chunks
+    num_frames_to_save = chunk_size * num_chunks
+    print(f'Processing the video at {fps} fps. {num_frames_total} frames in total. We have {num_chunks} videos of {chunk_size} frames each.')
+    current_chunk_idx = 0
+    frame_idx = -start_frame
+    curr_chunk_dir = os.path.join(target_dir, f'{current_chunk_idx:06d}')
+    for frame in tqdm(clip.iter_frames(fps=fps), total=num_frames_total + start_frame):
+        if frame_idx >= 0:
+            os.makedirs(curr_chunk_dir, exist_ok=True)
+            frame = Image.fromarray(frame)
+            frame = TVF.center_crop(frame, output_size=min(frame.size))
+            frame = TVF.resize(frame, size=target_size, interpolation=Image.LANCZOS)
+            frame.save(os.path.join(curr_chunk_dir, f'{frame_idx % chunk_size:06d}.jpg'), q=95)
+        frame_idx += 1
+        if frame_idx % chunk_size == 0 and frame_idx > 0:
+            current_chunk_idx += 1
+            curr_chunk_dir = os.path.join(target_dir, f'{current_chunk_idx:06d}')
+        if frame_idx == num_frames_to_save:
+            # Stop here so not to have a partially-filled chunk
+            break
+    chunk_sizes = [len(os.listdir(d)) for d in listdir_full_paths(target_dir)]
+    assert len(set(chunk_sizes)) == 1, f"Bad chunk sizes: {set(chunk_sizes)}"
+    print('Finished successfully!')
+def listdir_full_paths(d) -> List[os.PathLike]:
+    return sorted([os.path.join(d, x) for x in os.listdir(d)])
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description='Convert a long video into a dataset of frame dirs')
+    parser.add_argument('-s', '--source_video_path', type=str, help='Path to the source video')
+    parser.add_argument('-t', '--target_dir', type=str, help='Where to save the new dataset')
+    parser.add_argument('-n', '--num_chunks', type=int, help='How many samples should there be in the dataset?')
+    parser.add_argument('-cs', '--chunk_size', type=int, help='Each video length. Should be used separately from num_chunks')
+    parser.add_argument('-sf', '--start_frame', type=int, default=0, help='Start frame idx. Should we skip several frames?')
+    parser.add_argument('--target_size', type=int, default=128, help='What size should we resize to?')
+    parser.add_argument('--force_fps', type=int, help='What fps should we run videos with?')
+    args = parser.parse_args()
+    convert_videos_into_dataset(
+        video_path=args.source_video_path,
+        target_dir=args.target_dir,
+        num_chunks=args.num_chunks,
+        chunk_size=args.chunk_size,
+        start_frame=args.start_frame,
+        target_size=args.target_size,
+        force_fps=args.force_fps,
+    )

src/scripts/convert_videos_to_frames.py ADDED Viewed

	@@ -0,0 +1,105 @@

+"""
+Converts a dataset of mp4 videos into a dataset of video frames
+I.e. a directory of mp4 files becomes a directory of directories of frames
+This speeds up loading during training because we do not need
+"""
+import os
+from typing import List
+import argparse
+from pathlib import Path
+from multiprocessing import Pool
+from collections import Counter
+from PIL import Image
+import torchvision.transforms.functional as TVF
+from moviepy.editor import VideoFileClip
+from tqdm import tqdm
+def convert_videos_to_frames(source_dir: os.PathLike, target_dir: os.PathLike, num_workers: int, video_ext: str, **process_video_kwargs):
+    broken_clips_dir = f'{target_dir}_broken_clips'
+    os.makedirs(target_dir, exist_ok=True)
+    os.makedirs(broken_clips_dir, exist_ok=True)
+    clips_paths = [cp for cp in listdir_full_paths(source_dir) if cp.endswith(video_ext)]
+    clips_fps = []
+    tasks_kwargs = [dict(
+        clip_path=cp,
+        target_dir=target_dir,
+        broken_clips_dir=broken_clips_dir,
+        **process_video_kwargs,
+     ) for cp in clips_paths]
+    pool = Pool(processes=num_workers)
+    for fps in tqdm(pool.imap_unordered(task_proxy, tasks_kwargs), total=len(clips_paths)):
+        clips_fps.append(fps)
+    print(f'All possible fps: {Counter(clips_fps).most_common()}')
+def task_proxy(kwargs):
+    """I do not know, how to pass several arguments to a pool job..."""
+    return process_video(**kwargs)
+def process_video(
+    clip_path: os.PathLike, target_dir: os.PathLike, force_fps: int=None, target_size: int=None,
+    broken_clips_dir: os.PathLike=None, compute_fps_only: bool=False) -> int:
+    clip_name = os.path.basename(clip_path)
+    clip_name = clip_name[:clip_name.rfind('.')]
+    try:
+        clip = VideoFileClip(clip_path)
+    except KeyboardInterrupt:
+        raise
+    except Exception as e:
+        print(f'Coudnt process clip: {clip_path}')
+        if not broken_clips_dir is None:
+            Path(os.path.join(broken_clips_dir, clip_name)).touch()
+        return 0
+    if compute_fps_only:
+        return clip.fps
+    fps = clip.fps if force_fps is None else force_fps
+    clip_target_dir = os.path.join(target_dir, clip_name)
+    clip_target_dir = clip_target_dir.replace('#', '_')
+    os.makedirs(clip_target_dir, exist_ok=True)
+    frame_idx = 0
+    for frame in clip.iter_frames(fps=fps):
+        frame = Image.fromarray(frame)
+        if not target_size is None:
+            frame = TVF.resize(frame, size=target_size, interpolation=Image.LANCZOS)
+            frame = TVF.center_crop(frame, output_size=(target_size, target_size))
+        frame.save(os.path.join(clip_target_dir, f'{frame_idx:06d}.jpg'), q=95)
+        frame_idx += 1
+    return clip.fps
+def listdir_full_paths(d) -> List[os.PathLike]:
+    return sorted([os.path.join(d, x) for x in os.listdir(d)])
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description='Convert a dataset of mp4 files into a dataset of individual frames')
+    parser.add_argument('-s', '--source_dir', type=str, help='Path to the source dataset')
+    parser.add_argument('-t', '--target_dir', type=str, help='Where to save the new dataset')
+    parser.add_argument('--video_ext', type=str, default='mp4', help='Video extension')
+    parser.add_argument('--target_size', type=int, default=128, help='What size should we resize to?')
+    parser.add_argument('--force_fps', type=int, help='What fps should we run videos with?')
+    parser.add_argument('--num_workers', type=int, default=8, help='Number of processes to launch')
+    parser.add_argument('--compute_fps_only', action='store_true', help='Should we just compute fps?')
+    args = parser.parse_args()
+    convert_videos_to_frames(
+        source_dir=args.source_dir,
+        target_dir=args.target_dir,
+        target_size=args.target_size,
+        force_fps=args.force_fps,
+        num_workers=args.num_workers,
+        video_ext=args.video_ext,
+        compute_fps_only=args.compute_fps_only,
+    )

src/scripts/crop_video_dataset.py ADDED Viewed

	@@ -0,0 +1,69 @@

+import os
+import shutil
+import argparse
+from typing import List
+import numpy as np
+from tqdm import tqdm
+from PIL import Image
+def crop_video_dataset(source_dir: str, max_num_frames: int=None, slice_n_left_frames: int=0, resize: int=None, target_dir: str=None):
+    dataset_name = os.path.basename(source_dir)
+    if target_dir is None:
+        max_num_frames_prefix = "" if max_num_frames is None else f"_cut{max_num_frames}"
+        slice_prefix = "" if slice_n_left_frames == 0 else f"_slice{slice_n_left_frames}"
+        new_dataset_name = f"{dataset_name}{max_num_frames_prefix}{slice_prefix}"
+        target_dir = os.path.join(os.path.dirname(source_dir), new_dataset_name)
+    all_clips_paths = listdir_full_paths(source_dir)
+    os.makedirs(target_dir, exist_ok=True)
+    slice_proportions = []
+    total_num_frames = 0
+    for source_clip_dir in tqdm(all_clips_paths, desc=f'Cropping the dataset into {target_dir}'):
+        all_frames = listdir_full_paths(source_clip_dir)
+        if len(all_frames) == 0:
+            print(f'{source_clip_dir} is empty. Skipping it.')
+            continue
+        target_clip_dir = os.path.join(target_dir, os.path.basename(source_clip_dir))
+        os.makedirs(target_clip_dir, exist_ok=True)
+        total_num_frames += len(all_frames)
+        slice_proportions.append(slice_n_left_frames / len(all_frames))
+        all_frames = all_frames[slice_n_left_frames:]
+        if not max_num_frames is None:
+            all_frames = all_frames[:max_num_frames]
+        for source_frame_path in all_frames:
+            target_frame_path = os.path.join(target_clip_dir, os.path.basename(source_frame_path))
+            if resize is None:
+                shutil.copy(source_frame_path, target_frame_path)
+            else:
+                assert target_frame_path.endswith('.jpg')
+                Image.open(source_frame_path).resize((resize, resize), resample=Image.LANCZOS).save(target_frame_path, q=95)
+    print(f'Done! Sliced {np.mean(slice_proportions) * 100.0 : .02f}% on average. {len(all_clips_paths) * slice_n_left_frames / total_num_frames * 100.0 : .02f}% of total num frames.')
+def listdir_full_paths(d) -> List[os.PathLike]:
+    return sorted([os.path.join(d, x) for x in os.listdir(d)])
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description='Crops a video dataset temporally into several frames')
+    parser.add_argument('source_dir', type=str, help='Path to the dataset')
+    parser.add_argument('-n', '--max_num_frames', type=int, default=None, help='Number of frames to preserve')
+    parser.add_argument('--slice_n_left_frames', type=int, default=0, help='Number of frames to slice from the left')
+    parser.add_argument('--resize', type=int, default=None, help='Should we resize the dataset')
+    parser.add_argument('--target_dir', type=str, default=None, help='Should we resize the dataset')
+    args = parser.parse_args()
+    crop_video_dataset(
+        source_dir=args.source_dir,
+        max_num_frames=args.max_num_frames,
+        slice_n_left_frames=args.slice_n_left_frames,
+        resize=args.resize,
+        target_dir=args.target_dir,
+    )

src/scripts/frames_to_video_grid.py ADDED Viewed

	@@ -0,0 +1,78 @@

+"""
+Converts a directory of video frames into an mp4-grid
+"""
+import sys; sys.path.extend(['.'])
+import os
+import argparse
+import random
+import numpy as np
+import torch
+from torch import Tensor
+import torchvision.transforms.functional as TVF
+from torchvision import utils
+from PIL import Image
+from tqdm import tqdm
+import torchvision
+def frames_to_video_grid(videos_dir: os.PathLike, num_videos: int, length: int, fps: int, output_path: os.PathLike, select_random: bool=False, random_seed: int=None):
+    clips_paths = [os.path.join(videos_dir, d) for d in os.listdir(videos_dir)]
+    # bad_idx = [0, 9, 11, 16]
+    # clips_paths = [c for i, c in enumerate(clips_paths) if not i in bad_idx]
+    if select_random:
+        random.seed(random_seed)
+        clips_paths = random.sample(clips_paths, k=num_videos)
+    else:
+        clips_paths = clips_paths[:num_videos]
+    videos = [read_first_n_frames(d, length) for d in tqdm(clips_paths, desc='Reading data...')] # [num_videos, length, c, h, w]
+    videos = [fill_with_black_squares(v, length) for v in tqdm(videos, desc='Adding empty frames')] # [num_videos, length, c, h, w]
+    frame_grids = torch.stack(videos).permute(1, 0, 2, 3, 4) # [video_len, num_videos, c, h, w]
+    frame_grids = [utils.make_grid(fs, nrow=int(np.ceil(np.sqrt(num_videos)))) for fs in tqdm(frame_grids, desc='Making grids')]
+    if os.path.dirname(output_path) != "":
+        os.makedirs(os.path.dirname(output_path), exist_ok=True)
+    frame_grids = (torch.stack(frame_grids) * 255).to(torch.uint8).permute(0, 2, 3, 1) # [T, H, W, C]
+    torchvision.io.write_video(output_path, frame_grids, fps=fps, video_codec='h264', options={'crf': '10'})
+def read_first_n_frames(d: os.PathLike, num_frames: int) -> Tensor:
+    images = [Image.open(os.path.join(d, f)) for f in sorted(os.listdir(d))[:num_frames]]
+    images = [TVF.to_tensor(x) for x in images]
+    return torch.stack(images)
+def fill_with_black_squares(video, desired_len: int) -> Tensor:
+    if len(video) >= desired_len:
+        return video
+    return torch.cat([
+        video,
+        torch.zeros_like(video[0]).unsqueeze(0).repeat(desired_len - len(video), 1, 1, 1),
+    ], dim=0)
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    parser.add_argument('-d', '--directory', type=str, help='Directory with video frames')
+    parser.add_argument('-n', '--num_videos', type=int, help='Number of videos to consider')
+    parser.add_argument('-l', '--length', type=int, help='Video length (in frames)')
+    parser.add_argument('--fps', type=int, default=25, help='FPS to save with.')
+    parser.add_argument('-o', '--output_path', type=str, help='Where to save the file?.')
+    parser.add_argument('--select_random', action='store_true', help='Select videos at random?')
+    parser.add_argument('--random_seed', type=int, default=None, help='Random seed when selecting videos at random')
+    args = parser.parse_args()
+    frames_to_video_grid(
+        videos_dir=args.directory,
+        num_videos=args.num_videos,
+        length=args.length,
+        fps=args.fps,
+        output_path=args.output_path,
+        select_random=args.select_random,
+        random_seed=args.random_seed,
+    )

src/scripts/generate.py ADDED Viewed

	@@ -0,0 +1,148 @@

+"""Generates a dataset of images using pretrained network pickle."""
+import sys; sys.path.extend(['.', 'src'])
+import os
+import json
+import random
+import warnings
+import click
+from src import dnnlib
+import numpy as np
+import torch
+from tqdm import tqdm
+from omegaconf import OmegaConf
+import src.legacy as legacy
+from src.training.logging import generate_videos, save_video_frames_as_mp4, save_video_frames_as_frames_parallel
+torch.set_grad_enabled(False)
+#----------------------------------------------------------------------------
+@click.command()
+@click.pass_context
+@click.option('--network_pkl', help='Network pickle filename', metavar='PATH')
+@click.option('--networks_dir', help='Network pickles directory. Selects a checkpoint from it automatically based on the fvd2048_16f metric.', metavar='PATH')
+@click.option('--truncation_psi', type=float, help='Truncation psi', default=1.0, show_default=True)
+@click.option('--noise_mode', help='Noise mode', type=click.Choice(['const', 'random', 'none']), default='const', show_default=True)
+@click.option('--num_videos', type=int, help='Number of images to generate', default=50000, show_default=True)
+@click.option('--batch_size', type=int, help='Batch size to use for generation', default=32, show_default=True)
+@click.option('--moco_decomposition', type=bool, help='Should we do content/motion decomposition (available only for `--as_grids 1` generation)?', default=False, show_default=True)
+@click.option('--seed', type=int, help='Random seed', default=42, metavar='DIR')
+@click.option('--outdir', help='Where to save the output images', type=str, required=True, metavar='DIR')
+@click.option('--save_as_mp4', help='Should we save as independent frames or mp4?', type=bool, default=False, metavar='BOOL')
+@click.option('--video_len', help='Number of frames to generate', type=int, default=16, metavar='INT')
+@click.option('--fps', help='FPS for mp4 saving', type=int, default=25, metavar='INT')
+@click.option('--as_grids', help='Save videos as grids', type=bool, default=False, metavar='BOOl')
+@click.option('--time_offset', help='Additional time offset', default=0, type=int, metavar='INT')
+@click.option('--dataset_path', help='Dataset path. In case we want to use the conditioning signal.', default="", type=str, metavar='PATH')
+@click.option('--hydra_cfg_path', help='Config path', default="", type=str, metavar='PATH')
+@click.option('--slowmo_coef', help='Increase this value if you want to produce slow-motion videos.', default=1, type=int, metavar='INT')
+def generate(
+    ctx: click.Context,
+    network_pkl: str,
+    networks_dir: str,
+    truncation_psi: float,
+    noise_mode: str,
+    num_videos: int,
+    batch_size: int,
+    moco_decomposition: bool,
+    seed: int,
+    outdir: str,
+    save_as_mp4: bool,
+    video_len: int,
+    fps: int,
+    as_grids: bool,
+    time_offset: int,
+    dataset_path: os.PathLike,
+    hydra_cfg_path: os.PathLike,
+    slowmo_coef: int,
+):
+    if network_pkl is None:
+        # output_regex = "^network-snapshot-\d{6}.pkl$"
+        # ckpt_regex = re.compile("^network-snapshot-\d{6}.pkl$")
+        # ckpts = sorted([f for f in os.listdir(networks_dir) if ckpt_regex.match(f)])
+        # network_pkl = os.path.join(networks_dir, ckpts[-1])
+        ckpt_select_metric = 'fvd2048_16f'
+        metrics_file = os.path.join(networks_dir, f'metric-{ckpt_select_metric}.jsonl')
+        with open(metrics_file, 'r') as f:
+            snapshot_metrics_vals = [json.loads(line) for line in f.read().splitlines()]
+        best_snapshot = sorted(snapshot_metrics_vals, key=lambda m: m['results'][ckpt_select_metric])[0]
+        network_pkl = os.path.join(networks_dir, best_snapshot['snapshot_pkl'])
+        print(f'Using checkpoint: {network_pkl} with FVD16 of', best_snapshot['results'][ckpt_select_metric])
+        # Selecting a checkpoint with the best score
+    else:
+        assert networks_dir is None, "Cant have both parameters: network_pkl and networks_dir"
+    if moco_decomposition:
+        assert as_grids, f"Content/motion decomposition is available only when we generate as grids."
+        assert batch_size == num_videos, "Same motion is supported only for batch_size == num_videos"
+    print('Loading networks from "%s"...' % network_pkl)
+    device = torch.device('cuda')
+    with dnnlib.util.open_url(network_pkl) as f:
+        G = legacy.load_network_pkl(f)['G_ema'].to(device).eval() # type: ignore
+    os.makedirs(outdir, exist_ok=True)
+    random.seed(seed)
+    np.random.seed(seed)
+    torch.manual_seed(seed)
+    all_z = torch.randn(num_videos, G.z_dim, device=device) # [curr_batch_size, z_dim]
+    if dataset_path and G.c_dim > 0:
+        hydra_cfg_path = hydra_cfg_path or os.path.join(networks_dir, '..', "experiment_config.yaml")
+        hydra_cfg = OmegaConf.load(hydra_cfg_path)
+        training_set_kwargs = dnnlib.EasyDict(
+            class_name='training.dataset.VideoFramesFolderDataset',
+            path=dataset_path, cfg=hydra_cfg.dataset, use_labels=True, max_size=None, xflip=False)
+        training_set = dnnlib.util.construct_class_by_name(**training_set_kwargs)
+        all_c = [training_set.get_label(random.choice(range(len(training_set)))) for _ in range(num_videos)] # [num_videos, c_dim]
+        all_c = torch.from_numpy(np.array(all_c)).to(device) # [num_videos, c_dim]
+    elif G.c_dim > 0:
+        warnings.warn('Assuming that the conditioning is one-hot!')
+        c_idx = torch.randint(low=0, high=G.c_dim, size=(num_videos, 1), device=device)
+        all_c = torch.zeros(num_videos, G.c_dim, device=device) # [num_videos, c_dim]
+        all_c.scatter_(1, c_idx, 1)
+    else:
+        all_c = torch.zeros(num_videos, G.c_dim, device=device) # [num_videos, c_dim]
+    ts = time_offset + torch.arange(video_len, device=device).float().unsqueeze(0).repeat(batch_size, 1) / slowmo_coef # [batch_size, video_len]
+    if moco_decomposition:
+        num_rows = num_cols = int(np.sqrt(num_videos))
+        motion_z = G.synthesis.motion_encoder(c=all_c[:num_rows], t=ts[:num_rows])['motion_z'] # [1, *motion_dims]
+        motion_z = motion_z.repeat_interleave(num_cols, dim=0) # [batch_size, *motion_dims]
+        all_z = all_z[:num_cols].repeat(num_rows, 1) # [num_videos, z_dim]
+        all_c = all_c[:num_cols].repeat(num_rows, 1) # [num_videos, z_dim]
+    else:
+        motion_z = None
+    # Generate images.
+    for batch_idx in tqdm(range((num_videos + batch_size - 1) // batch_size), desc='Generating videos'):
+        curr_batch_size = batch_size if batch_size * (batch_idx + 1) <= num_videos else num_videos % batch_size
+        z = all_z[batch_idx * batch_size:batch_idx * batch_size + curr_batch_size] # [curr_batch_size, z_dim]
+        c = all_c[batch_idx * batch_size:batch_idx * batch_size + curr_batch_size] # [curr_batch_size, c_dim]
+        videos = generate_videos(
+            G, z, c, ts, motion_z=motion_z, noise_mode=noise_mode,
+            truncation_psi=truncation_psi, as_grids=as_grids, batch_size_num_frames=128)
+        if as_grids:
+            videos = [videos]
+        for video_idx, video in enumerate(videos):
+            if save_as_mp4:
+                save_path = os.path.join(outdir, f'{batch_idx * batch_size + video_idx:06d}.mp4')
+                save_video_frames_as_mp4(video, fps, save_path)
+            else:
+                save_dir = os.path.join(outdir, f'{batch_idx * batch_size + video_idx:06d}')
+                video = (video * 255).permute(0, 2, 3, 1).to(torch.uint8).numpy() # [video_len, h, w, c]
+                save_video_frames_as_frames_parallel(video, save_dir, time_offset=time_offset, num_processes=8)
+#----------------------------------------------------------------------------
+if __name__ == "__main__":
+    generate() # pylint: disable=no-value-for-parameter
+#----------------------------------------------------------------------------

src/scripts/preprocess_ffs.py ADDED Viewed

	@@ -0,0 +1,204 @@

+"""
+This file preprocesses FaceForensics dataset by cropping it
+Copied from https://github.com/pfnet-research/tgan2/blob/master/scripts/make_face_forensics.py
+"""
+import argparse
+import os
+from typing import List
+from multiprocessing import Pool
+from PIL import Image
+import cv2
+# import h5py
+import imageio
+import numpy as np
+import pandas
+from tqdm import tqdm
+def parse_videos(source_dir, splits: List[str], categories: List[dir]):
+    results = []
+    for split in splits:
+        for category in categories:
+            target_dir = os.path.join(source_dir, split, category)
+            filenames = sorted(os.listdir(target_dir))
+            for filename in filenames:
+                results.append({
+                    'split': split,
+                    'category': category,
+                    'filename': filename,
+                    'filepath': os.path.join(split, category, filename),
+                })
+    return pandas.DataFrame(results)
+def crop(img, left, right, top, bottom, margin):
+    cols = right - left
+    rows = bottom - top
+    if cols < rows:
+        padding = rows - cols
+        left -= padding // 2
+        right += (padding // 2) + (padding % 2)
+        cols = right - left
+    else:
+        padding = cols - rows
+        top -= padding // 2
+        bottom += (padding // 2) + (padding % 2)
+        rows = bottom - top
+    assert(rows == cols)
+    return img[top:bottom, left:right]
+def job_proxy(kwargs):
+    process_and_save_video(**kwargs)
+def process_and_save_video(video_path: os.PathLike, mask_path: os.PathLike, img_size: int, wide_crop: bool, output_dir: os.PathLike):
+    try:
+        video = process_video(video_path, mask_path, img_size=img_size, wide_crop=wide_crop)
+    except KeyboardInterrupt:
+        raise
+    except:
+        print(f'Couldnt process {video_path}')
+        return
+    os.makedirs(output_dir, exist_ok=True)
+    # if os.path.isdir(output_dir) and len(os.listdir(output_dir)) > 0:
+    #     return
+    for i, frame in enumerate(video):
+        frame = frame.transpose(1, 2, 0)
+        Image.fromarray(frame).save(os.path.join(output_dir, f'{i:06d}.jpg'), q=95)
+def process_video(video_path, mask_path, img_size, threshold=5, margin=0.02, wide_crop: bool=False):
+    video_reader = imageio.get_reader(video_path)
+    mask_reader = imageio.get_reader(mask_path)
+    assert(video_reader.get_length() == mask_reader.get_length())
+    # Searching for the widest crop which would work for the whole video
+    if wide_crop:
+        left_most = float('inf')
+        top_most = float('inf')
+        right_most = float('-inf')
+        bottom_most = float('-inf')
+        for img, mask in zip(video_reader, mask_reader):
+            hist = (255 - mask).astype(np.float64).sum(axis=2)
+            horiz_hist = np.where(hist.mean(axis=0) > threshold)[0]
+            vert_hist = np.where(hist.mean(axis=1) > threshold)[0]
+            left, right = horiz_hist[0], horiz_hist[-1]
+            top, bottom = vert_hist[0], vert_hist[-1]
+            left_most = min(left_most, left)
+            top_most = min(top_most, top)
+            right_most = max(right_most, right)
+            bottom_most = max(bottom_most, bottom)
+    video = []
+    for img, mask in zip(video_reader, mask_reader):
+        if wide_crop:
+            left, right, top, bottom = left_most, right_most, top_most, bottom_most
+        else:
+            hist = (255 - mask).astype(np.float64).sum(axis=2)
+            horiz_hist = np.where(hist.mean(axis=0) > threshold)[0]
+            vert_hist = np.where(hist.mean(axis=1) > threshold)[0]
+            left, right = horiz_hist[0], horiz_hist[-1]
+            top, bottom = vert_hist[0], vert_hist[-1]
+        dst_img = crop(img, left, right, top, bottom, margin)
+        try:
+            dst_img = cv2.resize(
+                dst_img, (img_size, img_size),
+                interpolation=cv2.INTER_LANCZOS4).transpose(2, 0, 1)
+            video.append(dst_img)
+        except KeyboardInterrupt:
+            raise
+        except:
+            print(img.shape, dst_img.shape, left, right, top, bottom)
+    T = len(video)
+    video = np.concatenate(video).reshape(T, 3, img_size, img_size)
+    return video
+# def count_frames(path):
+#     reader = imageio.get_reader(path)
+#     n_frames = 0
+#     while True:
+#         try:
+#             img = reader.get_next_data()
+#         except IndexError as e:
+#             break
+#         else:
+#             n_frames += 1
+#     return n_frames
+def main():
+    parser = argparse.ArgumentParser()
+    parser.add_argument('--source_dir', type=str, default='data/FaceForensics_compressed')
+    parser.add_argument('--output_dir', type=str, default='data/ffs_processed')
+    parser.add_argument('--img_size', type=int, default=256)
+    parser.add_argument('--num_workers', type=int, default=8)
+    parser.add_argument('--wide_crop', action='store_true', help="Should we crop each frame independently (this makes a video shaking)?")
+    args = parser.parse_args()
+    # splits = ['train', 'val', 'test']
+    # categories = ['original', 'mask', 'altered']
+    splits = ['train']
+    categories = ['original', 'mask']
+    df = parse_videos(args.source_dir, splits, categories)
+    os.makedirs(args.output_dir, exist_ok=True)
+    for split in splits:
+        target_frame = df[df['split'] == split]
+        filenames = target_frame['filename'].unique()
+        # print('Count # of frames')
+        # rets = []
+        # for i, filename in enumerate(filenames):
+        #     fn_frame = target_frame[target_frame['filename'] == filename]
+        #     video_path = os.path.join(
+        #         args.source_dir, fn_frame[fn_frame['category'] == 'original'].iloc[0]['filepath'])
+        #     rets.append(p.apply_async(count_frames, args=(video_path,)))
+        # n_frames = 0
+        # for ret in tqdm(rets):
+        #     n_frames += ret.get()
+        # print('# of frames: {}'.format(n_frames))
+        # h5file = h5py.File(os.path.join(args.output_dir, '{}.h5'.format(split)), 'w')
+        # dset = h5file.create_dataset('image', (n_frames, 3, args.img_size, args.img_size), dtype=np.uint8)
+        # conf = []
+        # start = 0
+        pool = Pool(processes=args.num_workers)
+        job_kwargs_list = []
+        for i, filename in enumerate(filenames):
+            fn_frame = target_frame[target_frame['filename'] == filename]
+            video_path = os.path.join(args.source_dir, fn_frame[fn_frame['category'] == 'original'].iloc[0]['filepath'])
+            mask_path = os.path.join(args.source_dir, fn_frame[fn_frame['category'] == 'mask'].iloc[0]['filepath'])
+            job_kwargs_list.append(dict(
+                video_path=video_path,
+                mask_path=mask_path,
+                img_size=args.img_size,
+                wide_crop=args.wide_crop,
+                output_dir=os.path.join(args.output_dir, filename[:filename.rfind('.')]),
+            ))
+        for _ in tqdm(pool.imap_unordered(job_proxy, job_kwargs_list), desc=f'Processing {split}', total=len(job_kwargs_list)):
+            pass
+            # T = len(video)
+            #dset[start:(start + T)] = video
+            # conf.append({'start': start, 'end': (start + T)})
+            # start += T
+        # conf = pandas.DataFrame(conf)
+        # conf.to_json(os.path.join(args.output_dir, '{}.json'.format(split)), orient='records')
+if __name__ == '__main__':
+    main()

src/scripts/profile_model.py ADDED Viewed

	@@ -0,0 +1,104 @@

+"""
+This script computes imgs/sec for a generator in the eval mode
+for different batch sizes
+"""
+import sys; sys.path.extend(['..', '.', 'src'])
+import time
+import numpy as np
+import torch
+import torch.nn as nn
+import hydra
+from hydra.experimental import initialize
+from omegaconf import DictConfig, OmegaConf
+from tqdm import tqdm
+import torch.autograd.profiler as profiler
+from src import dnnlib
+from src.infra.utils import recursive_instantiate
+DEVICE = 'cuda'
+BATCH_SIZES = [32]
+NUM_WARMUP_ITERS = 5
+NUM_PROFILE_ITERS = 25
+def instantiate_G(cfg: DictConfig) -> nn.Module:
+    G_kwargs = dnnlib.EasyDict(class_name='training.networks.Generator', w_dim=512, mapping_kwargs=dnnlib.EasyDict(), synthesis_kwargs=dnnlib.EasyDict())
+    G_kwargs.synthesis_kwargs.channel_base = int(cfg.model.generator.get('fmaps', 0.5) * 32768)
+    G_kwargs.synthesis_kwargs.channel_max = 512
+    G_kwargs.mapping_kwargs.num_layers = cfg.model.generator.get('mapping_net_n_layers', 2)
+    if cfg.get('num_fp16_res', 0) > 0:
+        G_kwargs.synthesis_kwargs.num_fp16_res = cfg.num_fp16_res
+        G_kwargs.synthesis_kwargs.conv_clamp = 256
+    G_kwargs.cfg = cfg.model.generator
+    G_kwargs.c_dim = 0
+    G_kwargs.img_resolution = cfg.get('resolution', 256)
+    G_kwargs.img_channels = 3
+    G = dnnlib.util.construct_class_by_name(**G_kwargs).eval().requires_grad_(False).to(DEVICE)
+    return G
+@torch.no_grad()
+def profile_for_batch_size(G: nn.Module, cfg: DictConfig, batch_size: int):
+    z = torch.randn(batch_size, G.z_dim, device=DEVICE)
+    c = torch.zeros(batch_size, G.c_dim, device=DEVICE)
+    t = torch.zeros(batch_size, 2, device=DEVICE)
+    times = []
+    for i in tqdm(range(NUM_WARMUP_ITERS), desc='Warming up'):
+        torch.cuda.synchronize()
+        fake_img = G(z, c=c, t=t).contiguous()
+        y = fake_img[0, 0, 0, 0].item() # sync
+        torch.cuda.synchronize()
+    time.sleep(1)
+    torch.cuda.reset_peak_memory_stats()
+    with profiler.profile(record_shapes=True, use_cuda=True) as prof:
+        for i in tqdm(range(NUM_PROFILE_ITERS), desc='Profiling'):
+            torch.cuda.synchronize()
+            start_time = time.time()
+            with profiler.record_function("forward"):
+                fake_img = G(z, c=c, t=t).contiguous()
+                y = fake_img[0, 0, 0, 0].item() # sync
+            torch.cuda.synchronize()
+            times.append(time.time() - start_time)
+    torch.cuda.empty_cache()
+    num_imgs_processed = len(times) * batch_size
+    total_time_spent = np.sum(times)
+    bandwidth = num_imgs_processed / total_time_spent
+    summary = prof.key_averages().table(sort_by="cpu_time_total", row_limit=10)
+    print(f'[Batch size: {batch_size}] Mean: {np.mean(times):.05f}s/it. Std: {np.std(times):.05f}s')
+    print(f'[Batch size: {batch_size}] Imgs/sec: {bandwidth:.03f}')
+    print(f'[Batch size: {batch_size}] Max mem: {torch.cuda.max_memory_allocated(DEVICE) / 2**30:<6.2f} gb')
+    return bandwidth, summary
+@hydra.main(config_path="../../configs", config_name="config.yaml")
+def profile(cfg: DictConfig):
+    recursive_instantiate(cfg)
+    G = instantiate_G(cfg)
+    bandwidths = []
+    summaries = []
+    print(f'Number of parameters: {sum(p.numel() for p in G.parameters())}')
+    for batch_size in BATCH_SIZES:
+        bandwidth, summary = profile_for_batch_size(G, cfg, batch_size)
+        bandwidths.append(bandwidth)
+        summaries.append(summary)
+    best_batch_size_idx = int(np.argmax(bandwidths))
+    print(f'------------ Best batch size is {BATCH_SIZES[best_batch_size_idx]} ------------')
+    print(summaries[best_batch_size_idx])
+if __name__ == '__main__':
+    profile()

src/scripts/project.py ADDED Viewed

	@@ -0,0 +1,479 @@

+"""
+Given a dataset of images, it (optionally crops it) and embeds into the model
+Also optionally generates random videos from the found w
+"""
+import sys; sys.path.extend(['.', 'src'])
+import os
+import re
+import json
+import random
+from typing import List, Optional, Callable
+from typing import List
+from PIL import Image
+import click
+from src import dnnlib
+import numpy as np
+import torch
+from tqdm import tqdm
+from omegaconf import OmegaConf
+import torch.nn as nn
+import torch.nn.functional as F
+from torchvision import utils
+from torch import Tensor
+import torchvision.transforms.functional as TVF
+from torchvision.utils import save_image
+import legacy
+from src.training.logging import generate_videos, save_video_frames_as_mp4, save_video_frames_as_frames
+from src.torch_utils import misc
+#----------------------------------------------------------------------------
+def project(
+    _sentinel=None,
+    G: Callable=None,
+    vgg16: nn.Module=None,
+    target_images: List[Tensor]=None,
+    device: str='cuda',
+    use_w_init: bool=False,
+    use_motion_init: bool=False,
+    w_avg_samples = 10000,
+    num_steps = 1000,
+    initial_learning_rate = 0.1,
+    initial_noise_factor = 0.05,
+    noise_ramp_length = 0.75,
+    lr_rampdown_length = 0.25,
+    lr_rampup_length = 0.05,
+    #regularize_noise_weight = 1e5,
+    regularize_noise_weight = 0.0001,
+    motion_reg_type: str=None,
+):
+    num_videos = len(target_images)
+    # misc.assert_shape(target_images, [None, G.img_channels, G.img_resolution, G.img_resolution])
+    G = G.eval().requires_grad_(False).to(device) # type: ignore
+    c = torch.zeros(num_videos, G.c_dim, device=device)
+    ts = torch.zeros(num_videos, 1, device=device)
+    # Compute w stats.
+    z_samples = np.random.RandomState(123).randn(w_avg_samples, G.z_dim)
+    w_samples = G.mapping(torch.from_numpy(z_samples).to(device), None)  # [N, L, C]
+    w_samples = w_samples[:, :1, :].cpu().numpy().astype(np.float32)       # [N, 1, C]
+    w_avg = np.mean(w_samples, axis=0, keepdims=True)      # [1, 1, C]
+    w_std = (np.sum((w_samples - w_avg) ** 2) / w_avg_samples) ** 0.5
+    # img_mean = G.synthesis(
+    #     ws=torch.from_numpy(w_avg).repeat(1, G.num_ws, 1).to(device),
+    #     c=c[0], t=ts[[0]],
+    # )
+    # img_mean = (img_mean * 0.5 + 0.5).cpu().detach()
+    # TVF.to_pil_image(img_mean[0]).save('/tmp/data/mean.png')
+    # print('saved!')
+    # Load VGG16 feature detector.
+    url = 'https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada-pytorch/pretrained/metrics/vgg16.pt'
+    with dnnlib.util.open_url(url) as f:
+        vgg16 = torch.jit.load(f).eval().to(device)
+    # Features for target image.
+    target_features = []
+    for img in target_images:
+        img = img.to(device).to(torch.float32).unsqueeze(0) * 255.0
+        if img.shape[2] > 256:
+            img = F.interpolate(img, size=(256, 256), mode='area')
+        target_features.append(vgg16(img, resize_images=False, return_lpips=True).squeeze(0))
+    target_features = torch.stack(target_features) # [num_images, lpips_dim]
+    if use_w_init:
+        w_opt = find_w_init() # [num_videos, 1, w_dim]
+        w_opt = w_opt.detach().requires_grad_(True) # [num_videos, num_ws, w_dim]
+    else:
+        w_opt = torch.tensor(w_avg, dtype=torch.float32, device=device, requires_grad=True) # pylint: disable=not-callable
+        w_opt = w_opt.repeat(num_videos, G.num_ws, 1).detach().requires_grad_(True) # [num_videos, num_ws, w_dim]
+    # w_opt_to_ws = lambda w_opt: torch.cat([w_opt[:, [0]].repeat(1, G.num_ws // 2, 1), w_opt[:, 1:]], dim=1)
+    # Trying a lot of motions to find which one works best
+    if use_motion_init:
+        motion_z_opt = select_motions(motion_codes)
+    else:
+        motion_z_opt = G.synthesis.motion_encoder(c=c, t=ts)['motion_z']
+        # motion_z_opt.data = torch.randn_like(motion_z_opt.data) * 1e-3
+    motion_z_opt.requires_grad_(True)
+    w_result = torch.zeros([num_steps] + list(w_opt.shape), dtype=torch.float32, device=device)
+    # optimizer = torch.optim.Adam([w_opt] + [motion_z_opt], betas=(0.9, 0.999), lr=initial_learning_rate)
+    optimizer = torch.optim.Adam([w_opt], betas=(0.9, 0.999), lr=initial_learning_rate)
+    for step in tqdm(range(num_steps)):
+        # Learning rate schedule.
+        t = step / num_steps
+        w_noise_scale = w_std * initial_noise_factor * max(0.0, 1.0 - t / noise_ramp_length) ** 2
+        lr_ramp = min(1.0, (1.0 - t) / lr_rampdown_length)
+        lr_ramp = 0.5 - 0.5 * np.cos(lr_ramp * np.pi)
+        lr_ramp = lr_ramp * min(1.0, t / lr_rampup_length)
+        lr = initial_learning_rate * lr_ramp
+        for param_group in optimizer.param_groups:
+            param_group['lr'] = lr
+        # Synth images from opt_w.
+        w_noise = torch.randn_like(w_opt) * w_noise_scale
+        ws = w_opt + w_noise
+        #ws = w_opt_to_ws(w_opt + w_noise)
+        #ws = (w_opt + w_noise).repeat([1, G.mapping.num_ws, 1])
+        #synth_images = G.synthesis(ws, c=c, t=ts, motion_z=motion_z_opt + torch.randn_like(motion_z_opt) * w_noise_scale)
+        synth_images = G.synthesis(ws, c=c, t=ts, motion_z=motion_z_opt)
+        #synth_images = G.synthesis(ws, c=c, t=ts)
+        # Downsample image to 256x256 if it's larger than that. VGG was built for 224x224 images.
+        synth_images = (synth_images * 0.5 + 0.5) * 255.0
+        if synth_images.shape[2] > 256:
+            synth_images = F.interpolate(synth_images, size=(256, 256), mode='area')
+        # Features for synth images.
+        synth_features = vgg16(synth_images, resize_images=False, return_lpips=True)
+        dist = (target_features - synth_features).square().sum()
+        # Noise regularization.
+        if motion_reg_type is None:
+            reg_loss = 0.0
+        elif motion_reg_type == "norm":
+            reg_loss = motion_z_opt.norm(dim=2).mean()
+        elif motion_reg_type == "dist":
+            reg_loss = motion_z_opt.mean().pow(2) + (motion_z_opt.var() - 1).pow(2)
+        elif motion_reg_type == "sg2":
+            for v in noise_bufs.values():
+                noise = v[None,None,:,:] # must be [1,1,H,W] for F.avg_pool2d()
+                while True:
+                    reg_loss += (noise*torch.roll(noise, shifts=1, dims=3)).mean()**2
+                    reg_loss += (noise*torch.roll(noise, shifts=1, dims=2)).mean()**2
+                    if noise.shape[2] <= 8:
+                        break
+                    noise = F.avg_pool2d(noise, kernel_size=2)
+        else:
+            raise NotImplementedError(f"Uknown motion_reg_type: {motion_reg_type}")
+        loss = dist + reg_loss * regularize_noise_weight
+        # Step
+        optimizer.zero_grad(set_to_none=True)
+        loss.backward()
+        optimizer.step()
+        # Save projected W for each optimization step.
+        w_result[step] = w_opt.detach()
+        # Normalize noise.
+    #     with torch.no_grad():
+    #         for buf in motion_z_opt.values():
+    #             buf -= buf.mean()
+    #             buf *= buf.square().mean().rsqrt()
+    return w_result, motion_z_opt
+#----------------------------------------------------------------------------
+@torch.no_grad()
+def find_motions_init(G: Callable, vgg16: nn.Module, target_features: Tensor, c: Tensor, t: Tensor, num_motions_to_try: int=128):
+    motions = G.synthesis.motion_encoder(
+        c=c.repeat_interleave(num_motions_to_try, dim=0),
+        t=t.repeat_interleave(num_motions_to_try, dim=0))['motion_z'] # [num_videos * num_motions_to_try, ...]
+    synth_images = G.synthesis(
+        w_opt.repeat_interleave(num_motions_to_try, dim=0),
+        c=c.repeat_interleave(num_motions_to_try, dim=0),
+        t=t.repeat_interleave(num_motions_to_try, dim=0),
+        motion_z=motions)
+    if synth_images.shape[2] > 256:
+        synth_images = F.interpolate(synth_images, size=(256, 256), mode='area')
+    synth_images = (synth_images * 0.5 + 0.5) * 255.0
+    synth_features = vgg16(synth_images, resize_images=False, return_lpips=True) # [num_videos * num_motions_to_try, ...]
+    dist = (target_features.repeat_interleave(num_motions_to_try, dim=0) - synth_features).square().sum(dim=1) # [num_videos * num_motions_to_try]
+    best_motions_idx = dist.view(num_videos, num_motions_to_try).argmin(dim=1) # [num_videos]
+    motion_z_opt = motions[best_motions_idx] # [num_videos, ...]
+    return motion_z_opt
+#----------------------------------------------------------------------------
+@torch.no_grad()
+def find_w_init(G: Callable, vgg16: nn.Module, target_features: Tensor, c: Tensor, t: Tensor, l: Tensor, num_w_to_try: int=128):
+    z = torch.randn(num_videos * num_w_to_try, G.z_dim, device=device)
+    w = G.mapping(z=z, c=None)  # [N, L, C]
+    synth_images = G.synthesis(
+        ws=w,
+        c=c.repeat_interleave(num_w_to_try, dim=0),
+        t=t.repeat_interleave(num_w_to_try, dim=0))
+    if synth_images.shape[2] > 256:
+        synth_images = F.interpolate(synth_images, size=(256, 256), mode='area')
+    synth_images = (synth_images * 0.5 + 0.5) * 255.0
+    synth_features = vgg16(synth_images, resize_images=False, return_lpips=True) # [num_videos * num_motions_to_try, ...]
+    dist = (target_features.repeat_interleave(num_w_to_try, dim=0) - synth_features).square().sum(dim=1) # [num_videos * num_motions_to_try]
+    best_w_idx = dist.view(num_videos, num_w_to_try).argmin(dim=1) # [num_videos]
+    w_opt = w[best_w_idx] # [num_videos, num_ws, w_dim]
+    return w_opt
+#----------------------------------------------------------------------------
+@torch.no_grad()
+def load_target_images(img_paths: List[os.PathLike], extract_faces: bool=False, ref_image: Tensor=None):
+    images = [Image.open(f) for f in tqdm(img_paths, desc='Loading images')]
+    if extract_faces:
+        images = extract_faces_from_images(imgs=images, ref_image=ref_image)
+        for p, img in zip(img_paths, images):
+            img.save('/tmp/data/faces_extracted/' + os.path.basename(p), q=95)
+        assert False
+        # grid = torch.stack([TVF.to_tensor(x) for x in images])
+        # grid = utils.make_grid(grid, nrow=8)
+        # save_image(grid, f'/tmp/data/faces_extracted.png')
+        # print('Saved the extracted images!')
+    # images = [x[:, 200:-400, 450:-200] for x in images]
+    images = [TVF.to_tensor(x) for x in images]
+    images = [TVF.resize(x, size=(256, 256)) for x in images]
+    return images
+#----------------------------------------------------------------------------
+@torch.no_grad()
+def extract_faces_from_images(_sentinel=None, imgs: List=None, ref_image: "Image"=None, device: str='cuda'):
+    assert _sentinel is None
+    try:
+        import face_alignment
+    except ImportError:
+        raise ImportError("To project images with alignment, you need to install the `face_alignment` library.")
+    SELECTED_LANDMARKS = [38, 44]
+    fa = face_alignment.FaceAlignment(face_alignment.LandmarksType._2D, flip_input=False, device=device)
+    ref_landmarks = fa.get_landmarks_from_image(np.array(ref_image))[0][SELECTED_LANDMARKS] # [2, 2]
+    landmarks = [fa.get_landmarks_from_image(np.array(x))[0][SELECTED_LANDMARKS] for x in imgs] # [num_imgs, 2, 2]
+    ref_dist = ((ref_landmarks[0] - ref_landmarks[1]) ** 2).sum() ** 0.5 # [1]
+    dists = [((p[0] - p[1]) ** 2).sum() ** 0.5 for p in landmarks] # [num_imgs]
+    resize_ratios = [ref_dist / d for d in dists] # [num_imgs]
+    new_sizes = [(int(r * x.size[1]), int(r * x.size[0])) for r, x in zip(resize_ratios, imgs)]
+    imgs_resized = [TVF.resize(x, size=s, interpolation=Image.LANCZOS) for x, s in zip(imgs, new_sizes)] # [num_imgs, Image]
+    bbox_left = [p[0][0] * r - ref_landmarks[0][0] for p, r in zip(landmarks, resize_ratios)]
+    bbox_top = [p[0][1] * r - ref_landmarks[0][1] for p, r in zip(landmarks, resize_ratios)]
+    out = [x.crop(box=(l, t, l + ref_image.size[0], t + ref_image.size[1])) for x, l, t in zip(imgs_resized, bbox_left, bbox_top)]
+    return out
+#----------------------------------------------------------------------------
+def pad_box_to_square(left, upper, right, lower):
+    h = lower - upper
+    w = right - left
+    if h == w:
+        return left, upper, right, lower
+    elif w > h:
+        diff = w - h
+        assert False, "Not implemented"
+    else:
+        pad = (h - w) // 2
+        return (left - pad, upper, right + pad, lower)
+#----------------------------------------------------------------------------
+def add_margins(box, margin, width: int=float('inf'), height: int=float('inf')):
+    left, upper, right, lower = box
+    return (
+        max(0, left - margin[0]),
+        max(0, upper - margin[1]),
+        min(width, right + margin[2]),
+        min(height, lower + margin[3]),
+    )
+#----------------------------------------------------------------------------
+def add_top_margin(box, margin_ratio: float=0.0):
+    left, upper, right, lower = box
+    height = lower - upper
+    margin = int(height * margin_ratio)
+    return (left, max(0, upper - margin), right, lower)
+#----------------------------------------------------------------------------
+def save_edited_w(
+        _sentinel=None,
+        G: Callable=None,
+        w_outdir: os.PathLike=None,
+        samples_outdir: os.PathLike=None,
+        img_names: List[str]=None,
+        stack_samples: bool=False,
+        num_frames: int = 16,
+        each_nth_frame: int = 3,
+        all_w: Tensor=None,
+        all_motion_z: Tensor=None,
+        stacked_samples_out_path: os.PathLike=None,
+    ):
+    assert _sentinel is None
+    # w_outdir = os.path.join(os.path.basename(images_dir))
+    os.makedirs(w_outdir, exist_ok=True)
+    num_videos = len(img_names)
+    device = all_w.device
+    if not stack_samples:
+        os.makedirs(samples_outdir, exist_ok=True)
+    else:
+        all_samples = []
+    # Generate samples from the given w and save them.
+    with torch.no_grad():
+        z = torch.randn(num_videos, G.z_dim, device=device) # [num_videos, z_dim]
+        c = torch.zeros(num_videos, G.c_dim, device=device) # [num_videos, c_dim]
+        for i, w in enumerate(all_w):
+            torch.save(w.cpu(), os.path.join(w_outdir, f'{img_names[i]}_w.pt'))
+            if all_motion_z is None:
+                motion_z = None
+            else:
+                motion_z = all_motion_z[i] # [...<any>...]
+                torch.save(motion_z.cpu(), os.path.join(w_outdir, f'{img_names[i]}_motion.pt'))
+                motion_z = motion_z.unsqueeze(0).to(device) # [1, ...<any>...]
+                motion_z = torch.randn_like(motion_z)
+            w = w.unsqueeze(0).to(device) # [1, num_ws, w_dim]
+            t = torch.linspace(0, num_frames * (1 + each_nth_frame), num_frames, device=device).unsqueeze(0)
+            imgs = G.synthesis(w, c=c[[i]]], t=t, motion_z=motion_z)
+            imgs = (imgs * 0.5 + 0.5).clamp(0, 1)
+            grid = utils.make_grid(imgs, nrow=num_frames).cpu()
+            if stack_samples:
+                all_samples.append(grid)
+            else:
+                # TVF.to_pil_image(grid).save(os.path.join(samples_outdir, img_names[i]) + '.jpg', q=95)
+                save_image(grid, os.path.join(samples_outdir, img_names[i]) + '.png')
+    if stack_samples:
+        main_grid = torch.stack(all_samples) # [num_videos, c, h, w * num_frames]
+        main_grid = utils.make_grid(main_grid, nrow=1)
+        # TVF.to_pil_image(main_grid).save(f'{images_dir}.jpg', q=95)
+        save_image(main_grid, stacked_samples_out_path)
+#----------------------------------------------------------------------------
+@click.command()
+@click.pass_context
+@click.option('--network_pkl', help='Network pickle filename', metavar='PATH')
+@click.option('--networks_dir', help='Network pickles directory', metavar='PATH')
+# @click.option('--truncation_psi', type=float, help='Truncation psi', default=1.0, show_default=True)
+# @click.option('--noise_mode', help='Noise mode', type=click.Choice(['const', 'random', 'none']), default='const', show_default=True)
+# @click.option('--same_motion_codes', type=bool, help='Should we use the same motion codes for all videos?', default=False, show_default=True)
+@click.option('--seed', type=int, help='Random seed', default=42, metavar='DIR')
+@click.option('--images_dir', help='Where to save the output images', type=str, required=True, metavar='DIR')
+# @click.option('--save_as_mp4', help='Should we save as independent frames or mp4?', type=bool, default=False, metavar='BOOL')
+# @click.option('--video_len', help='Number of frames to generate', type=int, default=16, metavar='INT')
+# @click.option('--fps', help='FPS for mp4 saving', type=int, default=25, metavar='INT')
+# @click.option('--as_grids', help='Save videos as grids', type=bool, default=False, metavar='BOOl')
+@click.option('--zero_periods', help='Zero-out periods predictor?', default=False, type=bool, metavar='BOOL')
+@click.option('--num_weights_to_slice', help='Number of high-frequency coords to remove.', default=0, type=int, metavar='INT')
+@click.option('--use_w_init', help='Init w by LPIPS.', default=False, type=bool, metavar='BOOL')
+@click.option('--use_motion_init', help='Init motions by LPIPS.', default=False, type=bool, metavar='BOOL')
+@click.option('--motion_reg_type', help='Type of the regularization for motion', default=None, type=str, metavar='STR')
+@click.option('--num_steps', help='Number of the optimization steps to perform.', default=1000, type=int, metavar='INT')
+@click.option('--stack_samples', help='When saving, should we stack samples together?', default=False, type=bool, metavar='BOOL')
+@click.option('--extract_faces', help='Use FaceNet to extract the face?', default=False, type=bool, metavar='BOOL')
+def main(
+    ctx: click.Context,
+    network_pkl: str,
+    networks_dir: str,
+    seed: int,
+    images_dir: str,
+    # save_as_mp4: bool,
+    # video_len: int,
+    # fps: int,
+    # as_grids: bool,
+    zero_periods: bool,
+    num_weights_to_slice: int,
+    use_w_init: bool,
+    use_motion_init: bool,
+    motion_reg_type: str,
+    num_steps: int,
+    stack_samples: bool,
+    extract_faces: bool,
+):
+    if network_pkl is None:
+        output_regex = "^network-snapshot-\d{6}.pkl$"
+        ckpt_regex = re.compile("^network-snapshot-\d{6}.pkl$")
+        # ckpts = sorted([f for f in os.listdir(networks_dir) if ckpt_regex.match(f)])
+        # network_pkl = os.path.join(networks_dir, ckpts[-1])
+        metrics_file = os.path.join(networks_dir, 'metric-fvd2048_16f.jsonl')
+        with open(metrics_file, 'r') as f:
+            snapshot_metrics_vals = [json.loads(line) for line in f.read().splitlines()]
+        best_snapshot = sorted(snapshot_metrics_vals, key=lambda m: m['results']['fvd2048_16f'])[0]
+        network_pkl = os.path.join(networks_dir, best_snapshot['snapshot_pkl'])
+        print(f'Using checkpoint: {network_pkl} with FVD16 of', best_snapshot['results']['fvd2048_16f'])
+        # Selecting a checkpoint with the best score
+    else:
+        assert networks_dir is None, "Cant have both parameters: network_pkl and networks_dir"
+    print('Loading networks from "%s"...' % network_pkl, end='')
+    device = torch.device('cuda')
+    with dnnlib.util.open_url(network_pkl) as f:
+        G = legacy.load_network_pkl(f)['G_ema'].to(device).eval() # type: ignore
+    print('Loaded!')
+    random.seed(seed)
+    np.random.seed(seed)
+    torch.manual_seed(seed)
+    if zero_periods:
+        G.synthesis.motion_encoder.time_encoder.periods_predictor.weight.data.zero_()
+    if num_weights_to_slice > 0:
+        G.synthesis.motion_encoder.time_encoder.weights[:, -num_weights_to_slice:] = 0.0
+    img_paths = sorted([os.path.join(images_dir, p) for p in os.listdir(images_dir) if p.endswith('.jpg')])
+    img_names = [n[:n.rfind('.')] for n in [os.path.basename(p) for p in img_paths]]
+    target_images = load_target_images(img_paths, extract_faces, ref_image=Image.open('/tmp/data/mean.png')) # [b, c, h, w]
+    assert G.c_dim == 0, "G.c_dim > 0 is not supported"
+    w_all_iters, motion_z_final = project(
+        G=G,
+        target_images=target_images,
+        num_steps=num_steps,
+        device=device,
+        use_w_init=use_w_init,
+        use_motion_init=use_motion_init,
+        motion_reg_type=motion_reg_type,
+    ) # [num_videos, num_ws, w_dim]
+    save_edited_w(
+        G=G,
+        w_outdir = f'{images_dir}_projected',
+        samples_outdir = f'{images_dir}_projected_samples',
+        img_names=img_names,
+        stack_samples=stack_samples,
+        all_w = w_all_iters[-1],
+        all_motion_z = motion_z_final,
+        stacked_samples_out_path = f'{images_dir}.png'
+    )
+#----------------------------------------------------------------------------
+if __name__ == "__main__":
+    main() # pylint: disable=no-value-for-parameter
+#----------------------------------------------------------------------------

src/torch_utils/__init__.py ADDED Viewed

	@@ -0,0 +1,9 @@

+# Copyright (c) 2021, NVIDIA CORPORATION.  All rights reserved.
+#
+# NVIDIA CORPORATION and its licensors retain all intellectual property
+# and proprietary rights in and to this software, related documentation
+# and any modifications thereto.  Any use, reproduction, disclosure or
+# distribution of this software and related documentation without an express
+# license agreement from NVIDIA CORPORATION is strictly prohibited.
+# empty

src/torch_utils/custom_ops.py ADDED Viewed

	@@ -0,0 +1,126 @@

+# Copyright (c) 2021, NVIDIA CORPORATION.  All rights reserved.
+#
+# NVIDIA CORPORATION and its licensors retain all intellectual property
+# and proprietary rights in and to this software, related documentation
+# and any modifications thereto.  Any use, reproduction, disclosure or
+# distribution of this software and related documentation without an express
+# license agreement from NVIDIA CORPORATION is strictly prohibited.
+import os
+import glob
+import torch
+import torch.utils.cpp_extension
+import importlib
+import hashlib
+import shutil
+from pathlib import Path
+from torch.utils.file_baton import FileBaton
+#----------------------------------------------------------------------------
+# Global options.
+verbosity = 'brief' # Verbosity level: 'none', 'brief', 'full'
+#----------------------------------------------------------------------------
+# Internal helper funcs.
+def _find_compiler_bindir():
+    patterns = [
+        'C:/Program Files (x86)/Microsoft Visual Studio/*/Professional/VC/Tools/MSVC/*/bin/Hostx64/x64',
+        'C:/Program Files (x86)/Microsoft Visual Studio/*/BuildTools/VC/Tools/MSVC/*/bin/Hostx64/x64',
+        'C:/Program Files (x86)/Microsoft Visual Studio/*/Community/VC/Tools/MSVC/*/bin/Hostx64/x64',
+        'C:/Program Files (x86)/Microsoft Visual Studio */vc/bin',
+    ]
+    for pattern in patterns:
+        matches = sorted(glob.glob(pattern))
+        if len(matches):
+            return matches[-1]
+    return None
+#----------------------------------------------------------------------------
+# Main entry point for compiling and loading C++/CUDA plugins.
+_cached_plugins = dict()
+def get_plugin(module_name, sources, **build_kwargs):
+    assert verbosity in ['none', 'brief', 'full']
+    # Already cached?
+    if module_name in _cached_plugins:
+        return _cached_plugins[module_name]
+    # Print status.
+    if verbosity == 'full':
+        print(f'Setting up PyTorch plugin "{module_name}"...')
+    elif verbosity == 'brief':
+        print(f'Setting up PyTorch plugin "{module_name}"... ', end='', flush=True)
+    try: # pylint: disable=too-many-nested-blocks
+        # Make sure we can find the necessary compiler binaries.
+        if os.name == 'nt' and os.system("where cl.exe >nul 2>nul") != 0:
+            compiler_bindir = _find_compiler_bindir()
+            if compiler_bindir is None:
+                raise RuntimeError(f'Could not find MSVC/GCC/CLANG installation on this computer. Check _find_compiler_bindir() in "{__file__}".')
+            os.environ['PATH'] += ';' + compiler_bindir
+        # Compile and load.
+        verbose_build = (verbosity == 'full')
+        # Incremental build md5sum trickery.  Copies all the input source files
+        # into a cached build directory under a combined md5 digest of the input
+        # source files.  Copying is done only if the combined digest has changed.
+        # This keeps input file timestamps and filenames the same as in previous
+        # extension builds, allowing for fast incremental rebuilds.
+        #
+        # This optimization is done only in case all the source files reside in
+        # a single directory (just for simplicity) and if the TORCH_EXTENSIONS_DIR
+        # environment variable is set (we take this as a signal that the user
+        # actually cares about this.)
+        source_dirs_set = set(os.path.dirname(source) for source in sources)
+        if len(source_dirs_set) == 1 and ('TORCH_EXTENSIONS_DIR' in os.environ):
+            all_source_files = sorted(list(x for x in Path(list(source_dirs_set)[0]).iterdir() if x.is_file()))
+            # Compute a combined hash digest for all source files in the same
+            # custom op directory (usually .cu, .cpp, .py and .h files).
+            hash_md5 = hashlib.md5()
+            for src in all_source_files:
+                with open(src, 'rb') as f:
+                    hash_md5.update(f.read())
+            build_dir = torch.utils.cpp_extension._get_build_directory(module_name, verbose=verbose_build) # pylint: disable=protected-access
+            digest_build_dir = os.path.join(build_dir, hash_md5.hexdigest())
+            if not os.path.isdir(digest_build_dir):
+                os.makedirs(digest_build_dir, exist_ok=True)
+                baton = FileBaton(os.path.join(digest_build_dir, 'lock'))
+                if baton.try_acquire():
+                    try:
+                        for src in all_source_files:
+                            shutil.copyfile(src, os.path.join(digest_build_dir, os.path.basename(src)))
+                    finally:
+                        baton.release()
+                else:
+                    # Someone else is copying source files under the digest dir,
+                    # wait until done and continue.
+                    baton.wait()
+            digest_sources = [os.path.join(digest_build_dir, os.path.basename(x)) for x in sources]
+            torch.utils.cpp_extension.load(name=module_name, build_directory=build_dir,
+                verbose=verbose_build, sources=digest_sources, **build_kwargs)
+        else:
+            torch.utils.cpp_extension.load(name=module_name, verbose=verbose_build, sources=sources, **build_kwargs)
+        module = importlib.import_module(module_name)
+    except:
+        if verbosity == 'brief':
+            print('Failed!')
+        raise
+    # Print status and add to cache.
+    if verbosity == 'full':
+        print(f'Done setting up PyTorch plugin "{module_name}".')
+    elif verbosity == 'brief':
+        print('Done.')
+    _cached_plugins[module_name] = module
+    return module
+#----------------------------------------------------------------------------