Upload folder using huggingface_hub

Browse files

Files changed (3) hide show

README.md +177 -30
train_celeba64.py +207 -0
train_flow_matching_on_images.py +198 -0

README.md CHANGED Viewed

@@ -9,38 +9,42 @@ datasets:
   - mnist
   - cifar10
   - celeba
 ---
 # UNet Flow Matching Models
 Pre-trained UNet models for Flow Matching on MNIST, CIFAR-10, and CelebA datasets.
 ## Models
 This repository contains three UNet-based velocity field models trained with Flow Matching:
 ### MNIST (28×28 Grayscale)
-- **Checkpoint**: `mnist/ckpt.pth`
 - **Parameters**: 6.2M
 - **Architecture**: UNet with num_channels=64, num_res_blocks=2
-- **Conditional**: Yes (10 classes)
-- **Training**: 50 epochs on MNIST dataset
 - **Hardware**: NVIDIA H100 GPU
-### CIFAR-10 (32×32 RGB)
-- **Checkpoint**: `cifar10/ckpt.pth`
 - **Parameters**: 9.0M
 - **Architecture**: UNet with num_channels=64, num_res_blocks=2
 - **Conditional**: Yes (10 classes)
-- **Training**: 50 epochs on CIFAR-10 dataset
 - **Hardware**: NVIDIA H100 GPU
 ### CelebA (64×64 RGB)
-- **Checkpoint**: `celeba64/ckpt.pth`
 - **Parameters**: 83.0M
 - **Architecture**: UNet with num_channels=128, num_res_blocks=2
 - **Conditional**: No (unconditional face generation)
-- **Training**: 50 epochs on CelebA dataset (202,599 images)
 - **Final loss**: 0.114
 - **Hardware**: NVIDIA H100 GPU
@@ -48,49 +52,192 @@ This repository contains three UNet-based velocity field models trained with Flo
 ### MNIST
 ![MNIST Samples](mnist_with_diff.png)
 ### CIFAR-10
 ![CIFAR-10 Samples](cifar10_with_diff.png)
 ### CelebA 64×64
 ![CelebA Samples](celeba_with_diff.png)
 ## Usage
 ```python
 import torch
-# Load checkpoint (example for CelebA)
-checkpoint = torch.load("celeba64/ckpt.pth", map_location="cuda")
-# The checkpoint contains the model state dict
-# You need to create the UNet model first, then load the weights
 ```
-## Architecture
-Based on the UNet architecture from OpenAI Guided Diffusion, adapted for Flow Matching.
-**Key components**:
-- Time embedding layers
-- ResNet blocks with adaptive normalization
-- Self-attention blocks
-- U-Net skip connections
-- Class conditioning (for MNIST and CIFAR-10)
-## Training Details
-All models trained with:
-- **Optimizer**: AdamW
-- **Epochs**: 50
-- **GPU**: NVIDIA H100
-- **Loss function**: MSE between predicted and target velocity fields
-Dataset-specific:
-- **MNIST**: batch_size=128, lr=1e-3
-- **CIFAR-10**: batch_size=128, lr=1e-3, horizontal_flip=True
-- **CelebA**: batch_size=512, lr=1e-4, horizontal_flip=True
 ## License
 CC BY-NC-SA 4.0 - Non-commercial use only.

   - mnist
   - cifar10
   - celeba
+base_model: keishihara/flow-matching
 ---
 # UNet Flow Matching Models
 Pre-trained UNet models for Flow Matching on MNIST, CIFAR-10, and CelebA datasets.
+**Training code based on**: [keishihara/flow-matching](https://github.com/keishihara/flow-matching.git)
 ## Models
 This repository contains three UNet-based velocity field models trained with Flow Matching:
 ### MNIST (28×28 Grayscale)
+- **Checkpoint**: `mnist/ckpt.pth` (24 MB)
 - **Parameters**: 6.2M
 - **Architecture**: UNet with num_channels=64, num_res_blocks=2
+- **Conditional**: Yes (10 classes, 0-9 digits)
+- **Training**: 50 epochs, batch_size=128, lr=1e-3
 - **Hardware**: NVIDIA H100 GPU
+### CIFAR-10 (32×32 RGB)
+- **Checkpoint**: `cifar10/ckpt.pth` (35 MB)
 - **Parameters**: 9.0M
 - **Architecture**: UNet with num_channels=64, num_res_blocks=2
 - **Conditional**: Yes (10 classes)
+- **Training**: 50 epochs, batch_size=128, lr=1e-3
 - **Hardware**: NVIDIA H100 GPU
 ### CelebA (64×64 RGB)
+- **Checkpoint**: `celeba64/ckpt.pth` (332 MB)
 - **Parameters**: 83.0M
 - **Architecture**: UNet with num_channels=128, num_res_blocks=2
 - **Conditional**: No (unconditional face generation)
+- **Training**: 50 epochs, batch_size=512, lr=1e-4
+- **Dataset**: 202,599 CelebA training images
 - **Final loss**: 0.114
 - **Hardware**: NVIDIA H100 GPU
 ### MNIST
 ![MNIST Samples](mnist_with_diff.png)
+*Generated MNIST digits at different velocity reuse thresholds*
 ### CIFAR-10
 ![CIFAR-10 Samples](cifar10_with_diff.png)
+*Generated CIFAR-10 images at different velocity reuse thresholds*
 ### CelebA 64×64
 ![CelebA Samples](celeba_with_diff.png)
+*Generated 64×64 faces at different velocity reuse thresholds*
+## Training Code
+The models were trained using the Flow Matching implementation based on [keishihara/flow-matching](https://github.com/keishihara/flow-matching.git).
+### Training Scripts
+**MNIST**:
+```python
+# train_flow_matching_on_images.py
+python train_flow_matching_on_images.py \
+    --do_train \
+    --dataset mnist \
+    --n_epochs 50 \
+    --batch_size 128 \
+    --learning_rate 1e-3
+```
+**CIFAR-10**:
+```python
+python train_flow_matching_on_images.py \
+    --do_train \
+    --dataset cifar10 \
+    --n_epochs 50 \
+    --batch_size 128 \
+    --learning_rate 1e-3 \
+    --horizontal_flip
+```
+**CelebA**:
+```python
+# train_celeba64.py
+python train_celeba64.py \
+    --do_train \
+    --n_epochs 50 \
+    --batch_size 512 \
+    --learning_rate 1e-4 \
+    --horizontal_flip
+```
+Training code files included:
+- `train_flow_matching_on_images.py` - For MNIST and CIFAR-10
+- `train_celeba64.py` - For CelebA 64×64
 ## Usage
+### Load Model
+```python
+import torch
+from huggingface_hub import hf_hub_download
+# Download checkpoint
+ckpt_path = hf_hub_download(
+    repo_id="WayBob/FlowMatching-Unet-Celeb-64x64",
+    filename="celeba64/ckpt.pth"
+)
+# Load checkpoint
+checkpoint = torch.load(ckpt_path, map_location="cuda")
+```
+### Inference (Sampling)
 ```python
 import torch
+from flow_matching.models import UNetModel
+from flow_matching.solver import ODESolver, ModelWrapper
+device = "cuda"
+# Create model (CelebA example)
+flow = UNetModel(
+    dim=(3, 64, 64),
+    num_channels=128,
+    num_res_blocks=2,
+    num_classes=0,
+    class_cond=False,
+).to(device)
+# Load weights
+flow.load_state_dict(checkpoint)
+flow.eval()
+# Create solver
+model_wrapper = ModelWrapper(flow)
+solver = ODESolver(model_wrapper)
+# Sample from Gaussian noise
+batch_size = 4
+x_init = torch.randn(batch_size, 3, 64, 64).to(device)
+time_grid = torch.linspace(0, 1, 21).to(device)  # 20 steps
+with torch.no_grad():
+    samples = solver.sample(
+        x_init=x_init,
+        step_size=0.05,
+        method="euler",
+        time_grid=time_grid
+    )
+# Denormalize from [-1, 1] to [0, 1]
+samples = (samples + 1) / 2
+samples = samples.clamp(0, 1)
+# Save or visualize
+from torchvision.utils import save_image
+save_image(samples, "generated_faces.png", nrow=2)
+```
+### Conditional Generation (MNIST/CIFAR-10)
+```python
+# For class-conditional models
+flow = UNetModel(
+    dim=(3, 32, 32),  # CIFAR-10
+    num_channels=64,
+    num_res_blocks=2,
+    num_classes=10,
+    class_cond=True,
+).to(device)
+# Load CIFAR-10 checkpoint
+ckpt = torch.load("cifar10/ckpt.pth")
+flow.load_state_dict(ckpt)
+# Generate specific class (e.g., class 3)
+y = torch.tensor([3, 3, 3, 3]).to(device)  # Batch of 4, all class 3
+def ode_func(t, x):
+    return flow(x=x, t=t, y=y)
+# Then use solver as before
 ```
+## Architecture Details
+**UNet** based on OpenAI Guided Diffusion:
+- Encoder-Decoder structure with skip connections
+- ResNet blocks with GroupNorm
+- Self-attention at multiple resolutions
+- Time embedding via sinusoidal position encoding
+- Optional class embedding for conditional generation
+## Flow Matching
+Flow Matching learns a velocity field that transports samples from source to target:
+$$\frac{dx}{dt} = v_\theta(x_t, t), \quad x_0 \sim \mathcal{N}(0, I), \quad x_1 \sim p_{data}$$
+Training uses Conditional Flow Matching (CFM) with straight-line paths:
+$$\mathcal{L} = \mathbb{E}_{t, x_0, x_1} \left[ \| v_\theta(x_t, t) - (x_1 - (1-\sigma)x_0) \|^2 \right]$$
+## Requirements
+```bash
+pip install torch torchvision
+pip install torchdiffeq einops
+```
 ## License
 CC BY-NC-SA 4.0 - Non-commercial use only.
+## Acknowledgments
+- Training code based on [keishihara/flow-matching](https://github.com/keishihara/flow-matching.git)
+- UNet architecture from [OpenAI Guided Diffusion](https://github.com/openai/guided-diffusion)
+## Citation
+```bibtex
+@misc{flowmatching-unet-2024,
+  title={UNet Flow Matching Models for Image Generation},
+  author={WayBob},
+  year={2024},
+  howpublished={\url{https://huggingface.co/WayBob/FlowMatching-Unet-Celeb-64x64}}
+}
+```

train_celeba64.py ADDED Viewed

	@@ -0,0 +1,207 @@

+"""
+Training script for CelebA 64x64 Flow Matching model.
+Usage:
+    python train_celeba64.py --do_train --n_epochs 50 --batch_size 128
+    python train_celeba64.py --do_sample
+"""
+from dataclasses import dataclass
+from functools import partial
+from pathlib import Path
+import matplotlib.animation as animation
+import matplotlib.pyplot as plt
+import torch
+import torch.nn.functional as F
+from torch import Tensor
+from torch.amp import GradScaler
+from torch.utils.data import DataLoader
+from torchvision.utils import make_grid, save_image
+from tqdm import tqdm as std_tqdm
+from transformers import HfArgumentParser
+from flow_matching.datasets.image_datasets import (
+    get_image_dataset,
+    get_test_transform,
+    get_train_transform,
+)
+from flow_matching.models import UNetModel
+from flow_matching.sampler import PathSampler
+from flow_matching.solver import ModelWrapper, ODESolver
+from flow_matching.utils import model_size_summary, set_seed
+tqdm = partial(std_tqdm, dynamic_ncols=True)
+@dataclass
+class ScriptArguments:
+    do_train: bool = False
+    do_sample: bool = False
+    dataset: str = "celeba"
+    image_size: int = 64  # Key parameter for CelebA
+    batch_size: int = 128
+    n_epochs: int = 50
+    learning_rate: float = 1e-4
+    sigma_min: float = 0.0
+    seed: int = 42
+    output_dir: str = "outputs"
+    horizontal_flip: bool = True  # Important for faces
+def train(args: ScriptArguments):
+    """Train the flow matching model on CelebA 64x64."""
+    output_dir = Path(args.output_dir) / "cfm" / f"{args.dataset}{args.image_size}"
+    output_dir.mkdir(parents=True, exist_ok=True)
+    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+    set_seed(args.seed)
+    print(f"Using device: {device}")
+    print(f"Training CelebA at {args.image_size}x{args.image_size} resolution")
+    # Load the dataset with resize
+    dataset = get_image_dataset(
+        args.dataset,
+        train=True,
+        transform=get_train_transform(
+            horizontal_flip=args.horizontal_flip,
+            image_size=args.image_size
+        ),
+    )
+    dataloader = DataLoader(dataset, batch_size=args.batch_size, shuffle=True, drop_last=True, num_workers=4)
+    print(f"Loaded {args.dataset} dataset with {len(dataset):,} samples")
+    # CelebA doesn't have classes, so we set num_classes=0 and class_cond=False
+    input_shape = dataset[0][0].size()
+    print(f"{input_shape=}")
+    # Load the UNet model WITHOUT class conditioning for CelebA
+    flow = UNetModel(
+        input_shape,
+        num_channels=128,  # Larger model for 64x64
+        num_res_blocks=2,
+        num_classes=0,  # No class conditioning
+        class_cond=False,
+    ).to(device)
+    path_sampler = PathSampler(sigma_min=args.sigma_min)
+    # Load the optimizer
+    optimizer = torch.optim.AdamW(flow.parameters(), lr=args.learning_rate)
+    scaler = GradScaler(enabled=device.type == "cuda")
+    print("GradScaler enabled:", scaler._enabled)
+    model_size_summary(flow)
+    for epoch in range(args.n_epochs):
+        flow.train()
+        pbar = tqdm(dataloader, desc=f"Epoch {epoch+1:2d}/{args.n_epochs}")
+        for x_1, _ in pbar:  # CelebA returns (img, label) but we ignore label
+            x_1 = x_1.to(device)
+            # Compute the probability path samples
+            x_0 = torch.randn_like(x_1)
+            t = torch.rand(x_1.size(0), device=device, dtype=x_1.dtype)
+            x_t, dx_t = path_sampler.sample(x_0, x_1, t)
+            flow.zero_grad(set_to_none=True)
+            # Compute the conditional flow matching loss WITHOUT class conditioning
+            with torch.autocast(device_type=device.type, dtype=torch.bfloat16):
+                vf_t = flow(t=t, x=x_t)  # No y parameter
+                loss = F.mse_loss(vf_t, dx_t)
+            # Gradient scaling and backprop
+            scaler.scale(loss).backward()
+            torch.nn.utils.clip_grad_norm_(flow.parameters(), max_norm=1.0)
+            scaler.step(optimizer)
+            scaler.update()
+            pbar.set_postfix({"loss": loss.item()})
+    torch.save(flow.state_dict(), output_dir / "ckpt.pth")
+    print(f"Final checkpoint saved to {output_dir / 'ckpt.pth'}")
+def generate_samples_and_save_animation(args: ScriptArguments):
+    """Generate samples following the flow and save the animation."""
+    output_dir = Path(args.output_dir) / "cfm" / f"{args.dataset}{args.image_size}"
+    assert output_dir.is_dir(), f"Output directory {output_dir} does not exist"
+    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+    set_seed(args.seed)
+    print(f"Using device: {device}")
+    # Load the dataset
+    dataset = get_image_dataset(
+        args.dataset,
+        train=False,
+        transform=get_test_transform(image_size=args.image_size),
+    )
+    input_shape = dataset[0][0].size()
+    # Load the flow model
+    flow = UNetModel(
+        input_shape,
+        num_channels=128,
+        num_res_blocks=2,
+        num_classes=0,
+        class_cond=False,
+    ).to(device)
+    state_dict = torch.load(output_dir / "ckpt.pth", weights_only=True)
+    flow.load_state_dict(state_dict)
+    flow.eval()
+    # Use ODE solver to sample trajectories
+    class WrappedModel(ModelWrapper):
+        def forward(self, x: Tensor, t: Tensor, **extras) -> Tensor:
+            return self.model(x=x, t=t)
+    samples_count = 64  # 8x8 grid
+    sample_steps = 101
+    time_steps = torch.linspace(0, 1, sample_steps).to(device)
+    wrapped_model = WrappedModel(flow)
+    step_size = 0.05
+    x_init = torch.randn((samples_count, *input_shape), dtype=torch.float32, device=device)
+    solver = ODESolver(wrapped_model)
+    sol = solver.sample(
+        x_init=x_init,
+        step_size=step_size,
+        method="midpoint",
+        time_grid=time_steps,
+        return_intermediates=True,
+    )
+    sol = sol.detach().cpu()
+    final_samples = sol[-1]
+    save_image(final_samples, output_dir / "final_samples.png", nrow=8, normalize=True)
+    fig, ax = plt.subplots(1, 2, figsize=(10, 5))
+    grid = make_grid(final_samples, nrow=8, normalize=True)
+    ax[0].imshow(grid.permute(1, 2, 0))
+    ax[0].set_title("Final samples (t = 1.0)", fontsize=16)
+    ax[0].axis("off")
+    def update(frame: int):
+        grid = make_grid(sol[frame], nrow=8, normalize=True)
+        ax[1].clear()
+        ax[1].imshow(grid.permute(1, 2, 0))
+        ax[1].set_title(f"t = {time_steps[frame].item():.2f}", fontsize=16)
+        ax[1].axis("off")
+    fig.subplots_adjust(left=0.02, right=0.98, top=0.90, bottom=0.05, wspace=0.1)
+    ani = animation.FuncAnimation(fig, update, frames=sample_steps)
+    ani.save(output_dir / "trajectory.gif", writer="pillow", fps=20)
+    print(f"Generated trajectory saved to {output_dir / 'trajectory.gif'}")
+if __name__ == "__main__":
+    parser = HfArgumentParser(ScriptArguments)
+    script_args, *_ = parser.parse_args_into_dataclasses()
+    if script_args.do_train:
+        train(script_args)
+    if script_args.do_sample:
+        generate_samples_and_save_animation(script_args)

train_flow_matching_on_images.py ADDED Viewed

	@@ -0,0 +1,198 @@

+from dataclasses import dataclass
+from functools import partial
+from pathlib import Path
+import matplotlib.animation as animation
+import matplotlib.pyplot as plt
+import torch
+import torch.nn.functional as F
+from torch import Tensor
+from torch.amp import GradScaler
+from torch.utils.data import DataLoader
+from torchvision.utils import make_grid, save_image
+from tqdm import tqdm as std_tqdm
+from transformers import HfArgumentParser
+from flow_matching.datasets.image_datasets import (
+    get_image_dataset,
+    get_test_transform,
+    get_train_transform,
+)
+from flow_matching.models import UNetModel
+from flow_matching.sampler import PathSampler
+from flow_matching.solver import ModelWrapper, ODESolver
+from flow_matching.utils import model_size_summary, set_seed
+tqdm = partial(std_tqdm, dynamic_ncols=True)
+@dataclass
+class ScriptArguments:
+    do_train: bool = False
+    do_sample: bool = False
+    dataset: str = "mnist"
+    batch_size: int = 128
+    n_epochs: int = 10
+    learning_rate: float = 1e-3
+    sigma_min: float = 0.0
+    seed: int = 42
+    output_dir: str = "outputs"
+    horizontal_flip: bool = False
+def train(args: ScriptArguments):
+    """Train the flow matching model on the given dataset."""
+    output_dir = Path(args.output_dir) / "cfm" / args.dataset
+    output_dir.mkdir(parents=True, exist_ok=True)
+    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+    set_seed(args.seed)
+    print(f"Using device: {device}")
+    # Load the dataset
+    dataset = get_image_dataset(
+        args.dataset,
+        train=True,
+        transform=get_train_transform(horizontal_flip=args.horizontal_flip),
+    )
+    dataloader = DataLoader(dataset, batch_size=args.batch_size, shuffle=True, drop_last=True)
+    print(f"Loaded {args.dataset} dataset with {len(dataset):,} samples")
+    num_classes = len(dataset.classes)
+    input_shape = dataset[0][0].size()
+    print(f"{input_shape=}, {num_classes=}")
+    # Load the UNet model with class conditioning for flow matching
+    flow = UNetModel(
+        input_shape,
+        num_channels=64,
+        num_res_blocks=2,
+        num_classes=num_classes,
+        class_cond=True,
+    ).to(device)
+    path_sampler = PathSampler(sigma_min=args.sigma_min)
+    # Load the optimizer
+    optimizer = torch.optim.AdamW(flow.parameters(), lr=args.learning_rate)
+    scaler = GradScaler(enabled=device.type == "cuda")
+    print("GradScaler enabled:", scaler._enabled)
+    model_size_summary(flow)
+    for epoch in range(args.n_epochs):
+        flow.train()
+        pbar = tqdm(dataloader, desc=f"Epoch {epoch+1:2d}/{args.n_epochs}")
+        for x_1, y in pbar:
+            x_1, y = x_1.to(device), y.to(device)
+            # Compute the probability path samples
+            x_0 = torch.randn_like(x_1)
+            t = torch.rand(x_1.size(0), device=device, dtype=x_1.dtype)
+            x_t, dx_t = path_sampler.sample(x_0, x_1, t)
+            flow.zero_grad(set_to_none=True)
+            # Compute the conditional flow matching loss with class conditioning
+            with torch.autocast(device_type=device.type, dtype=torch.bfloat16):
+                vf_t = flow(t=t, x=x_t, y=y)
+                loss = F.mse_loss(vf_t, dx_t)
+            # Gradient scaling and backprop
+            scaler.scale(loss).backward()
+            torch.nn.utils.clip_grad_norm_(flow.parameters(), max_norm=1.0)  # clip gradients
+            scaler.step(optimizer)
+            scaler.update()
+            pbar.set_postfix({"loss": loss.item()})
+    torch.save(flow.state_dict(), output_dir / "ckpt.pth")
+    print(f"Final checkpoint saved to {output_dir / 'ckpt.pth'}")
+def generate_samples_and_save_animation(args: ScriptArguments):
+    """Generate samples following the flow and save the animation."""
+    output_dir = Path(args.output_dir) / "cfm" / args.dataset
+    assert output_dir.is_dir(), f"Output directory {output_dir} does not exist"
+    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+    set_seed(args.seed)
+    print(f"Using device: {device}")
+    # Load the dataset
+    dataset = get_image_dataset(
+        args.dataset,
+        train=False,
+        transform=get_test_transform(),
+    )
+    input_shape = dataset[0][0].size()
+    num_classes = len(dataset.classes)
+    # Load the flow model
+    flow = UNetModel(
+        input_shape,
+        num_channels=64,
+        num_res_blocks=2,
+        num_classes=num_classes,
+        class_cond=True,
+    ).to(device)
+    state_dict = torch.load(output_dir / "ckpt.pth", weights_only=True)
+    flow.load_state_dict(state_dict)
+    flow.eval()
+    # Use ODE solver to sample trajectories
+    class WrappedModel(ModelWrapper):
+        def forward(self, x: Tensor, t: Tensor, **extras) -> Tensor:
+            return self.model(x=x, t=t, **extras)
+    samples_per_class = 10
+    sample_steps = 101
+    time_steps = torch.linspace(0, 1, sample_steps).to(device)
+    class_list = torch.arange(num_classes, device=device).repeat(samples_per_class)
+    wrapped_model = WrappedModel(flow)
+    step_size = 0.05
+    x_init = torch.randn((class_list.size(0), *input_shape), dtype=torch.float32, device=device)
+    solver = ODESolver(wrapped_model)
+    sol = solver.sample(
+        x_init=x_init,
+        step_size=step_size,
+        method="midpoint",
+        time_grid=time_steps,
+        return_intermediates=True,
+        y=class_list,
+    )
+    sol = sol.detach().cpu()
+    final_samples = sol[-1]
+    save_image(final_samples, output_dir / "final_samples.png", nrow=num_classes, normalize=True)
+    fig, ax = plt.subplots(1, 2, figsize=(8, 4))
+    grid = make_grid(final_samples, nrow=num_classes, normalize=True)
+    ax[0].imshow(grid.permute(1, 2, 0))
+    ax[0].set_title("Final samples (t = 1.0)", fontsize=16)
+    ax[0].axis("off")
+    def update(frame: int):
+        grid = make_grid(sol[frame], nrow=num_classes, normalize=True)
+        ax[1].clear()
+        ax[1].imshow(grid.permute(1, 2, 0))
+        ax[1].set_title(f"t = {time_steps[frame].item():.2f}", fontsize=16)
+        ax[1].axis("off")
+    fig.subplots_adjust(left=0.02, right=0.98, top=0.90, bottom=0.05, wspace=0.1)
+    ani = animation.FuncAnimation(fig, update, frames=sample_steps)
+    ani.save(output_dir / "trajectory.gif", writer="pillow", fps=20)
+    print(f"Generated trajectory saved to {output_dir / 'trajectory.gif'}")
+if __name__ == "__main__":
+    parser = HfArgumentParser(ScriptArguments)
+    script_args, *_ = parser.parse_args_into_dataclasses()
+    if script_args.do_train:
+        train(script_args)
+    if script_args.do_sample:
+        generate_samples_and_save_animation(script_args)