Spaces:

blanchon
/

motion_latent_diffusion_standalone_demo

Running on Zero

File size: 31,543 Bytes

f875353

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Motion Latent Analysis\n",
    "\n",
    "This notebook demonstrates how to work with motion latent representations from the MLD model:\n",
    "\n",
    "1. **Generate variations** - Create 10 similar \"jump\" motions\n",
    "2. **Compute mean latent** - Average the latent representations\n",
    "3. **Distance computation** - Compare motions using L2 distance\n",
    "4. **Classification** - Distinguish jump from non-jump motions\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Setup and Imports\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/workspace/ai-toolkit/motion-latent-diffusion/standalone_demo/.venv/lib/python3.13/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
      "  from .autonotebook import tqdm as notebook_tqdm\n"
     ]
    }
   ],
   "source": [
    "import numpy as np\n",
    "import torch\n",
    "from pathlib import Path\n",
    "from standalone_demo import StandaloneConfig, load_model\n",
    "\n",
    "# Configuration\n",
    "OUTPUT_DIR = Path(\"outputs/jump\")\n",
    "NUM_VARIATIONS = 20\n",
    "MOTION_LENGTH = 120  # frames (6 seconds at 20fps)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Load Model\n",
    "\n",
    "Load the MLD model for motion generation. This will auto-download models if needed.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Loading MLD model...\n",
      "Model initialized on cuda\n",
      "Loading checkpoint from resources/checkpoints/model.ckpt\n",
      "Checkpoint loaded successfully\n",
      "✓ Model loaded successfully\n"
     ]
    }
   ],
   "source": [
    "print(\"Loading MLD model...\")\n",
    "config = StandaloneConfig()\n",
    "config.resolve_paths(Path(\".\"))\n",
    "model = load_model(config)\n",
    "print(\"✓ Model loaded successfully\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 1: Generate jump Variations\n",
    "\n",
    "Generate 10 variations of \"jump\" motions using slightly different prompts.\n",
    "Each generation saves:\n",
    "- `.npy` - 3D joint positions\n",
    "- `.latent.pt` - Latent representation\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Generating 20 jump variations...\n",
      "\n",
      "[1/20] a person does a jump\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/workspace/ai-toolkit/motion-latent-diffusion/standalone_demo/src/standalone_demo/models/utils.py:23: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.detach().clone() or sourceTensor.detach().clone().requires_grad_(True), rather than torch.tensor(sourceTensor).\n",
      "  lengths = torch.tensor(lengths, device=device)\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "  ✓ Saved jump_var_00\n",
      "    Joints: (120, 22, 3), Latent: torch.Size([1, 1, 256])\n",
      "[2/20] someone performs a jump\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/workspace/ai-toolkit/motion-latent-diffusion/standalone_demo/src/standalone_demo/models/utils.py:23: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.detach().clone() or sourceTensor.detach().clone().requires_grad_(True), rather than torch.tensor(sourceTensor).\n",
      "  lengths = torch.tensor(lengths, device=device)\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "  ✓ Saved jump_var_01\n",
      "    Joints: (120, 22, 3), Latent: torch.Size([1, 1, 256])\n",
      "[3/20] a person jumps in the air\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/workspace/ai-toolkit/motion-latent-diffusion/standalone_demo/src/standalone_demo/models/utils.py:23: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.detach().clone() or sourceTensor.detach().clone().requires_grad_(True), rather than torch.tensor(sourceTensor).\n",
      "  lengths = torch.tensor(lengths, device=device)\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "  ✓ Saved jump_var_02\n",
      "    Joints: (120, 22, 3), Latent: torch.Size([1, 1, 256])\n",
      "[4/20] doing a jump\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/workspace/ai-toolkit/motion-latent-diffusion/standalone_demo/src/standalone_demo/models/utils.py:23: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.detach().clone() or sourceTensor.detach().clone().requires_grad_(True), rather than torch.tensor(sourceTensor).\n",
      "  lengths = torch.tensor(lengths, device=device)\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "  ✓ Saved jump_var_03\n",
      "    Joints: (120, 22, 3), Latent: torch.Size([1, 1, 256])\n",
      "[5/20] performing a jump\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/workspace/ai-toolkit/motion-latent-diffusion/standalone_demo/src/standalone_demo/models/utils.py:23: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.detach().clone() or sourceTensor.detach().clone().requires_grad_(True), rather than torch.tensor(sourceTensor).\n",
      "  lengths = torch.tensor(lengths, device=device)\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "  ✓ Saved jump_var_04\n",
      "    Joints: (120, 22, 3), Latent: torch.Size([1, 1, 256])\n",
      "[6/20] a person does a jump\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/workspace/ai-toolkit/motion-latent-diffusion/standalone_demo/src/standalone_demo/models/utils.py:23: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.detach().clone() or sourceTensor.detach().clone().requires_grad_(True), rather than torch.tensor(sourceTensor).\n",
      "  lengths = torch.tensor(lengths, device=device)\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "  ✓ Saved jump_var_05\n",
      "    Joints: (120, 22, 3), Latent: torch.Size([1, 1, 256])\n",
      "[7/20] someone jumps backward\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/workspace/ai-toolkit/motion-latent-diffusion/standalone_demo/src/standalone_demo/models/utils.py:23: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.detach().clone() or sourceTensor.detach().clone().requires_grad_(True), rather than torch.tensor(sourceTensor).\n",
      "  lengths = torch.tensor(lengths, device=device)\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "  ✓ Saved jump_var_06\n",
      "    Joints: (120, 22, 3), Latent: torch.Size([1, 1, 256])\n",
      "[8/20] a person executes a jump\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/workspace/ai-toolkit/motion-latent-diffusion/standalone_demo/src/standalone_demo/models/utils.py:23: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.detach().clone() or sourceTensor.detach().clone().requires_grad_(True), rather than torch.tensor(sourceTensor).\n",
      "  lengths = torch.tensor(lengths, device=device)\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "  ✓ Saved jump_var_07\n",
      "    Joints: (120, 22, 3), Latent: torch.Size([1, 1, 256])\n",
      "[9/20] doing an acrobatic jump\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/workspace/ai-toolkit/motion-latent-diffusion/standalone_demo/src/standalone_demo/models/utils.py:23: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.detach().clone() or sourceTensor.detach().clone().requires_grad_(True), rather than torch.tensor(sourceTensor).\n",
      "  lengths = torch.tensor(lengths, device=device)\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "  ✓ Saved jump_var_08\n",
      "    Joints: (120, 22, 3), Latent: torch.Size([1, 1, 256])\n",
      "[10/20] a person jumps forward\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/workspace/ai-toolkit/motion-latent-diffusion/standalone_demo/src/standalone_demo/models/utils.py:23: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.detach().clone() or sourceTensor.detach().clone().requires_grad_(True), rather than torch.tensor(sourceTensor).\n",
      "  lengths = torch.tensor(lengths, device=device)\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "  ✓ Saved jump_var_09\n",
      "    Joints: (120, 22, 3), Latent: torch.Size([1, 1, 256])\n",
      "[11/20] a person does a jump\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/workspace/ai-toolkit/motion-latent-diffusion/standalone_demo/src/standalone_demo/models/utils.py:23: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.detach().clone() or sourceTensor.detach().clone().requires_grad_(True), rather than torch.tensor(sourceTensor).\n",
      "  lengths = torch.tensor(lengths, device=device)\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "  ✓ Saved jump_var_10\n",
      "    Joints: (120, 22, 3), Latent: torch.Size([1, 1, 256])\n",
      "[12/20] someone performs a jump\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/workspace/ai-toolkit/motion-latent-diffusion/standalone_demo/src/standalone_demo/models/utils.py:23: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.detach().clone() or sourceTensor.detach().clone().requires_grad_(True), rather than torch.tensor(sourceTensor).\n",
      "  lengths = torch.tensor(lengths, device=device)\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "  ✓ Saved jump_var_11\n",
      "    Joints: (120, 22, 3), Latent: torch.Size([1, 1, 256])\n",
      "[13/20] a person jumps in the air\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/workspace/ai-toolkit/motion-latent-diffusion/standalone_demo/src/standalone_demo/models/utils.py:23: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.detach().clone() or sourceTensor.detach().clone().requires_grad_(True), rather than torch.tensor(sourceTensor).\n",
      "  lengths = torch.tensor(lengths, device=device)\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "  ✓ Saved jump_var_12\n",
      "    Joints: (120, 22, 3), Latent: torch.Size([1, 1, 256])\n",
      "[14/20] doing a jump\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/workspace/ai-toolkit/motion-latent-diffusion/standalone_demo/src/standalone_demo/models/utils.py:23: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.detach().clone() or sourceTensor.detach().clone().requires_grad_(True), rather than torch.tensor(sourceTensor).\n",
      "  lengths = torch.tensor(lengths, device=device)\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "  ✓ Saved jump_var_13\n",
      "    Joints: (120, 22, 3), Latent: torch.Size([1, 1, 256])\n",
      "[15/20] performing a jump\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/workspace/ai-toolkit/motion-latent-diffusion/standalone_demo/src/standalone_demo/models/utils.py:23: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.detach().clone() or sourceTensor.detach().clone().requires_grad_(True), rather than torch.tensor(sourceTensor).\n",
      "  lengths = torch.tensor(lengths, device=device)\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "  ✓ Saved jump_var_14\n",
      "    Joints: (120, 22, 3), Latent: torch.Size([1, 1, 256])\n",
      "[16/20] a person does a jump\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/workspace/ai-toolkit/motion-latent-diffusion/standalone_demo/src/standalone_demo/models/utils.py:23: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.detach().clone() or sourceTensor.detach().clone().requires_grad_(True), rather than torch.tensor(sourceTensor).\n",
      "  lengths = torch.tensor(lengths, device=device)\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "  ✓ Saved jump_var_15\n",
      "    Joints: (120, 22, 3), Latent: torch.Size([1, 1, 256])\n",
      "[17/20] someone jumps backward\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/workspace/ai-toolkit/motion-latent-diffusion/standalone_demo/src/standalone_demo/models/utils.py:23: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.detach().clone() or sourceTensor.detach().clone().requires_grad_(True), rather than torch.tensor(sourceTensor).\n",
      "  lengths = torch.tensor(lengths, device=device)\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "  ✓ Saved jump_var_16\n",
      "    Joints: (120, 22, 3), Latent: torch.Size([1, 1, 256])\n",
      "[18/20] a person executes a jump\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/workspace/ai-toolkit/motion-latent-diffusion/standalone_demo/src/standalone_demo/models/utils.py:23: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.detach().clone() or sourceTensor.detach().clone().requires_grad_(True), rather than torch.tensor(sourceTensor).\n",
      "  lengths = torch.tensor(lengths, device=device)\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "  ✓ Saved jump_var_17\n",
      "    Joints: (120, 22, 3), Latent: torch.Size([1, 1, 256])\n",
      "[19/20] doing an acrobatic jump\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/workspace/ai-toolkit/motion-latent-diffusion/standalone_demo/src/standalone_demo/models/utils.py:23: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.detach().clone() or sourceTensor.detach().clone().requires_grad_(True), rather than torch.tensor(sourceTensor).\n",
      "  lengths = torch.tensor(lengths, device=device)\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "  ✓ Saved jump_var_18\n",
      "    Joints: (120, 22, 3), Latent: torch.Size([1, 1, 256])\n",
      "[20/20] a person jumps forward\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/workspace/ai-toolkit/motion-latent-diffusion/standalone_demo/src/standalone_demo/models/utils.py:23: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.detach().clone() or sourceTensor.detach().clone().requires_grad_(True), rather than torch.tensor(sourceTensor).\n",
      "  lengths = torch.tensor(lengths, device=device)\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "  ✓ Saved jump_var_19\n",
      "    Joints: (120, 22, 3), Latent: torch.Size([1, 1, 256])\n",
      "\n",
      "✓ Generated 20 jump variations\n"
     ]
    }
   ],
   "source": [
    "import shutil\n",
    "\n",
    "# Create output directory\n",
    "OUTPUT_DIR.mkdir(parents=True, exist_ok=True)\n",
    "\n",
    "# Define prompt variations\n",
    "jump_prompts = [\n",
    "    \"a person does a jump\",\n",
    "    \"someone performs a jump\",\n",
    "    \"a person jumps in the air\",\n",
    "    \"doing a jump\",\n",
    "    \"performing a jump\",\n",
    "    \"a person does a jump\",\n",
    "    \"someone jumps backward\",\n",
    "    \"a person executes a jump\",\n",
    "    \"doing an acrobatic jump\",\n",
    "    \"a person jumps forward\",\n",
    "    \"a person does a jump\",\n",
    "    \"someone performs a jump\",\n",
    "    \"a person jumps in the air\",\n",
    "    \"doing a jump\",\n",
    "    \"performing a jump\",\n",
    "    \"a person does a jump\",\n",
    "    \"someone jumps backward\",\n",
    "    \"a person executes a jump\",\n",
    "    \"doing an acrobatic jump\",\n",
    "    \"a person jumps forward\",\n",
    "    \"a person does a jump\",\n",
    "    \"someone performs a jump\",\n",
    "    \"a person jumps in the air\",\n",
    "    \"doing a jump\",\n",
    "    \"performing a jump\",\n",
    "    \"a person does a jump\",\n",
    "    \"someone jumps backward\",\n",
    "    \"a person executes a jump\",\n",
    "    \"doing an acrobatic jump\",\n",
    "    \"a person jumps forward\",\n",
    "]\n",
    "\n",
    "print(f\"Generating {NUM_VARIATIONS} jump variations...\\n\")\n",
    "\n",
    "latent_paths = []\n",
    "\n",
    "for i, prompt in enumerate(jump_prompts[:NUM_VARIATIONS]):\n",
    "    print(f\"[{i + 1}/{NUM_VARIATIONS}] {prompt}\")\n",
    "\n",
    "    # Generate motion with latent\n",
    "    (joints, latent, video_path) = model.generate(\n",
    "        prompt, MOTION_LENGTH, return_latent=True, create_video=True\n",
    "    )\n",
    "\n",
    "    # Save files\n",
    "    base_name = f\"jump_var_{i:02d}\"\n",
    "    npy_path = OUTPUT_DIR / f\"{base_name}.npy\"\n",
    "    latent_path = OUTPUT_DIR / f\"{base_name}.latent.pt\"\n",
    "\n",
    "    np.save(npy_path, joints)\n",
    "    torch.save(latent, latent_path)\n",
    "    latent_paths.append(latent_path)\n",
    "\n",
    "    # Save video\n",
    "    video_path_target = OUTPUT_DIR / f\"{base_name}.mp4\"\n",
    "    shutil.copy(video_path, video_path_target)\n",
    "\n",
    "    print(f\"  ✓ Saved {base_name}\")\n",
    "    print(f\"    Joints: {joints.shape}, Latent: {latent.shape}\")\n",
    "\n",
    "print(f\"\\n✓ Generated {len(latent_paths)} jump variations\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 2: Compute Mean Latent\n",
    "\n",
    "Average all flip latents to create a \"prototype\" flip representation.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Computing mean latent from 20 samples...\n",
      "✓ Mean latent shape: torch.Size([1, 1, 256])\n",
      "✓ Saved to: outputs/jump/jump_mean.latent.pt\n"
     ]
    }
   ],
   "source": [
    "print(f\"Computing mean latent from {len(latent_paths)} samples...\")\n",
    "\n",
    "# Load all latents\n",
    "latents = [torch.load(path) for path in latent_paths]\n",
    "\n",
    "# Stack and compute mean\n",
    "latents_stacked = torch.stack(latents)\n",
    "mean_latent = latents_stacked.mean(dim=0)\n",
    "\n",
    "# Save mean latent\n",
    "mean_latent_path = OUTPUT_DIR / \"jump_mean.latent.pt\"\n",
    "torch.save(mean_latent, mean_latent_path)\n",
    "\n",
    "print(f\"✓ Mean latent shape: {mean_latent.shape}\")\n",
    "print(f\"✓ Saved to: {mean_latent_path}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 3: Define Distance Function\n",
    "\n",
    "L2 distance measures similarity between latent representations.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "✓ Distance function defined\n"
     ]
    }
   ],
   "source": [
    "def compute_latent_distance(latent1, latent2):\n",
    "    \"\"\"\n",
    "    Compute L2 (Euclidean) distance between two latent representations.\n",
    "\n",
    "    Args:\n",
    "        latent1: First latent tensor or path\n",
    "        latent2: Second latent tensor or path\n",
    "\n",
    "    Returns:\n",
    "        L2 distance (float)\n",
    "    \"\"\"\n",
    "    # Load if paths provided\n",
    "    if isinstance(latent1, (str, Path)):\n",
    "        latent1 = torch.load(latent1)\n",
    "    if isinstance(latent2, (str, Path)):\n",
    "        latent2 = torch.load(latent2)\n",
    "\n",
    "    # Compute L2 norm of difference\n",
    "    distance = torch.norm(latent1 - latent2, p=2).item()\n",
    "\n",
    "    return distance\n",
    "\n",
    "\n",
    "print(\"✓ Distance function defined\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 4: Generate Test Motions\n",
    "\n",
    "Generate:\n",
    "- A flip motion (should be close to mean)\n",
    "- A walk motion (should be far from mean)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Generating test motions...\n",
      "\n",
      "1. Generating jump-like motion...\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/workspace/ai-toolkit/motion-latent-diffusion/standalone_demo/src/standalone_demo/models/utils.py:23: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.detach().clone() or sourceTensor.detach().clone().requires_grad_(True), rather than torch.tensor(sourceTensor).\n",
      "  lengths = torch.tensor(lengths, device=device)\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "   ✓ Saved test jump motion\n",
      "\n",
      "2. Generating non-jump motion (walking)...\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/workspace/ai-toolkit/motion-latent-diffusion/standalone_demo/src/standalone_demo/models/utils.py:23: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.detach().clone() or sourceTensor.detach().clone().requires_grad_(True), rather than torch.tensor(sourceTensor).\n",
      "  lengths = torch.tensor(lengths, device=device)\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "   ✓ Saved test walk motion\n"
     ]
    }
   ],
   "source": [
    "print(\"Generating test motions...\\n\")\n",
    "\n",
    "# Test 1: jump-like motion\n",
    "print(\"1. Generating jump-like motion...\")\n",
    "joints_jump, latent_jump, video_path_jump = model.generate(\n",
    "    \"a person does a jump\", MOTION_LENGTH, return_latent=True, create_video=True\n",
    ")\n",
    "jump_latent_path = OUTPUT_DIR / \"test_jump.latent.pt\"\n",
    "torch.save(latent_jump, jump_latent_path)\n",
    "np.save(OUTPUT_DIR / \"test_jump.npy\", joints_jump)\n",
    "\n",
    "video_path_target = OUTPUT_DIR / \"test_jump.mp4\"\n",
    "shutil.copy(video_path_jump, video_path_target)\n",
    "\n",
    "print(f\"   ✓ Saved test jump motion\")\n",
    "\n",
    "# Test 2: Non-jump motion (walking)\n",
    "print(\"\\n2. Generating non-jump motion (walking)...\")\n",
    "joints_walk, latent_walk, video_path_walk = model.generate(\n",
    "    \"a person walks forward\", MOTION_LENGTH, return_latent=True, create_video=True\n",
    ")\n",
    "walk_latent_path = OUTPUT_DIR / \"test_walk.latent.pt\"\n",
    "torch.save(latent_walk, walk_latent_path)\n",
    "np.save(OUTPUT_DIR / \"test_walk.npy\", joints_walk)\n",
    "\n",
    "video_path_target = OUTPUT_DIR / \"test_walk.mp4\"\n",
    "shutil.copy(video_path_walk, video_path_target)\n",
    "\n",
    "print(f\"   ✓ Saved test walk motion\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 5: Compare Distances\n",
    "\n",
    "Measure how close each test motion is to the mean jump latent.\n",
    "\n",
    "**Hypothesis**: jump motion should have smaller distance than walk motion.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Computing distances to mean jump latent...\n",
      "\n",
      "============================================================\n",
      "📊 RESULTS\n",
      "============================================================\n",
      "Distance (jump → mean jump):  12.6496\n",
      "Distance (walk → mean jump):  42.3448\n",
      "\n",
      "Ratio (walk/jump):            3.35x\n",
      "============================================================\n",
      "\n",
      "✅ SUCCESS: jump is closer to mean jump latent!\n",
      "   The model can distinguish jump from non-jump motions.\n"
     ]
    }
   ],
   "source": [
    "print(\"Computing distances to mean jump latent...\\n\")\n",
    "\n",
    "# Distance: Test jump → Mean jump\n",
    "dist_jump_to_mean = compute_latent_distance(latent_jump, mean_latent)\n",
    "\n",
    "# Distance: Test walk → Mean jump\n",
    "dist_walk_to_mean = compute_latent_distance(latent_walk, mean_latent)\n",
    "\n",
    "# Display results\n",
    "print(\"=\" * 60)\n",
    "print(\"📊 RESULTS\")\n",
    "print(\"=\" * 60)\n",
    "print(f\"Distance (jump → mean jump):  {dist_jump_to_mean:.4f}\")\n",
    "print(f\"Distance (walk → mean jump):  {dist_walk_to_mean:.4f}\")\n",
    "print(f\"\\nRatio (walk/jump):            {dist_walk_to_mean / dist_jump_to_mean:.2f}x\")\n",
    "print(\"=\" * 60)\n",
    "\n",
    "if dist_jump_to_mean < dist_walk_to_mean:\n",
    "    print(\"\\n✅ SUCCESS: jump is closer to mean jump latent!\")\n",
    "    print(f\"   The model can distinguish jump from non-jump motions.\")\n",
    "else:\n",
    "    print(\"\\n⚠️  UNEXPECTED: Walk is closer to mean jump latent.\")\n",
    "    print(f\"   This suggests the latent space may not capture this distinction.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Bonus: Analyze Individual Variation Distances\n",
    "\n",
    "See how much each jump variation differs from the mean.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Analyzing variation distances...\n",
      "\n",
      "  Variation 00: 17.7083\n",
      "  Variation 01: 23.6372\n",
      "  Variation 02: 23.7708\n",
      "  Variation 03: 27.0579\n",
      "  Variation 04: 17.2911\n",
      "  Variation 05: 18.6115\n",
      "  Variation 06: 43.8279\n",
      "  Variation 07: 29.0473\n",
      "  Variation 08: 23.5446\n",
      "  Variation 09: 20.4132\n",
      "  Variation 10: 14.3313\n",
      "  Variation 11: 19.8556\n",
      "  Variation 12: 31.8104\n",
      "  Variation 13: 20.7619\n",
      "  Variation 14: 22.4498\n",
      "  Variation 15: 34.5026\n",
      "  Variation 16: 26.5776\n",
      "  Variation 17: 38.9580\n",
      "  Variation 18: 28.6006\n",
      "  Variation 19: 24.1094\n",
      "\n",
      "Variation statistics:\n",
      "  Mean distance: 25.3433\n",
      "  Std deviation: 7.2979\n",
      "\n",
      "Comparison:\n",
      "  Test jump: 12.6496 (0.50x mean variation)\n",
      "  Test walk: 42.3448 (1.67x mean variation)\n"
     ]
    }
   ],
   "source": [
    "print(\"Analyzing variation distances...\\n\")\n",
    "\n",
    "variation_distances = []\n",
    "for i, latent_path in enumerate(latent_paths):\n",
    "    dist = compute_latent_distance(latent_path, mean_latent)\n",
    "    variation_distances.append(dist)\n",
    "    print(f\"  Variation {i:02d}: {dist:.4f}\")\n",
    "\n",
    "avg_variation = np.mean(variation_distances)\n",
    "std_variation = np.std(variation_distances)\n",
    "\n",
    "print(f\"\\nVariation statistics:\")\n",
    "print(f\"  Mean distance: {avg_variation:.4f}\")\n",
    "print(f\"  Std deviation: {std_variation:.4f}\")\n",
    "print(f\"\\nComparison:\")\n",
    "print(\n",
    "    f\"  Test jump: {dist_jump_to_mean:.4f} ({dist_jump_to_mean / avg_variation:.2f}x mean variation)\"\n",
    ")\n",
    "print(\n",
    "    f\"  Test walk: {dist_walk_to_mean:.4f} ({dist_walk_to_mean / avg_variation:.2f}x mean variation)\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Summary\n",
    "\n",
    "### 📁 Files Created\n",
    "\n",
    "In `outputs/jump/`:\n",
    "- `jump_var_00` to `jump_var_09` (.npy + .latent.pt) - 10 jump variations\n",
    "- `jump_mean.latent.pt` - Mean latent of all variations ⭐\n",
    "- `test_jump` (.npy + .latent.pt) - Test jump motion\n",
    "- `test_walk` (.npy + .latent.pt) - Test walk motion\n",
    "\n",
    "**Total**: 24 files (10 variations + 2 tests + 1 mean + videos)\n",
    "\n",
    "### 🔬 Key Findings\n",
    "\n",
    "1. **Latent space clustering**: Similar motions (jumps) have similar latent representations\n",
    "2. **Distance metric**: L2 distance effectively distinguishes motion types\n",
    "3. **Mean latent**: Averaging latents creates a useful prototype representation\n",
    "\n",
    "### 🎯 Applications\n",
    "\n",
    "- **Motion classification**: Identify motion types (jump, walk, jump, etc.)\n",
    "- **Motion retrieval**: Find similar motions in a database\n",
    "- **Quality control**: Detect outlier/corrupted generations\n",
    "- **Interpolation**: Blend between different motions\n",
    "- **Style transfer**: Map motions to similar but different styles\n",
    "- **Few-shot learning**: Create classifiers from few examples\n",
    "\n",
    "### 💡 Next Steps\n",
    "\n",
    "Try this analysis with other motion types:\n",
    "- Jumps, spins, kicks, dances\n",
    "- Compare multiple motion classes\n",
    "- Build a motion classifier\n",
    "- Create a motion search engine\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": ".venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.13.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}