katefgroup
/

3d_diffuser_actor

English

Model card Files Files and versions

xet

Community

twke commited on Feb 12, 2024

Commit

f7c5dfc

verified ·

1 Parent(s): 528fb15

Update README.md

Browse files

Files changed (1) hide show

README.md +47 -7

README.md CHANGED Viewed

@@ -19,9 +19,9 @@ The models released are the following:
 | Benchmark | Embedding dimension | Diffusion timestep |
 |------|------|------|
-| [RLBench (PerAct)]() | 120 | 100 |
-| [RLBench (GNFactor)]() | 120| 100 |
-| [CALVIN]() | 192 | 25 |
 ### Model Description
@@ -46,13 +46,53 @@ The models released are the following:
 <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
-TODO
-### Direct Use
-<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
-TODO
 ## Evaluation

 | Benchmark | Embedding dimension | Diffusion timestep |
 |------|------|------|
+| [RLBench (PerAct)](https://huggingface.co/katefgroup/3d_diffuser_actor/blob/main/diffuser_actor_peract.pth) | 120 | 100 |
+| [RLBench (GNFactor)](https://huggingface.co/katefgroup/3d_diffuser_actor/blob/main/diffuser_actor_gnfactor.pth) | 120| 100 |
+| [CALVIN](https://huggingface.co/katefgroup/3d_diffuser_actor/blob/main/diffuser_actor_calvin.pth) | 192 | 25 |
 ### Model Description
 <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Input format
+3D Diffuser Actor takes the following inputs:
+1. `RGB observations`: a tensor of shape (batch_size, num_cameras, 3, H, W).  The pixel values are in the range of [0, 1]
+2. `Point cloud observation`: a tensor of shape (batch_size, num_cameras, 3, H, W).
+3. `Instruction encodings`: a tensor of shape (batch_size, max_instruction_length, C).  In this code base, the embedding dimension `C` is set to 512.
+4. `curr_gripper`: a tensor of shape (batch_size, history_length, 7), where the last channel denotes xyz-action (3D) and quarternion (4D).
+5. `trajectory_mask`: a tensor of shape (batch_size, trajectory_length), which is only used to indicate the length of each trajectory.  To predict keyposes, we just need to set its shape to (batch_size, 1).
+6. `gt_trajectory`: a tensor of shape (batch_size, trajectory_length, 7), where the last channel denotes xyz-action (3D) and quarternion (4D).  The input is only used during training.
+### Output format
+The model returns the diffusion loss, when `run_inference=False`, otherwise, it returns pose trajectory of shape (batch_size, trajectory_length, 8) when `run_inference=True`.
+### Usage
+For training, forward 3D Diffuser Actor with `run_inference=False`
+```
+> loss = model.forward(gt_trajectory,
+                       trajectory_mask,
+                       rgb_obs,
+                       pcd_obs,
+                       instruction,
+                       curr_gripper,
+                       run_inference=False)
+```
+For evaluation, forward 3D Diffuser Actor with `run_inference=True`
+```
+> fake_gt_trajectory =  torch.full((1, trajectory_length, 7), 0).to(device)
+> trajectory_mask = torch.full((1, trajectory_length), False).to(device)
+> trajectory = model.forward(fake_gt_trajectory,
+                             trajectory_mask,
+                             rgb_obs,
+                             pcd_obs,
+                             instruction,
+                             curr_gripper,
+                             run_inference=True)
+```
+Or you can forward the model with `compute_trajectory` function
+```
+> trajectory_mask = torch.full((1, trajectory_length), False).to(device)
+> trajectory = model.compute_trajectory(trajectory_mask,
+                                        rgb_obs,
+                                        pcd_obs,
+                                        instruction,
+                                        curr_gripper)
+```
 ## Evaluation