| Evaluating on cuda |
| Report will be saved to: /mnt/project_rlinf/jzn/workspace/latest/RLinf/reward_model/ckpt/libero_10_256_zsq/task_embed/seed42_dim64/multi_task_evaluation_report.txt |
|
|
| ================================================================================================================================================= |
| Instruction | Acc | TP (Success) | FN (Miss) | TN (Fail) | FP (Ghost) |
| ------------------------------------------------------------------------------------------------------------------------------------------------- |
| put both the alphabet soup and the cream cheese... | 100.00% | 0 ( 0.0%) | 0 ( 0.0%) | 4617 (100.0%) | 0 ( 0.0%) |
| put the yellow and white mug in the microwave a... | 99.98% | 478 (100.0%) | 0 ( 0.0%) | 4138 (100.0%) | 1 ( 0.0%) |
| --- Trajectory Success Frames (All samples) --- |
| [step_50_seed_4_traj_28.npy] Actual Success: 264, Pred Success: 264 |
| [step_25_seed_6_traj_25.npy] Actual Success: 281, Pred Success: 281 |
| ------------------------------------------------------------------------------------------------------------------------------------------------- |
| put both the alphabet soup and the tomato sauce... | 100.00% | 0 ( 0.0%) | 0 ( 0.0%) | 2052 (100.0%) | 0 ( 0.0%) |
| put the white mug on the plate and put the choc... | 95.27% | 310 ( 89.3%) | 37 ( 10.7%) | 5066 ( 95.7%) | 230 ( 4.3%) |
| --- Trajectory Success Frames (All samples) --- |
| [step_50_seed_3_traj_4.npy] Actual Success: N/A, Pred Success: 496 |
| [step_50_seed_3_traj_36.npy] Actual Success: N/A, Pred Success: 255 |
| [step_0_seed_1_traj_61.npy] Actual Success: 272, Pred Success: 270 |
| [step_25_seed_7_traj_48.npy] Actual Success: 369, Pred Success: 304 |
| [step_50_seed_7_traj_19.npy] Actual Success: 277, Pred Success: 269 |
| ------------------------------------------------------------------------------------------------------------------------------------------------- |
| put the white mug on the left plate and put the... | 100.00% | 0 ( 0.0%) | 0 ( 0.0%) | 6669 (100.0%) | 0 ( 0.0%) |
| pick up the book and place it in the back compa... | 85.54% | 312 ( 46.6%) | 357 ( 53.4%) | 1882 ( 99.3%) | 14 ( 0.7%) |
| --- Trajectory Success Frames (All samples) --- |
| [step_50_seed_5_traj_54.npy] Actual Success: 191, Pred Success: 182 |
| [step_25_seed_5_traj_54.npy] Actual Success: 177, Pred Success: 176 |
| [step_0_seed_7_traj_30.npy] Actual Success: 198, Pred Success: 201 |
| [step_25_seed_3_traj_33.npy] Actual Success: 190, Pred Success: 210 |
| ------------------------------------------------------------------------------------------------------------------------------------------------- |
| turn on the stove and put the moka pot on it | 98.16% | 805 ( 96.5%) | 29 ( 3.5%) | 2720 ( 98.7%) | 37 ( 1.3%) |
| --- Trajectory Success Frames (All samples) --- |
| [step_50_seed_0_traj_51.npy] Actual Success: 274, Pred Success: 276 |
| [step_25_seed_5_traj_14.npy] Actual Success: 192, Pred Success: 191 |
| [step_25_seed_6_traj_31.npy] Actual Success: 189, Pred Success: 185 |
| ------------------------------------------------------------------------------------------------------------------------------------------------- |
| put both the cream cheese box and the butter in... | 96.32% | 284 (100.0%) | 0 ( 0.0%) | 4163 ( 96.1%) | 170 ( 3.9%) |
| --- Trajectory Success Frames (All samples) --- |
| [step_25_seed_3_traj_51.npy] Actual Success: N/A, Pred Success: 313 |
| [step_25_seed_2_traj_6.npy] Actual Success: 229, Pred Success: 185 |
| [step_50_seed_5_traj_19.npy] Actual Success: N/A, Pred Success: 490 |
| ------------------------------------------------------------------------------------------------------------------------------------------------- |
| put both moka pots on the stove | 100.00% | 0 ( 0.0%) | 0 ( 0.0%) | 2052 (100.0%) | 0 ( 0.0%) |
| put the black bowl in the bottom drawer of the ... | 99.95% | 536 (100.0%) | 0 ( 0.0%) | 1515 ( 99.9%) | 1 ( 0.1%) |
| --- Trajectory Success Frames (All samples) --- |
| [step_25_seed_1_traj_43.npy] Actual Success: 251, Pred Success: 251 |
| [step_50_seed_1_traj_51.npy] Actual Success: 239, Pred Success: 238 |
| ------------------------------------------------------------------------------------------------------------------------------------------------- |
| ================================================================================================================================================= |
| OVERALL TOTAL | 97.72% | 2725 ( 86.6%) | 423 ( 13.4%) | 34874 ( 98.7%) | 453 ( 1.3%) |
| ================================================================================================================================================= |
|
|