File size: 4,797 Bytes
4375dec
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
Evaluating on cuda
Report will be saved to: /mnt/project_rlinf/jzn/workspace/latest/RLinf/reward_model/ckpt/libero_10_256_zsq/task_embed/seed42_dim64/multi_task_evaluation_report.txt

=================================================================================================================================================
Instruction                                        | Acc    | TP (Success)    | FN (Miss)       | TN (Fail)       | FP (Ghost)     
-------------------------------------------------------------------------------------------------------------------------------------------------
put both the alphabet soup and the cream cheese... | 100.00% |    0 (  0.0%) |    0 (  0.0%) | 4617 (100.0%) |    0 (  0.0%)
put the yellow and white mug in the microwave a... | 99.98% |  478 (100.0%) |    0 (  0.0%) | 4138 (100.0%) |    1 (  0.0%)
--- Trajectory Success Frames (All samples) ---
    [step_50_seed_4_traj_28.npy] Actual Success: 264, Pred Success: 264
    [step_25_seed_6_traj_25.npy] Actual Success: 281, Pred Success: 281
-------------------------------------------------------------------------------------------------------------------------------------------------
put both the alphabet soup and the tomato sauce... | 100.00% |    0 (  0.0%) |    0 (  0.0%) | 2052 (100.0%) |    0 (  0.0%)
put the white mug on the plate and put the choc... | 95.27% |  310 ( 89.3%) |   37 ( 10.7%) | 5066 ( 95.7%) |  230 (  4.3%)
--- Trajectory Success Frames (All samples) ---
    [step_50_seed_3_traj_4.npy] Actual Success: N/A, Pred Success: 496
    [step_50_seed_3_traj_36.npy] Actual Success: N/A, Pred Success: 255
    [step_0_seed_1_traj_61.npy] Actual Success: 272, Pred Success: 270
    [step_25_seed_7_traj_48.npy] Actual Success: 369, Pred Success: 304
    [step_50_seed_7_traj_19.npy] Actual Success: 277, Pred Success: 269
-------------------------------------------------------------------------------------------------------------------------------------------------
put the white mug on the left plate and put the... | 100.00% |    0 (  0.0%) |    0 (  0.0%) | 6669 (100.0%) |    0 (  0.0%)
pick up the book and place it in the back compa... | 85.54% |  312 ( 46.6%) |  357 ( 53.4%) | 1882 ( 99.3%) |   14 (  0.7%)
--- Trajectory Success Frames (All samples) ---
    [step_50_seed_5_traj_54.npy] Actual Success: 191, Pred Success: 182
    [step_25_seed_5_traj_54.npy] Actual Success: 177, Pred Success: 176
    [step_0_seed_7_traj_30.npy] Actual Success: 198, Pred Success: 201
    [step_25_seed_3_traj_33.npy] Actual Success: 190, Pred Success: 210
-------------------------------------------------------------------------------------------------------------------------------------------------
turn on the stove and put the moka pot on it       | 98.16% |  805 ( 96.5%) |   29 (  3.5%) | 2720 ( 98.7%) |   37 (  1.3%)
--- Trajectory Success Frames (All samples) ---
    [step_50_seed_0_traj_51.npy] Actual Success: 274, Pred Success: 276
    [step_25_seed_5_traj_14.npy] Actual Success: 192, Pred Success: 191
    [step_25_seed_6_traj_31.npy] Actual Success: 189, Pred Success: 185
-------------------------------------------------------------------------------------------------------------------------------------------------
put both the cream cheese box and the butter in... | 96.32% |  284 (100.0%) |    0 (  0.0%) | 4163 ( 96.1%) |  170 (  3.9%)
--- Trajectory Success Frames (All samples) ---
    [step_25_seed_3_traj_51.npy] Actual Success: N/A, Pred Success: 313
    [step_25_seed_2_traj_6.npy] Actual Success: 229, Pred Success: 185
    [step_50_seed_5_traj_19.npy] Actual Success: N/A, Pred Success: 490
-------------------------------------------------------------------------------------------------------------------------------------------------
put both moka pots on the stove                    | 100.00% |    0 (  0.0%) |    0 (  0.0%) | 2052 (100.0%) |    0 (  0.0%)
put the black bowl in the bottom drawer of the ... | 99.95% |  536 (100.0%) |    0 (  0.0%) | 1515 ( 99.9%) |    1 (  0.1%)
--- Trajectory Success Frames (All samples) ---
    [step_25_seed_1_traj_43.npy] Actual Success: 251, Pred Success: 251
    [step_50_seed_1_traj_51.npy] Actual Success: 239, Pred Success: 238
-------------------------------------------------------------------------------------------------------------------------------------------------
=================================================================================================================================================
OVERALL TOTAL                                      | 97.72% | 2725 ( 86.6%) |  423 ( 13.4%) | 34874 ( 98.7%) |  453 (  1.3%)
=================================================================================================================================================