What this shows: Each violin is the Bayesian posterior distribution over a policy's true success rate, given the observed rollouts. Wide = high uncertainty; narrow = high certainty. CLD letters above summarise which policies are statistically separable (STEP, Bonferroni-corrected). Shared letter → not significantly different.
Bold letter = CLD group (STEP, Bonferroni α=0.10/55, nmax=50)
Violins represent posterior uncertainty, not confidence intervals. Two overlapping violins can still be statistically distinct. STEP sequential test with Bonferroni correction for 55 pairwise comparisons (α=0.10, per-pair α<0.0018, nmax=50).