What this shows: Each violin is the Bayesian posterior distribution over a policy's true success rate, given the observed rollouts. Wide = high uncertainty; narrow = high certainty. CLD letters above summarise which policies are statistically separable (STEP, Bonferroni-corrected). Shared letter → not significantly different.
Posterior success rate distributions: Beta(1+k, 1+n-k)
Initial (1.x)
Fine-tuned (2.x)
Bold letter = CLD group (STEP, Bonferroni α=0.10/55, nmax=50)

Violins represent posterior uncertainty, not confidence intervals. Two overlapping violins can still be statistically distinct. STEP sequential test with Bonferroni correction for 55 pairwise comparisons (α=0.10, per-pair α<0.0018, nmax=50).