This collection provides detailed information on how we derived the reported benchmark metrics for the Llama 3.3 models, including the configurations