Update benchmark scores to reflect VLMEvalKit evaluation setting (formatting, generation length) 44a8f27 verified ankke commited on about 1 month ago