Some weights of XLNetForQuestionAnsweringSimple were not initialized from the model checkpoint at xlnet-base-cased and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. ***Training with "xlnet-base-cased" *** Map: 0%| | 0/10570 [00:00