transformers>=4.35 torch>=2.1,<2.4 soundfile librosa gradio>=4.0 datasets optimum[openvino]