Instructions to use inclusionAI/LLaDA2.0-flash with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use inclusionAI/LLaDA2.0-flash with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("inclusionAI/LLaDA2.0-flash", dtype="auto") - Notebooks
- Google Colab
- Kaggle
KV cache not used by default
#7
by jhy-123 - opened
Hi, I noticed that the inference speed is quite slow when following the example in the model card. It seems past_key_values is not being utilized in the generate function—could you please clarify if there is a specific reason why KV cache isn't enabled by default?