Instructions to use inclusionAI/LLaDA2.0-flash with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use inclusionAI/LLaDA2.0-flash with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("inclusionAI/LLaDA2.0-flash", dtype="auto") - Notebooks
- Google Colab
- Kaggle
Attention computation
#4
by Serpient - opened
When i am trying out, i notice there are three specified ways in the model, flex attention, sdpa and eager. It seems eager is supposed to be the default option? But as i try to generate using the demo, i find that the default is sdpa, and when i set config._attn_implementation to eager, the generation output becomes gibberish.
Thanks for the feedback. We'll look into it.