Instructions to use inclusionAI/LLaDA2.0-flash with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use inclusionAI/LLaDA2.0-flash with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("inclusionAI/LLaDA2.0-flash", dtype="auto") - Notebooks
- Google Colab
- Kaggle
eos early stop
#6
by Serpient - opened
As i am checking your eos early stop mechanism, i notice a weird logic, im not sure if this is your intention. If transfer_index.any(), cur_x is updated, then if all the conditions of eos early stop meet, the final_x is returned, however, final_x here is x[:, :total_length][:, : eos_pos + 1]. Given that x is only updated once each block is completely unmasked, when you assign final_x, all decoded tokens in current block are not updated to x yet.
oh, i see, thx for clarification