|
|
--- |
|
|
license: apache-2.0 |
|
|
library_name: transformers |
|
|
pipeline_tag: text-generation |
|
|
datasets: |
|
|
- HuggingFaceFW/fineweb-edu |
|
|
base_model: |
|
|
- GSAI-ML/LLaDA-8B-Base |
|
|
tags: |
|
|
- XDLM |
|
|
- LLaDA |
|
|
--- |
|
|
|
|
|
# [LLaDA-XDLM-8B-Base](https://arxiv.org/pdf/2602.01362) |
|
|
|
|
|
This repository contains the checkpoint of 600 training steps for ***continual pretraining LLaDA with XDLM***. |
|
|
|
|
|
***LLaDA-XDLM with sampling budget of 32.*** |
|
|
Evaluation of adapting LLaDA-8B to our XDLM formulation (LLaDA-XDLM): (a) LLaDA-XDLM consistently out-performs baselines across diverse benchmarks with 32 sampling steps; (b) Improvements are particularly pronounced in code generation (MBPP), where the |
|
|
model substantially reduces generation failures. |
|
|
|
|
|
<div align=center> |
|
|
<img src="https://cdn-uploads.huggingface.co/production/uploads/65aa76b1cb5b4fb08ecb087c/oPbIv32EgvA1BbCqd2r6E.png" width="80%"> |
|
|
</div> |
|
|
|
|
|
|
|
|
For details and usage see [Code](https://github.com/MzeroMiko/LLaDA-XDLM) |
|
|
|
|
|
## TODO: |
|
|
- [ ] update `model_card` to support standard huggingface transformers's usage. |
|
|
|
|
|
<!-- ## Updates --> |