| | --- |
| | license: apache-2.0 |
| | library_name: transformers |
| | pipeline_tag: text-generation |
| | datasets: |
| | - HuggingFaceFW/fineweb-edu |
| | base_model: |
| | - GSAI-ML/LLaDA-8B-Base |
| | tags: |
| | - XDLM |
| | - LLaDA |
| | --- |
| | |
| | # LLaDA-XDLM-8B-Base |
| |
|
| | This repository contains the checkpoint of 600 training steps for ***continual pretraining LLaDA with XDLM***. |
| |
|
| | ***LLaDA-XDLM with sampling budget of 32.*** |
| | Evaluation of adapting LLaDA-8B to our XDLM formulation (LLaDA-XDLM): (a) LLaDA-XDLM consistently out-performs baselines across diverse benchmarks with 32 sampling steps; (b) Improvements are particularly pronounced in code generation (MBPP), where the |
| | model substantially reduces generation failures. |
| |
|
| | <div align=center> |
| | <img src="https://cdn-uploads.huggingface.co/production/uploads/65aa76b1cb5b4fb08ecb087c/oPbIv32EgvA1BbCqd2r6E.png" width="80%"> |
| | </div> |
| |
|
| |
|
| | For details and usage see [Code](https://github.com/MzeroMiko/LLaDA-XDLM) |
| |
|
| | ## TODO: |
| | - [ ] update `model_card` to support standard huggingface transformers's usage. |
| |
|
| | <!-- ## Updates --> |