LLaDA-XDLM / README.md

Mzero17

Update README.md

6a5aa7c verified 1 day ago

preview code

raw

history blame contribute delete

1.04 kB

metadata

license: apache-2.0
library_name: transformers
pipeline_tag: text-generation
datasets:
  - HuggingFaceFW/fineweb-edu
base_model:
  - GSAI-ML/LLaDA-8B-Base
tags:
  - XDLM
  - LLaDA

LLaDA-XDLM-8B-Base

This repository contains the checkpoint of 600 training steps for continual pretraining LLaDA with XDLM.

LLaDA-XDLM with sampling budget of 32. Evaluation of adapting LLaDA-8B to our XDLM formulation (LLaDA-XDLM): (a) LLaDA-XDLM consistently out-performs baselines across diverse benchmarks with 32 sampling steps; (b) Improvements are particularly pronounced in code generation (MBPP), where the model substantially reduces generation failures.

For details and usage see Code

TODO:

update model_card to support standard huggingface transformers's usage.