GSAI-ML
/

LLaDA-8B-Base

Text Generation

Model card Files Files and versions

LLaDA-8B-Base / README.md

Mzero17's picture

Update README.md

eb15c65 verified about 1 month ago

|

1.01 kB

	---
	license: apache-2.0
	library_name: transformers
	pipeline_tag: text-generation
	datasets:
	- HuggingFaceFW/fineweb-edu
	base_model:
	- GSAI-ML/LLaDA-8B-Base
	tags:
	- XDLM
	- LLaDA
	---

	# LLaDA-XDLM-8B-Base

	This repository contains the checkpoint of 600 training steps for *continual pretraining LLaDA with XDLM*.

	*LLaDA-XDLM with sampling budget of 32.*
	Evaluation of adapting LLaDA-8B to our XDLM formulation (LLaDA-XDLM): (a) LLaDA-XDLM consistently out-performs baselines across diverse benchmarks with 32 sampling steps; (b) Improvements are particularly pronounced in code generation (MBPP), where the
	model substantially reduces generation failures.

	<div align=center>
	<img src="https://cdn-uploads.huggingface.co/production/uploads/65aa76b1cb5b4fb08ecb087c/oPbIv32EgvA1BbCqd2r6E.png" width="80%">
	</div>


	For details and usage see [Code](https://github.com/MzeroMiko/LLaDA-XDLM)

	## TODO:
	- [ ] update `model_card` to support standard huggingface transformers's usage.

	<!-- ## Updates -->