Mzero17
/

LLaDA-XDLM

Text Generation

feature-extraction

Model card Files Files and versions

Mzero17 commited on 3 days ago

Commit

ae481b9

·

verified ·

1 Parent(s): 2940523

Update README.md

Files changed (1) hide show

README.md +32 -3

README.md CHANGED Viewed

@@ -1,3 +1,32 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+library_name: transformers
+pipeline_tag: text-generation
+datasets:
+- HuggingFaceFW/fineweb-edu
+base_model:
+- GSAI-ML/LLaDA-8B-Base
+tags:
+- XDLM
+- LLaDA
+---
+# LLaDA-XDLM-8B-Base
+This repository contains the checkpoint of 600 training steps for ***continual pretraining LLaDA with XDLM***.
+***LLaDA-XDLM with sampling budget of 32.***
+Evaluation of adapting LLaDA-8B to our XDLM formulation (LLaDA-XDLM): (a) LLaDA-XDLM consistently out-performs baselines across diverse benchmarks with 32 sampling steps; (b) Improvements are particularly pronounced in code generation (MBPP), where the
+model substantially reduces generation failures.
+<div align=center>
+<img src="https://cdn-uploads.huggingface.co/production/uploads/65aa76b1cb5b4fb08ecb087c/oPbIv32EgvA1BbCqd2r6E.png" width="80%">
+</div>
+For details and usage see [Code](https://github.com/MzeroMiko/LLaDA-XDLM)
+## TODO:
+  - [ ] update `model_card` to support standard huggingface transformers's usage.
+<!-- ## Updates -->