Files changed (1) hide show
  1. README.md +23 -7
README.md CHANGED
@@ -1,16 +1,32 @@
1
  ---
2
- license: mit
3
  library_name: transformers
4
  pipeline_tag: text-generation
 
 
 
 
 
 
 
5
  ---
6
 
7
- # LLaDA-8B-Base
8
 
9
- We introduce LLaDA, a diffusion model with an unprecedented 8B scale, trained entirely from scratch, rivaling LLaMA3 8B in performance.
10
 
11
- [Project Page](https://ml-gsai.github.io/LLaDA-demo/)
 
 
12
 
13
- [Code](https://github.com/ML-GSAI/LLaDA)
 
 
14
 
15
- ## Updates
16
- [2025-10-21] We have modified modeling_llada.py to support the input of attention_mask.
 
 
 
 
 
 
1
  ---
2
+ license: apache-2.0
3
  library_name: transformers
4
  pipeline_tag: text-generation
5
+ datasets:
6
+ - HuggingFaceFW/fineweb-edu
7
+ base_model:
8
+ - GSAI-ML/LLaDA-8B-Base
9
+ tags:
10
+ - XDLM
11
+ - LLaDA
12
  ---
13
 
14
+ # LLaDA-XDLM-8B-Base
15
 
16
+ This repository contains the checkpoint of 600 training steps for ***continual pretraining LLaDA with XDLM***.
17
 
18
+ ***LLaDA-XDLM with sampling budget of 32.***
19
+ Evaluation of adapting LLaDA-8B to our XDLM formulation (LLaDA-XDLM): (a) LLaDA-XDLM consistently out-performs baselines across diverse benchmarks with 32 sampling steps; (b) Improvements are particularly pronounced in code generation (MBPP), where the
20
+ model substantially reduces generation failures.
21
 
22
+ <div align=center>
23
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/65aa76b1cb5b4fb08ecb087c/oPbIv32EgvA1BbCqd2r6E.png" width="80%">
24
+ </div>
25
 
26
+
27
+ For details and usage see [Code](https://github.com/MzeroMiko/LLaDA-XDLM)
28
+
29
+ ## TODO:
30
+ - [ ] update `model_card` to support standard huggingface transformers's usage.
31
+
32
+ <!-- ## Updates -->