tencent
/

WeDLM-8B-Instruct

Text Generation

parallel-decoding

Model card Files Files and versions

exlaw commited on 22 days ago

Commit

711665c

·

verified ·

1 Parent(s): ba45e22

Upload folder using huggingface_hub

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -19,7 +19,7 @@ tags:
 - 📈 Outperforms base Qwen3-8B-Instruct on most benchmarks
 - ✅ Native KV cache compatible (FlashAttention, PagedAttention, CUDA Graphs)
-For the base (pretrained) version, see [WeDLM-8B](https://huggingface.co/tencent/WeDLM-8B).
 📄 Paper (Coming Soon) | 🌐 [Project Page](https://wedlm.github.io) | 💻 [GitHub](https://github.com/tencent/WeDLM)
@@ -27,7 +27,7 @@ For the base (pretrained) version, see [WeDLM-8B](https://huggingface.co/tencent
 | Attribute | Value |
 |:----------|:------|
-| Base Model | [WeDLM-8B](https://huggingface.co/tencent/WeDLM-8B) |
 | Parameters | 8B |
 | Context Length | 32,768 |

 - 📈 Outperforms base Qwen3-8B-Instruct on most benchmarks
 - ✅ Native KV cache compatible (FlashAttention, PagedAttention, CUDA Graphs)
+For the base (pretrained) version, see [WeDLM-8B](https://huggingface.co/tencent/WeDLM-8B-Base).
 📄 Paper (Coming Soon) | 🌐 [Project Page](https://wedlm.github.io) | 💻 [GitHub](https://github.com/tencent/WeDLM)
 | Attribute | Value |
 |:----------|:------|
+| Base Model | [WeDLM-8B](https://huggingface.co/tencent/WeDLM-8B-Base) |
 | Parameters | 8B |
 | Context Length | 32,768 |