--- license: apache-2.0 language: - en - zh base_model: tencent/WeDLM-7B pipeline_tag: text-generation tags: - language model - parallel-decoding - chat - instruct --- # WeDLM-7B-Instruct **WeDLM-7B-Instruct** is an instruction-tuned diffusion language model that performs parallel decoding under standard causal attention, fine-tuned from [WeDLM-7B](https://huggingface.co/tencent/WeDLM-7B). For the base (pretrained) version, see [WeDLM-7B](https://huggingface.co/tencent/WeDLM-7B). 📄 Paper (Coming Soon) | 🌐 [Project Page](https://wedlm.github.io) | 💻 [GitHub](https://github.com/tencent/WeDLM) ## Model Details | Attribute | Value | |:----------|:------| | Base Model | [WeDLM-7B](https://huggingface.co/tencent/WeDLM-7B) | | Parameters | 7B | | Context Length | 32,768 | ## Quick Start (Recommended) For **fast inference**, use the `wedlm` engine: ```bash pip install git+https://github.com/tencent/WeDLM.git ``` ```python from transformers import AutoTokenizer from wedlm import LLM, SamplingParams llm = LLM(model="tencent/WeDLM-7B-Instruct") tokenizer = AutoTokenizer.from_pretrained("tencent/WeDLM-7B-Instruct", trust_remote_code=True) prompt = "Explain the difference between machine learning and deep learning." messages = [{"role": "user", "content": prompt}] text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) outputs = llm.generate([text], SamplingParams(temperature=0.2, max_tokens=512)) print(outputs[0]["text"]) ``` ### Multi-turn Conversation ```python messages = [ {"role": "user", "content": "What is Python?"}, {"role": "assistant", "content": "Python is a high-level programming language known for its simplicity and readability."}, {"role": "user", "content": "Show me a hello world example."} ] text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) outputs = llm.generate([text], SamplingParams(temperature=0.2, max_tokens=256)) ``` ## HuggingFace Transformers For **training** or simple forward passes: ```python from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("tencent/WeDLM-7B-Instruct", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained( "tencent/WeDLM-7B-Instruct", trust_remote_code=True, torch_dtype="auto", device_map="auto" ) messages = [{"role": "user", "content": "Hello!"}] text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = tokenizer(text, return_tensors="pt").to(model.device) outputs = model(**inputs) ``` > ⚠️ **Note:** The HuggingFace interface is for training/forward pass convenience. For optimized inference throughput, use the `wedlm` engine above. ## Performance | Benchmark | Qwen2.5-7B-Instruct | WeDLM-7B-Instruct | |:----------|:-------------------:|:-----------------:| | ARC-C (0-shot) | 86.09 | 89.59 | | GSM8K (3-shot) | 89.91 | 87.57 | | MATH (4-shot) | 45.00 | 55.40 | | HumanEval (4-shot) | 76.22 | 75.00 | | MMLU (5-shot) | 71.98 | 70.52 | ## Citation (Coming soon) ## License Apache 2.0