WeDLM-7B-Instruct / README.md
exlaw's picture
Upload folder using huggingface_hub
dc2980f verified
metadata
license: apache-2.0
language:
  - en
  - zh
base_model: tencent/WeDLM-7B
pipeline_tag: text-generation
tags:
  - language model
  - parallel-decoding
  - chat
  - instruct

WeDLM-7B-Instruct

WeDLM-7B-Instruct is an instruction-tuned diffusion language model that performs parallel decoding under standard causal attention, fine-tuned from WeDLM-7B.

For the base (pretrained) version, see WeDLM-7B.

๐Ÿ“„ Paper (Coming Soon) | ๐ŸŒ Project Page | ๐Ÿ’ป GitHub

Model Details

Attribute Value
Base Model WeDLM-7B
Parameters 7B
Context Length 32,768

Quick Start (Recommended)

For fast inference, use the wedlm engine:

pip install git+https://github.com/tencent/WeDLM.git
from transformers import AutoTokenizer
from wedlm import LLM, SamplingParams

llm = LLM(model="tencent/WeDLM-7B-Instruct")
tokenizer = AutoTokenizer.from_pretrained("tencent/WeDLM-7B-Instruct", trust_remote_code=True)

prompt = "Explain the difference between machine learning and deep learning."
messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

outputs = llm.generate([text], SamplingParams(temperature=0.2, max_tokens=512))
print(outputs[0]["text"])

Multi-turn Conversation

messages = [
    {"role": "user", "content": "What is Python?"},
    {"role": "assistant", "content": "Python is a high-level programming language known for its simplicity and readability."},
    {"role": "user", "content": "Show me a hello world example."}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = llm.generate([text], SamplingParams(temperature=0.2, max_tokens=256))

HuggingFace Transformers

For training or simple forward passes:

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("tencent/WeDLM-7B-Instruct", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    "tencent/WeDLM-7B-Instruct", 
    trust_remote_code=True,
    torch_dtype="auto",
    device_map="auto"
)

messages = [{"role": "user", "content": "Hello!"}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model(**inputs)

โš ๏ธ Note: The HuggingFace interface is for training/forward pass convenience. For optimized inference throughput, use the wedlm engine above.

Performance

Benchmark Qwen2.5-7B-Instruct WeDLM-7B-Instruct
ARC-C (0-shot) 86.09 89.59
GSM8K (3-shot) 89.91 87.57
MATH (4-shot) 45.00 55.40
HumanEval (4-shot) 76.22 75.00
MMLU (5-shot) 71.98 70.52

Citation (Coming soon)

License

Apache 2.0