WeDLM
Collection
4 items
โข
Updated
โข
8
WeDLM-7B is a diffusion language model that performs parallel decoding under standard causal attention, initialized from Qwen2.5-7B.
This is the base (pretrained) version. For the instruction-tuned version, see WeDLM-7B-Instruct.
๐ Paper (Coming Soon) | ๐ Project Page | ๐ป GitHub
| Attribute | Value |
|---|---|
| Initialized From | Qwen2.5-7B |
| Parameters | 7B |
| Context Length | 32,768 |
For fast inference, use the wedlm engine:
pip install git+https://github.com/tencent/WeDLM.git
from wedlm import LLM, SamplingParams
llm = LLM(model="tencent/WeDLM-7B")
prompt = "The theory of relativity states that"
outputs = llm.generate([prompt], SamplingParams(temperature=0.2, max_tokens=256))
print(outputs[0]["text"])
For training or simple forward passes, you can load via Transformers:
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("tencent/WeDLM-7B", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
"tencent/WeDLM-7B",
trust_remote_code=True,
torch_dtype="auto",
device_map="auto"
)
inputs = tokenizer("The theory of relativity", return_tensors="pt").to(model.device)
outputs = model(**inputs)
โ ๏ธ Note: The HuggingFace interface is for training/forward pass convenience. For optimized inference throughput, use the
wedlmengine above.
| Benchmark | Qwen2.5-7B | WeDLM-7B |
|---|---|---|
| ARC-C (0-shot) | 89.93 | 90.70 |
| GSM8K (3-shot) | 79.23 | 84.76 |
| MATH (4-shot) | 43.40 | 48.20 |
| HumanEval (4-shot) | 59.14 | 68.90 |
| MMLU (5-shot) | 71.62 | 71.93 |
@article{liu2025wedlm,
title={WeDLM: Reconciling Diffusion Language Models with Standard Causal Attention for Fast Inference},
author={Liu, Aiwei and He, Minghua and Zeng, Shaoxun and Zhang, Linhao and Wu, Chuhan and Jia, Wei and Liu, Yuan and Yu, Yang and Zhou, Xiao and Zhou, Jie},
year={2025}
}
Apache 2.0
Base model
Qwen/Qwen2.5-7B