Open-Source Pangu openPangu-7B-Diffusion-DeepDiver

中文 | English

1. Introduction

openPangu-7B-Diffusion-DeepDiver is a 7B-parameter language model based on block diffusion large language models (Diffusion LLM), specifically trained and fine-tuned for multi-agent scenarios (including tool invocation, information retrieval, and multi-step decision-making). Its underlying architecture and inference pipeline follow the design of openPangu-R-7B-Diffusion (including block-wise denoising and bidirectional attention within blocks), thus maintaining consistent structures and interfaces for single-pass generation and parallel decoding.

For complete evaluations and training details, please refer to the technical report “DLLM Agent: See Farther, Run Faster” (arXiv:2602.07451v2).

openPangu-7B-Diffusion-DeepDiver: Agent model with a context length of 32k.

Key Features

Uses the same DLLM architecture and iterative inference pipeline as openPangu-R-7B-Diffusion.
Trained and fine-tuned with data and objectives tailored for in-depth agent research, making it more robust in multi-round tool invocation and planning tasks.
Introduces context-clean corruption and span-aware attention alignment to reduce noise propagation in multi-round agent dialogues and improve the reliability of tool invocation formats.

Inference

openPangu-7B-Diffusion-DeepDiver adopts context-causal block diffusion decoding, performing diffusion decoding block by block. Within each block, full attention is applied, while causal attention is used for preceding context. Once all tokens in a block are decoded, the entire block is stored in the historical KV cache with causal masking, and decoding proceeds to the first token of the next block.

Supports variable-length inference and KV caching.
Flexible context length, not limited by block size.
Supports both autoregressive and block diffusion decoding.
Uses confidence-threshold sampling, achieving up to 2.5× throughput improvement over standard autoregressive decoding.
Similar to Fast dLLMv2, small blocks can be configured within each block to balance throughput and performance, with optimal results typically at small block sizes of 4 or 8.

Inference in Agent Workflows

Integrated into the DeepDiver v2 Agent workflow, the model applies DLLM’s iterative denoising inference strategy for each round of tool generation.

Deepdiver v2 is a planner-centered MAS (Multi-Agent System) architecture that coordinates multiple executors.

For detailed information, refer to its technical report.

Training

Training Corpus

The model is trained on 11k specially collected or synthesized agent trajectory datasets (including planner → seeker multi-agent interactions, real tool calls, and tool-return traces). These data aim to help the model learn to generate semantically consistent and format-compliant tool invocation instructions in multi-round interactions. See the “Agent-oriented Fine-tuning” section in the technical report for details.

Supervision Method

Cross-entropy losses for both diffusion and autoregressive models are jointly optimized during training, ensuring stability and preserving reliable left-to-right generation.

Masking and Attention Alignment

To address information contamination caused by diffusion when multi-round contexts and tool outputs are mixed, training applies context-clean corruption to mask irrelevant context segments and span-aware attention alignment within generation ranges. Experiments on agent datasets show that both techniques improve final information retrieval scores.

2. Model Architecture

	openPangu-7B-Diffusion-DeepDiver
Architecture	Dense
Parameters (Non-Embedding)	7B
Number of Layers	34
Hidden Dimension	12800
Attention Mechanism	GQA
Number of Attention Heads	32 for Q, 8 for KV
Vocabulary Size	153k
Context Length	32k
Continued Training Tokens	700B

3. Evaluation Results

Table 1. Comparison results on a 110-question subset of BrowseComp-zh.

Method	Accuracy (%)	Tool Calls	Agent Rounds	Tool Failure Rate
AR Agent (autoregressive backbone)	15.5	7.5	14.8	1.9%
DLLM Agent (diffusion backbone)	15.5	6.7	13.0	6.4%

Although the final accuracy is comparable to AR on this subset, DLLM requires fewer tool calls and sub-agent rounds, and achieves about 30% average end-to-end latency reduction. However, DLLM shows a higher tool failure rate, indicating that it is still less stable than AR models.

4. Deployment and Usage

4.1 Environment Setup

Hardware Requirements

Atlas 800T A2 (64GB). For drivers and firmware, see: [Atlas 800T A2].

Software Environment

OS: Linux (openEuler ≥ 24.03 recommended)
CANN == 8.1.RC1. See [CANN Install]
python == 3.10
torch == 2.6.0
torch-npu == 2.6.0
transformers == 4.53.2

The above configurations have been verified. Higher versions may be supported. Please submit an issue if you have questions.

4.2 Inference Examples

Below is a simple example of using openPangu-7B-Diffusion-DeepDiver with the transformers framework and the Deepdiver v2 Agent framework.

Loading and Running

Before running, modify generate.py to specify the model path.

cd inference
python generate.py

For optimal throughput, set sampling parameters to alg="confidence_threshold", threshold=0.9, num_small_blocks=1, and choose an appropriate batch size based on hardware.

Service Deployment

Download the lightweight service script, place it in the model directory, and start the service:

python launch_server.py --load /path/to/model --port 9999

Deepdiver v2

Download the Deepdiver v2 package (no model weights required) and install it following official documentation.

Copy env.template to config/.env, set MODEL_REQUEST_URL to the model service URL, and modify MODEL_NAME to match the deployed model name (default: local-diffusion-llm).

Start the MCP service:

python src/tools/mcp_server_standard.py

Send a query to Deepdiver v2:

python cli/demo.py -q "今天北京的天气怎么样？"

For more usage details, refer to the official repository.

Currently, openPangu-7B-Diffusion-DeepDiver has only been trained and tested within the Deepdiver v2 framework. It has not been adapted for other agent frameworks or tasks, and performance on other setups is not guaranteed.

5. License

When using the model or its outputs, please cite the technical report: “DLLM Agent: See Farther, Run Faster” (arXiv:2602.07451v2).

Unless otherwise specified, openPangu-7B-Diffusion-DeepDiver is licensed under the OPENPANGU MODEL LICENSE AGREEMENT VERSION 1.0, which aims to promote the development of AI technologies. See the LICENSE file in the repository root for details.

6. Disclaimer

Due to inherent technical limitations and the nature of AI-generated content, Huawei makes no guarantees regarding the following:

The generated outputs may contain defects, inaccuracies, or inappropriate content, and do not represent Huawei’s views.
The model is not guaranteed to be 100% accurate, reliable, complete, timely, secure, error-free, uninterrupted, or stable.
The outputs do not constitute advice or decisions and do not guarantee authenticity, completeness, accuracy, legality, or usefulness. They cannot replace professional advice in medical, legal, or other domains. Users must make independent judgments, and Huawei assumes no responsibility.

7. Feedback

For suggestions or feedback, please submit an issue or contact: openPangu@huawei.com.

8. Citation

@article{zhen2026dllm,
  title={DLLM Agent: See Farther, Run Faster},
  author={Zhen, Huiling and Lin, Weizhe and Liu, Renxi and Han, Kai and Li, Yiming and Tian, Yuchuan and Chen, Hanting and Li, Xiaoguang and Li, Xiaosong and Chen, Chen and others},
  journal={arXiv preprint arXiv:2602.07451},
  year={2026}
}

Downloads last month: 21

Safetensors

Model size

8B params

Tensor type

BF16

Paper for DLLM-Agent/openPangu-7B-Diffusion-DeepDiver

DLLM Agent: See Farther, Run Faster

Paper • 2602.07451 • Published Feb 7 • 1