oncody commited on
Commit
8f2bfa9
·
verified ·
1 Parent(s): f5979c5

Set Nepalaya-R model card branding

Browse files
Files changed (1) hide show
  1. README.md +114 -123
README.md CHANGED
@@ -1,154 +1,145 @@
1
  ---
2
  license: mit
3
  library_name: transformers
4
- base_model:
5
- - deepseek-ai/DeepSeek-V3.2-Exp-Base
6
- base_model_relation: finetune
7
  ---
8
- # DeepSeek-V3.2-Exp
9
-
10
- <!-- markdownlint-disable first-line-h1 -->
11
- <!-- markdownlint-disable html -->
12
- <!-- markdownlint-disable no-duplicate-header -->
13
-
14
- <div align="center">
15
- <img src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/logo.svg?raw=true" width="60%" alt="DeepSeek-V3" />
16
- </div>
17
- <hr>
18
- <div align="center" style="line-height: 1;">
19
- <a href="https://www.deepseek.com/" target="_blank" style="margin: 2px;">
20
- <img alt="Homepage" src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/badge.svg?raw=true" style="display: inline-block; vertical-align: middle;"/>
21
- </a>
22
- <a href="https://chat.deepseek.com/" target="_blank" style="margin: 2px;">
23
- <img alt="Chat" src="https://img.shields.io/badge/🤖%20Chat-DeepSeek%20V3-536af5?color=536af5&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
24
- </a>
25
- <a href="https://huggingface.co/deepseek-ai" target="_blank" style="margin: 2px;">
26
- <img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-DeepSeek%20AI-ffc107?color=ffc107&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
27
- </a>
28
- </div>
29
- <div align="center" style="line-height: 1;">
30
- <a href="https://discord.gg/Tc7c45Zzu5" target="_blank" style="margin: 2px;">
31
- <img alt="Discord" src="https://img.shields.io/badge/Discord-DeepSeek%20AI-7289da?logo=discord&logoColor=white&color=7289da" style="display: inline-block; vertical-align: middle;"/>
32
- </a>
33
- <a href="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/qr.jpeg?raw=true" target="_blank" style="margin: 2px;">
34
- <img alt="Wechat" src="https://img.shields.io/badge/WeChat-DeepSeek%20AI-brightgreen?logo=wechat&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
35
- </a>
36
- <a href="https://twitter.com/deepseek_ai" target="_blank" style="margin: 2px;">
37
- <img alt="Twitter Follow" src="https://img.shields.io/badge/Twitter-deepseek_ai-white?logo=x&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
38
- </a>
39
- </div>
40
- <div align="center" style="line-height: 1;">
41
- <a href="LICENSE" style="margin: 2px;">
42
- <img alt="License" src="https://img.shields.io/badge/License-MIT-f5de53?&color=f5de53" style="display: inline-block; vertical-align: middle;"/>
43
- </a>
44
- </div>
45
-
46
- ## Introduction
47
-
48
-
49
- We are excited to announce the official release of DeepSeek-V3.2-Exp, an experimental version of our model. As an intermediate step toward our next-generation architecture, V3.2-Exp builds upon V3.1-Terminus by introducing DeepSeek Sparse Attention—a sparse attention mechanism designed to explore and validate optimizations for training and inference efficiency in long-context scenarios.
50
-
51
- This experimental release represents our ongoing research into more efficient transformer architectures, particularly focusing on improving computational efficiency when processing extended text sequences.
52
-
53
- <div align="center">
54
- <img src="assets/cost.png" >
55
- </div>
56
-
57
- - DeepSeek Sparse Attention (DSA) achieves fine-grained sparse attention for the first time, delivering substantial improvements in long-context training and inference efficiency while maintaining virtually identical model output quality.
58
-
59
-
60
- - To rigorously evaluate the impact of introducing sparse attention, we deliberately aligned the training configurations of DeepSeek-V3.2-Exp with V3.1-Terminus. Across public benchmarks in various domains, DeepSeek-V3.2-Exp demonstrates performance on par with V3.1-Terminus.
61
-
62
-
63
- | Benchmark | DeepSeek-V3.1-Terminus | DeepSeek-V3.2-Exp |
64
- | :--- | :---: | :---: |
65
- | **Reasoning Mode w/o Tool Use** | | |
66
- | MMLU-Pro | 85.0 | 85.0 |
67
- | GPQA-Diamond | 80.7 | 79.9 |
68
- | Humanity's Last Exam | 21.7 | 19.8 |
69
- | LiveCodeBench | 74.9 | 74.1 |
70
- | AIME 2025 | 88.4 | 89.3 |
71
- | HMMT 2025 | 86.1 | 83.6 |
72
- | Codeforces | 2046 | 2121 |
73
- | Aider-Polyglot | 76.1 | 74.5 |
74
- | **Agentic Tool Use** | | |
75
- | BrowseComp | 38.5 | 40.1 |
76
- | BrowseComp-zh | 45.0 | 47.9 |
77
- | SimpleQA | 96.8 | 97.1 |
78
- | SWE Verified | 68.4 | 67.8 |
79
- | SWE-bench Multilingual | 57.8 | 57.9 |
80
- | Terminal-bench | 36.7 | 37.7 |
81
-
82
- ## Update
83
-
84
- - 2025.11.17: **We have identified that previous versions of the inference demo code contained an implementation discrepancy in Rotary Position Embedding (RoPE) within the indexer module, potentially leading to degraded model performance.** Specifically, the input tensor to RoPE in the indexer module requires a non-interleaved layout, whereas RoPE in the MLA module expects an interleaved layout. This issue has now been resolved. Please refer to the updated version of the inference demo code and take note of this implementation detail.
85
-
86
- ## How to Run Locally
87
-
88
- ### HuggingFace
89
-
90
- We provide an updated inference demo code in the [inference](https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Exp/tree/main/inference) folder to help the community quickly get started with our model and understand its architectural details.
91
-
92
- First convert huggingface model weights to the the format required by our inference demo. Set `MP` to match your available GPU count:
93
  ```bash
94
- cd inference
95
- export EXPERTS=256
96
- python convert.py --hf-ckpt-path ${HF_CKPT_PATH} --save-path ${SAVE_PATH} --n-experts ${EXPERTS} --model-parallel ${MP}
97
  ```
98
 
99
- Launch the interactive chat interface and start exploring DeepSeek's capabilities:
 
 
100
  ```bash
101
- export CONFIG=config_671B_v3.2.json
102
- torchrun --nproc-per-node ${MP} generate.py --ckpt-path ${SAVE_PATH} --config ${CONFIG} --interactive
103
  ```
104
 
105
- ### SGLang
 
 
 
106
 
107
- #### Installation with Docker
108
 
 
 
 
 
 
 
109
  ```
110
- # H200
111
- docker pull lmsysorg/sglang:dsv32
112
 
113
- # MI350
114
- docker pull lmsysorg/sglang:dsv32-rocm
115
 
116
- # NPUs
117
- docker pull lmsysorg/sglang:dsv32-a2
118
- docker pull lmsysorg/sglang:dsv32-a3
119
- ```
120
 
121
- #### Launch Command
122
- ```bash
123
- python -m sglang.launch_server --model deepseek-ai/DeepSeek-V3.2-Exp --tp 8 --dp 8 --enable-dp-attention
124
- ```
125
 
126
- ### vLLM
 
 
 
 
127
 
128
- vLLM provides day-0 support of DeepSeek-V3.2-Exp. See the [recipes](https://docs.vllm.ai/projects/recipes/en/latest/DeepSeek/DeepSeek-V3_2-Exp.html) for up-to-date details.
129
 
130
- ## Open-Source Kernels
 
 
 
131
 
132
- For TileLang kernels with **better readability and research-purpose design**, please refer to [TileLang](https://github.com/tile-ai/tilelang/tree/main/examples/deepseek_v32).
133
 
134
- For **high-performance CUDA kernels**, indexer logit kernels (including paged versions) are available in [DeepGEMM](https://github.com/deepseek-ai/DeepGEMM/pull/200). Sparse attention kernels are released in [FlashMLA](https://github.com/deepseek-ai/FlashMLA/pull/98).
 
 
 
135
 
 
136
 
 
 
 
137
 
138
- ## License
 
 
 
 
 
139
 
140
- This repository and the model weights are licensed under the [MIT License](LICENSE).
 
 
 
141
 
142
- ## Citation
 
 
 
 
 
 
 
 
 
143
 
144
  ```
145
- @misc{deepseekai2024deepseekv32,
146
- title={DeepSeek-V3.2-Exp: Boosting Long-Context Efficiency with DeepSeek Sparse Attention},
147
- author={DeepSeek-AI},
148
- year={2025},
149
- }
 
 
 
 
 
 
 
 
 
 
 
150
  ```
151
 
152
- ## Contact
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
153
 
154
- If you have any questions, please raise an issue or contact us at [service@deepseek.com](service@deepseek.com).
 
1
  ---
2
  license: mit
3
  library_name: transformers
 
 
 
4
  ---
5
+
6
+ # Nepalaya-R
7
+
8
+ Nepalaya-R is a large language model project with full source, configs, and deployment tooling for local and Hugging Face usage.
9
+
10
+ ## About This Model
11
+
12
+ This repository contains the Nepalaya-R model implementation with:
13
+
14
+ - ✅ Full source code and inference implementations
15
+ - Tokenizer configuration adapted for Nepalaya-R
16
+ - Easy-to-use inference scripts
17
+ - Documentation and setup guides
18
+
19
+ ## Quick Start
20
+
21
+ ### Installation
22
+
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23
  ```bash
24
+ pip install -r requirements.txt
 
 
25
  ```
26
 
27
+ ### Download & Setup
28
+
29
+ Option 1: Download from Hugging Face
30
  ```bash
31
+ export HF_TOKEN=your_token
32
+ python download_model.py --model-id your-username/Nepalaya-R --local-dir ./model_weights
33
  ```
34
 
35
+ Option 2: Run Quick Inference
36
+ ```bash
37
+ python quick_inference.py --prompt "Your prompt here"
38
+ ```
39
 
40
+ ### Mirror Setup
41
 
42
+ To create your own Nepalaya-R repo mirror:
43
+ ```bash
44
+ export HF_TOKEN=your_token
45
+ python mirror_to_hf.py \
46
+ --source source-org/source-model \
47
+ --dest your-username/Nepalaya-R
48
  ```
 
 
49
 
50
+ ## Documentation
 
51
 
52
+ - **[SETUP.md](SETUP.md)** - Detailed setup and configuration guide
53
+ - **[GITHUB_DEPLOY.md](GITHUB_DEPLOY.md)** - Deployment instructions
54
+ - **[inference/README.md](inference/README.md)** - Inference code documentation
 
55
 
56
+ ## Model Architecture
 
 
 
57
 
58
+ Nepalaya-R architecture summary:
59
+ - **Parameters:** 671B
60
+ - **Context Length:** Extended via sparse attention
61
+ - **Training:** Sparse attention based training pipeline
62
+ - **Architecture:** Optimized transformer with mixture-of-experts
63
 
64
+ ## Key Features
65
 
66
+ - Multi-expert routing for efficient inference
67
+ - Sparse attention for long-context processing
68
+ - Chat template support
69
+ - Distributed inference capabilities
70
 
71
+ ## System Requirements
72
 
73
+ - **GPU Memory:** 48GB+ VRAM recommended
74
+ - **RAM:** 64GB+ system memory
75
+ - **Storage:** ~300GB for full model weights
76
+ - **SSD:** Fast storage recommended
77
 
78
+ ## Usage Examples
79
 
80
+ ### Basic Generation
81
+ ```python
82
+ from transformers import AutoModelForCausalLM, AutoTokenizer
83
 
84
+ model = AutoModelForCausalLM.from_pretrained(
85
+ "your-username/Nepalaya-R",
86
+ torch_dtype="auto",
87
+ device_map="auto"
88
+ )
89
+ tokenizer = AutoTokenizer.from_pretrained("your-username/Nepalaya-R")
90
 
91
+ inputs = tokenizer("Hello", return_tensors="pt")
92
+ outputs = model.generate(**inputs, max_new_tokens=100)
93
+ print(tokenizer.decode(outputs[0]))
94
+ ```
95
 
96
+ ### Chat Mode
97
+ ```python
98
+ messages = [
99
+ {"role": "user", "content": "What is machine learning?"}
100
+ ]
101
+ inputs = tokenizer.apply_chat_template(messages, tokenize=True, return_tensors="pt")
102
+ outputs = model.generate(**inputs, max_new_tokens=256)
103
+ ```
104
+
105
+ ## Repository Structure
106
 
107
  ```
108
+ Nepalaya-R/
109
+ ├── README.md # This file
110
+ ├── SETUP.md # Setup guide
111
+ ├── GITHUB_DEPLOY.md # Deployment guide
112
+ ├── requirements.txt # Python dependencies
113
+ ├── config.json # Model configuration
114
+ ├── tokenizer.json # Tokenizer
115
+ ├── quick_inference.py # Quick inference script
116
+ ├── download_model.py # Model downloader
117
+ ├── mirror_to_hf.py # HF mirroring tool
118
+ ├── inference/ # Inference code
119
+ │ ├── generate.py # Generation script
120
+ │ ├── model.py # Model implementation
121
+ │ ├── convert.py # Weight converter
122
+ │ └── config_671B_nepalaya.json # Inference config
123
+ └── assets/ # Chat templates
124
  ```
125
 
126
+ ## Files Included
127
+
128
+ - **Source Code:** Full inference implementation
129
+ - **Configuration:** Model and generation configs
130
+ - **Tokenizer:** Complete tokenizer setup
131
+ - **Documentation:** Setup and usage guides
132
+ - **Utilities:** Download and mirror scripts
133
+
134
+ ## License
135
+
136
+ MIT License - See [LICENSE](LICENSE) file
137
+
138
+ ## Support
139
+
140
+ For documentation, see [SETUP.md](SETUP.md)
141
+ For deployment, see [GITHUB_DEPLOY.md](GITHUB_DEPLOY.md)
142
+
143
+ ---
144
 
145
+ Nepalaya-R model card and repository maintained by the Nepalaya-R project.