Update README.md
Browse files
README.md
CHANGED
|
@@ -457,12 +457,33 @@ Wait, but let me check if there's another angle. Maybe the question is testing s
|
|
| 457 |
~~~
|
| 458 |
|
| 459 |
|
| 460 |
-
### Evaluate the model
|
| 461 |
-
|
| 462 |
-
we have no enough resource to evaluate the model
|
| 463 |
|
| 464 |
### Generate the model
|
| 465 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 466 |
5*80g and 1.4T-1.6T memory is required
|
| 467 |
|
| 468 |
~~~python
|
|
|
|
| 457 |
~~~
|
| 458 |
|
| 459 |
|
|
|
|
|
|
|
|
|
|
| 460 |
|
| 461 |
### Generate the model
|
| 462 |
|
| 463 |
+
**1 add meta data to bf16 model** https://huggingface.co/opensourcerelease/DeepSeek-R1-bf16
|
| 464 |
+
|
| 465 |
+
~~~python
|
| 466 |
+
import safetensors
|
| 467 |
+
from safetensors.torch import save_file
|
| 468 |
+
|
| 469 |
+
for i in range(1, 164):
|
| 470 |
+
idx_str = "0" * (5-len(str(i))) + str(i)
|
| 471 |
+
safetensors_path = f"model-{idx_str}-of-000163.safetensors"
|
| 472 |
+
print(safetensors_path)
|
| 473 |
+
tensors = dict()
|
| 474 |
+
with safetensors.safe_open(safetensors_path, framework="pt") as f:
|
| 475 |
+
for key in f.keys():
|
| 476 |
+
tensors[key] = f.get_tensor(key)
|
| 477 |
+
save_file(tensors, safetensors_path, metadata={'format': 'pt'})
|
| 478 |
+
~~~
|
| 479 |
+
|
| 480 |
+
|
| 481 |
+
|
| 482 |
+
**2 remove torch.no_grad** in modeling_deepseek.py as we need some tuning in AutoRound.
|
| 483 |
+
|
| 484 |
+
https://github.com/intel/auto-round/blob/deepseekv3/modeling_deepseek.py
|
| 485 |
+
|
| 486 |
+
|
| 487 |
5*80g and 1.4T-1.6T memory is required
|
| 488 |
|
| 489 |
~~~python
|