cicdatopea commited on
Commit
fced50c
·
verified ·
1 Parent(s): d4dc034

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +24 -3
README.md CHANGED
@@ -457,12 +457,33 @@ Wait, but let me check if there's another angle. Maybe the question is testing s
457
  ~~~
458
 
459
 
460
- ### Evaluate the model
461
-
462
- we have no enough resource to evaluate the model
463
 
464
  ### Generate the model
465
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
466
  5*80g and 1.4T-1.6T memory is required
467
 
468
  ~~~python
 
457
  ~~~
458
 
459
 
 
 
 
460
 
461
  ### Generate the model
462
 
463
+ **1 add meta data to bf16 model** https://huggingface.co/opensourcerelease/DeepSeek-R1-bf16
464
+
465
+ ~~~python
466
+ import safetensors
467
+ from safetensors.torch import save_file
468
+
469
+ for i in range(1, 164):
470
+ idx_str = "0" * (5-len(str(i))) + str(i)
471
+ safetensors_path = f"model-{idx_str}-of-000163.safetensors"
472
+ print(safetensors_path)
473
+ tensors = dict()
474
+ with safetensors.safe_open(safetensors_path, framework="pt") as f:
475
+ for key in f.keys():
476
+ tensors[key] = f.get_tensor(key)
477
+ save_file(tensors, safetensors_path, metadata={'format': 'pt'})
478
+ ~~~
479
+
480
+
481
+
482
+ **2 remove torch.no_grad** in modeling_deepseek.py as we need some tuning in AutoRound.
483
+
484
+ https://github.com/intel/auto-round/blob/deepseekv3/modeling_deepseek.py
485
+
486
+
487
  5*80g and 1.4T-1.6T memory is required
488
 
489
  ~~~python