Dr. Jorge Abreu Vicente commited on
Commit ·
561cc87
1
Parent(s): eb7f17a
Update README.md
Browse files
README.md
CHANGED
|
@@ -28,6 +28,44 @@ However, the file [`convert_megatron_bert_checkpoint.py`](https://github.com/hug
|
|
| 28 |
|
| 29 |
We provide in the repository an alternative version of the python script in order to any user to cross-check the validity of the model replicated in this repository.
|
| 30 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 31 |
BioMegatron can be run with the standard 🤗 script for loading models. Here we show an example identical to that of [`nvidia/megatron-bert-uncased-345m`](https://huggingface.co/nvidia/megatron-bert-uncased-345m).
|
| 32 |
|
| 33 |
```python
|
|
|
|
| 28 |
|
| 29 |
We provide in the repository an alternative version of the python script in order to any user to cross-check the validity of the model replicated in this repository.
|
| 30 |
|
| 31 |
+
|
| 32 |
+
The code below is a modification of the original [`convert_megatron_bert_checkpoint.py`](https://github.com/huggingface/transformers/blob/main/src/transformers/models/megatron_bert/convert_megatron_bert_checkpoint.py).
|
| 33 |
+
|
| 34 |
+
```python
|
| 35 |
+
import os
|
| 36 |
+
import torch
|
| 37 |
+
from convert_biomegatron_checkpoint import convert_megatron_checkpoint
|
| 38 |
+
|
| 39 |
+
print_checkpoint_structure = True
|
| 40 |
+
path_to_checkpoint = "/path/to/BioMegatron345mUncased/"
|
| 41 |
+
|
| 42 |
+
# Extract the basename.
|
| 43 |
+
basename = os.path.dirname(path_to_checkpoint).split('/')[-1]
|
| 44 |
+
|
| 45 |
+
# Load the model.
|
| 46 |
+
input_state_dict = torch.load(os.path.join(path_to_checkpoint, 'model_optim_rng.pt'), map_location="cpu")
|
| 47 |
+
|
| 48 |
+
# Convert.
|
| 49 |
+
print("Converting")
|
| 50 |
+
output_state_dict, output_config = convert_megatron_checkpoint(input_state_dict, head_model=False)
|
| 51 |
+
|
| 52 |
+
# Print the structure of converted state dict.
|
| 53 |
+
if print_checkpoint_structure:
|
| 54 |
+
recursive_print(None, output_state_dict)
|
| 55 |
+
|
| 56 |
+
# Store the config to file.
|
| 57 |
+
output_config_file = os.path.join(path_to_checkpoint, "config.json")
|
| 58 |
+
print(f'Saving config to "{output_config_file}"')
|
| 59 |
+
with open(output_config_file, "w") as f:
|
| 60 |
+
json.dump(output_config, f)
|
| 61 |
+
|
| 62 |
+
# Store the state_dict to file.
|
| 63 |
+
output_checkpoint_file = os.path.join(path_to_checkpoint, "pytorch_model.bin")
|
| 64 |
+
print(f'Saving checkpoint to "{output_checkpoint_file}"')
|
| 65 |
+
torch.save(output_state_dict, output_checkpoint_file)
|
| 66 |
+
|
| 67 |
+
```
|
| 68 |
+
|
| 69 |
BioMegatron can be run with the standard 🤗 script for loading models. Here we show an example identical to that of [`nvidia/megatron-bert-uncased-345m`](https://huggingface.co/nvidia/megatron-bert-uncased-345m).
|
| 70 |
|
| 71 |
```python
|