| # Checkpoint Upload | |
| This model checkpoint was automatically uploaded from a distributed training run. | |
| ## Model Details | |
| - Training step: 21 | |
| - Architecture: Llama-style model | |
| - Hidden size: 2048 | |
| - Layers: 36 | |
| - Vocabulary size: 151,936 | |
| ## Checkpoint Information | |
| - Originally saved as distributed checkpoint across 4 ranks | |
| - Consolidated into single checkpoint for easier use | |
| - Contains model weights, optimizer states, and training configuration | |
| ## Usage | |
| ```python | |
| import torch | |
| # Load the checkpoint | |
| checkpoint = torch.load('pytorch_model.bin', map_location='cpu') | |
| # The checkpoint contains the model state dict | |
| # You'll need to initialize the appropriate model architecture | |
| # and load these weights | |
| ``` | |
| ## Note | |
| This is a raw training checkpoint. For inference, you may need to: | |
| 1. Initialize the correct model architecture | |
| 2. Load the weights properly | |
| 3. Convert to the desired format (e.g., Hugging Face Transformers format) | |