derikk
/

training-checkpoint-step21

+# Checkpoint Upload
+This model checkpoint was automatically uploaded from a distributed training run.
+## Model Details
+- Training step: 21
+- Architecture: Llama-style model
+- Hidden size: 2048
+- Layers: 36
+- Vocabulary size: 151,936
+## Checkpoint Information
+- Originally saved as distributed checkpoint across 4 ranks
+- Consolidated into single checkpoint for easier use
+- Contains model weights, optimizer states, and training configuration
+## Usage
+```python
+import torch
+# Load the checkpoint
+checkpoint = torch.load('pytorch_model.bin', map_location='cpu')
+# The checkpoint contains the model state dict
+# You'll need to initialize the appropriate model architecture
+# and load these weights
+```
+## Note
+This is a raw training checkpoint. For inference, you may need to:
+1. Initialize the correct model architecture
+2. Load the weights properly
+3. Convert to the desired format (e.g., Hugging Face Transformers format)