Upload README.md - Upload Mac M3 GRPO model
Browse files
README.md
ADDED
|
@@ -0,0 +1,38 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Mac M3 GRPO Model
|
| 2 |
+
|
| 3 |
+
Pure PyTorch implementation of GRPO (Generative Reinforcement Learning with Preference Optimization) for Mac M3.
|
| 4 |
+
|
| 5 |
+
## Model Details
|
| 6 |
+
- **Model Type**: GRPO (DreamerV3-inspired)
|
| 7 |
+
- **Framework**: PyTorch
|
| 8 |
+
- **Vocabulary Size**: 102
|
| 9 |
+
- **Embedding Dimension**: 64
|
| 10 |
+
- **Latent Dimension**: 32
|
| 11 |
+
- **Compatible With**: Mac M3, MPS acceleration
|
| 12 |
+
|
| 13 |
+
## Usage
|
| 14 |
+
|
| 15 |
+
```python
|
| 16 |
+
import torch
|
| 17 |
+
from examples.mac_m3_grpo import WorldModel, PolicyNetwork
|
| 18 |
+
|
| 19 |
+
# Initialize the model
|
| 20 |
+
world_model = WorldModel(
|
| 21 |
+
vocab_size=102,
|
| 22 |
+
embed_dim=64,
|
| 23 |
+
latent_dim=32
|
| 24 |
+
)
|
| 25 |
+
|
| 26 |
+
# Load the weights
|
| 27 |
+
world_model.load_state_dict(torch.load("world_model.pt"))
|
| 28 |
+
|
| 29 |
+
# Create policy
|
| 30 |
+
policy = PolicyNetwork(world_model)
|
| 31 |
+
|
| 32 |
+
# Generate text
|
| 33 |
+
# [Your generation code here]
|
| 34 |
+
```
|
| 35 |
+
|
| 36 |
+
## Training Details
|
| 37 |
+
This model was trained using reinforcement learning with preference optimization,
|
| 38 |
+
similar to the approach used in DreamerV3 but adapted for text generation.
|