websockets / README.md
Chandij123's picture
Upload README.md with huggingface_hub
1728682 verified
---
license: mit
tags:
- split-learning
- gpt2
- federated-learning
- lora
---
# DisLLM GPT-2 Split Learning - Client Model
This repository contains the **client-side model** (first 4 layers) for a split learning implementation of GPT-2 Small with LoRA fine-tuning.
## Model Details
- **Architecture**: GPT-2 Small (124M parameters)
- **Split Configuration**: First 4 layers out of 12 transformer blocks
- **LoRA Parameters**: 1,735,304 trainable parameters
- **Training Method**: Federated Split Learning
## Performance
- **Training PPL**: 30.64
- **Validation PPL**: 27.03
- **Test PPL**: 29.75
- **Training Epochs**: 5
## Usage
```python
import torch
from huggingface_hub import hf_hub_download
# Download model
model_path = hf_hub_download(repo_id="Chandij123/websockets", filename="client_model.pth")
# Load checkpoint
checkpoint = torch.load(model_path)
model_state = checkpoint['model_state_dict']
config = checkpoint['model_config']
# Initialize and load your FirstPartModel here
# first_part_model.load_state_dict(model_state)
```
## Split Learning Architecture
This model works in conjunction with a server-side model that contains the remaining layers.
- **Client**: Processes input through first 4 layers
- **Server**: Continues processing through remaining 8 layers
## Training Details
Dataset: WikiText-2
- Training: 2,359 examples
- Validation: 243 examples
- Test: 279 examples
Training configuration:
- Batch size: 2
- Context length: 1024
- Learning rate: 1e-6
- Optimizer: AdamW
## License
MIT License