---
license: mit
tags:
- split-learning
- gpt2
- federated-learning
- lora
---

# DisLLM GPT-2 Split Learning - Client Model

This repository contains the **client-side model** (first 4 layers) for a split learning implementation of GPT-2 Small with LoRA fine-tuning.

## Model Details

- **Architecture**: GPT-2 Small (124M parameters)
- **Split Configuration**: First 4 layers out of 12 transformer blocks
- **LoRA Parameters**: 1,735,304 trainable parameters
- **Training Method**: Federated Split Learning

## Performance

- **Training PPL**: 30.64
- **Validation PPL**: 27.03
- **Test PPL**: 29.75
- **Training Epochs**: 5

## Usage

```python
import torch
from huggingface_hub import hf_hub_download

# Download model
model_path = hf_hub_download(repo_id="Chandij123/websockets", filename="client_model.pth")

# Load checkpoint
checkpoint = torch.load(model_path)
model_state = checkpoint['model_state_dict']
config = checkpoint['model_config']

# Initialize and load your FirstPartModel here
# first_part_model.load_state_dict(model_state)
```

## Split Learning Architecture

This model works in conjunction with a server-side model that contains the remaining layers.

- **Client**: Processes input through first 4 layers
- **Server**: Continues processing through remaining 8 layers

## Training Details

Dataset: WikiText-2
- Training: 2,359 examples
- Validation: 243 examples  
- Test: 279 examples

Training configuration:
- Batch size: 2
- Context length: 1024
- Learning rate: 1e-6
- Optimizer: AdamW

## License

MIT License