Chandij123
/

websockets

federated-learning

Model card Files Files and versions

websockets / README.md

Chandij123's picture

Upload README.md with huggingface_hub

1728682 verified about 2 months ago

|

history blame contribute delete

1.54 kB

	---
	license: mit
	tags:
	- split-learning
	- gpt2
	- federated-learning
	- lora
	---

	# DisLLM GPT-2 Split Learning - Client Model

	This repository contains the client-side model (first 4 layers) for a split learning implementation of GPT-2 Small with LoRA fine-tuning.

	## Model Details

	- Architecture: GPT-2 Small (124M parameters)
	- Split Configuration: First 4 layers out of 12 transformer blocks
	- LoRA Parameters: 1,735,304 trainable parameters
	- Training Method: Federated Split Learning

	## Performance

	- Training PPL: 30.64
	- Validation PPL: 27.03
	- Test PPL: 29.75
	- Training Epochs: 5

	## Usage

	```python
	import torch
	from huggingface_hub import hf_hub_download

	# Download model
	model_path = hf_hub_download(repo_id="Chandij123/websockets", filename="client_model.pth")

	# Load checkpoint
	checkpoint = torch.load(model_path)
	model_state = checkpoint['model_state_dict']
	config = checkpoint['model_config']

	# Initialize and load your FirstPartModel here
	# first_part_model.load_state_dict(model_state)
	```

	## Split Learning Architecture

	This model works in conjunction with a server-side model that contains the remaining layers.

	- Client: Processes input through first 4 layers
	- Server: Continues processing through remaining 8 layers

	## Training Details

	Dataset: WikiText-2
	- Training: 2,359 examples
	- Validation: 243 examples
	- Test: 279 examples

	Training configuration:
	- Batch size: 2
	- Context length: 1024
	- Learning rate: 1e-6
	- Optimizer: AdamW

	## License

	MIT License