Chandij123 commited on
Commit
aa9d60f
·
verified ·
1 Parent(s): 8282ffb

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +61 -0
README.md ADDED
@@ -0,0 +1,61 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - pytorch
4
+ - gpt2
5
+ - split-learning
6
+ - federated-learning
7
+ - lora
8
+ library_name: pytorch
9
+ ---
10
+
11
+ # DisLLM Split GPT-2 Model
12
+
13
+ This repository contains a split GPT-2 model trained using federated learning with LoRA fine-tuning.
14
+
15
+ ## Model Architecture
16
+
17
+ - **Base Model**: GPT-2 Small (124M parameters)
18
+ - **Fine-tuning Method**: LoRA (Low-Rank Adaptation)
19
+ - **Trainable Parameters**: ~1.7M
20
+ - **Split Configuration**:
21
+ - First Part (Client): 4 transformer blocks
22
+ - Second Part (Server): 8 transformer blocks
23
+
24
+ ## Files
25
+
26
+ - `central_trained_first_part_20251210_162256.pth`: Client-side model (first 4 layers)
27
+ - `central_trained_second_part_20251210_162256.pth`: Server-side model (remaining 8 layers)
28
+
29
+ ## Training Details
30
+
31
+ - **Dataset**: WikiText-2
32
+ - **Training Method**: Federated Learning with Split Learning
33
+ - **Context Length**: 1024 tokens
34
+ - **Batch Size**: 2
35
+ - **Learning Rate**: 1e-6
36
+
37
+ ## Usage
38
+
39
+ ```python
40
+ import torch
41
+ from transformers import GPT2Config
42
+
43
+ # Load the model parts
44
+ first_part = torch.load('central_trained_first_part_20251210_162256.pth')
45
+ second_part = torch.load('central_trained_second_part_20251210_162256.pth')
46
+
47
+ # Use with the DisLLM architecture
48
+ # (Requires FirstPartModel and SecondPartModel class definitions)
49
+ ```
50
+
51
+ ## Performance
52
+
53
+ Training improves perplexity from ~45 to ~30-35 across train/val/test sets.
54
+
55
+ ## Citation
56
+
57
+ If you use this model, please cite the original DisLLM work and GPT-2 paper.
58
+
59
+ ## License
60
+
61
+ This model inherits the license from the GPT-2 model and training code.