lemms commited on
Commit
74e80e3
ยท
verified ยท
1 Parent(s): d46268a

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +241 -23
README.md CHANGED
@@ -1,40 +1,258 @@
1
  ---
2
- title: OpenLLM Training Space
3
  emoji: ๐Ÿš€
4
- colorFrom: blue
5
- colorTo: purple
6
  sdk: gradio
7
- sdk_version: 4.44.0
8
  app_file: app.py
9
  pinned: false
10
  license: gpl-3.0
11
  ---
12
 
13
- # OpenLLM Training Space
14
 
15
- This space provides training infrastructure for OpenLLM models.
16
 
17
- ## Features
18
 
19
- - ๐ŸŽฏ Model training pipeline
20
- - ๐Ÿ“Š Training monitoring
21
- - ๐Ÿ”„ Model versioning
22
- - ๐Ÿ“ˆ Performance tracking
23
 
24
- ## Usage
25
 
26
- 1. Upload your training data
27
- 2. Configure training parameters
28
- 3. Start training
29
- 4. Monitor progress
30
- 5. Download trained models
31
 
32
- ## Model Repositories
33
 
34
- - [openllm-small-extended-7k](https://huggingface.co/lemms/openllm-small-extended-7k)
35
- - [openllm-small-extended-8k](https://huggingface.co/lemms/openllm-small-extended-8k)
36
- - [openllm-training-data](https://huggingface.co/datasets/lemms/openllm-training-data)
37
 
38
- ## License
39
 
40
- GPL-3.0 - See [LICENSE](LICENSE) for details.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: OpenLLM Live Training Space
3
  emoji: ๐Ÿš€
4
+ colorFrom: green
5
+ colorTo: blue
6
  sdk: gradio
7
+ sdk_version: 4.44.1
8
  app_file: app.py
9
  pinned: false
10
  license: gpl-3.0
11
  ---
12
 
13
+ # ๐Ÿš€ OpenLLM Live Training Space
14
 
15
+ ## ๐Ÿ“š What is This Space?
16
 
17
+ Welcome to the **OpenLLM Live Training Space**! This is an interactive web application where you can train new language models from existing checkpoints with customizable parameters. Think of it as a "training playground" where you can experiment with different training configurations in real-time.
18
 
19
+ ### ๐ŸŽฏ What Makes This Special?
 
 
 
20
 
21
+ Unlike most AI demos that only allow you to use pre-trained models, **this space lets you actually train new models** with your own settings:
22
 
23
+ - **Interactive Training**: Configure and start training sessions in real-time
24
+ - **Parameter Experimentation**: Try different learning rates, batch sizes, and optimization settings
25
+ - **Live Monitoring**: Watch training progress and metrics as they happen
26
+ - **Educational**: Learn how different parameters affect model training
27
+ - **No Setup Required**: Train models without installing anything locally
28
 
29
+ ## ๐Ÿง  Understanding Model Training
30
 
31
+ ### What is Model Training?
 
 
32
 
33
+ Model training is like teaching a student by showing them millions of examples. The model learns patterns from the data and gradually improves its ability to predict what comes next.
34
 
35
+ **Example Training Process:**
36
+ 1. **Input**: "The weather today is..."
37
+ 2. **Model Prediction**: "sunny" (might be wrong initially)
38
+ 3. **Correction**: "Actually, it's rainy"
39
+ 4. **Learning**: Model adjusts its "thinking" to do better next time
40
+ 5. **Repeat**: Millions of times until the model gets good at predictions
41
+
42
+ ### How Does Training Work?
43
+
44
+ #### The Training Loop
45
+ 1. **Forward Pass**: Model makes a prediction
46
+ 2. **Loss Calculation**: Measure how wrong the prediction was
47
+ 3. **Backward Pass**: Calculate how to adjust the model
48
+ 4. **Parameter Update**: Update model weights to improve
49
+ 5. **Repeat**: Continue until the model performs well
50
+
51
+ #### Key Parameters
52
+ - **Learning Rate**: How big steps to take when learning (too big = overshooting, too small = slow learning)
53
+ - **Batch Size**: How many examples to process at once (affects memory usage and training speed)
54
+ - **Training Steps**: How long to train (more steps = potentially better performance)
55
+ - **Optimizer**: Algorithm for updating model weights (AdamW, Adam, SGD)
56
+
57
+ ## ๐ŸŽฎ How to Use This Space
58
+
59
+ ### Step-by-Step Guide
60
+
61
+ #### 1. Configure Training Parameters
62
+ - **Learning Rate**: Start with 3e-4 (0.0003) for most cases
63
+ - **Batch Size**: Choose based on your memory constraints (8-16 is usually good)
64
+ - **Training Steps**:
65
+ - 1000 steps = Quick experiment (10-30 minutes)
66
+ - 5000 steps = Medium training (1-3 hours)
67
+ - 10000 steps = Extended training (3-8 hours)
68
+
69
+ #### 2. Start Training
70
+ - Click the "๐Ÿš€ Start Training" button
71
+ - Watch the status updates in real-time
72
+ - Monitor loss values and training progress
73
+
74
+ #### 3. Monitor Progress
75
+ - **Loss**: Should decrease over time (lower is better)
76
+ - **Learning Rate**: May change based on scheduler
77
+ - **Steps**: Current progress through training
78
+
79
+ #### 4. Download Results
80
+ - Once training completes, download your trained model
81
+ - Use it for text generation or further fine-tuning
82
+
83
+ ### Training Scenarios
84
+
85
+ #### Quick Experiments (1000 steps)
86
+ - **Best for**: Testing different learning rates and configurations
87
+ - **Duration**: 10-30 minutes
88
+ - **Use case**: Hyperparameter exploration and rapid prototyping
89
+
90
+ #### Medium Training (5000 steps)
91
+ - **Best for**: Significant model improvement and fine-tuning
92
+ - **Duration**: 1-3 hours
93
+ - **Use case**: Model optimization and performance enhancement
94
+
95
+ #### Extended Training (10000 steps)
96
+ - **Best for**: Maximum performance improvement
97
+ - **Duration**: 3-8 hours
98
+ - **Use case**: Production model development and research
99
+
100
+ ## ๐Ÿ“Š Understanding the Parameters
101
+
102
+ ### Learning Parameters
103
+ - **Learning Rate**: Controls how fast the model learns
104
+ - Too high: Model might overshoot and never converge
105
+ - Too low: Training takes forever
106
+ - Sweet spot: Usually between 1e-4 and 1e-3
107
+
108
+ - **Batch Size**: Number of examples processed together
109
+ - Larger: More stable gradients, but uses more memory
110
+ - Smaller: Less memory, but potentially less stable training
111
+
112
+ ### Optimization Settings
113
+ - **Gradient Accumulation**: Simulates larger batch sizes with less memory
114
+ - **Optimizer**: Algorithm for updating weights
115
+ - AdamW: Usually the best choice for transformers
116
+ - Adam: Good general-purpose optimizer
117
+ - SGD: Simple but may need more tuning
118
+
119
+ - **Scheduler**: How learning rate changes over time
120
+ - Cosine: Smooth decrease, often works well
121
+ - Linear: Straight-line decrease
122
+ - Constant: No change (rarely used)
123
+
124
+ ### Advanced Options
125
+ - **Weight Decay**: Prevents overfitting by penalizing large weights
126
+ - **Gradient Clipping**: Prevents exploding gradients
127
+ - **Warmup Steps**: Gradually increase learning rate at the start
128
+
129
+ ## ๐ŸŽ“ Educational Value
130
+
131
+ ### What You'll Learn
132
+
133
+ #### 1. Training Dynamics
134
+ - How loss decreases over time
135
+ - The relationship between learning rate and convergence
136
+ - When to stop training (avoiding overfitting)
137
+
138
+ #### 2. Hyperparameter Tuning
139
+ - How different parameters affect training
140
+ - The trade-offs between speed and quality
141
+ - Best practices for different scenarios
142
+
143
+ #### 3. Model Development
144
+ - The complete training workflow
145
+ - How to evaluate model performance
146
+ - When to use different training strategies
147
+
148
+ #### 4. Practical Skills
149
+ - Reading training logs and metrics
150
+ - Understanding model convergence
151
+ - Debugging training issues
152
+
153
+ ### Learning Path
154
+
155
+ #### Beginner Level
156
+ 1. Start with default parameters
157
+ 2. Try different training step counts
158
+ 3. Observe how loss changes over time
159
+
160
+ #### Intermediate Level
161
+ 1. Experiment with different learning rates
162
+ 2. Try different optimizers and schedulers
163
+ 3. Understand the relationship between parameters
164
+
165
+ #### Advanced Level
166
+ 1. Fine-tune all parameters for optimal performance
167
+ 2. Understand the underlying training algorithms
168
+ 3. Apply these concepts to your own projects
169
+
170
+ ## ๐Ÿ”ฌ Research Applications
171
+
172
+ ### What Can You Do With This?
173
+
174
+ #### 1. Hyperparameter Research
175
+ - Study how different parameters affect training
176
+ - Find optimal configurations for specific tasks
177
+ - Understand parameter interactions
178
+
179
+ #### 2. Training Methodologies
180
+ - Compare different optimization strategies
181
+ - Study learning rate schedules
182
+ - Research training stability techniques
183
+
184
+ #### 3. Model Development
185
+ - Prototype new training approaches
186
+ - Test different architectures
187
+ - Develop custom training pipelines
188
+
189
+ #### 4. Educational Research
190
+ - Study how people learn about ML
191
+ - Develop better teaching methods
192
+ - Create interactive learning experiences
193
+
194
+ ## ๐Ÿ› ๏ธ Technical Details
195
+
196
+ ### Base Model
197
+ This space uses the **lemms/openllm-small-extended-9k** model as the starting point, which is our best-performing model with:
198
+ - **Architecture**: GPT-style transformer
199
+ - **Parameters**: ~35.8M
200
+ - **Training**: 9,000 steps on SQUAD dataset
201
+ - **Performance**: ~5.2 loss, ~177 perplexity
202
+
203
+ ### Training Infrastructure
204
+ - **Framework**: PyTorch with custom training loop
205
+ - **Optimization**: AdamW optimizer with cosine scheduling
206
+ - **Memory Management**: Gradient checkpointing and accumulation
207
+ - **Monitoring**: Real-time loss and metric tracking
208
+
209
+ ### Limitations
210
+ - **Demo Mode**: This is a demonstration of training capabilities
211
+ - **Resource Constraints**: Limited GPU time per session
212
+ - **Model Size**: Currently supports small models only
213
+ - **Dataset**: Uses pre-processed SQUAD dataset
214
+
215
+ ## ๐Ÿ”— Related Resources
216
+
217
+ ### OpenLLM Project
218
+ - **[Model Demo Space](https://huggingface.co/spaces/lemms/llm)** - Test trained models
219
+ - **[GitHub Repository](https://github.com/louischua/osllm)** - Source code and documentation
220
+ - **[Training Documentation](../docs/TRAINING_IMPROVEMENTS.md)** - Detailed training guide
221
+
222
+ ### Learning Resources
223
+ - **PyTorch Tutorials**: Official PyTorch documentation
224
+ - **Transformer Papers**: "Attention Is All You Need" and follow-ups
225
+ - **Training Guides**: Hugging Face training tutorials
226
+
227
+ ### Community
228
+ - **GitHub Discussions**: Ask questions and share results
229
+ - **Discord/Slack**: Join our community chat
230
+ - **Twitter**: Follow for updates and announcements
231
+
232
+ ## ๐Ÿ“ž Support and Contact
233
+
234
+ ### Getting Help
235
+ - **GitHub Issues**: For bugs and feature requests
236
+ - **Discussions**: For questions and general help
237
+ - **Email**: louischua@gmail.com for private matters
238
+
239
+ ### Contact Information
240
+ - **Author**: Louis Chua Bean Chong
241
+ - **GitHub**: https://github.com/louischua/openllm
242
+ - **Model Demo**: https://huggingface.co/spaces/lemms/llm
243
+
244
+ ## ๐Ÿ“„ License
245
+
246
+ This space is part of the OpenLLM project and is available under the GPLv3 license for open source use, with commercial licensing options available.
247
+
248
+ ---
249
+
250
+ ## ๐ŸŽ‰ Start Training!
251
+
252
+ Ready to train your own language model? Configure your parameters and click "Start Training" to begin your AI learning journey!
253
+
254
+ **Remember**: This is a demonstration space. For production training, please refer to the full OpenLLM documentation and run training locally or on your own infrastructure.
255
+
256
+ ---
257
+
258
+ *This space is maintained by Louis Chua Bean Chong and the open-source community. Your feedback and contributions are welcome!*