Esha commited on
Commit
0fa110d
·
1 Parent(s): 4ea58ec

Update training pipeline, debug logic, and model outputs

Browse files
Files changed (2) hide show
  1. README.md +164 -9
  2. src/model/train.py +0 -13
README.md CHANGED
@@ -15,14 +15,169 @@ model-index:
15
 
16
  # Finetuning an Open-Source LLM
17
 
18
- This project adapts large language models to domain-specific tasks, leveraging parameter-efficient techniques (LoRA/QLoRA), cloud deployment, and workflow orchestration.
 
 
 
 
 
19
 
20
  ## Getting Started
21
- - Clone this repo
22
- - Install Python dependencies
23
- - See `demo_app/app.py` to launch the demo
24
-
25
- ## Structure
26
- - Models: Fine-tuned checkpoints
27
- - Demo App: Streamlit/Gradio interface
28
- - Configs: Training/deployment configs
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
 
16
  # Finetuning an Open-Source LLM
17
 
18
+ This project adapts large language models to domain-specific tasks, leveraging parameter-efficient techniques (LoRA/QLoRA), cloud deployment, and workflow orchestration.
19
+ This repository contains code for fine-tuning large language models (LLMs) on custom datasets, handling cloud orchestration, and uploading final models to Hugging Face Hub.
20
+
21
+ ## Objective
22
+
23
+ This project aims to fine-tune large language models (LLMs) efficiently for domain-specific tasks, enabling easy deployment via cloud orchestration and interactive demo applications. It demonstrates advanced techniques like parameter-efficient fine-tuning (LoRA/QLoRA) and streamlined workflow automation.
24
 
25
  ## Getting Started
26
+
27
+ - Clone this repository
28
+ - Install Python dependencies
29
+ - See `demo_app/app.py` to launch the demo
30
+
31
+ ---
32
+
33
+ ## Project Structure
34
+
35
+ - `src/model/`: Core model training, evaluation, and upload scripts.
36
+ - `configs/train_config.yaml`: Configuration file for training hyperparameters and paths.
37
+ - `models/llm-finetuned/`: Output directory where trained model checkpoints and tokenizer files are saved.
38
+ - `upload_model.py`: Script to upload saved model files to Hugging Face Hub.
39
+ - `src/eval/`: Evaluation scripts for the trained models.
40
+
41
+ ---
42
+
43
+ ## Setup Instructions
44
+
45
+ 1. Create and activate your Python environment:
46
+
47
+ ```
48
+ conda create -n llm-finetuning python=3.10 -y
49
+ conda activate llm-finetuning
50
+ ```
51
+
52
+ 2. Install required dependencies:
53
+
54
+ ```
55
+ pip install -r requirements.txt
56
+ ```
57
+
58
+ 3. Configure `configs/train_config.yaml` according to your data paths and training parameters.
59
+
60
+ ---
61
+ ## Training
62
+
63
+ Run model training with:
64
+ ```bash
65
+ python src/model/train.py
66
+ ```
67
+
68
+ This will:
69
+
70
+ - Load your dataset from the configured path.
71
+ - Fine-tune the specified model.
72
+ - Save model checkpoints and tokenizer files in `models/llm-finetuned/`.
73
+
74
+ ---
75
+
76
+ ## Uploading Model to Hugging Face
77
+
78
+ After training completes, upload your model files with:
79
+ ```bash
80
+ python upload_model.py
81
+ ```
82
+
83
+ Ensure your `upload_model.py` points to the correct local folder and Hugging Face repository:
84
+
85
+ from huggingface_hub import HfApi
86
+
87
+ api = HfApi()
88
+ api.upload_folder(
89
+ repo_id="your-hf-username/your-model-repo",
90
+ folder_path="models/llm-finetuned",
91
+ path_in_repo="",
92
+ repo_type="model"
93
+ )
94
+ print("Upload completed")
95
+
96
+ ---
97
+
98
+ ## Streamlit Demo Application
99
+
100
+ This project also includes a Streamlit app for easy demonstration and testing of the fine-tuned model.
101
+
102
+ ### Running the Streamlit App
103
+
104
+ 1. Install Streamlit if not installed:
105
+
106
+ ```
107
+ pip install streamlit
108
+ ```
109
+
110
+ 2. Run the app:
111
+
112
+ ```
113
+ streamlit run src/demo_app/app.py
114
+ ```
115
+
116
+ 3. Open the local URL provided by Streamlit in your browser to interact with the model.
117
+
118
+ ### Configuration
119
+
120
+ - Update API keys or model paths in the app configuration file if needed.
121
+ - Modify the app code to customize UI or add features.
122
+
123
+ ---
124
+
125
+ ## Git Workflow for Updates
126
+
127
+ To commit and push changes safely when collaborating or syncing with remote:
128
+ ```bash
129
+ git add .
130
+ git commit -m "Describe your changes"
131
+ git pull --rebase
132
+ ```
133
+ Resolve conflicts if any, then:
134
+ git push
135
+
136
+ For stuck rebase issues, clear the rebase state with:
137
+ For Git Bash or WSL
138
+ rm -rf .git/rebase-merge
139
+
140
+ Or in PowerShell
141
+ Remove-Item -Recurse -Force .git\rebase-merge
142
+
143
+ ---
144
+
145
+ ## Troubleshooting
146
+
147
+ - If Git complains about `index.lock`, delete the lock file:
148
+
149
+ ```
150
+ rm -f .git/index.lock
151
+ ```
152
+
153
+ Or in PowerShell:
154
+
155
+ ```
156
+ Remove-Item .git\index.lock
157
+ ```
158
+
159
+ - Always commit or stash changes before pulling:
160
+
161
+ ```
162
+ git add .
163
+ git commit -m "Save progress"
164
+ git pull --rebase
165
+ ```
166
+
167
+ ---
168
+
169
+ ## Future Scope
170
+
171
+ - Expand support for additional LLM architectures and datasets.
172
+ - Integrate advanced evaluation metrics and error analysis tools.
173
+ - Develop fully featured web applications for user-friendly model interaction.
174
+ - Optimize cloud deployment pipelines for scalable inference.
175
+ - Implement autoML capabilities for hyperparameter and architecture tuning.
176
+ - Add multilingual and multimodal fine-tuning workflows.
177
+
178
+ ---
179
+ ## Contact
180
+
181
+ For questions or issues, open an issue in this repository or reach out via email: [workwitheesha@gmail.com](mailto:workwitheesha@gmail.com)
182
+
183
+ ---
src/model/train.py CHANGED
@@ -64,19 +64,6 @@ def main():
64
  trainer.train()
65
  trainer.save_model(output_dir) # Saves model files like pytorch_model.bin, config.json
66
  tokenizer.save_pretrained(output_dir) # Saves tokenizer files like tokenizer_config.json, vocab files
67
- print(f"Model and tokenizer saved in {output_dir}")
68
-
69
- # Debug: Check if folder exists after saving
70
- import os
71
- if os.path.exists(output_dir):
72
- print(f"Output directory exists: {output_dir}")
73
- else:
74
- print(f"Output directory DOES NOT exist: {output_dir}")
75
-
76
- # Debug: List files in output_dir
77
- saved_files = os.listdir(output_dir) if os.path.exists(output_dir) else []
78
- print(f"Files saved in output directory: {saved_files}")
79
-
80
 
81
  if __name__ == "__main__":
82
  main()
 
64
  trainer.train()
65
  trainer.save_model(output_dir) # Saves model files like pytorch_model.bin, config.json
66
  tokenizer.save_pretrained(output_dir) # Saves tokenizer files like tokenizer_config.json, vocab files
 
 
 
 
 
 
 
 
 
 
 
 
 
67
 
68
  if __name__ == "__main__":
69
  main()