FluegelQueen commited on
Commit
85e8c5c
·
verified ·
1 Parent(s): 54649eb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +65 -3
README.md CHANGED
@@ -1,3 +1,65 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+
4
+
5
+ -Welcome to my Computer Science Capstone Project!
6
+
7
+
8
+ This is the code that for the training pipeline that was used during my multi year Computer Science Capstone Project. It is a finetune of the most recent Command R model trained using a custom Python training pipeline from scratch.
9
+ My goal is ultimately is understanding the process of training an LLM though the creation of an administrative assistant AI Agent powered by my own custom model.
10
+
11
+ I started this project around the summer of my sophomore year in high school. I was just getting around to studying the mechanics of LLMs back then. My school
12
+ offers a CS capstone class where you are allowed to work on a computer science related project of your choice for the year. This can be repeated in later years if take
13
+ prior to senior year in order to build a new project or continue a previous one.
14
+
15
+
16
+
17
+ # Technical Approach:
18
+
19
+ -Multi-task Training: Curated custom dataset batches across various administrative capabilities such as tool calling, summarization and RAG
20
+
21
+ -Iterative Fine-tuning: Progressive training runs with small learning rate to prevent catastrophic forgetting(learned this the hard way after losing 20 credits)
22
+
23
+ -Knowledge Preservation: Mixed subsets of previous datasets into each new run
24
+
25
+ -Quantization: 8-bit loading via BitsAndBytes for efficient training on Google Colab L4 GPUs
26
+
27
+
28
+
29
+ # Some Challenges:
30
+
31
+
32
+ -The very first training run I forgot I was working with a dictionary and accidentally assigned the variables wrong so every model was trained on "question"-"answer" repeatedly
33
+
34
+ -Trying to train on long chain of thought while heavily truncating the text resulted in barely coherent checkpoints
35
+
36
+ -Cuda dependencies were a struggle that cost a great many hours, nearly causing me to give up on quantization entirely
37
+
38
+ -Money management. I originally used expensive H100 GPUs from cloud providers before settling on Colab
39
+
40
+ -Finding tutorials. Since the subject is so new, I couldn't find many tutorials for younger students. Unsloth notebooks ended up being very useful.
41
+
42
+
43
+
44
+
45
+ # Model Rationale
46
+
47
+ -I was originally going to try Mistral Small 3 24B but it was too large and expensive
48
+
49
+ -Qwen models felt too stiff to me in testing despite recommendation
50
+
51
+ -Cohere models are advertised as good at tool calling and seemed good in practice
52
+
53
+ -I emailed Cohere to see if they were okay with me using this for things that could theoretically help me make money with it and they said I was fine
54
+
55
+ -This is still a research project first and foremost, so non commercial use wasn't really a dealbreaker for me.
56
+
57
+
58
+
59
+
60
+ # Current Goal?
61
+
62
+ -My current goal this senior year is phase 2 of the project, working on a custom agent, built on the smolagents framework, for the model to use in day to day life
63
+
64
+
65
+