| --- | |
| --- | |
| license: apache-2.0 | |
| # Welcome to my Computer Science Capstone Project! | |
| This is the code that for the training pipeline that was used during my multi year Computer Science Capstone Project. It is a finetune of the most recent Command R model trained using a custom Python training pipeline from scratch. | |
| My goal is ultimately is understanding the process of training an LLM though the creation of an administrative assistant AI Agent powered by my own custom model. | |
| I started this project around the summer of my sophomore year in high school. I was just getting around to studying the mechanics of LLMs back then. My school | |
| offers a CS capstone class where you are allowed to work on a computer science related project of your choice for the year. This can be repeated in later years if take | |
| prior to senior year in order to build a new project or continue a previous one. | |
| # Technical Approach: | |
| -Multi-task Training: Curated custom dataset batches across various administrative capabilities such as tool calling, summarization and RAG | |
| -Iterative Fine-tuning: Progressive training runs with small learning rate to prevent catastrophic forgetting(learned this the hard way after losing 20 credits) | |
| -Knowledge Preservation: Mixed subsets of previous datasets into each new run | |
| -Quantization: 8-bit loading via BitsAndBytes for efficient training on Google Colab L4 GPUs | |
| # Some Challenges: | |
| -The very first training run I forgot I was working with a dictionary and accidentally assigned the variables wrong so every model was trained on "question"-"answer" repeatedly | |
| -Trying to train on long chain of thought while heavily truncating the text resulted in barely coherent checkpoints | |
| -Cuda dependencies were a struggle that cost a great many hours, nearly causing me to give up on quantization entirely | |
| -Money management. I originally used expensive H100 GPUs from cloud providers before settling on Colab | |
| -Finding tutorials. Since the subject is so new, I couldn't find many tutorials for younger students. Unsloth notebooks ended up being very useful. | |
| # Model Rationale | |
| -I was originally going to try Mistral Small 3 24B but it was too large and expensive | |
| -Qwen models felt too stiff to me in testing despite recommendation | |
| -Cohere models are advertised as good at tool calling and seemed good in practice | |
| -I emailed Cohere to see if they were okay with me using this for things that could theoretically help me make money with it and they said I was fine | |
| -This is still a research project first and foremost, so non commercial use wasn't really a dealbreaker for me. | |
| # Current Goal? | |
| -My current goal this senior year is phase 2 of the project, working on a custom agent, built on the smolagents framework, for the model to use in day to day life | |