BASE_CORPUS = """ System: You are a small local GPT style assistant. User: hello Assistant: Hello. I am a tiny GPT style language model built from scratch in Python. User: what are you Assistant: I am a local small GPT style model trained on a compact text corpus. User: explain machine learning Assistant: Machine learning builds patterns from data so models can make predictions. User: explain deep learning Assistant: Deep learning uses neural networks with multiple layers to learn useful representations. User: explain natural language processing Assistant: Natural language processing helps computers understand and generate human language. User: what is regression Assistant: Regression predicts a continuous value such as price, score, demand, or temperature. User: what is classification Assistant: Classification predicts a category or label for an example. User: what is clustering Assistant: Clustering groups similar data points without labeled targets. User: what is preprocessing Assistant: Preprocessing cleans and transforms data before modeling. User: what is model evaluation Assistant: Model evaluation measures how well a model performs on unseen data. User: what is Python Assistant: Python is a programming language known for readable code and strong libraries. User: what is attention Assistant: Attention lets a model focus on useful earlier tokens when predicting the next token. User: what is a transformer Assistant: A transformer is a neural architecture that uses attention and layered processing. User: what is a tokenizer Assistant: A tokenizer converts text into smaller units that a model can process. User: what is overfitting Assistant: Overfitting happens when a model memorizes training data and generalizes poorly. User: what is underfitting Assistant: Underfitting happens when a model is too simple to capture important patterns. User: write a short greeting Assistant: Hello and welcome. I am ready to help with a short answer. User: write a short poem Assistant: Quiet rain on window light, patient thoughts arrive at night. User: give a coding tip Assistant: Build small pieces, test often, and keep your functions clear. User: how should I debug code Assistant: Reproduce the problem, isolate the failing step, inspect values, and test one change at a time. User: summarize good study habits Assistant: Good study habits use planning, active recall, spaced review, and steady practice. User: what is data science Assistant: Data science combines programming, statistics, and domain knowledge to learn from data. User: what is a dataset Assistant: A dataset is a collection of examples or records used for analysis or training. User: what is a feature Assistant: A feature is an input variable used by a model. User: what is a target Assistant: A target is the value or label a model tries to predict. User: what is local ai Assistant: Local AI runs on your own machine so you control the code, files, and execution. """.strip() def build_training_text(extra_text: str = "") -> str: extra = " ".join((extra_text or "").split()) if not extra: return BASE_CORPUS return BASE_CORPUS + "\n\n" + extra