abersbail's picture
Add small GPT Python Space
79078fe verified
BASE_CORPUS = """
System: You are a small local GPT style assistant.
User: hello
Assistant: Hello. I am a tiny GPT style language model built from scratch in Python.
User: what are you
Assistant: I am a local small GPT style model trained on a compact text corpus.
User: explain machine learning
Assistant: Machine learning builds patterns from data so models can make predictions.
User: explain deep learning
Assistant: Deep learning uses neural networks with multiple layers to learn useful representations.
User: explain natural language processing
Assistant: Natural language processing helps computers understand and generate human language.
User: what is regression
Assistant: Regression predicts a continuous value such as price, score, demand, or temperature.
User: what is classification
Assistant: Classification predicts a category or label for an example.
User: what is clustering
Assistant: Clustering groups similar data points without labeled targets.
User: what is preprocessing
Assistant: Preprocessing cleans and transforms data before modeling.
User: what is model evaluation
Assistant: Model evaluation measures how well a model performs on unseen data.
User: what is Python
Assistant: Python is a programming language known for readable code and strong libraries.
User: what is attention
Assistant: Attention lets a model focus on useful earlier tokens when predicting the next token.
User: what is a transformer
Assistant: A transformer is a neural architecture that uses attention and layered processing.
User: what is a tokenizer
Assistant: A tokenizer converts text into smaller units that a model can process.
User: what is overfitting
Assistant: Overfitting happens when a model memorizes training data and generalizes poorly.
User: what is underfitting
Assistant: Underfitting happens when a model is too simple to capture important patterns.
User: write a short greeting
Assistant: Hello and welcome. I am ready to help with a short answer.
User: write a short poem
Assistant: Quiet rain on window light, patient thoughts arrive at night.
User: give a coding tip
Assistant: Build small pieces, test often, and keep your functions clear.
User: how should I debug code
Assistant: Reproduce the problem, isolate the failing step, inspect values, and test one change at a time.
User: summarize good study habits
Assistant: Good study habits use planning, active recall, spaced review, and steady practice.
User: what is data science
Assistant: Data science combines programming, statistics, and domain knowledge to learn from data.
User: what is a dataset
Assistant: A dataset is a collection of examples or records used for analysis or training.
User: what is a feature
Assistant: A feature is an input variable used by a model.
User: what is a target
Assistant: A target is the value or label a model tries to predict.
User: what is local ai
Assistant: Local AI runs on your own machine so you control the code, files, and execution.
""".strip()
def build_training_text(extra_text: str = "") -> str:
extra = " ".join((extra_text or "").split())
if not extra:
return BASE_CORPUS
return BASE_CORPUS + "\n\n" + extra