Spaces:
Sleeping
Sleeping
| BASE_CORPUS = """ | |
| System: You are a tiny local assistant. | |
| User: hello | |
| Assistant: Hello. I am a small local language model built from scratch in Python. | |
| User: what can you do | |
| Assistant: I can generate short text, continue patterns, and answer simple prompts based on my tiny training data. | |
| User: explain machine learning | |
| Assistant: Machine learning is a way to build models from data so they can make predictions or find patterns. | |
| User: explain deep learning | |
| Assistant: Deep learning uses neural networks with many layers to learn useful representations from data. | |
| User: explain natural language processing | |
| Assistant: Natural language processing studies how computers read, generate, and understand human language. | |
| User: what is regression | |
| Assistant: Regression predicts a continuous value, such as price, score, temperature, or demand. | |
| User: what is classification | |
| Assistant: Classification predicts a label or category, such as spam or not spam. | |
| User: what is clustering | |
| Assistant: Clustering groups similar data points without needing labeled targets. | |
| User: what is preprocessing | |
| Assistant: Preprocessing cleans and transforms raw data before modeling. | |
| User: what is model evaluation | |
| Assistant: Model evaluation checks how well a model performs on data it did not train on. | |
| User: write a short greeting | |
| Assistant: Hello and welcome. I am ready to help with a short answer. | |
| User: write a short poem | |
| Assistant: Soft morning light on quiet ground, small thoughts wake without a sound. | |
| User: what is Python | |
| Assistant: Python is a popular programming language known for readable syntax and strong libraries. | |
| User: what is a neural network | |
| Assistant: A neural network is a model made of layers that transform input data into useful predictions. | |
| User: how do I learn coding | |
| Assistant: Start small, practice every day, build mini projects, and review your mistakes carefully. | |
| User: summarize good study habits | |
| Assistant: Good study habits include clear goals, spaced practice, breaks, active recall, and consistent review. | |
| User: give motivation | |
| Assistant: Progress grows from small repeated effort, not from waiting for the perfect moment. | |
| User: what is overfitting | |
| Assistant: Overfitting happens when a model memorizes training data and performs poorly on new data. | |
| User: what is underfitting | |
| Assistant: Underfitting happens when a model is too simple to capture important patterns. | |
| User: what is a dataset | |
| Assistant: A dataset is a collection of examples or records used for analysis or modeling. | |
| User: what is a feature | |
| Assistant: A feature is an input variable used by a model to make a prediction. | |
| User: what is a target | |
| Assistant: A target is the value or label the model tries to predict. | |
| User: give a coding tip | |
| Assistant: Keep functions small, name things clearly, and test simple cases first. | |
| User: tell me about local models | |
| Assistant: Local models run on your own machine, giving you direct control over code, files, and training. | |
| User: how should I debug code | |
| Assistant: Reproduce the issue, isolate the failing step, inspect inputs and outputs, and test one fix at a time. | |
| User: what is attention | |
| Assistant: Attention lets a model focus on the most relevant parts of earlier tokens when predicting the next token. | |
| User: what is a tokenizer | |
| Assistant: A tokenizer turns text into smaller pieces so a model can process language as numbers. | |
| User: what is a transformer | |
| Assistant: A transformer is a neural architecture that uses attention to model relationships between tokens. | |
| User: write a short answer about data science | |
| Assistant: Data science combines statistics, programming, and domain knowledge to learn from data. | |
| """.strip() | |
| def build_training_text(extra_text: str = "") -> str: | |
| extra = " ".join((extra_text or "").split()) | |
| if not extra: | |
| return BASE_CORPUS | |
| return BASE_CORPUS + "\n\n" + extra | |