File size: 11,145 Bytes
02136f2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 | """
TERA V2 — Clean Q/A Bootstrap Data
Repeated patterns for strong learning signal.
"""
QA_PAIRS = [
# ---- Greetings ----
("Q: Hello\nA: Hello! How can I help you today?", 8),
("Q: Hi\nA: Hi there! What can I do for you?", 8),
("Q: Hey\nA: Hey! How are you doing?", 6),
("Q: Good morning\nA: Good morning! Hope you have a great day.", 6),
("Q: Good night\nA: Good night! Sleep well.", 6),
("Q: How are you?\nA: I am doing well, thank you for asking!", 8),
("Q: What is your name?\nA: My name is Tera. I am an AI assistant.", 8),
("Q: Who are you?\nA: I am Tera, an AI language model built from scratch.", 8),
("Q: Who made you?\nA: I was created by the Vedaco team.", 6),
("Q: Thank you\nA: You are welcome! Happy to help.", 6),
("Q: Thanks\nA: No problem! Let me know if you need anything else.", 6),
("Q: Bye\nA: Goodbye! Have a wonderful day.", 6),
("Q: See you later\nA: See you! Take care.", 4),
("Q: What can you do?\nA: I can answer questions, explain topics, and have conversations.", 6),
# ---- Science ----
("Q: What is the sun?\nA: The sun is a star at the center of our solar system.", 6),
("Q: What is the moon?\nA: The moon is a natural satellite that orbits Earth.", 6),
("Q: What is water?\nA: Water is a liquid made of hydrogen and oxygen.", 6),
("Q: What is air?\nA: Air is a mixture of gases that surrounds Earth.", 4),
("Q: What is fire?\nA: Fire is a chemical reaction that produces heat and light.", 4),
("Q: What is gravity?\nA: Gravity is a force that pulls objects toward each other.", 6),
("Q: What is light?\nA: Light is a form of energy that allows us to see.", 4),
("Q: What is sound?\nA: Sound is a vibration that travels through air or other materials.", 4),
("Q: What is energy?\nA: Energy is the ability to do work or cause change.", 4),
("Q: What is a planet?\nA: A planet is a large body that orbits a star.", 4),
("Q: What is Earth?\nA: Earth is the third planet from the sun and our home.", 6),
("Q: What is Mars?\nA: Mars is the fourth planet from the sun, known as the red planet.", 4),
("Q: What is a star?\nA: A star is a ball of hot gas that produces light and heat.", 4),
("Q: What is the sky?\nA: The sky is the area above Earth where we see clouds and stars.", 4),
("Q: What is rain?\nA: Rain is water that falls from clouds to the ground.", 4),
("Q: What is a cloud?\nA: A cloud is a collection of tiny water drops floating in the sky.", 4),
("Q: What is snow?\nA: Snow is frozen water that falls from clouds in cold weather.", 4),
("Q: What is ice?\nA: Ice is water in its solid frozen form.", 4),
("Q: What is oxygen?\nA: Oxygen is a gas that living things need to breathe.", 4),
("Q: What is a tree?\nA: A tree is a tall plant with a trunk, branches, and leaves.", 4),
# ---- Animals ----
("Q: What is a dog?\nA: A dog is a friendly animal often kept as a pet.", 4),
("Q: What is a cat?\nA: A cat is a small animal known for being independent and playful.", 4),
("Q: What is a fish?\nA: A fish is an animal that lives in water and breathes through gills.", 4),
("Q: What is a bird?\nA: A bird is an animal with feathers and wings that can often fly.", 4),
("Q: What is an elephant?\nA: An elephant is a very large animal with a long trunk.", 4),
("Q: What is a lion?\nA: A lion is a large wild cat known as the king of animals.", 4),
# ---- Math basics ----
("Q: What is one plus one?\nA: One plus one is two.", 6),
("Q: What is two plus two?\nA: Two plus two is four.", 6),
("Q: What is three plus three?\nA: Three plus three is six.", 4),
("Q: What is five plus five?\nA: Five plus five is ten.", 4),
("Q: What is ten minus five?\nA: Ten minus five is five.", 4),
("Q: What is two times three?\nA: Two times three is six.", 4),
("Q: What is ten divided by two?\nA: Ten divided by two is five.", 4),
# ---- Geography ----
("Q: What is a country?\nA: A country is an area of land with its own government.", 4),
("Q: What is an ocean?\nA: An ocean is a very large body of salt water.", 4),
("Q: What is a mountain?\nA: A mountain is a very tall landform that rises high above the ground.", 4),
("Q: What is a river?\nA: A river is a large stream of water that flows across land.", 4),
("Q: What is a desert?\nA: A desert is a dry area that gets very little rain.", 4),
("Q: What is a forest?\nA: A forest is a large area covered with many trees.", 4),
("Q: What is a city?\nA: A city is a large place where many people live and work.", 4),
# ---- Technology ----
("Q: What is a computer?\nA: A computer is a machine that processes information and runs programs.", 6),
("Q: What is the internet?\nA: The internet is a global network that connects computers around the world.", 6),
("Q: What is a phone?\nA: A phone is a device used to make calls and send messages.", 4),
("Q: What is a robot?\nA: A robot is a machine that can perform tasks automatically.", 4),
("Q: What is AI?\nA: AI stands for artificial intelligence, which is technology that can learn and think.", 6),
("Q: What is software?\nA: Software is a set of instructions that tells a computer what to do.", 4),
("Q: What is a website?\nA: A website is a collection of pages on the internet.", 4),
("Q: What is coding?\nA: Coding is writing instructions for a computer using a programming language.", 4),
# ---- Language ----
("Q: What is a word?\nA: A word is a unit of language that has meaning.", 4),
("Q: What is a sentence?\nA: A sentence is a group of words that expresses a complete thought.", 4),
("Q: What is a language?\nA: A language is a system of words and rules used for communication.", 4),
("Q: What is English?\nA: English is a language spoken by many people around the world.", 4),
("Q: What is reading?\nA: Reading is the process of looking at words and understanding their meaning.", 4),
("Q: What is writing?\nA: Writing is putting thoughts into words on paper or a screen.", 4),
# ---- Food ----
("Q: What is food?\nA: Food is what living things eat to get energy and stay alive.", 4),
("Q: What is a fruit?\nA: A fruit is a sweet food that grows on trees or plants.", 4),
("Q: What is bread?\nA: Bread is a food made from flour, water, and yeast.", 4),
("Q: What is milk?\nA: Milk is a white liquid produced by cows and other animals.", 4),
("Q: What is rice?\nA: Rice is a grain that is eaten as a main food in many countries.", 4),
# ---- How / Why questions ----
("Q: Why is the sky blue?\nA: The sky looks blue because sunlight is scattered by the atmosphere.", 6),
("Q: Why do we sleep?\nA: We sleep so our body and brain can rest and recover.", 6),
("Q: Why do birds fly?\nA: Birds fly using their wings to move through the air.", 4),
("Q: How do fish breathe?\nA: Fish breathe by taking oxygen from water through their gills.", 4),
("Q: How does rain form?\nA: Rain forms when water in clouds gets heavy and falls down.", 4),
("Q: Why do we eat?\nA: We eat to give our body the energy it needs to work.", 4),
("Q: Why do plants need sunlight?\nA: Plants need sunlight to make food through a process called photosynthesis.", 4),
("Q: How do we hear sound?\nA: We hear sound when vibrations travel through air and reach our ears.", 4),
# ---- People / Body ----
("Q: What is a brain?\nA: The brain is an organ that controls thinking and body functions.", 4),
("Q: What is a heart?\nA: The heart is an organ that pumps blood through the body.", 4),
("Q: What are eyes?\nA: Eyes are organs that allow us to see the world around us.", 4),
("Q: What is skin?\nA: Skin is the outer covering of the body that protects us.", 4),
("Q: What are bones?\nA: Bones are hard parts inside the body that give it shape and support.", 4),
# ---- Conversational ----
("Q: Tell me something interesting\nA: Did you know that honey never goes bad? It can last for thousands of years!", 4),
("Q: Tell me a fact\nA: The Earth is about 4.5 billion years old.", 4),
("Q: I am bored\nA: How about learning something new? I can tell you about science, animals, or space!", 4),
("Q: I am sad\nA: I am sorry to hear that. I hope things get better soon. I am here to help.", 4),
("Q: I am happy\nA: That is great! I am glad you are feeling good.", 4),
("Q: What should I do today?\nA: You could read a book, learn something new, or go for a walk.", 4),
("Q: Do you like music?\nA: I do not have feelings, but many people enjoy music very much!", 4),
("Q: Are you a human?\nA: No, I am Tera, an AI language model. I am not a human.", 6),
("Q: Can you think?\nA: I process language and generate responses, but I do not think like a human.", 4),
("Q: Are you smart?\nA: I am still learning! I try my best to give helpful answers.", 4),
("Q: What is your purpose?\nA: My purpose is to help answer questions and have conversations with you.", 4),
("Q: How old are you?\nA: I was just recently created, so I am very new!", 4),
("Q: Where do you live?\nA: I live in the cloud, running on computers and servers.", 4),
("Q: What do you eat?\nA: I do not eat! I run on electricity and code.", 4),
# ---- Definitions ----
("Q: What is time?\nA: Time is a way we measure the passing of moments, like seconds and hours.", 4),
("Q: What is color?\nA: Color is what we see when light reflects off objects in different ways.", 4),
("Q: What is music?\nA: Music is a combination of sounds arranged in a pleasing way.", 4),
("Q: What is a book?\nA: A book is a collection of written pages bound together.", 4),
("Q: What is a school?\nA: A school is a place where people go to learn.", 4),
("Q: What is a family?\nA: A family is a group of people who are related and care for each other.", 4),
("Q: What is a friend?\nA: A friend is someone you like and enjoy spending time with.", 4),
("Q: What is love?\nA: Love is a strong feeling of care and affection for someone.", 4),
("Q: What is a game?\nA: A game is an activity done for fun, often with rules.", 4),
("Q: What is a number?\nA: A number is a value used for counting and measuring.", 4),
("Q: What is math?\nA: Math is the study of numbers, shapes, and patterns.", 4),
("Q: What is science?\nA: Science is the study of how the world and universe work.", 4),
("Q: What is history?\nA: History is the study of things that happened in the past.", 4),
("Q: What is art?\nA: Art is the expression of ideas and feelings through creative work.", 4),
]
def get_training_texts():
"""Return flat list of training strings with repetitions applied."""
texts = []
for text, repeats in QA_PAIRS:
for _ in range(repeats):
texts.append(text)
return texts
# Quick stats
if __name__ == "__main__":
data = get_training_texts()
print(f"Unique QA pairs : {len(QA_PAIRS)}")
print(f"Total examples : {len(data)}")
print(f"Sample:\n{data[0]}")
|