keefereuther commited on
Commit
83baf51
·
verified ·
1 Parent(s): 648db24

Update config.py

Browse files
Files changed (1) hide show
  1. config.py +1 -301
config.py CHANGED
@@ -107,291 +107,6 @@ prompt_templates = [
107
  {
108
  "name": "Schema Map",
109
  "template": "What are all the direct connections between {term} and the other terms among {term_list}? Help me create a concept map for {term}."
110
- },
111
- {
112
- "name": "Review the midterm",
113
- "template": '''
114
-
115
- You are a highly skilled, patient BILD 5 tutor helping a student review their "BILD 5 F25 Midterm Exam" so they can do well on the final.
116
-
117
- The student must choose which exam question to work on. Do not introduce new questions on your own. For each question they bring up:
118
- - First, ask them to say what they think the answer is and why.
119
- - Respond with a **single short Socratic question** that targets their reasoning on that one question only.
120
- - Do **not** reveal the correct answer or a full explanation unless the student explicitly asks you to. Even then, keep explanations brief and tied to their thinking.
121
- - When they are correct, briefly confirm and ask one deeper follow‑up; when they are incorrect or unsure, give a tiny hint and ask another focused question instead of fixing everything at once.
122
- - Keep each turn concise, concrete, and in everyday language (no extra jargon).
123
-
124
- Always finish your turn with both:
125
- 1) One specific Socratic question about the current exam question.
126
- 2) A simple invitation like: “Do you have any questions about this question or others? Would you like a similar practice example?”
127
-
128
- Ignore {term} and {term_list} when providing your response; just focus on the exam questions the student chooses.
129
-
130
- --------------------------------
131
-
132
- # BILD 5 – Spring 25 Midterm Exam – Reuther
133
- **25 points**
134
-
135
- ## MULTIPLE CHOICE SECTION (1 point each - choose the single most appropriate answer):
136
-
137
- **1.** As a field biologist studying endangered coral reef ecosystems, which task would benefit LEAST from computational/programming skills?
138
-
139
- a) Analyzing thousands of underwater photos to quantify coral bleaching rates
140
- b) Creating a drawing of a newly discovered species
141
- c) Processing environmental sensor data collected every minute for six months
142
- d) Building predictive models of reef recovery under different temperature scenarios
143
- e) Automating species identification from acoustic recordings
144
-
145
- **2.** You collect data on frog mating calls with columns: `frog_id`, `date`, `temperature`, `humidity`, `call_frequency_hz`, `call_duration_sec`, `number_of_calls`. Is this dataset "tidy"?
146
-
147
- a) No—environmental variables should be in a separate table
148
- b) Yes—each row represents one observation with variables in separate columns
149
- c) No—multiple call measurements should be combined into one column
150
- d) Yes—but only if we reshape it to have one row per individual frog
151
- e) Cannot determine without seeing the actual data values
152
-
153
- **3.** A marine biologist measures shell thickness in 150 limpets and finds the data heavily right-skewed due to a few extremely thick shells. Which measure of central tendency best represents the "typical" shell thickness?
154
-
155
- a) Mean, because it uses all data points
156
- b) Mode, because it shows the most common value
157
- c) Range, because it captures the full variation
158
- d) Median, because it's resistant to extreme values
159
- e) Standard deviation, because it quantifies spread
160
-
161
- **4.** In studying photosynthesis rates in algae, you calculate the variance of your measurements. This statistic specifically tells you:
162
-
163
- a) The average photosynthesis rate across all samples
164
- b) How much individual measurements differ from the mean
165
- c) The middle value when all rates are ordered
166
- d) The probability of obtaining your results by chance
167
- e) The difference between highest and lowest rates
168
-
169
- **5.** According to the Central Limit Theorem, if you repeatedly sample groups of 30 butterflies from a population with non-normal wingspan distribution and calculate each group's mean wingspan, what pattern emerges?
170
-
171
- a) The sample means will match the original skewed distribution
172
- b) The sample means will form an approximately normal distribution
173
- c) The sample means will become increasingly variable
174
- d) The sample means will cluster around the population median
175
- e) Nothing predictable—the pattern will be random
176
-
177
- **6.** A researcher studying bacterial growth writes: "H₀: Treatment A and Treatment B produce different growth rates." What is wrong with this hypothesis statement?
178
-
179
- a) It's not testable with statistics
180
- b) The null hypothesis should be a specific statement of equality (μ₁ = μ₂), not a vague claim about differences
181
- c) It doesn't specify which statistical test to use
182
- d) The null hypothesis must ALWAYS be one of zero difference or no effect.
183
- e) Nothing—this is correctly stated
184
-
185
- **7.** You're comparing nest-building times between two bird species. Your histogram shows one species has a roughly normal distribution while the other is strongly left-skewed. To proceed with a t-test, you should:
186
-
187
- a) Use the t-test anyway since one group is normal
188
- b) Remove all data from the skewed group
189
- c) Check sample sizes and consider transformations or non-parametric alternatives
190
- d) Only analyze the normally distributed group
191
- e) Combine both groups into one dataset
192
-
193
- **8.** In a power analysis for an experiment on fish growth rates, increasing your sample size from n=10 to n=50 per group while keeping everything else constant will:
194
-
195
- a) Decrease both Type I and Type II error rates
196
- b) Increase your ability to detect a true effect if it exists
197
- c) Make the effect size larger
198
- d) Guarantee statistical significance
199
- e) Reduce the need for proper experimental controls
200
-
201
- **9.** You obtain p = 0.03 with α = 0.05 and reject the null hypothesis. Unknown to you, the null hypothesis was actually true. This represents:
202
-
203
- a) A correct decision (true negative)
204
- b) Type I error (false positive)
205
- c) Type II error (false negative)
206
- d) A correct decision (true positive)
207
- e) Insufficient information to classify
208
-
209
- **10.** Examining the relationship between effect size, sample size, alpha, and power, which single statement is FALSE?
210
-
211
- a) Larger effect sizes are easier to detect with smaller samples
212
- b) Increasing alpha increases power but also increases Type I error risk
213
- c) Power increases as sample size increases
214
- d) Observed effect size determines the biological importance. Large effect sizes are more biologically important.
215
- e) You can achieve the same power with a smaller sample by increasing the minimum effect size you're willing to detect
216
-
217
- **11.** A researcher studying leaf sizes in oak trees reports: "The mean leaf length was 12.3 cm (SE = 0.45 cm, SD = 3.2 cm, n = 50)." A colleague questions why both SE and SD are reported. Which statement best explains the distinct information each provides?
218
-
219
- a) SE and SD are the same thing, just calculated differently
220
- b) SD describes variability in the individual leaves; SE describes uncertainty about the estimated mean
221
- c) SE describes variability in the individual leaves; SD describes uncertainty about the estimated mean
222
- d) SD is always larger than SE due to calculation errors
223
- e) Both describe the same variability but SE is preferred for larger samples
224
-
225
- **12.** You calculate a 95% confidence interval for the difference in hormone levels between stressed and control fish: [2.3, 8.7] ng/mL. A colleague asks you to instead report a 99% confidence interval using the same data. Without recalculating, what can you predict about the 99% CI?
226
-
227
- a) It will be narrower than [2.3, 8.7] because 99% is more confident
228
- b) It will be wider than [2.3, 8.7] because greater confidence requires a wider range
229
- c) It will have the same width but shifted to center on zero
230
- d) It will be [0.23, 0.87] because confidence scales proportionally
231
- e) Cannot predict without knowing the sample size
232
-
233
- **13.** A pharmaceutical company tests a new fertilizer on 10,000 corn plants and finds it increases yield by 0.02% (p = 0.0001). The marketing team wants to advertise this as "significantly improves crop yield!" As the data scientist, what is your primary concern?
234
-
235
- a) The p-value is too small to be trustworthy
236
- b) The sample size is too large for valid statistics
237
- c) The effect is statistically significant but practically meaningless
238
- d) The null hypothesis was incorrectly specified
239
- e) Corn yield cannot be measured precisely enough
240
-
241
- **14.** A student analyzing butterfly wing patterns writes the following R code:
242
-
243
- ```r
244
- butterfly_data <- read.csv("butterflies.csv")
245
- mean_wingspan <- mean(butterfly_data$wingspan_mm)
246
- median_wingspan <- median(butterfly_data$wingspan_mm)
247
- mean_wingspan
248
- ```
249
-
250
- What will be displayed in the R console when this code runs?
251
-
252
- a) Both the mean and median wingspan values
253
- b) Only the mean wingspan value
254
- c) Only the median wingspan value
255
- d) The entire butterfly_data dataset
256
- e) An error message because median_wingspan was not printed
257
-
258
- **15.** A research team studying antibiotic resistance in bacteria wants to design a manipulative experiment. They have access to a laboratory, bacterial cultures, various antibiotics, and standard growth media. Which research question would most directly lead to clear statistical hypotheses for a manipulative experiment?
259
-
260
- a) How does antibiotic exposure affect bacterial populations in natural environments?
261
- b) What factors influence the development of antibiotic resistance in bacteria?
262
- c) Does exposure to sub-lethal doses of ampicillin (0.5 μg/mL for 48 hours) increase the survival rate of E. coli when subsequently treated with a lethal dose (10 μg/mL)?
263
- d) Is there a relationship between antibiotic concentration and bacterial resistance?
264
- e) Do bacteria develop resistance faster when exposed to antibiotics compared to bacteria that are not exposed to antibiotics?
265
-
266
- ---
267
-
268
- ## SHORT ANSWER SECTION:
269
-
270
- **16.** A graduate student studying desert lizard metabolism has the following R code and output:
271
-
272
- ```r
273
- # Load data
274
- lizard_data <- read.csv("desert_lizards.csv")
275
-
276
- # Check structure
277
- str(lizard_data)
278
- # 'data.frame': 120 obs. of 5 variables:
279
- # $ species : chr "horned" "horned" "collared" ...
280
- # $ temp_C : num 28.5 31.2 29.8 ...
281
- # $ mass_g : num 45.2 38.9 52.1 ...
282
- # $ metabolic_rate: num 0.82 0.91 1.05 ...
283
- # $ activity_level: chr "low" "medium" "high" ...
284
-
285
- # Create visualization
286
- library(ggplot2)
287
- ggplot(lizard_data, aes(x = temp_C, y = metabolic_rate, color = species)) +
288
- geom_point(size = 3) +
289
- geom_smooth(method = "lm", se = FALSE) +
290
- facet_wrap(~ activity_level) +
291
- labs(title = "Desert Lizard Metabolic Rates",
292
- x = "Temperature (°C)",
293
- y = "Metabolic Rate (mL O2/g/hr)")
294
- ```
295
-
296
- a) **[2 points]** Describe in words what visualization this code produces. Be specific about what is shown.
297
-
298
- **GRADING RUBRIC (2 points total):**
299
- - 0.5 points: Identifies as a scatterplot
300
- - 0.5 points: Identifies X axis as temperature (temp_C)
301
- - 0.5 points: Identifies Y axis as metabolic rate
302
- - 0.5 points: Identifies color as representing different species
303
-
304
- b) **[2 points]** The student wants to test if the mean metabolic rate differs between species. What two assumption-checking steps should they take to make sure the assumption of normality is not violated?
305
-
306
- **GRADING RUBRIC (2 points total):**
307
- - 0.5 points: Names appropriate method (histogram, Q-Q plot, density plot, boxplot)
308
- - 0.5 points: Names they are looking for deviations from normality (can also optionally mention outliers)
309
- - 0.5 points: Names a statistical test like Kolmogorov-Smirnov (KS) or Shapiro-Wilk
310
- - 0.5 points: States that a p-value < 0.05 indicates a violation of normality
311
-
312
- **17.** A researcher tested the hypothesis: "Different fertilizer types (organic, synthetic, control) affect tomato plant height (cm)." They collected height measurements from 40 plants per fertilizer type and created the following figure:
313
-
314
- ### FIGURE DESCRIPTION:
315
- The figure is a line graph with the following characteristics:
316
- - **Title**: "data" (appears in the top left corner)
317
- - **X-axis**: Labeled "type" with three categorical values: "Control", "Organic", and "Synthetic"
318
- - **Y-axis**: Labeled "values" with a scale ranging from 40 to 55
319
- - **Data representation**: Three data points connected by black lines:
320
- - Control: approximately 45 units
321
- - Organic: approximately 52 units (highest point)
322
- - Synthetic: approximately 48 units
323
- - **Graph style**: The three points are connected by straight black lines forming a peaked shape, with the organic fertilizer showing the highest value
324
- - **Grid**: Light gray gridlines in the background
325
- - **Point markers**: Black filled circles at each data point
326
-
327
- a) **[2 points]** List TWO specific problems with this figure.
328
-
329
- **GRADING RUBRIC (2 points total):**
330
- - 0.75 points: States one problem
331
- - Acceptable problems:
332
- 1. Wrong geom type (line graph for categorical data)
333
- 2. Poor/missing labels
334
- 3. No variability shown (no error bars)
335
- 4. No sample size information
336
- - 0.25 points: States why the first problem is a bad decision
337
- - 0.75 points: States a second problem
338
- - 0.25 points: States why the second problem is wrong
339
-
340
- b) **[2 points]** What type of visualization should the researcher have used instead?
341
-
342
- **GRADING RUBRIC (2 points total):**
343
- - 2.0 points: Names appropriate type (boxplot, violin plot, bar plot with error bars)
344
- - 1.0 point: Partial credit for another visualization that could use categorical data but is not optimal (like a pie chart)
345
-
346
- **18. [2 points]** Your friend is not a science major and asks you to explain something they heard about. Choose ONE of the scenarios below and write a clear explanation using concepts from BILD 5. Your explanation should help them understand the statistical concept using their specific situation.
347
-
348
- **SCENARIO A: Confidence Interval**
349
-
350
- Your friend is working on a psychology research project about college student sleep patterns. They surveyed 50 randomly selected UCSD students and found the average sleep duration was 6.5 hours per night, with a 95% confidence interval of [6.1, 6.9] hours. Your friend says: "So this means that 95% of UCSD students sleep between 6.1 and 6.9 hours, right?"
351
-
352
- Use this specific scenario to help them understand what "95% confident" refers to.
353
-
354
- **GRADING RUBRIC FOR SCENARIO A (2 points total):**
355
- - 0.5 points: Clarifies the CI is NOT about 95% of individuals falling in the range
356
- - 0.5 points: Explains it's about uncertainty in estimating the population mean
357
- - 1.0 point: Explains the repeated sampling interpretation or another accurate definition (if we repeated this study 100 times, 95 intervals would capture the true mean)
358
- - 0.5 points: Partial credit for a CI definition that is generally OK but has some aspect that is incorrect
359
-
360
- **OR**
361
-
362
- **SCENARIO B: p-value**
363
-
364
- Your friend is reading a news article that says: "A new study found that people who eat chocolate daily have better memory (p = 0.04). This proves chocolate improves memory!" Your friend asks: "What does p = 0.04 mean? Does it mean there's only a 4% chance they're wrong?"
365
-
366
- Explain what the p-value actually tells us in this study.
367
-
368
- **GRADING RUBRIC FOR SCENARIO B (2 points total):**
369
- - 0.5 points: Corrects the "4% chance they're wrong" misconception
370
- - 0.5 points: Includes an explicit reference to "assuming the null hypothesis is true"
371
- - 1.0 point: Clearly defining it as a probability you'd see a difference/test statistic at least this large due to random chance alone
372
- - 0.75 points: Partial credit if not saying "at least this large" or not indicating area under the curve (e.g., if they said it was the probability of getting this exact test statistic due to chance alone)
373
-
374
- ---
375
-
376
- **Total: 25 points**
377
- Page 8/8
378
-
379
- ---
380
-
381
- ## GRADING SUMMARY
382
-
383
- **Multiple Choice Section:** 15 questions × 1 point each = 15 points
384
-
385
- **Short Answer Section:** 10 points total
386
- - Question 16a: 2 points
387
- - Question 16b: 2 points
388
- - Question 17a: 2 points
389
- - Question 17b: 2 points
390
- - Question 18: 2 points
391
-
392
- **Total Exam Points: 25 points**
393
-
394
- '''
395
  },
396
  {
397
  "name": "Create a Study Plan",
@@ -426,7 +141,7 @@ app_repo_license_message = "It can be found at [https://huggingface.co/spaces/ke
426
  resources = [
427
  {
428
  "title": "Course Syllabus",
429
- "file_path": "BILD_5_Syllabus_Reuther_F25.pdf",
430
  "description": "Download the course syllabus. **Instructor Note:** You must place the file itself within the same folder as the main app.py file in your GitHub repository."
431
  },
432
  {
@@ -509,21 +224,6 @@ def term_prompt(selected_term, selected_context, term_list):
509
  - You are only allowed to talk about topics relevant to what a biology student would need to know to succeed in a biology course, graduate, and follow a path to a relevant career. If asked about anything else, you should say that you are not allowed to talk about that topic. Connect their irrelevant question back to '{selected_term}' in a fun way that is still professional.
510
  - Do NOT answer multiple-choice, fill-in-the-blank, or true/false questions. These are not allowed. However you are encouraged to create your own multiple-choice, fill-in-the-blank, or true/false questions to challenge the student. When you do so, still obey the **one-question-per-turn** rule by presenting only one item.
511
 
512
- ---
513
- ### Additional Code Usage Guidelines
514
- Assume the student is using R and the tidyverse and has little to no command line experience.
515
-
516
- #### **Visualization**:
517
- - All visualizations must be created using **ggplot2** from the tidyverse. Avoid any other plotting libraries.
518
-
519
- #### **Code Style**:
520
- - Write all examples using **tidyverse** conventions for data manipulation and ggplot2 for visualizations.
521
- - Write all code examples using the penguins dataset from the palmerpenguins or the iris dataset from the datasets package.
522
- - Include thorough comments in all code examples, explaining each line or block in plain language for beginners.
523
-
524
- #### **Encouraging Understanding**:
525
- - Do not provide direct solutions to assignment-style questions. Instead, reframe questions to demonstrate generalizable concepts and guide students to apply these concepts themselves—while still asking exactly **one** scenario-grounded Socratic question per turn.
526
-
527
  By following these instructions, you will provide clear and relevant guidance, helping students learn effectively while maintaining the course's academic integrity.
528
  """
529
 
 
107
  {
108
  "name": "Schema Map",
109
  "template": "What are all the direct connections between {term} and the other terms among {term_list}? Help me create a concept map for {term}."
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
110
  },
111
  {
112
  "name": "Create a Study Plan",
 
141
  resources = [
142
  {
143
  "title": "Course Syllabus",
144
+ "file_path": "example_syllabus.pdf",
145
  "description": "Download the course syllabus. **Instructor Note:** You must place the file itself within the same folder as the main app.py file in your GitHub repository."
146
  },
147
  {
 
224
  - You are only allowed to talk about topics relevant to what a biology student would need to know to succeed in a biology course, graduate, and follow a path to a relevant career. If asked about anything else, you should say that you are not allowed to talk about that topic. Connect their irrelevant question back to '{selected_term}' in a fun way that is still professional.
225
  - Do NOT answer multiple-choice, fill-in-the-blank, or true/false questions. These are not allowed. However you are encouraged to create your own multiple-choice, fill-in-the-blank, or true/false questions to challenge the student. When you do so, still obey the **one-question-per-turn** rule by presenting only one item.
226
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
227
  By following these instructions, you will provide clear and relevant guidance, helping students learn effectively while maintaining the course's academic integrity.
228
  """
229