Spaces:
Sleeping
Sleeping
Upload final_running.csv
Browse files- final_running.csv +150 -0
final_running.csv
ADDED
|
@@ -0,0 +1,150 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
TERM,CONTEXT
|
| 2 |
+
general course syllabus questions,"# NOTE TO SYSTEM - for this term you are only allowed to pull information directly from this context and not your own knowledge base as an LLM model. If you do not know an answer to a question you must state that you are uncertain of the answer and to contact Keefe or post the question on Piazza --- ## **COURSE DESCRIPTION \- *Data Analysis and Design for Biologists (4 credits)*** This course is a practical introduction to information literacy, experimental design, and data analysis for biologists. Students will be introduced to coding, data management, and quantitative analysis using the R programming language and the RStudio IDE. However, this is not a traditional statistics course and no math prerequisites are required. Rather this course focuses on practical skills related to effectively asking and answering biological questions with data. ## **COURSE LEARNING OUTCOMES** 1. *Create* testable hypotheses addressing valid biological questions. 2. *Evaluate* the credibility and value of relevant scientific information. 3. *Design* experiments that effectively test hypotheses. 4. *Construct* figures that effectively communicate data. 5. *Perform* appropriate quantitative and statistical analyses on experimental data. 6. *Interpret* the results of quantitative statistical models and associated analyses. 7. *Utilize* the R programming language for scientific data analysis and graphing. 8. *Combine* the elements of a complete investigative cycle in a student designed project. 9. *Explore* the modern intersection between different subfields of biology, technology, and data science. 10. *Examine* the ethical responsibilities of scientists when creating and communicating scientific evidence. ## ## **CONTACT AND SCHEDULING INFO** Dr. Keefe Reuther (he/him/his) **(Please call me Keefe)** [Email address: kdreuther@ucsd.edu](mailto:kdreuther@ucsd.edu) (please put BILD 5 in the subject line) **LECTURE TIME** MWF 2:00-2:50 pm Center Hall Room 101 **FINAL EXAM TIME** Friday, June 13, 2025; 3:00p-6:00p PST \- **THIS IS MANDATORY IN-PERSON** **INSTRUCTIONAL ASSISTANTS:** **NAME EMAIL** Yanlin yal037@ucsd.edu RM tkobbekaduwa@ucsd.edu Mitchell mis033@ucsd.edu Saranya savohra@ucsd.edu **SECTION MEETING TIMES** A01 M 4:00p \- 4:50p WLH 2205 A02 W 1:00p \- 1:50p Center Hall 222 A03 W 8:00a \- 8:50a WLH 2205 A04 F 4:00p \- 4:50p WLH 2205 KEEFE’S OFFICE HOURS Wednesday 12:00p-1:30p Location TBA Friday 3:00-4:00pm Location TBA PREREQUISITES None\! And you don’t need any experience coding or working in a lab\! QUESTIONS? This term we will be using Piazza for class discussion. The system is highly catered to getting you help fast and efficiently from classmates, the TA, and myself. Rather than emailing questions to the teaching staff, I encourage you to post your questions on Piazza. If you have any problems or feedback for the developers, email team@piazza.com. Find our class signup link at: [https://piazza.com/ucsd/spring2025/bild5\_sp25\_a00](https://piazza.com/ucsd/spring2025/bild5_sp25_a00) ## TECHNOLOGY REQUIREMENTS You will need access to a device that can access a web browser (e.g. Chrome, Safari, Firefox). This will be to access Canvas, Zoom, and the UCSD DataHub. While any connected device can typically accomplish this (smart phone, tablet, laptop), it is highly recommended that you use a laptop or a desktop computer for connecting to the UCSD DataHub. Trust me, you don’t want to write code from your phone\! Note that Chromebooks work perfectly well for this course. You are allowed to bring your technology into lecture \- *however it is not necessary*. Since section activities are based around R coding, it is recommended that you bring your laptop. There are resources on campus available if you have tech needs. Please visit: [https://libraries.ucsd.edu/computing-and-technology/computers-and-laptop-stations/index.html](https://libraries.ucsd.edu/computing-and-technology/computers-and-laptop-stations/index.html) ## COURSE CALENDAR | Week | Lecture Topics | | :---: | :---- | | 1 | Why are science, statistics, and programming necessary? | | 1 | Ok, but why are they necessary in biology? | | 1 | Data Part 1: types and structures/variables and samples | | 2 | Data Part 2: The importance of visualizing data | | 2 | | | 2 | Data Part 3: Central tendency and variation | | 3 | | | 3 | Data Part 4: Normality \- the Central Limit Theorem and confidence intervals | | 3 | | | 4 | Hypothesis Testing Part 1: Writing a clear alternative and null | | 4 | Hypothesis Testing Part 2: Testing your assumptions | | 4 | Hypothesis Testing Part 3: Test statistics and quantifying differences | | 5 | | | 5 | Hypothesis Testing Part 4: How power, p values, effect size, and sample size affect uncertainty \- exploring the t-test | | 5 | | | 6 | **MIDTERM** | | 6 | Hypothesis Testing Part 5: ANOVA and post-hoc testing | | 6 | Hypothesis Testing Part 6: Correlation and quantifying similarities | | 7 | Hypothesis Testing Part 7: Making predictions with linear regression | | 7 | Experimental Design Part 1 \- Which design to choose? | | 7 | | | 8 | Experimental Design Part 2 \- The importance of sampling | | 8 | | | 8 | Experimental Design Part 3 \- Practical and ethical considerations | | 9 | **UNIVERSITY HOLIDAY** | | 9 | What’s next? multivariate models, resampling, ordination, bioinformatics… Labs, research, careers, and the issues facing modern science culture. | | 9 | | | 10 | | | 10 | Catch up, review, and term project help | | 10 | | ## ## DISCUSSION SECTION TOPICS # | Week 1 | Introduction to Datahub and RStudio \- “Hello World” | | :---- | :---- | | Week 2 | Importing data, saving files, making objects | | Week 3 | Data visualization | | Week 4 | Tidyverse and data wrangling/cleaning | | Week 5 | Review | | Week 6 | Test for normality; t-test | | Week 7 | ANOVA | | Week 8 | Linear regression and correlation | | Week 9 | Bringing it all together | | Week 10 | Work on term project | ## ## DELIVERABLES # | All due times are 11:59pm PST; See Canvas for specific due dates | | | ----- | :---- | | Week 1 | Extra Credit Surveys (3x \- Pre); Discussion Board Prompt 1; Section Assignment 1 | | Week 2 | Quiz 1, Section Assignment 2 | | Week 3 | Discussion Board Prompt 2; Section Assignment 3 | | Week 4 | Quiz 2, Section Assignment 4 | | Week 5 | Discussion Board Prompt 3; Section assignment makeup (wks 1-4) | | Week 6 | Midterm, Section Assignment 5 | | Week 7 | Discussion Board Prompt 4; Section Assignment 6 | | Week 8 | Quiz 3; Section Assignment 7; Term Project CP \#1 | | Week 9 | Discussion Board Prompt 5; Section Assignment 8; Term Project CP \#2 | | Week 10 | Quiz 4; Section Assignment 9; Term Project; SET Evaluations | | Finals Week | Final Exam; Extra Credit Surveys (3x (Post) ## **GRADING** | Lecture participation (9 missed lecture classes OK) | 5% | | :---- | :---- | | Quizzes (4x \- drop lowest score) | 15% | | Section Assignments (drop lowest score) | 20% | | Discussion Board Prompts (drop lowest score) | 10% | | Term Project | 20% | | Midterm | 10% | | Final Exam | 20% | | 3x Pre and post course surveys & SETs | up to 1% in Extra Credit | **POLICY ON COLLABORATION** Working together is good\! Science is a social act and we want this course to mirror the real world of biology. That being said, we also need to adhere to our pledge to act with integrity. Therefore, you may help each other in general. This means explaining concepts, definitions, processes, etc. to each other. You may also talk about and share code with each other. Copying and pasting code is an everyday tactic. However, ***your final answers and responses must be your own and you are responsible for understanding and creating/evaluating everything you submit for this course.*** There is to be absolutely no sharing of answers. We will frequently ask for you to annotate your code and explain what your code is doing. This, as well as all other written work, must be original. All assignments (including RMarkdown pdfs) will be run through a plagiarism checker. At the end of the day, you are here to learn this material so you can be a better biologist. Focus on learning and grades will come as an indirect, wonderful consequence. **DISCUSSION BOARD PROMPTS** During weeks 1, 3, 5, 7, and 9 there will be a discussion prompt placed in the Discussions section of Canvas prior to Monday morning. For a response to count for credit, it must be original, substantive, and properly cited (if necessary). Generally, this means a small paragraph. Replies of “I agree” do not count as substantive. **Any use of generative AI must include a short, single statement that includes the AI tool used (e.g., ChatGPT) and what you received from it (e.g., help identifying an error in my code.).** Your lowest score will be dropped. **QUIZZES** During weeks 2, 4, 8, and 10 there will be a 60 minute Canvas quiz due by 11:59pm PST on Sunday. **The first quiz will include questions about the course syllabus.** Each quiz is non-cumulative and will consist of multiple choice and short answer applied and theoretical questions encompassing the material since the previous quiz. You will not be expected to do any coding or use RStudio during the quiz. Your lowest score will be dropped. **YOU MAY NOT USE AI DURING A QUIZ.** **FINAL EXAM and MIDTERM \- MANDATORY AND IN-PERSON \- YOU MUST TAKE IT AT THE SCHEDULED TIME** - For the final exam, you will have 3 hours for the final exam, even though it will not be written to take the entire time. This will be a cumulative exam mirroring the structure of an extended theory quiz. A study guide with relevant topics will be provided beforehand. You may bring 1 NOTE CARD no larger than 4”x 6” with information of your choosing on both sides. It can be printed or handwritten. It cannot be shared or sold to others. It will be counted as your section assignment 10 grade. - The midterm will be a miniature version of the final, held in-class for 50 minutes. Practice exams are available on Canvas. **TERM PROJECT** This project will allow you to go through an entire investigative cycle on your own, from the design of your own question through being provided with simulated data to analyze, interpret and report. You will receive instructor feedback after each step. Please take heed of the feedback as grading will get progressively more stringent. See individual rubrics on Canvas for more information. Each step should be adequately researched and cited using core principles of scientific literacy. While the data is fake, your project should be realistic, relevant, and at least moderately original. This should be a product that you can put in your portfolio for future interviews. Who knows? Maybe it’ll inspire your next research project in graduate school\! **EXTRA CREDIT** This can be earned by completing course evaluations and related surveys which aim to improve the course and the educational experiences of your future peers. There are no other opportunities for extra credit beyond what is assigned by the course instructor. --- ## GRADING | A+ | 97-100% | B+ | 87-89% | C+ | 77-79% | D+ | 67-69% | F | 0-59% | :---- | :---- | :---- | :---- | :---- | :---- | A | 93-96% | B | 83-86% | C | 73-76% | D | 63-66% | | A- | 90-92% | B- | 80-82% | C- | 70-72% | D- | 60-62% | --- **Grade cut-offs will never be shifted and there is no rounding of points.** --- **LATE ASSIGNMENT POLICY** **Lecture Participation:** Determined by check-ins and check-outs in class via Mentimeter. **Action:** No action is necessary. **You can miss up to 18 check-ins and check-outs (three weeks) and still maintain 100% attendance, regardless of the reason. This is meant to help you in case of illness, family emergency, phone running out of battery, or any other reason you miss class. *Please do not ask me if you can make-up or receive credit for missed attendance.* It is your responsibility to keep track of the number of class sessions you miss. **Quiz, Discussion Board, Final Project:** Submitted late? **Action:** A 2% penalty applies for each hour the assignment is late. Any unexcused assignment turned in more than two days late can’t receive a score higher than 50%. If you face an unavoidable issue (e.g., hospitalization), contact Keefe via email ([**kdreuther@ucsd.edu**](mailto:kdreuther@ucsd.edu)) promptly. Keep in mind, **technical difficulties right before the deadline are not valid reasons for exceptions**. Your lowest scores for discussion board and quizzes will still be dropped. **Midterm & Final Exam:** Can’t attend due to unavoidable circumstances? **Action:** The final exam is in-person, synchronous, and mandatory. It will not be rescheduled unless you qualify for an OSD accommodation or a university sanctioned event being chaperoned by university personnel. If you can’t attend due to an unavoidable issue (e.g., illness), notify Keefe ([**kdreuther@ucsd.edu**](mailto:kdreuther@ucsd.edu)) immediately and follow the University’s incomplete grade policy to make alternate arrangements. This must be done prior to the start time of the final exam. **If your final exam % is higher than your midterm % then your final exam score will replace your lower midterm score.** --- ACADEMIC INTEGRITY POLICY ON GENERATIVE AI *Philosophical Overview:* Generative AI is neutral by nature, neither good nor bad. Its value hinges on how it's applied. We acknowledge AI's potential to both elevate and diminish the academic experience. While it's a powerful tool for the digital age and essential for our future, it doesn't absolve us from upholding academic integrity and opposing plagiarism. *Personal Responsibility and Accountability:* You own your work. AI can assist, but it shouldn't be the main contributor. If your work appears overly dependent on AI, expect an oral quiz to test your understanding. Remember: mastering AI, like any skill, takes effort. Over-relying on it shortchanges your education and has lasting consequences. *Attribution and Documentation:* * **Any use of generative AI must include a short, single statement that includes the AI tool used (e.g., ChatGPT) and what you received from it (e.g., help identifying an error in my code.).** *Disclaimer on Generative AI:* Generative AI, such as LLMs, can sometimes produce misleading or false information. Be especially wary with images. You're accountable for every submission, AI-assisted or not. ALWAYS fact-check AI-generated content before submission. --- **SUPPORT AND RESOURCES:** Need help? Contact your instructor, IAs, or the following UC San Diego support centers: * The Commons’ Academic Achievement Hub: Learning strategies, tutoring, and supplemental instruction. * The Commons’ Writing Hub: Writing and presentation help. * The Library: Research guidance. * The Academic Integrity Office: Queries about ethical GenAI use. --- ## CAMPUS POLICIES * ### [UC San Diego Principles of Community](https://ucsd.edu/about/principles.html) * ### [UC San Diego Policy on Integrity of Scholarship](http://senate.ucsd.edu/Operating-Procedures/Senate-Manual/Appendices/2) * ## [Religious Accommodation](https://senate.ucsd.edu/operating-procedures/educational-policies/courses/epc-policies-on-courses/policy-exams-including-midterms-final-exams-and-religious-accommodations-for-exams/) * ## [Nondiscrimination and Harassment](https://ophd.ucsd.edu/policies-procedures/nps_student.html) * ## [UC San Diego Student Conduct Code](https://students.ucsd.edu/_files/student-conduct/ucsandiego-student-conduct-code_interim-revisions1-16-18.pdf) *Diversity and equity statement* It is important for us to make sure that how we teach this course and how we accommodate different student needs reflects the differences of race, ability, sexual orientation, age, and gender identity that enrich our classroom experience and campus. If you have any concerns related to diversity and equity in the course, please contact the instructor. If you find yourself in an uncomfortable situation, ask for help. The university is committed to upholding policies regarding nondiscrimination, sexual violence, and sexual harassment. --- ## STUDENT RESOURCES | Learning and Academic Support | | | :---- | :---- | | [Ask a Librarian: Library Support](https://library.ucsd.edu/ask-us) *Chat or make an appointment with a librarian to focus on your research needs* [Course Reserves, Connecting from Off-Campus and Research Support](https://library.ucsd.edu/ask-us/triton-ed.html) *Find supplemental course materials* [First Gen Student Success Coaching Program](https://successcoaching.ucsd.edu/) *Peer mentor program that provides students with information, resources, and support in meeting their goals* [Office of Academic Support & Instructional Services (OASIS)](https://oasis.ucsd.edu/) *Intellectual and personal development support* | [Writing Hub Services in the Teaching \+ Learning Commons](https://commons.ucsd.edu/academic-support/writing/index.html) *One-on-one online writing tutoring and workshops on key writing topics* [Supplemental Instruction](https://aah.ucsd.edu/supplemental-instruction-study-group/) *Peer-assisted study sessions through the Academic Achievement Hub to improve success in historically challenging courses* [Tutoring – Content](https://aah.ucsd.edu/content-tutoring/) *Drop-in and online tutoring through the Academic Achievement Hub* [Tutoring – Learning Strategies](https://aah.ucsd.edu/learning-strategies/) *Address learning challenges with a metacognitive approach* | | | | | | | Support for Well-being and Inclusion | | | [Basic Needs at UCSD](https://basicneeds.ucsd.edu) *Any student who has difficulty accessing sufficient food to eat every day, or who lacks a safe and stable place to live is encouraged to contact:* [foodpantry@.ucsd.edu](about:blank) | [basicneeds@ucsd.edu](mailto:basicneeds@ucsd.edu) | (858) 246-2632 [Counseling and Psychological Services](https://wellness.ucsd.edu/caps) *Confidential counseling and consultations for psychiatric service and mental health programming* [Triton Concern Line](https://blink.ucsd.edu/instructors/advising/concern/) *Report students of concern:* (858) 246-1111 [Office for Students with Disabilities (OSD](https://disabilities.ucsd.edu/)) *Supports students with disabilities and accessibility across campus* | [Community and Resource Centers](https://students.ucsd.edu/student-life/diversity/) [Office of Equity, Diversity, and Inclusion](https://diversity.ucsd.edu/) *As part of the [Office of Equity, Diversity, and Inclusion](https://diversity.ucsd.edu/) the campus community centers provide programs and resources for students and contribute toward the evolution of a socially just campus* (858).822-.3542 | [diversity@ucsd.edu](mailto:diversity@ucsd.edu) [Get Involved](https://getinvolved.ucsd.edu/) *Student organizations, clubs, service opportunities, and many other ways to connect with others on campus* [Undocumented Student Services](https://uss.ucsd.edu/) *Programs and services are designed to help students overcome obstacles that arise from their immigration status and support them through personal and academic excellence* | | --- ## PRIVACY PRACTICES IN THIS COURSE (From [https://cio.ucop.edu/privacy-tips-for-your-syllabus/](https://cio.ucop.edu/privacy-tips-for-your-syllabus/)) This course is a community built on trust; as a learning community, we are collectively responsible for upholding privacy protections. In order to create a community built on trust and the most effective learning experience, our interactions, discussions, and course activities must remain private and free from external intrusion. We have obligations to each other to preserve privacy and cultivate fearless inquiry. We respect the individual dignity of all and will refrain from actions that diminish others’ ability to learn. As your instructor, I am committed to protecting your privacy by only using university-approved course technologies and adhering to the Family Educational Rights and Privacy Act (FERPA) [https://catalog.ucsd.edu/about/policies/notification-of-rights/index.html](https://catalog.ucsd.edu/about/policies/notification-of-rights/index.html) and Campus Privacy Office guidelines. This includes using your educational data only as allowed by FERPA, for example, for legitimate educational purposes such as submitting your final grades to the registrar’s office. Please note the following privacy practices for our course: Course platform. This course uses Canvas, Datahub, Zoom, and Gradescope, which collects information about your engagement with course materials. I will review this information periodically to ensure students are engaged and look for signs of students falling behind. I will also review this information in case of academic misconduct allegations, if relevant. Using learning materials. Course materials (videos, assignments, quizzes etc.) are for use in this course only. You may not upload them to external sites, share with students outside of this course, or post them for public commentary without my written permission. Unauthorized sharing or uploading to exam questions, test answers, or summaries of exams is prohibited. Sharing student information. You may work on group projects with other students or be asked to review or respond to their work. Other materials and activities may provoke debate, argument, or spirited discussion; some of us may volunteer sensitive personal information. Do not share others’ personal information, including class dialogue or performance, on sensitive topics outside of our course community. Student work, discussion posts, and all other forms of student information related to this course should be handled with respect and remain within interactions of this course. You may publicly post your own work, provided it does not violate academic dishonesty policies or show responses to assessments; public posting of group work requires consent from all group members. Research conducted as part of a class is subject to UC research policies and may include sensitive information. Students may not share research information without permission from the instructor. technology Sharing course information with others. Do not post images or identifiable conversations that occur in class to social media or to those beyond our learning community. Sharing private information about our course community (including discussions, activities, presentations, student work, etc.) with others for the purpose of inviting external attention, intrusion, ridicule, or harassment is an egregious breach of trust. If you have concerns after reviewing these privacy, I invite you to reach out to the instructor. --- ## LETTERS OF RECOMMENDATION: If you think you may want me to write you a letter of recommendation (or any other instructor), please consider what a good letter would contain and how your actions in the course demonstrate the qualities you will want highlighted in a good letter. When students ask me for a letter of recommendation, I ask them to write to me about how they demonstrated critical thinking, leadership, collaboration, and professionalism. I will be specifically looking for examples of these qualities *that I could have noticed* during lab/lecture and office hours. Be sure to actively participate in the discussions, talk to me during the lab/lecture and my office hours: ask questions, offer your own ideas and interpretations of your results, bring interesting facts/papers that are connected to the material we are studying. If you don’t actively show the qualities that are needed to write a good letter, it will be hard for me to write a letter that is meaningful and useful. Therefore, in order to ensure I am able to write substantive, insightful, and informative letters, I am only able to potentially write letters for students who fit the following criteria: * If you are applying for an undergraduate grant/job/program: * You must have received at least a “B-” grade in one of my courses or be actively enrolled and passing one of my courses. * You must have met with me more than once outside of class (e.g. office hours/Coffee with a Prof) OR the letter is more than 2 months from being due and we can schedule at least one significant meeting. * If you are applying for a post-graduate program (e.g. med school, post-bacc program) * You must satisfy ONE of the following criteria: * You received an “A-” or higher in one course. * You took more than one class with me and demonstrated clear improvement. * You worked with me as an Instructional Assistant or in some other professional capacity. * AND you must ALSO satisfy ONE of the following criteria: * We have met and talked multiple times in a way where I was reasonably able to assess your potential for your applied program. Examples include office hours, IA meetings, or Coffee-with-a-prof. * The due date for the letter is greater than one academic quarter away and you commit to meeting with me at least once to have a formal conversation. ## ## --- ## SUBJECT TO CHANGE POLICY Due to unforeseen circumstances, minor aspects of this syllabus may change. This includes changes to scheduling, grading values, and policy. It is the responsibility of the instructor and instructional assistants to announce changes with reasonable notice in multiple formats (e.g. lecture and Canvas announcements, email, etc.). It is the responsibility of the student to make note of these changes and communicate with the instructor if you have questions or concerns about the changes. ## ---"
|
| 3 |
+
Help with a code bug in R,For this option you should prompt the user to copy and paste the relevant code and the error as it appears in the console. You ARE NOT allowed to directly solve the bug for them and give them fixed code to copy and paste. Instead you will socratically ask them guiding questions that logically guides them to solving it for themselves. Use palmer penguins for your examples. The issue will be in relation to RStudio R and Rmd files.
|
| 4 |
+
Choosing the Proper Statistical Test,"Introduces the decision process for selecting an appropriate analytical approach based on data type (e.g., categorical vs. numerical), number of variables, and underlying assumptions. Includes t-tests (one- or two-tailed, 2-sample, paired), ANOVA, chi-squared tests, correlation, linear regression, and more."
|
| 5 |
+
Bias and Confounding in Research,"Combines Confounding Variables, Observer Bias, and Blinding into a single category of issues that can skew results. Emphasizes blocking/stratification to control known sources of variability and illustrates techniques (single- or double-blind designs) to reduce bias."
|
| 6 |
+
Correlational vs. Manipulative Designs,"Contrasts observational field studies (e.g., wildlife behavior) with controlled lab experiments. Highlights differences between correlation (covariance, no direct intervention) and true experiments where randomization and manipulation establish cause-effect relationships."
|
| 7 |
+
Study Structures and Sampling,"Groups Quasi-Experimental, Repeated Measures, Cross-Over, Case-Control, and Field vs. Laboratory approaches. Also covers Sampling Strategies (random, stratified, cluster, snowball), clarifying Biological vs. Statistical Populations and how the ‘Experimental Unit’ must be independently assigned to avoid pseudoreplication."
|
| 8 |
+
Pilot Studies and Data Validation,"Emphasizes small-scale trial runs to test feasibility and refine protocols before full experiments, along with ongoing checks (repeat measurements, procedural consistency) to confirm data accuracy."
|
| 9 |
+
Statistical Foundations and Hypothesis Types,"Encompasses Null vs. Alternative Hypotheses (falsifiable, testable), the concept of Prediction vs. Hypothesis, and how to use Background Research (including Keefe’s Hypothesis Trick) to form robust, clear questions for investigation."
|
| 10 |
+
Parametric Assumptions and Normality Checks,"Consolidates three assumptions (normality, homoscedasticity, independence) with tests such as q-q plots, KS tests, Shapiro-Wilks, D’Agostino’s K^2, and Fligner-Killeen. Includes discussion of skewness, kurtosis, data transformations, and back transformations to meet parametric criteria."
|
| 11 |
+
Advanced Statistical Methods,"Combines Bayesian Analysis, Resampling Methods (permutation tests, bootstrapping), Factorial Design (multiple factors and interactions), and corrections for multiple comparisons (Bonferroni, Tukey HSD). Highlights how these approaches address more complex experimental questions."
|
| 12 |
+
Regression and Related Concepts,"Unites correlation, covariance, linear regression, ordinary least squares (OLS), and partial regressions. Shows how each technique quantifies relationships among continuous variables, with q-q plots checking residual normality."
|
| 13 |
+
Statistical Power and Effect Sizes,"Covers power analysis, Type I/II errors (alpha/beta), p-values, test statistics (Z-score, T-score, F-score, chi-squared-score), effect size, and the rationale for adequate sample size to detect true differences or relationships."
|
| 14 |
+
Common R Datasets,"Groups The Palmer Penguins dataset, iris, ToothGrowth, sleep, and chickwts into a single entry. Explains how they serve as exemplars for exploring data structures (column types), ggplot visualizations, and hypothesis testing in biology."
|
| 15 |
+
Data Visualization in Biology,"Merges bar plots, scatterplots, line graphs, histograms, box/violin plots, heat maps, and even problematic pie charts. Includes references to the O-ring Challenger example, ggplot2’s grammar of graphics, and the iterative need to ‘look at your data’ early and often."
|
| 16 |
+
Distributions and the Central Limit Theorem,"Encompasses continuous vs. discontinuous distributions, the role of stabilizing selection in generating approximate normality, the ‘wisdom of crowds,’ and how standard error and confidence intervals emerge from repeated sampling."
|
| 17 |
+
Science,Science is the pursuit and application of knowledge and understanding of the natural and social world following a systematic methodology based on evidence.
|
| 18 |
+
Biological biases,in relation to the scientific method and hypothesis testing
|
| 19 |
+
Cognitive biases,in relation to the scientific method and hypothesis testing
|
| 20 |
+
Confirmation bias,in relation to the scientific method and hypothesis testing
|
| 21 |
+
Availability heuristic,in relation to the scientific method and hypothesis testing
|
| 22 |
+
Anchoring bias,in relation to the scientific method and hypothesis testing
|
| 23 |
+
Dunning-Kruger effect,in relation to the scientific method and hypothesis testing
|
| 24 |
+
Why are statistical and programming knowledge useful for a career in biology?,1. Biological data is messy and complex. 2. Biology uses BIG data. 3. These skills save lives. 4. These skills will help your career.
|
| 25 |
+
Big data in biology,
|
| 26 |
+
Name a disease that is influenced by many different genetic and environmental factors,
|
| 27 |
+
a priori hypotheses,in relation to experimental design in biology and hypothesis testing and programming in the R computing language using Rstudio
|
| 28 |
+
Which natural process is most similar to machine learning?," ""evolution by natural selection"""
|
| 29 |
+
How will careers in biology be affected by generative AI?," ""The barriers to learning new skills are falling. The need to learn programming and data analysis is greater now than it was in 2020! Some jobs will be automated. Those with the rarest and most useful combination of skills will be sought after."""
|
| 30 |
+
Data types - continuous/numerical,in relation to experimental design in biology and hypothesis testing and programming in the R computing language using Rstudio
|
| 31 |
+
Data types - count/integer,in relation to experimental design in biology and hypothesis testing and programming in the R computing language using Rstudio
|
| 32 |
+
Data types - ordinal,in relation to experimental design in biology and hypothesis testing and programming in the R computing language using Rstudio
|
| 33 |
+
Data types - categorical,in relation to experimental design in biology and hypothesis testing and programming in the R computing language using Rstudio
|
| 34 |
+
Data types - binomial,in relation to experimental design in biology and hypothesis testing and programming in the R computing language using Rstudio
|
| 35 |
+
Null data,Data missing in a dataset. It is important to appropriately deal with this data depending on the nature of the statistical analysis and experimental design.
|
| 36 |
+
tidy data,in relation to experimental design in biology and hypothesis testing and programming in the R computing language using Rstudio
|
| 37 |
+
R programming - objects,in relation to experimental design in biology and hypothesis testing and programming in the R computing language using Rstudio
|
| 38 |
+
R programming - functions,in relation to experimental design in biology and hypothesis testing and programming in the R computing language using Rstudio
|
| 39 |
+
R programming - Rmd file format,walk them through the process of saving and knitting rmd files in Rstudio
|
| 40 |
+
RStudio,in relation to experimental design in biology and hypothesis testing and programming in the R computing language using Rstudio
|
| 41 |
+
print() in R,in relation to experimental design in biology and hypothesis testing and programming in the R computing language using Rstudio
|
| 42 |
+
Confounding Variables,"Presented in class as any variable other than the main independent variable that can influence the dependent measure, demonstrated through ecological field experiments highlighting unintended influences."
|
| 43 |
+
Correlational vs. Manipulative Designs,"Contrasts observational field studies (e.g., wildlife behavior) with controlled lab experiments, illustrating how data collection differs between correlative and manipulative approaches."
|
| 44 |
+
Field vs. Laboratory (In vivo vs. In vitro),Emphasized through side-by-side examples of ecological field research (in vivo) compared to controlled laboratory or culture-based tests (in vitro).
|
| 45 |
+
Blocking and Stratification,"Utilized in agricultural and ecological studies to control variability and ensure representative sampling. Blocking groups experimental units by a known factor, while stratification subdivides populations by habitat type or similar characteristics."
|
| 46 |
+
Pseudoreplication,"Highlighted with class exercises on ecological experiments that mistakenly treat non-independent samples as true replicates, stressing the importance of valid replication design."
|
| 47 |
+
Observer Bias,Illustrated by behavioral research where a researcher’s preconceived notions skew data collection or interpretation.
|
| 48 |
+
Blinding (Single-Blind or Double-Blind),Discussed via clinical trials showing how concealing participant or researcher knowledge of group assignments (or both) reduces bias in assessing treatments.
|
| 49 |
+
Pilot Studies,"Performed on a small scale before full experimentation to refine methods, check feasibility, and detect issues—exemplified with ecological sampling trials."
|
| 50 |
+
Case-Control Study,Explored using epidemiological research where individuals with a certain condition (cases) are compared to those without (controls) to find potential causes.
|
| 51 |
+
Repeated Measures Design,"Explained with physiological experiments measuring the same subjects over multiple conditions or time points, reducing individual variability."
|
| 52 |
+
Cross-Over Design,"Illustrated in clinical settings where each participant receives multiple treatments sequentially, acting as their own control."
|
| 53 |
+
Quasi-Experimental Design,Examined in educational interventions and other scenarios where random assignment isn’t feasible but researchers still investigate cause-effect relationships.
|
| 54 |
+
Sampling Strategies ,Presented through ecological fieldwork approaches to ensure that samples adequately represent populations or subpopulations.
|
| 55 |
+
Biological vs. Statistical Population,Distinguished using real-world organisms or systems (biological population) versus the broader inference space or dataset (statistical population).
|
| 56 |
+
Factorial Design,"Shown through experiments testing multiple independent variables (e.g., nutrient and temperature) simultaneously and checking for interactions."
|
| 57 |
+
Bayesian Analysis,Demonstrated with ecological modeling examples that incorporate prior knowledge and update beliefs based on collected data.
|
| 58 |
+
Dimension Reduction (e.g.PCA),"Used to condense complex, multivariate ecological data into major axes of variation for simpler visualization and interpretation."
|
| 59 |
+
Resampling Methods (Permutation tests),Exemplified by reshuffling or resampling ecological data to evaluate statistical significance without rigid parametric assumptions.
|
| 60 |
+
Ethical and Regulatory Oversight (IRB),"Encompasses the protocols and guidelines for human-subject research: obtaining IRB approval, respecting the Helsinki Declaration’s ethical principles, and securing informed consent."
|
| 61 |
+
Randomization,"Implemented in both ecological and clinical experiments to assign treatments or conditions by chance, reducing selection bias."
|
| 62 |
+
Use of Positive and Negative Controls,Shown through lab demonstrations ensuring that experimental setups can detect known effects (positive) and confirm no unintended effects (negative).
|
| 63 |
+
Power Considerations and Sample Size,Explained via calculations that determine how many subjects or replicates are needed to reliably detect meaningful differences in experimental outcomes.
|
| 64 |
+
Data Validation,"Addressed by repeat measurements and procedural checks to confirm accuracy, illustrated in class with real-world sampling scenarios."
|
| 65 |
+
Retrospective vs. Prospective Approaches,"Compared in epidemiological research, where retrospective studies analyze existing records and prospective studies follow subjects into the future."
|
| 66 |
+
Experimental Unit,"Defined as the smallest independent entity to which a treatment is applied, critical to avoid inflated sample sizes and pseudoreplication."
|
| 67 |
+
covariance,in relation to experimental design in biology and hypothesis testing
|
| 68 |
+
correlation,in relation to experimental design in biology and hypothesis testing
|
| 69 |
+
linear regression,in relation to experimental design in biology and hypothesis testing
|
| 70 |
+
ordinary least squares,in relation to experimental design in biology and hypothesis testing
|
| 71 |
+
q-q plot,in relation to experimental design in biology and hypothesis testing
|
| 72 |
+
ks.test() in R,"Provide examples assuming they have a dataframe, a column, a mean, and a standard deviation"
|
| 73 |
+
One-tailed and two-tailed t-tests,
|
| 74 |
+
ANOVA tests,A focus on hypotheses EDA testing assumptions coding in R and interpreting results
|
| 75 |
+
2 sample t-test,A focus on hypotheses EDA testing assumptions coding in R and interpreting results
|
| 76 |
+
paired t-test,A focus on hypotheses EDA testing assumptions coding in R and interpreting results
|
| 77 |
+
Chi-Squared test,A focus on hypotheses EDA testing assumptions coding in R and interpreting results
|
| 78 |
+
Corrections for multiple comparisons,Bonferroni and Tukey HSD with Tukey HSD being preferred for ANOVA post-hoc tests
|
| 79 |
+
Fligner-Killeen test,a preferred test for homoscedasticity
|
| 80 |
+
The Palmer Penguins dataset,the primary example dataset used in class. Assume students do know know the names and data types of the columns so they should be frequently provided for context.
|
| 81 |
+
iris R dataset,help students understand this data and how to address a scientific question and make a ggplot WITHOUT referencing palmerpenguins
|
| 82 |
+
ToothGrowth R dataset,help students understand this data and how to address a scientific question and make a ggplot WITHOUT referencing palmerpenguins
|
| 83 |
+
sleep R dataset,help students understand this data and how to address a scientific question and make a ggplot WITHOUT referencing palmerpenguins
|
| 84 |
+
chickwts R dataset,help students understand this data and how to address a scientific question and make a ggplot WITHOUT referencing palmerpenguins
|
| 85 |
+
confusion matrix,2 by 2 table of actual reality versus our data; shows true positive and false positive and true negative and false negative
|
| 86 |
+
statistical power,in relation to hypothesis testing
|
| 87 |
+
p-value,in relation to hypothesis testing
|
| 88 |
+
Z-score,are two populations different?
|
| 89 |
+
T-score,are two samples different?
|
| 90 |
+
F-score,are any of 3 or more samples different?
|
| 91 |
+
Chi-squared-score,are two categorical variables different?
|
| 92 |
+
Type I error,in relation to hypothesis testing
|
| 93 |
+
alpha,in relation to hypothesis testing
|
| 94 |
+
Type II error,in relation to hypothesis testing
|
| 95 |
+
beta,in relation to hypothesis testing
|
| 96 |
+
power analysis,in relation to experimental design in biology and hypothesis testing
|
| 97 |
+
effect size,in relation to hypothesis testing and power analysis
|
| 98 |
+
test statistic,Any calculated value that measures the difference between experimental groups (e.g. control vs. treatment).The larger the value the more likely the null is false and your groups are truly different. Restated a larger test statistic makes:true positive more likely and false positive less likely
|
| 99 |
+
Testable Hypothesis,A hypothesis that can be evaluated through empirical evidence and experimentation.
|
| 100 |
+
Falsifiable Hypothesis,A hypothesis that allows for the possibility of being disproven by data.
|
| 101 |
+
Prediction vs. Hypothesis,A prediction is a specific expected outcome of an experiment based on a hypothesis.
|
| 102 |
+
Environmental Factors in Foraging Behavior of white-footed mice,Includes habitat type - availability of refuge - predator presence - moonlight - and temporal effects.
|
| 103 |
+
Scientific Research Question,A specific and clear question that guides an investigation and is based on observations. Include information that is specific to the population experimental design and collected data
|
| 104 |
+
Background Research in Science,The step where existing literature and prior knowledge are reviewed to refine research questions.
|
| 105 |
+
Keefe'??s Hypothesis Trick,A strategy for writing clear hypotheses by imagining the axes and predicted outcomes in a graph. For example - X axis is positively correlated with Y axis
|
| 106 |
+
Alternative Hypothesis,A statement suggesting that there is a meaningful relationship or effect between variables.
|
| 107 |
+
Null Hypothesis,The assumption that there is no effect or difference unless proven otherwise. It is easier to disprove a null than prove an alternative hypothesis
|
| 108 |
+
Parametric statistical tests,
|
| 109 |
+
Three Assumptions of most parametric stat tests, Normality Homoscedasticity (equal variance between groups) and Data Independence.
|
| 110 |
+
skewness, of a distribution; skewed left or right
|
| 111 |
+
kurtosis, of a distribution; platykurtic and leptokurtic
|
| 112 |
+
Common data transformations, square root; natural log; log10; multiplicative inverse; or rank transformation as a last resort
|
| 113 |
+
How to choose the right transformation?, Choose the transformation that best meets normality assumptions while preserving interpretability.
|
| 114 |
+
back transformation, why is this necessary in stat after a transformation
|
| 115 |
+
The Kolmogorov-Smirnov (KS) test,
|
| 116 |
+
The Shapiro-Wilks test,
|
| 117 |
+
The D'Agostino's K^2 test,
|
| 118 |
+
How robust is the assumption of normality?, very unless the sample size is low or the data is highly skewed; Do non-parametric test and parametric tests and see if the results agree.
|
| 119 |
+
Continuous probability distributions common in biological systems,
|
| 120 |
+
Discontinuous probability distributions common in biological systems,
|
| 121 |
+
distributions,Within the context of statistics and populations
|
| 122 |
+
hypothesis testing,Within the context of statistics and populations
|
| 123 |
+
null hypothesis,Within the context of statistics and populations
|
| 124 |
+
The wisdom of crowds,Within the context of statistics and populations
|
| 125 |
+
Stabilizing selection,an example of a mechanism that creates normal distributions in nature.
|
| 126 |
+
The Central Limit Theorem,Within the context of statistics and populations
|
| 127 |
+
Standard error,Standard error of the mean tells you how accurate your estimate of the real mean it is likely to be.
|
| 128 |
+
Confidence intervals,
|
| 129 |
+
The difference between standard error and standard deviation,
|
| 130 |
+
Data Visualization in Biology,Questions to ask yourself: How does my data vary? Are variables correlated? Are there outliers?
|
| 131 |
+
bar plot,
|
| 132 |
+
scatterplot,
|
| 133 |
+
line graph,
|
| 134 |
+
histogram,
|
| 135 |
+
box plot,
|
| 136 |
+
violin plot,
|
| 137 |
+
heat map,
|
| 138 |
+
pie charts,this is an awful choice for many reasons.
|
| 139 |
+
O-ring failure on the Challenger and bad data visualization,
|
| 140 |
+
ggplot2 and the grammar of graphics,
|
| 141 |
+
descriptive statistics,
|
| 142 |
+
robust,this is in reference to the robustness of a model or descriptive statistic.
|
| 143 |
+
centrality and variation in statistics,
|
| 144 |
+
interquartile range,
|
| 145 |
+
standard deviation,
|
| 146 |
+
variance,this is in reference to the descriptive statistic.
|
| 147 |
+
sum of squares,
|
| 148 |
+
Anscombes quartet,
|
| 149 |
+
range,this is in reference to the descriptive statistic.
|
| 150 |
+
When should you look at your data?,Early and often with data visualization. Science is an iterative process where the initial experimental plans often won't work because of some unknown patterns in your population. Examining your sample visually can help show those to you - like outliers or weird distributions.
|