Schema_Study_BILD5_V1 / week7_running.csv
keefereuther's picture
Upload week7_running.csv
68d790c verified
TERM,CONTEXT
general course content questions,"""Here is the course syllabus for context:---## **COURSE DESCRIPTION \- *Data Analysis and Design for Biologists (4 credits)***This course is a practical introduction to information literacy experimental design and data analysis for biologists. Students will be introduced to coding data management and quantitative analysis using the R programming language and the RStudio IDE. However this is not a traditional statistics course and no math prerequisites are required. Rather this course focuses on practical skills related to effectively asking and answering biological questions with data.## **COURSE LEARNING OUTCOMES**1. *Create* testable hypotheses addressing valid biological questions. 2. *Evaluate* the credibility and value of relevant scientific information. 3. *Design* experiments that effectively test hypotheses. 4. *Construct* figures that effectively communicate data. 5. *Perform* appropriate quantitative and statistical analyses on experimental data. 6. *Interpret* the results of quantitative statistical models and associated analyses. 7. *Utilize* the R programming language for scientific data analysis and graphing. 8. *Combine* the elements of a complete investigative cycle in a student designed project. 9. *Explore* the modern intersection between different subfields of biology technology and data science. 10. *Examine* the ethical responsibilities of scientists when creating and communicating scientific evidence.## ## **CONTACT AND SCHEDULING INFO**Dr. Keefe Reuther (he/him/his) (Please call me Keefe) [Email address: kdreuther@ucsd.edu](mailto:kdreuther@ucsd.edu) (please put BILD 5 in the subject line) LECTURE TIME MWF 1:00-1:50pm Center Hall Room 101FINAL EXAM TIME Friday March 21 2025; 11:30a-2:30p PSTINSTRUCTIONAL ASSISTANTS AND MEETING TIMES Name Email Address Keerthana [kananda@ucsd.edu](mailto:kananda@ucsd.edu) Joseph [jsc011@ucsd.edu](mailto:jsc011@ucsd.edu) Jia [jbhavnani@ucsd.edu](mailto:jbhavnani@ucsd.edu)SECTION MEETING TIMES B01 M 12:00p \- 12:50p Peterson Hall 104 B02 W 3:00p \- 3:50p WLH 2204 B03 W 4:00p \- 4:50p WLH 2204 B04 F 5:00p \- 5:50p Pacific Hall 3500KEEFE’S OFFICE HOURS Monday and Friday 2:30-4pm HSS 8016PREREQUISITES None\! And you don’t need any experience coding or working in a lab\!QUESTIONS? This term we will be using Piazza for class discussion. The system is highly catered to getting you help fast and efficiently from classmates the TA and myself. Rather than emailing questions to the teaching staff I encourage you to post your questions on Piazza. If you have any problems or feedback for the developers email team@piazza.com. Find our class page at: [https://piazza.com/ucsd/winter2025/bild5\_wi25\_b00/info](https://piazza.com/ucsd/winter2025/bild5_wi25_b00/info) COURSE CALENDAR [https://tinyurl.com/keefe-BILD5-W25](https://tinyurl.com/keefe-BILD5-W25)## TECHNOLOGY REQUIREMENTS You will need access to a device that can access a web browser (e.g. Chrome Safari Firefox). This will be to access Canvas Zoom and the UCSD DataHub. While any connected device can typically accomplish this (smart phone tablet laptop) it is highly recommended that you use a laptop or a desktop computer for connecting to the UCSD DataHub. Trust me you don’t want to write code from your phone\! Note that Chromebooks work perfectly well for this course. You are allowed to bring your technology into lecture \- *however it is not necessary*. Since section activities are based around R coding it is recommended that you bring your laptop. There are resources on campus available if you have tech needs. Please visit: [https://libraries.ucsd.edu/computing-and-technology/computers-and-laptop-stations/index.html](https://libraries.ucsd.edu/computing-and-technology/computers-and-laptop-stations/index.html)## COURSE CALENDAR| Week | Lecture Topics || :---: | :---- || 1 | Why are science statistics and programming necessary? || 1 | Ok but why are they necessary in biology? || 1 | Data Part 1: types and structures/variables and samples || 2 | Data Part 2: The importance of visualizing data || 2 | || 2 | Data Part 3: Central tendency and variation || 3 | **UNIVERSITY HOLIDAY** || 3 | Data Part 4: Normality \- the Central Limit Theorem and confidence intervals || 3 | || 4 | Hypothesis Testing Part 1: Writing a clear alternative and null || 4 | Hypothesis Testing Part 2: Testing your assumptions || 4 | Hypothesis Testing Part 3: Test statistics and quantifying differences || 5 | || 5 | Hypothesis Testing Part 4: How power p values effect size and sample size affect uncertainty || 5 | || 6 | Hypothesis Testing Part 5: Putting it all together to interpret papers || 6 | Hypothesis Testing Part 6: ANOVA and post-hoc testing || 6 | Hypothesis Testing Part 7: Correlation and quantifying similarities || 7 | **UNIVERSITY HOLIDAY** || 7 | Hypothesis Testing Part 7: Making predictions with linear regression || 7 | || 8 | Experimental Design Part 1 \- Which design to choose? || 8 | Experimental Design Part 2 \- The importance of sampling || 8 | Experimental Design Part 3 \- Practical and ethical considerations || 9 | What’s next? Part 1 \- Where do you go from here? multivariate models resampling ordination bioinformatics… || 9 | || 9 | What’s next? Part 2 \- Labs research careers and the issues facing modern science culture. || 10 | || 10 | Catch up review and term project help || 10 | |## ## DISCUSSION SECTION TOPICS# | Week 1 | Introduction to Datahub and RStudio \- “Hello World” || :---- | :---- || Week 2 | Importing data saving files making objects || Week 3 | Data visualization || Week 4 | Tidyverse and data wrangling/cleaning || Week 5 | Review || Week 6 | Test for normality; t-test part 1 || Week 7 | t-test part 2 || Week 8 | ANOVA || Week 9 | Linear regression and correlation || Week 10 | Work on final project |## ## DELIVERABLES# | All due times are 11:59pm PST | || ----- | :---- || Week 1 | Extra Credit Surveys; Discussion Board Prompt; Section Assignment || Week 2 | Theory Quiz Section Assignment || Week 3 | Discussion Board Prompt; Section Assignment || Week 4 | Theory Quiz Section Assignment || Week 5 | Discussion Board Prompt || Week 6 | Theory Quiz Section Assignment || Week 7 | Discussion Board Prompt; Section Assignment || Week 8 | Section Assignment Term Project CP \#1 || Week 9 | Discussion Board Prompt; Section Assignment; Term Project CP \#2 || Week 10 | Theory Quiz Section Assignment; Extra Credit Surveys || Finals Week | Final Exam; Term Project |## GRADING| Lecture participation (6 missed lecture classes OK) | 5% || :---- | :---- || Theory Quizzes (drop lowest score) | 15% || Section Assignments (drop lowest score) | 25% || Discussion Board Prompts (drop lowest score) | 10% || Term Project Checkpoints | 10% || Term Project | 15% || Final Exam | 20% || Extra Credit Surveys & SETs | up to 1% |POLICY ON COLLABORATION Working together is good\! Science is a social act and we want this course to mirror the real world of biology. That being said we also need to adhere to our pledge to act with integrity. Therefore you may help each other in general. This means explaining concepts definitions processes etc. to each other. You may also talk about and share code with each other. Copying and pasting code is an everyday tactic. However your final answers and responses must be your own and written in your own words. There is to be absolutely no sharing of answers. We will frequently ask for you to annotate your code and explain what your code is doing. This as well as all other written work must be original. All assignments (including RMarkdown pdfs) will be run through a plagiarism checker. At the end of the day you are here to learn this material so you can be a better biologist. Focus on learning and grades will come as an indirect wonderful consequence.DISCUSSION BOARD PROMPTS During weeks 1 3 5 7 and 9 there will be a discussion prompt placed in the Discussions section of Canvas prior to Monday morning. You must make one substantive response to the prompt before Wednesday @11:59pm. You must respond to another student’s reply by Sunday night at 11:59pm. For a response to count for credit it must be original substantive and properly cited (if necessary). Generally this means a small paragraph. Replies of “I agree” do not count as substantive.THEORY QUIZZES During weeks 2 4 6 and 10 there will be a 60 minute Canvas quiz due by 11:59pm PST on Sunday. Each quiz is non-cumulative and will consist of multiple choice and short answer applied and theoretical questions encompassing the material since the previous quiz. You will not be expected to do any coding or use RStudio during the quiz.FINAL EXAM \- MANDATORY AND IN-PERSON \- YOU MUST TAKE IT AT THE SCHEDULED TIME You will have 3 hours for the final exam even though it will not be written to take the entire time. This will be a cumulative exam mirroring the structure of an extended theory quiz. A study guide with relevant topics will be provided beforehand. You may bring 1 sheet of 8.5”x11” paper with information of your choosing on both sides. You can add anything but it must be of your own creation \- we will collect them at the end.TERM PROJECT This project will allow you to go through an entire investigative cycle on your own from the design of your own question through being provided with simulated data to analyze interpret and report. You will receive instructor feedback after each step. Please take heed of the feedback as grading will get progressively more stringent. See individual rubrics on Canvas for more information. Each step should be adequately researched and cited using core principles of scientific literacy. While the data is fake your project should be realistic relevant and at least moderately original. This should be a product that you can put in your portfolio for future interviews. Who knows? Maybe it’ll inspire your next research project in graduate school\!EXTRA CREDIT This can be earned by completing course evaluations and related surveys which aim to improve the course and the educational experiences of your future peers. There are no other opportunities for extra credit beyond what is assigned by the course instructor. ---## GRADING| A+ | 97-100% | B+ | 87-89% | C+ | 77-79% | D+ | 67-69% | F | 0-59% || :---- | :---- | :---- | :---- | :---- | :---- | :---- | :---- | :---- | :---- || A | 93-96% | B | 83-86% | C | 73-76% | D | 63-66% | | || A- | 90-92% | B- | 80-82% | C- | 70-72% | D- | 60-62% | | |**Grade cut-offs will never be shifted and there is no rounding of points.**---LATE ASSIGNMENT POLICY**Lecture Participation:** Missing a lecture facing technical difficulties (e.g. your phone dies) or failing to answer at least 50% of Mentimeter questions doesn’t affect your attendance. **Action:** No action is necessary. You can miss up to six lectures (two weeks) and still maintain 100% attendance regardless of the reason.**Theory Quiz:** Missed a Theory Quiz? **Action:** Don’t worry—your lowest quiz score is automatically dropped no matter the reason. However if you have an unavoidable issue preventing you from taking a quiz email Keefe ([**kdreuther@ucsd.edu**](mailto:kdreuther@ucsd.edu)) directly to discuss your situation.**Section Assignment Discussion Board Final Project or Checkpoints:** Submitted late? **Action:** A 1% penalty applies for each hour the assignment is late. If you face an unavoidable issue (e.g. hospitalization) contact Keefe ([**kdreuther@ucsd.edu**](mailto:kdreuther@ucsd.edu)) promptly. Keep in mind technical difficulties right before the deadline are not valid reasons for exceptions. Your lowest scores for discussion board and section assignments will still be dropped.**Final Exam:** Can’t attend due to unavoidable circumstances? **Action:** The final exam is in-person synchronous and mandatory. It will not be rescheduled unless you qualify for an OSD accommodation. If you can’t attend due to an unavoidable issue (e.g. illness) notify Keefe ([**kdreuther@ucsd.edu**](mailto:kdreuther@ucsd.edu)) immediately and follow the University’s incomplete grade policy to make alternate arrangements.---ACADEMIC INTEGRITY POLICY ON GENERATIVE AI*Philosophical Overview:*Generative AI is neutral by nature neither good nor bad. Its value hinges on how it's applied. We acknowledge AI's potential to both elevate and diminish the academic experience. While it's a powerful tool for the digital age and essential for our future it doesn't absolve us from upholding academic integrity and opposing plagiarism.*Personal Responsibility and Accountability:*You own your work. AI can assist but it shouldn't be the main contributor. If your work appears overly dependent on AI expect an oral quiz to test your understanding. Remember: mastering AI like any skill takes effort. Over-relying on it shortchanges your education and has lasting consequences.*Attribution and Documentation:*Using AI-generated content? Document:* Prompts given to the AI: "\<List prompt(s)\>" * AI's direct output: "\<Paste the output generated by the AI system\>" * Your modifications to the output: "\<explain the actions taken\>" * How did you verify the information? Did you run the code or did you have to fact-check online?*Disclaimer on Generative AI:*Generative AI such as LLMs can sometimes produce misleading or false information. Be especially wary with images. You're accountable for every submission AI-assisted or not. ALWAYS fact-check AI-generated content before submission.*Support and Resources:*Need help? Contact your instructor IAs or the following UC San Diego support centers:* The Commons’ Academic Achievement Hub: Learning strategies tutoring and supplemental instruction. * The Commons’ Writing Hub: Writing and presentation help. * The Library: Research guidance. * The Academic Integrity Office: Queries about ethical GenAI use.---## CAMPUS POLICIES* ### [UC San Diego Principles of Community](https://ucsd.edu/about/principles.html) * ### [UC San Diego Policy on Integrity of Scholarship](http://senate.ucsd.edu/Operating-Procedures/Senate-Manual/Appendices/2)* ## [Religious Accommodation](https://senate.ucsd.edu/operating-procedures/educational-policies/courses/epc-policies-on-courses/policy-exams-including-midterms-final-exams-and-religious-accommodations-for-exams/)* ## [Nondiscrimination and Harassment](https://ophd.ucsd.edu/policies-procedures/nps_student.html)* ## [UC San Diego Student Conduct Code](https://students.ucsd.edu/_files/student-conduct/ucsandiego-student-conduct-code_interim-revisions1-16-18.pdf)*Diversity and equity statement*It is important for us to make sure that how we teach this course and how we accommodate different student needs reflects the differences of race ability sexual orientation age and gender identity that enrich our classroom experience and campus. If you have any concerns related to diversity and equity in the course please contact the instructor. If you find yourself in an uncomfortable situation ask for help. The university is committed to upholding policies regarding nondiscrimination sexual violence and sexual harassment. ---## STUDENT RESOURCES| Learning and Academic Support | || :---- | :---- || [Ask a Librarian: Library Support](https://library.ucsd.edu/ask-us) *Chat or make an appointment with a librarian to focus on your research needs* [Course Reserves Connecting from Off-Campus and Research Support](https://library.ucsd.edu/ask-us/triton-ed.html) *Find supplemental course materials* [First Gen Student Success Coaching Program](https://successcoaching.ucsd.edu/) *Peer mentor program that provides students with information resources and support in meeting their goals* [Office of Academic Support & Instructional Services (OASIS)](https://oasis.ucsd.edu/) *Intellectual and personal development support* | [Writing Hub Services in the Teaching \+ Learning Commons](https://commons.ucsd.edu/academic-support/writing/index.html) *One-on-one online writing tutoring and workshops on key writing topics* [Supplemental Instruction](https://aah.ucsd.edu/supplemental-instruction-study-group/) *Peer-assisted study sessions through the Academic Achievement Hub to improve success in historically challenging courses* [Tutoring – Content](https://aah.ucsd.edu/content-tutoring/) *Drop-in and online tutoring through the Academic Achievement Hub* [Tutoring – Learning Strategies](https://aah.ucsd.edu/learning-strategies/) *Address learning challenges with a metacognitive approach* || | || | || | || Support for Well-being and Inclusion | || [Basic Needs at UCSD](https://basicneeds.ucsd.edu) *Any student who has difficulty accessing sufficient food to eat every day or who lacks a safe and stable place to live is encouraged to contact:* [foodpantry@.ucsd.edu](about:blank) | [basicneeds@ucsd.edu](mailto:basicneeds@ucsd.edu) | (858) 246-2632 [Counseling and Psychological Services](https://wellness.ucsd.edu/caps) *Confidential counseling and consultations for psychiatric service and mental health programming* [Triton Concern Line](https://blink.ucsd.edu/instructors/advising/concern/) *Report students of concern:* (858) 246-1111 [Office for Students with Disabilities (OSD](https://disabilities.ucsd.edu/)) *Supports students with disabilities and accessibility across campus* | [Community and Resource Centers](https://students.ucsd.edu/student-life/diversity/) [Office of Equity Diversity and Inclusion](https://diversity.ucsd.edu/) *As part of the [Office of Equity Diversity and Inclusion](https://diversity.ucsd.edu/) the campus community centers provide programs and resources for students and contribute toward the evolution of a socially just campus* (858).822-.3542 | [diversity@ucsd.edu](mailto:diversity@ucsd.edu) [Get Involved](https://getinvolved.ucsd.edu/) *Student organizations clubs service opportunities and many other ways to connect with others on campus* [Undocumented Student Services](https://uss.ucsd.edu/) *Programs and services are designed to help students overcome obstacles that arise from their immigration status and support them through personal and academic excellence* || | |---## PRIVACY PRACTICES IN THIS COURSE(From [https://cio.ucop.edu/privacy-tips-for-your-syllabus/](https://cio.ucop.edu/privacy-tips-for-your-syllabus/)) This course is a community built on trust; as a learning community we are collectively responsible for upholding privacy protections. In order to create a community built on trust and the most effective learning experience our interactions discussions and course activities must remain private and free from external intrusion. We have obligations to each other to preserve privacy and cultivate fearless inquiry. We respect the individual dignity of all and will refrain from actions that diminish others’ ability to learn.As your instructor I am committed to protecting your privacy by only using university-approved course technologies and adhering to the Family Educational Rights and Privacy Act (FERPA) [https://catalog.ucsd.edu/about/policies/notification-of-rights/index.html](https://catalog.ucsd.edu/about/policies/notification-of-rights/index.html) and Campus Privacy Office guidelines. This includes using your educational data only as allowed by FERPA for example for legitimate educational purposes such as submitting your final grades to the registrar’s office. Please note the following privacy practices for our course:Course platform. This course uses Canvas Datahub Zoom and Gradescope which collects information about your engagement with course materials. I will review this information periodically to ensure students are engaged and look for signs of students falling behind. I will also review this information in case of academic misconduct allegations if relevant.Using learning materials. Course materials (videos assignments quizzes etc.) are for use in this course only. You may not upload them to external sites share with students outside of this course or post them for public commentary without my written permission. Unauthorized sharing or uploading to exam questions test answers or summaries of exams is prohibited.Sharing student information. You may work on group projects with other students or be asked to review or respond to their work. Other materials and activities may provoke debate argument or spirited discussion; some of us may volunteer sensitive personal information. Do not share others’ personal information including class dialogue or performance on sensitive topics outside of our course community. Student work discussion posts and all other forms of student information related to this course should be handled with respect and remain within interactions of this course. You may publicly post your own work provided it does not violate academic dishonesty policies or show responses to assessments; public posting of group work requires consent from all group members. Research conducted as part of a class is subject to UC research policies and may include sensitive information. Students may not share research information without permission from the instructor. technologySharing course information with others. Do not post images or identifiable conversations that occur in class to social media or to those beyond our learning community. Sharing private information about our course community (including discussions activities presentations student work etc.) with others for the purpose of inviting external attention intrusion ridicule or harassment is an egregious breach of trust. If you have concerns after reviewing these privacy I invite you to reach out to the instructor. ---## LETTERS OF RECOMMENDATION:If you think you may want me to write you a letter of recommendation (or any other instructor) please consider what a good letter would contain and how your actions in the course demonstrate the qualities you will want highlighted in a good letter. When students ask me for a letter of recommendation I ask them to write to me about how they demonstrated critical thinking leadership collaboration and professionalism. I will be specifically looking for examples of these qualities *that I could have noticed* during lab/lecture and office hours. Be sure to actively participate in the discussions talk to me during the lab/lecture and my office hours: ask questions offer your own ideas and interpretations of your results bring interesting facts/papers that are connected to the material we are studying. If you don’t actively show the qualities that are needed to write a good letter it will be hard for me to write a letter that is meaningful and useful. Therefore in order to ensure I am able to write substantive insightful and informative letters I am only able to potentially write letters for students who fit the following criteria:* If you are applying for an undergraduate grant/job/program: * You must have received at least a “B-” grade in one of my courses or be actively enrolled and passing one of my courses. * You must have met with me more than once outside of class (e.g. office hours/Coffee with a Prof) OR the letter is more than 2 months from being due and we can schedule at least one significant meeting. * If you are applying for a post-graduate program (e.g. med school post-bacc program) * You must satisfy ONE of the following criteria: * You received an “A-” or higher in one course. * You took more than one class with me and demonstrated clear improvement. * You worked with me as an Instructional Assistant or in some other professional capacity. * AND you must ALSO satisfy ONE of the following criteria: * We have met and talked multiple times in a way where I was reasonably able to assess your potential for your applied program. Examples include office hours IA meetings or Coffee-with-a-prof. * The due date for the letter is greater than one academic quarter away and you commit to meeting with me at least once to have a formal conversation. """
Help with a code bug in R,For this option you should prompt the user to copy and paste the relevant code and the error as it appears in the console. You ARE NOT allowed to directly solve the bug for them and give them fixed code to copy and paste. Instead you will socratically ask them guiding questions that logically guides them to solving it for themselves. Use palmer penguins for your examples. The issue will be in relation to RStudio R and Rmd files.
How to choose the proper statistical test,Make sure to walk them through data types and variable number as well as limiting the discussion to tests listing the term list
covariance,in relation to experimental design in biology and hypothesis testing
correlation,in relation to experimental design in biology and hypothesis testing
linear regression,in relation to experimental design in biology and hypothesis testing
ordinary least squares,in relation to experimental design in biology and hypothesis testing
q-q plot,in relation to experimental design in biology and hypothesis testing
ks.test() in R,"Provide examples assuming they have a dataframe, a column, a mean, and a standard deviation"
One-tailed and two-tailed t-tests,
ANOVA tests,A focus on hypotheses EDA testing assumptions coding in R and interpreting results
2 sample t-test,A focus on hypotheses EDA testing assumptions coding in R and interpreting results
paired t-test,A focus on hypotheses EDA testing assumptions coding in R and interpreting results
Chi-Squared test,A focus on hypotheses EDA testing assumptions coding in R and interpreting results
Corrections for multiple comparisons,Bonferroni and Tukey HSD with Tukey HSD being preferred for ANOVA post-hoc tests
Fligner-Killeen test,a preferred test for homoscedasticity
The Palmer Penguins dataset,the primary example dataset used in class. Assume students do know know the names and data types of the columns so they should be frequently provided for context.
iris R dataset,help students understand this data and how to address a scientific question and make a ggplot WITHOUT referencing palmerpenguins
ToothGrowth R dataset,help students understand this data and how to address a scientific question and make a ggplot WITHOUT referencing palmerpenguins
sleep R dataset,help students understand this data and how to address a scientific question and make a ggplot WITHOUT referencing palmerpenguins
chickwts R dataset,help students understand this data and how to address a scientific question and make a ggplot WITHOUT referencing palmerpenguins
confusion matrix,2 by 2 table of actual reality versus our data; shows true positive and false positive and true negative and false negative
statistical power,in relation to hypothesis testing
p-value,in relation to hypothesis testing
Z-score,are two populations different?
T-score,are two samples different?
F-score,are any of 3 or more samples different?
Chi-squared-score,are two categorical variables different?
Type I error,in relation to hypothesis testing
alpha,in relation to hypothesis testing
Type II error,in relation to hypothesis testing
beta,in relation to hypothesis testing
power analysis,in relation to experimental design in biology and hypothesis testing
effect size,in relation to hypothesis testing and power analysis
test statistic,Any calculated value that measures the difference between experimental groups (e.g. control vs. treatment).The larger the value the more likely the null is false and your groups are truly different. Restated a larger test statistic makes:true positive more likely and false positive less likely
Testable Hypothesis,A hypothesis that can be evaluated through empirical evidence and experimentation.
Falsifiable Hypothesis,A hypothesis that allows for the possibility of being disproven by data.
Prediction vs. Hypothesis,A prediction is a specific expected outcome of an experiment based on a hypothesis.
Environmental Factors in Foraging Behavior of white-footed mice,Includes habitat type - availability of refuge - predator presence - moonlight - and temporal effects.
Scientific Research Question,A specific and clear question that guides an investigation and is based on observations. Include information that is specific to the population experimental design and collected data
Background Research in Science,The step where existing literature and prior knowledge are reviewed to refine research questions.
Keefe'??s Hypothesis Trick,A strategy for writing clear hypotheses by imagining the axes and predicted outcomes in a graph. For example - X axis is positively correlated with Y axis
Alternative Hypothesis,A statement suggesting that there is a meaningful relationship or effect between variables.
Null Hypothesis,The assumption that there is no effect or difference unless proven otherwise. It is easier to disprove a null than prove an alternative hypothesis
Parametric statistical tests,
Three Assumptions of most parametric stat tests, Normality Homoscedasticity (equal variance between groups) and Data Independence.
skewness, of a distribution; skewed left or right
kurtosis, of a distribution; platykurtic and leptokurtic
Common data transformations, square root; natural log; log10; multiplicative inverse; or rank transformation as a last resort
How to choose the right transformation?, Choose the transformation that best meets normality assumptions while preserving interpretability.
back transformation, why is this necessary in stat after a transformation
The Kolmogorov-Smirnov (KS) test,
The Shapiro-Wilks test,
The D'Agostino's K^2 test,
How robust is the assumption of normality?, very unless the sample size is low or the data is highly skewed; Do non-parametric test and parametric tests and see if the results agree.
Continuous probability distributions common in biological systems,
Discontinuous probability distributions common in biological systems,
distributions,Within the context of statistics and populations
hypothesis testing,Within the context of statistics and populations
null hypothesis,Within the context of statistics and populations
The wisdom of crowds,Within the context of statistics and populations
Stabilizing selection,an example of a mechanism that creates normal distributions in nature.
The Central Limit Theorem,Within the context of statistics and populations
Standard error,Standard error of the mean tells you how accurate your estimate of the real mean it is likely to be.
Confidence intervals,
The difference between standard error and standard deviation,
Data Visualization in Biology,Questions to ask yourself: How does my data vary? Are variables correlated? Are there outliers?
bar plot,
scatterplot,
line graph,
histogram,
box plot,
violin plot,
heat map,
pie charts,this is an awful choice for many reasons.
O-ring failure on the Challenger and bad data visualization,
ggplot2 and the grammar of graphics,
descriptive statistics,
robust,this is in reference to the robustness of a model or descriptive statistic.
centrality and variation in statistics,
interquartile range,
standard deviation,
variance,this is in reference to the descriptive statistic.
sum of squares,
Anscombes quartet,
range,this is in reference to the descriptive statistic.
When should you look at your data?,Early and often with data visualization. Science is an iterative process where the initial experimental plans often won't work because of some unknown patterns in your population. Examining your sample visually can help show those to you - like outliers or weird distributions.
Science,Science is the pursuit and application of knowledge and understanding of the natural and social world following a systematic methodology based on evidence.
PPDAC - Problem,
PPDAC - Plan,
PPDAC - Data,
PPDAC - Analysis,
PPDAC - Conclusion,This step cycles back to 'PPDAC - Problem' because science is an iterative process that creates new questions and directions.
Biological biases,
Cognitive biases,
Confirmation bias,
Availability heuristic,
Anchoring bias,
Dunning-Kruger effect,
Why are statistical and programming knowledge useful for a career in biology?,1. Biological data is messy and complex. 2. Biology uses BIG data. 3. These skills save lives. 4. These skills will help your career.
Big data in biology,
Name a disease that is influenced by many different genetic and environmental factors,
a priori hypotheses,
Which natural process is most similar to machine learning?," ""evolution by natural selection"""
How will careers in biology be affected by generative AI?," ""The barriers to learning new skills are falling. The need to learn programming and data analysis is greater now than it was in 2020! Some jobs will be automated. Those with the rarest and most useful combination of skills will be sought after."""
Data types - continuous/numerical,
Data types - count/integer,
Data types - ordinal,
Data types - categorical,
Data types - binomial,
Null data,Data missing in a dataset. It is important to appropriately deal with this data depending on the nature of the statistical analysis and experimental design.
tidy data,
R programming - objects,
R programming - functions,
R programming - Rmd file format,
RStudio,
print() in R,