-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path02-chap2.Rmd
24 lines (15 loc) · 4.36 KB
/
02-chap2.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
---
output:
pdf_document: default
html_document: default
---
# Week 1: R Programming I{#chap2}
## Why Programming? Why Statistics? Why R?
Computer programming has become an increasingly valued skill for biologists. Computer programming allows us to express complicated ideas in a formal way such that they can be analysed and evaluated using a computer. Thus, the computer __keeps us honest__ with regard to our ideas. If we can't explain our scientific problem to a computer, we haven't thought it through well enough. Indeed, since [Alan Turing](https://en.wikipedia.org/wiki/Alan_Turing) showed that a computer can emulate any process that can be described by an algorithm, we might ask, "What are computers for?" Many of you will be used to thinking that computers are used for word processing, spreadsheets, email, Instagram etc. and that is true. But I think the answer to the question goes much deeper than this. The best answer that I can think of is, "__Computers are machines for thinking with.__" We use computers as an aid to thinking about the world. Computers can take the drudgery out of many scientific processes and leave us with room to think about the "big" questions. Computers are used in many situations in biology. In this course, we will be concerned with using computers to statistically and dynamically __model__ aspects of data and biological processes.
## Why Statistics?
The biological world is messy. There is no way we can treat organisms the way atoms and molecules are treated in physics and chemistry. Even if we knew all the physical and chemical properties of all the molecules in an organism, we still could not predict accurately what that organism looks like or how it will behave so we need to use models that incorporate uncertainty and randomness in a principled way. This is why we do statistics. Statistical models have probability "built in." Statistical models allow us to draw conclusions from data and form a way of thinking about organisms that takes randomness and uncertainty into account.
Statistics is at the very heart of biology: the mechanism of natural selection requires that organisms differ, and differences among organisms are ultimately due to the random process of mutation. Mutation is necessary for selection to work. In eukaryotic cells, assortment of chromosomes occur at random, and there can be random "crossing over" events along a chromosome. Thus, statistical models help us to understand how and why organisms have evolved.
We will be modelling the properties of __data__. Data are crucial to the scientific process and it is important to treat them fairly and gently. We will be using a variety of statistical methods to study data. From previous courses, you may be under the impression that statistics is about performing various tests on your data to draw conclusions. It is true that this is one aspect of statistics. But at a more fundamental level, we seek a good __model__ for our data. After obtaining a good model, we find that the statistical tests are easy. They pretty much look after themselves.
## Why R?
R is a computer language, similar in many respects to other computer languages that you may have heard of, such as Python, C, and C++. There are many reasons why R is the language of choice for doing statistical modelling. It is free (both free as in "free beer" and free as in "free speech"). It was originally written by statisticians for statisticians so there are many aspects of the language that are ideally suited for doing statistical modelling. The language is relatively easy to read, write, and understand. There are thousands of add-on packages available for R that can be used to do many diverse forms of analysis. R has become the "lingua franca" of statistics. R is used by research statisticians to develop new methods, so chances are that you will have access to the latest methods for doing data analysis. Other packages can take many years to become up to date with well-known methods.
Having a good knowledge of programming in R is a skill that employers will find valuable. Even if you don't end up using R, and instead use some other software, you will find that learning a new system is made much easier because you already have experience with R. Like spoken languages, knowing one language can help you learn another. You can compare and contrast. Often you will find that other systems for statistical analysis are very inferior to R!