-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path04-chap4.Rmd
103 lines (84 loc) · 4.19 KB
/
04-chap4.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
---
output:
html_document: default
---
# More R Programming {#chap4}
## Data Structures
Data structures are fundamental to any computer language. You have already seen some: vectors, matrices, and arrays. In fact, matrices and arrays are really special types of vectors. Matrices and arrays have a special property: they possess dimensions. Try this:
```{r, eval=FALSE}
v <- 1:9
v
vmat <- v; dim(vmat) <- c(3, 3)
vmat
dim(vmat) <- NULL
vmat
```
See that you can turn a vector into a matrix, and you can turn a matrix into a vector. The key is the ``dim()`` function which allows you to set and get the dimensions of a matrix or vector. The same applies to higher dimensional objects (arrays):
```{r, eval=FALSE}
avec <- 1:16
arr <- array(avec, c(4, 2, 2))
arr
dim(arr)
```
## Lists
Vectors, matrices and arrays are of limited usefulness: they only allow the elements of these data structures to be of the same __type__. That is, vectors (and matrices and arrays) can only have one type of data in them. For example, you can have a vector of numbers:
```{r}
1:10
```
or a vector of characters:
```{r}
LETTERS[1:10]
```
but you can't have a vector that mixes characters and numbers:
```{r}
vec <- c(1, "A")
vec
```
See that R turns the number (1) into a character ("1"). These are not the same thing! This behaviour may or may not be important to your program but it is crucial that you know about it. So how do you combine, say, characters and numbers? You use another data structure: the list.
```{r}
mylist <- list(1, "A")
mylist
```
Here we have a list with two __elements__. The first element is a numeric vector of length 1. It's just the number 1. The second element is a character vector of length 1. It is just the character "A." The way R prints out lists gives us a way to access elements, using the square bracket notation (``[[``, ``[``, etc). So say we want the first element of ``mylist``. We can do:
```{r}
mylist[[1]]
```
Similarly for the second element. (Try it!) Another useful aspect of lists is that the elements can have names:
```{r}
names(mylist) <- c("Number", "Letter")
mylist
mylist$Letter
```
See that if the list has named elements, we can use the ``$`` notation. This notation is less general: lists will always have an ordered set of elements but lists don't always have names attached to the elements. The ``$`` notation is more readable, however.
Note also that unlike vectors and matrices, you can have "nested" lists: lists within lists. This is a very useful property in many circumstances:
```{r}
mylist <- list(list("A", 1), list("B", 2))
mylist
names(mylist) <- c("Element1", "Element2")
mylist
mylist[[1]][[2]] ## accesses the second element of Element1.
mylist$Element1[[1]] ## mix and match!
```
## Data Frames
A data frame is a special type of list. Its main difference is that a data frame is a list with all its elements having the same length. This is the way that statistical data are usually represented: The columns of the data frame represent variables. The rows of the data frame represent observations. Thus, a data frame is "rectangular." Here's a simple example:
```{r}
library(ade4)
data(lizards)
dat <- lizards$traits
head(dat)
```
Here we have used the ``lizards`` data set from the ``ade4`` package. This data set is actually a list: It has elements ``traits`` and two phylogenies: ``hprA`` and ``hprB`` I extracted the data frame from this list and looked at the first 6 lines. It is a ``data.frame`` with 8 variables (columns) and 18 observations, one for each species of lizard in the data set. Since ``dat`` is also a list, we can refer to the columns using the square-bracket notation or the ``$`` notation. Also, data frames have another property: they have dimensions:
```{r}
dim(dat)
```
And you can refer to the elements of the data frame as if they were a matrix:
```{r}
dat[3, 4]
```
This gives the value at the 3rd row and 4th column.
Although each column (variable) has to be of the same length, if you have missing data, you can just put ``NA`` which is the missing data character in R:
```{r}
dat[3, 4] <- NA
dat[1:4, 1:5]
```
Usually you will import data from a file into a data.frame. You can have your NA values in the data file and R will understand them and import them with the rest of the data.