Week2 Conditionals and IF Statements

In R, conditional statements or arguments are used to compare or analyse values/data based on certain conditions. In general, this is done with the use of ‘relational operators’ (=, >, <, >=, <=, !=) and ‘logical operators’ (OR, AND, AND/OR).

2.1 Relational operators

The most basic of the ‘relational operators’ is the equality operator (==), which can be used to check if two objects (values, vectors, matrices etc.) are equal:

4 == 3+1
## [1] TRUE
5^2 == 25
## [1] TRUE
8 %% 5 == 3  # The double percentage sign here resembles modulo arithmetic, i.e. 8 mod 5
## [1] TRUE

This can also be performed on vectors on an element by element basis (as usual):

1:10 == c(1,2,3,4,5,6,7,8,9,10)
##  [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
1:10 == c(0,2,3,4,5,6,7,8,9,10)
##  [1] FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE

Unsurprisingly, it also works on matrices on an element by element basis as well:

matrix(5, nrow = 3, ncol = 3)
##      [,1] [,2] [,3]
## [1,]    5    5    5
## [2,]    5    5    5
## [3,]    5    5    5
matrix(1:9, nrow = 3) == matrix(5, nrow = 3, ncol = 3)
##       [,1]  [,2]  [,3]
## [1,] FALSE FALSE FALSE
## [2,] FALSE  TRUE FALSE
## [3,] FALSE FALSE FALSE
diag(5, nrow = 3, ncol = 3)
##      [,1] [,2] [,3]
## [1,]    5    0    0
## [2,]    0    5    0
## [3,]    0    0    5
diag(5, nrow = 3, ncol = 3) == 5 * diag(1, nrow =3)
##      [,1] [,2] [,3]
## [1,] TRUE TRUE TRUE
## [2,] TRUE TRUE TRUE
## [3,] TRUE TRUE TRUE

Notice that this equality operator uses a double equal sign (==) rather than a single =. This is due to the fact the single equality sign is already used for assignments (similar to <-). This can be confusing, can easily cause errors and is the main reason I always suggest using <- for variable assignment.

Conversely, you can use the not equal operator (!=) in a similar way

3 != 5
## [1] TRUE
seq(1, 10, by = 1) != 1:10
##  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

Note - In general, the (!) symbol negates any type of relational operator or Boolean value in R, e.g.

!TRUE
## [1] FALSE
!FALSE
## [1] TRUE

In a similar way, you should easily be able to understand how the rest of the relational operators work, i.e. (<, >, <=, >=). In the following example(s), I will introduce you to one of the many pre-programmed data sets that form part of the base package data sets, i.e, mtcars; we will discuss data sets in more details in the next few weeks.

mtcars
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2

Exercise 2.1 Assume we want to analyse the hp (horsepower) variable (column) only. Based on what we discussed last week regarding vector/matrix extraction, how can we extract the hp data only?

An alternative method of extraction for data sets (data frames) is to use the $ extraction command based on the column/variable name. Note that this only works on data frames and not general matrices, whereas the square bracket extraction works for both:

(HP <- mtcars$hp)
##  [1] 110 110  93 110 175 105 245  62  95 123 123 180 180 180 205 215 230  66  52
## [20]  65  97 150 150 245 175  66  91 113 264 175 335 109
HP > 200
##  [1] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE
## [13] FALSE FALSE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE
## [25] FALSE FALSE FALSE FALSE  TRUE FALSE  TRUE FALSE

What do you think will happen if we execute the code sum(HP>200) and mean(HP>200)? Have a think about this then check out the solution when you’re ready.

Solution
sum(HP > 200)
## [1] 7
mean(HP > 200)
## [1] 0.21875
In both of these case, the conditional statement(s) have produced a vector of TRUE and FALSE Boolean values. In R, these are understood as being values of 1 and 0 respectively. Hence, it is then possible to take the sum() or the mean() over the Boolean values themselves.

The above gives an examples of how R understands the Boolean values (TRUE/FALSE) as 1 and 0, respectively and also give you an idea of how powerful such simple lines of conditional code can be when used in the right way.

Exercise 2.2 Can you create a vector of all square numbers from 1 to 100 and count how many of these values are divisible by 3? Moreover, can you determine what percentage of them are NOT divisible by 5?

In the next few weeks, we will look in more details at how we can use these relational operators (along with the logical operators discussed below) to conditionally extract data/values from a data.frame. This is a very helpful skill to learn for data handling and manipulation.

2.2 Logical operators

‘Logical operators’ are used to check whether multiple conditions have been satisfied at the same time (AND) or at least one of them (OR). The key to understanding how these work in R, is understanding how logical operators work in theory.

Let us begin with the logical operator ‘AND’ which, in R, is denoted via & or && (I will explain the difference later). For an AND statement/condition to evaluate to TRUE, both conditions in the statement must be TRUE. That is, the condition on the left is TRUE ‘AND’ the condition on the right is TRUE

TRUE & TRUE
## [1] TRUE
TRUE & FALSE
## [1] FALSE
FALSE & FALSE
## [1] FALSE
pi
## [1] 3.141593
pi > 3
## [1] TRUE
pi < 4
## [1] TRUE
pi > 3 & pi < 4
## [1] TRUE
5 < 10 & 5 < 3
## [1] FALSE

It is actually possible to have more than two arguments and include different relational operators as well.What do we think the following expression will evaluate to, TRUE or FALSE?

pi > 0 & pi < 5 & !(pi %% 2 == 0)
## [1] TRUE

As with relational operators, logical operators can also be used in vector form, where the & operator evaluates on a term by term basis, e.g.

c(1,2,3) < c(2,3,4) & c(2,3,4) < c(3,4,5) # Think about this one a little!
## [1] TRUE TRUE TRUE

In fact, this sort of logical/relational operation can also be computed on other objects than just numerical values, i.e. ‘character strings’:

"Red" == "Red"
## [1] TRUE
"Red" == "Blue"
## [1] FALSE
"Red" == "red"
## [1] FALSE
c(1, 2, 3) < c(2, 3, 4) & "Red" == "Blue" # How has this worked? The left hand side is a 3 element vector but the right is a single logical element?
## [1] FALSE FALSE FALSE
c(1, 2, 3) < c(2, 1, 4) & "Red" == "Red"
## [1]  TRUE FALSE  TRUE

In contrast to & which evaluates on a term by term basis, the double && reads from left to right and only evaluates the first values of each vector

c(1, 2, 3) < c(2, 1, 4) && "Red" == "Red"
## Warning in c(1, 2, 3) < c(2, 1, 4) && "Red" == "Red": 'length(x) = 3 > 1' in
## coercion to 'logical(1)'
## [1] TRUE
c(5, 2, 3) < c(2, 1, 4) && "Red" == "Red"
## Warning in c(5, 2, 3) < c(2, 1, 4) && "Red" == "Red": 'length(x) = 3 > 1' in
## coercion to 'logical(1)'
## [1] FALSE

The second logical operator is the so called OR operator, denoted by | and ||, which evaluates to TRUE as long as ‘at least one statement is TRUE’, e.g.

TRUE | TRUE
## [1] TRUE
TRUE | FALSE
## [1] TRUE
FALSE | TRUE
## [1] TRUE
FALSE | FALSE
## [1] FALSE
F | F | T | F #etc.
## [1] TRUE

The same ideas as were discussed above for & work also for |, i.e. | evaluates element-wise, whilst || only evaluates the first element of a vector.

Exercise 2.3 With all this in mind, how can we calculate the number of cars in the mtcars data set that have horsepower greater than 200, mpg at most 30, are automatic but do not have 6 cylinders?

Exercise 2.4 The set of data VADeaths contains the death rates (measured per 1000 population per year), in Virginia, USA, in 1940. The structure of this data set is a matrix (not a data frame) with the rows denoting age ranges and the columns sex/area.

  1. How can we find out this information (and possibly more) about the data set?
  2. Extract the two columns containing the female data, either together or separately.
  3. Using conditional arguments, determine how many age groups have a death rate larger than 20 for rural females and a death rate less than 30 from Urban females.

2.3 IF statements

‘IF’ Statements are extremely popular and powerful tools in programming that are used to execute certain commands, based on given conditions. In most cases, the conditions used within IF statements are built up from combinations of the relational and logical operators seen above.

In general, an IF statement has the following form:

if ( condition ){
command
} else {
command
}

To see how an IF statement works in practice, let us look at a simple example to check if a number is odd or even

x <- 8

if (x %% 2 == 0){
  print("This number is even")
} else {
  print("This number is odd")
}
## [1] "This number is even"

You can actually make the output even better in this example by asking it to print out the value of \(x\) that has been given by using the paste function paste(). Notice the variable \(x\) is not in quotation marks but the ‘words’ are.

x <- 14

if (x %% 2 == 0){
  print(paste(x, "is an even number"))
} else {
  print(paste(x,"is an odd number"))
}
## [1] "14 is an even number"

This is quite a simple example but it is very possible to have more complicated and longer IF statements that contain more conditional possibilities. If this is the case, you can simply extend the IF statement by adding elseif instead of just else. Finally, once you have finished with all conditions, you finish with else. For example

x <- 7

if (x < 0) {
  print(paste(x, "is a negative number"))
} else if (x > 0) {
  print(paste(x, "is a positive number"))
} else {
  print(paste(x, "is Zero"))
}
## [1] "7 is a positive number"

Exercise 2.5 Can you create an IF statement which tells you whether a number (x) is divisible by another number (y), where both x and y can be changed (not fixed)? Hint: Use the modulus operator %%.

Looking back at the previous two examples regarding even/odd and positive/negative numbers, we can actually combine these two statements by using logical operators within the IF conditions:

x <- 4

if (x < 0 & x %% 2== 0) {
    print(paste(x, "is a negative even number"))
  } else if (x < 0) {
    print(paste(x,"is a negative odd number"))
} else if (x > 0 & x %% 2 == 0) {
    print(paste(x, "is a positive even number"))
  } else if (x > 0){
    print(paste(x, "is a positive odd number"))
} else {
  print(paste(x, "is Zero"))
}
## [1] "4 is a positive even number"

In fact, you could do this an alternative way by ‘nesting’ IF statements inside one another to make several ‘layers’. There is no right or wrong way to do these but through experience you will see either can be used depending on the situation.

x <- 3

if (x < 0) {
  if (x %% 2 == 0){
  print(paste(x, "is a negative even number"))
  } else {
    print(paste(x,"is a negative odd number"))}
} else if (x > 0) {
  if (x %% 2 == 0){
  print(paste(x, "is a positive even number"))
  } else {
    print(paste(x, "is a positive odd number"))
  }
} else {
  print(paste(x, "is Zero"))
}
## [1] "3 is a positive odd number"

What happens if we let \(x\) be a vector?

Note - The IF statement will technically work in the sense it will print something out, but it will not do quite what we expect. This is because in an IF statement, the conditions or ‘test statements’ can only be single elements and thus, R will only consider the first element of the vector. With this in mind, it is important to note that if you use a logical operator in an IF statement, it is always best to use the double version, i.e. && or ||.

That being said, it is possible to bypass such a problem using the ifelse() function. The ifelse() function allows us to create an IF statement which only has one condition but can be applied to a vector element-wise.

x <- c(1, 2, 3)
ifelse(x %% 2 == 0, "Even", "Odd")
## [1] "Odd"  "Even" "Odd"

Note - This only works for quite simple statements.

It is possible to use a more complicated IF statement on a vector as we tried above but to do so we have to introduce the idea of FOR loops, which we will discuss next week!

2.4 Exercises

  1. Create an R script that calculates the square root of a value, x. If the value contained in x is negative it should return NA as output.
  1. Create an R script that returns the maximum value out of the elements of a numeric vector of length 2 (two elements), without using the min, max or sort functions.
  1. Use the command x <- rexp(20, rate = 0.5) to create a vector containing 20 simulations of an exponential random variable with mean 2. Return the number of values that are larger than the mean of the vector x. You are allowed to use the mean() function.

2.5 Applied exercises

Before proceeding with this exercise, you need to first generate 1,000 random values which will represent your data in throughout the questions. To do this, use the code yearly.returns <- rbeta(1000, 5, 2) - 0.7.

The values you have generated represent 1000 yearly returns from an asset. Using this data:

  1. Plot a histogram of the yearly returns for this asset.
  1. Calculate the sample mean and sample standard deviation (s.d.) for the yearly returns.

The Sharp Ratio is a measure of risk for a given asset calculated by comparing the mean returns to the risk-free rate of interest. That is, if we denote the mean return from an asset by \(r_A\), the standard deviation by \(\sigma_A\) and the risk-free rate of interest is denoted \(r_f\), then the Sharpe ratio is given by \[SR = \frac{r_A-r_f}{\sigma_A}.\]

  1. Given that the risk-free rate of interest is \(r_f=4\%\), calculate the Sharpe Ratio for this asset. Comment on your result.
  1. Calculate the proportion of positive (gains) and negative (losses) yearly returns, respectively.
  1. Calculate the proportion of yearly returns that are larger than 2 s.d. away from the mean.
  1. Calculate the mean yearly losses. HINT: You can extract elements from vectors/matrices using boolean values, e.g. if x is a 2 element vector, then x[c(TRUE, FALSE)] will extract the first element but not the second.
  1. Calculate the s.d. of the losses (downside risk) of the daily returns. Given your answer in part 2., comment on this result.

The Sortino Ratio is another measure of risk for an asset but only takes into account the downside risk of an investment. That is, if we denote the downside risk (deviation) by \(\sigma_A^-\), then the Sortino ratio is given by \[SorR = \frac{r_A-r_f}{\sigma_A^-}.\]

  1. Given that the risk-free rate of interest \(r_f=4\%\), calculate the Sortino Ratio for this asset. Comment on the difference between this measure and the Sharpe Ratio.

2.6 DataCamp course(s)