Week2 Conditionals and IF Statements

In R, conditional statements or arguments are used to compare or analyse values/data based on certain conditions. In general, this is done with the use of ‘relational operators’ (=, >, <, >=, <=, !=) and ‘logical operators’ (OR, AND, AND/OR).

2.1 Relational operators

The most basic of the ‘relational operators’ is the equality operator (==), which can be used to check if two objects (values, vectors, matrices etc.) are equal:

4 == 3+1

## [1] TRUE

5^2 == 25

## [1] TRUE

8 %% 5 == 3  # The double percentage sign here resembles modulo arithmetic, i.e. 8 mod 5

## [1] TRUE

This can also be performed on vectors on an element by element basis (as usual):

1:10 == c(1,2,3,4,5,6,7,8,9,10)

##  [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

1:10 == c(0,2,3,4,5,6,7,8,9,10)

##  [1] FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE

Unsurprisingly, it also works on matrices on an element by element basis as well:

matrix(5, nrow = 3, ncol = 3)

##      [,1] [,2] [,3]
## [1,]    5    5    5
## [2,]    5    5    5
## [3,]    5    5    5

matrix(1:9, nrow = 3) == matrix(5, nrow = 3, ncol = 3)

##       [,1]  [,2]  [,3]
## [1,] FALSE FALSE FALSE
## [2,] FALSE  TRUE FALSE
## [3,] FALSE FALSE FALSE

diag(5, nrow = 3, ncol = 3)

##      [,1] [,2] [,3]
## [1,]    5    0    0
## [2,]    0    5    0
## [3,]    0    0    5

diag(5, nrow = 3, ncol = 3) == 5 * diag(1, nrow =3)

##      [,1] [,2] [,3]
## [1,] TRUE TRUE TRUE
## [2,] TRUE TRUE TRUE
## [3,] TRUE TRUE TRUE

Notice that this equality operator uses a double equal sign (==) rather than a single =. This is due to the fact the single equality sign is already used for assignments (similar to <-). This can be confusing, can easily cause errors and is the main reason I always suggest using <- for variable assignment.

Conversely, you can use the not equal operator (!=) in a similar way

3 != 5

## [1] TRUE

seq(1, 10, by = 1) != 1:10

##  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

Note - In general, the (!) symbol negates any type of relational operator or Boolean value in R, e.g.

!TRUE

## [1] FALSE

!FALSE

## [1] TRUE

In a similar way, you should easily be able to understand how the rest of the relational operators work, i.e. (<, >, <=, >=). In the following example(s), I will introduce you to one of the many pre-programmed data sets that form part of the base package data sets, i.e, mtcars; we will discuss data sets in more details in the next few weeks.

mtcars

	mpg	cyl	disp	hp	drat	wt	qsec	vs	am	gear	carb
Mazda RX4	21.0	6	160.0	110	3.90	2.620	16.46	0	1	4	4
Mazda RX4 Wag	21.0	6	160.0	110	3.90	2.875	17.02	0	1	4	4
Datsun 710	22.8	4	108.0	93	3.85	2.320	18.61	1	1	4	1
Hornet 4 Drive	21.4	6	258.0	110	3.08	3.215	19.44	1	0	3	1
Hornet Sportabout	18.7	8	360.0	175	3.15	3.440	17.02	0	0	3	2
Valiant	18.1	6	225.0	105	2.76	3.460	20.22	1	0	3	1
Duster 360	14.3	8	360.0	245	3.21	3.570	15.84	0	0	3	4
Merc 240D	24.4	4	146.7	62	3.69	3.190	20.00	1	0	4	2
Merc 230	22.8	4	140.8	95	3.92	3.150	22.90	1	0	4	2
Merc 280	19.2	6	167.6	123	3.92	3.440	18.30	1	0	4	4
Merc 280C	17.8	6	167.6	123	3.92	3.440	18.90	1	0	4	4
Merc 450SE	16.4	8	275.8	180	3.07	4.070	17.40	0	0	3	3
Merc 450SL	17.3	8	275.8	180	3.07	3.730	17.60	0	0	3	3
Merc 450SLC	15.2	8	275.8	180	3.07	3.780	18.00	0	0	3	3
Cadillac Fleetwood	10.4	8	472.0	205	2.93	5.250	17.98	0	0	3	4
Lincoln Continental	10.4	8	460.0	215	3.00	5.424	17.82	0	0	3	4
Chrysler Imperial	14.7	8	440.0	230	3.23	5.345	17.42	0	0	3	4
Fiat 128	32.4	4	78.7	66	4.08	2.200	19.47	1	1	4	1
Honda Civic	30.4	4	75.7	52	4.93	1.615	18.52	1	1	4	2
Toyota Corolla	33.9	4	71.1	65	4.22	1.835	19.90	1	1	4	1
Toyota Corona	21.5	4	120.1	97	3.70	2.465	20.01	1	0	3	1
Dodge Challenger	15.5	8	318.0	150	2.76	3.520	16.87	0	0	3	2
AMC Javelin	15.2	8	304.0	150	3.15	3.435	17.30	0	0	3	2
Camaro Z28	13.3	8	350.0	245	3.73	3.840	15.41	0	0	3	4
Pontiac Firebird	19.2	8	400.0	175	3.08	3.845	17.05	0	0	3	2
Fiat X1-9	27.3	4	79.0	66	4.08	1.935	18.90	1	1	4	1
Porsche 914-2	26.0	4	120.3	91	4.43	2.140	16.70	0	1	5	2
Lotus Europa	30.4	4	95.1	113	3.77	1.513	16.90	1	1	5	2
Ford Pantera L	15.8	8	351.0	264	4.22	3.170	14.50	0	1	5	4
Ferrari Dino	19.7	6	145.0	175	3.62	2.770	15.50	0	1	5	6
Maserati Bora	15.0	8	301.0	335	3.54	3.570	14.60	0	1	5	8
Volvo 142E	21.4	4	121.0	109	4.11	2.780	18.60	1	1	4	2

Exercise 2.1 Assume we want to analyse the hp (horsepower) variable (column) only. Based on what we discussed last week regarding vector/matrix extraction, how can we extract the hp data only?

An alternative method of extraction for data sets (data frames) is to use the $ extraction command based on the column/variable name. Note that this only works on data frames and not general matrices, whereas the square bracket extraction works for both:

(HP <- mtcars$hp)

##  [1] 110 110  93 110 175 105 245  62  95 123 123 180 180 180 205 215 230  66  52
## [20]  65  97 150 150 245 175  66  91 113 264 175 335 109

HP > 200

##  [1] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE
## [13] FALSE FALSE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE
## [25] FALSE FALSE FALSE FALSE  TRUE FALSE  TRUE FALSE

What do you think will happen if we execute the code sum(HP>200) and mean(HP>200)? Have a think about this then check out the solution when you’re ready.

Solution

sum(HP > 200)

## [1] 7

mean(HP > 200)

## [1] 0.21875

In both of these case, the conditional statement(s) have produced a vector of TRUE and FALSE Boolean values. In R, these are understood as being values of 1 and 0 respectively. Hence, it is then possible to take the sum() or the mean() over the Boolean values themselves.

The above gives an examples of how R understands the Boolean values (TRUE/FALSE) as 1 and 0, respectively and also give you an idea of how powerful such simple lines of conditional code can be when used in the right way.

Exercise 2.2 Can you create a vector of all square numbers from 1 to 100 and count how many of these values are divisible by 3? Moreover, can you determine what percentage of them are NOT divisible by 5?

In the next few weeks, we will look in more details at how we can use these relational operators (along with the logical operators discussed below) to conditionally extract data/values from a data.frame. This is a very helpful skill to learn for data handling and manipulation.

2.2 Logical operators

‘Logical operators’ are used to check whether multiple conditions have been satisfied at the same time (AND) or at least one of them (OR). The key to understanding how these work in R, is understanding how logical operators work in theory.

Let us begin with the logical operator ‘AND’ which, in R, is denoted via & or && (I will explain the difference later). For an AND statement/condition to evaluate to TRUE, both conditions in the statement must be TRUE. That is, the condition on the left is TRUE ‘AND’ the condition on the right is TRUE

TRUE & TRUE

## [1] TRUE

TRUE & FALSE

## [1] FALSE

FALSE & FALSE

## [1] FALSE

pi

## [1] 3.141593

pi > 3

## [1] TRUE

pi < 4

## [1] TRUE

pi > 3 & pi < 4

## [1] TRUE

5 < 10 & 5 < 3

## [1] FALSE

It is actually possible to have more than two arguments and include different relational operators as well.What do we think the following expression will evaluate to, TRUE or FALSE?

pi > 0 & pi < 5 & !(pi %% 2 == 0)

## [1] TRUE

As with relational operators, logical operators can also be used in vector form, where the & operator evaluates on a term by term basis, e.g.

c(1,2,3) < c(2,3,4) & c(2,3,4) < c(3,4,5) # Think about this one a little!

## [1] TRUE TRUE TRUE

In fact, this sort of logical/relational operation can also be computed on other objects than just numerical values, i.e. ‘character strings’:

"Red" == "Red"

## [1] TRUE

"Red" == "Blue"

## [1] FALSE

"Red" == "red"

## [1] FALSE

c(1, 2, 3) < c(2, 3, 4) & "Red" == "Blue" # How has this worked? The left hand side is a 3 element vector but the right is a single logical element?

## [1] FALSE FALSE FALSE

c(1, 2, 3) < c(2, 1, 4) & "Red" == "Red"

## [1]  TRUE FALSE  TRUE

In contrast to & which evaluates on a term by term basis, the double && reads from left to right and only evaluates the first values of each vector

c(1, 2, 3) < c(2, 1, 4) && "Red" == "Red"

## Warning in c(1, 2, 3) < c(2, 1, 4) && "Red" == "Red": 'length(x) = 3 > 1' in
## coercion to 'logical(1)'

## [1] TRUE

c(5, 2, 3) < c(2, 1, 4) && "Red" == "Red"

## Warning in c(5, 2, 3) < c(2, 1, 4) && "Red" == "Red": 'length(x) = 3 > 1' in
## coercion to 'logical(1)'

## [1] FALSE

The second logical operator is the so called OR operator, denoted by | and ||, which evaluates to TRUE as long as ‘at least one statement is TRUE’, e.g.

TRUE | TRUE

## [1] TRUE

TRUE | FALSE

## [1] TRUE

FALSE | TRUE

## [1] TRUE

FALSE | FALSE

## [1] FALSE

F | F | T | F #etc.

## [1] TRUE

The same ideas as were discussed above for & work also for |, i.e. | evaluates element-wise, whilst || only evaluates the first element of a vector.

Exercise 2.3 With all this in mind, how can we calculate the number of cars in the mtcars data set that have horsepower greater than 200, mpg at most 30, are automatic but do not have 6 cylinders?

Exercise 2.4 The set of data VADeaths contains the death rates (measured per 1000 population per year), in Virginia, USA, in 1940. The structure of this data set is a matrix (not a data frame) with the rows denoting age ranges and the columns sex/area.

How can we find out this information (and possibly more) about the data set?
Extract the two columns containing the female data, either together or separately.
Using conditional arguments, determine how many age groups have a death rate larger than 20 for rural females and a death rate less than 30 from Urban females.

2.3 IF statements

‘IF’ Statements are extremely popular and powerful tools in programming that are used to execute certain commands, based on given conditions. In most cases, the conditions used within IF statements are built up from combinations of the relational and logical operators seen above.

In general, an IF statement has the following form:

if ( condition ){
command
} else {
command
}

To see how an IF statement works in practice, let us look at a simple example to check if a number is odd or even

x <- 8

if (x %% 2 == 0){
  print("This number is even")
} else {
  print("This number is odd")
}

## [1] "This number is even"

You can actually make the output even better in this example by asking it to print out the value of $x$ that has been given by using the paste function paste(). Notice the variable $x$ is not in quotation marks but the ‘words’ are.

x <- 14

if (x %% 2 == 0){
  print(paste(x, "is an even number"))
} else {
  print(paste(x,"is an odd number"))
}

## [1] "14 is an even number"

This is quite a simple example but it is very possible to have more complicated and longer IF statements that contain more conditional possibilities. If this is the case, you can simply extend the IF statement by adding elseif instead of just else. Finally, once you have finished with all conditions, you finish with else. For example

x <- 7

if (x < 0) {
  print(paste(x, "is a negative number"))
} else if (x > 0) {
  print(paste(x, "is a positive number"))
} else {
  print(paste(x, "is Zero"))
}

## [1] "7 is a positive number"

Exercise 2.5 Can you create an IF statement which tells you whether a number (x) is divisible by another number (y), where both x and y can be changed (not fixed)? Hint: Use the modulus operator %%.

Looking back at the previous two examples regarding even/odd and positive/negative numbers, we can actually combine these two statements by using logical operators within the IF conditions:

x <- 4

if (x < 0 & x %% 2== 0) {
    print(paste(x, "is a negative even number"))
  } else if (x < 0) {
    print(paste(x,"is a negative odd number"))
} else if (x > 0 & x %% 2 == 0) {
    print(paste(x, "is a positive even number"))
  } else if (x > 0){
    print(paste(x, "is a positive odd number"))
} else {
  print(paste(x, "is Zero"))
}

## [1] "4 is a positive even number"

In fact, you could do this an alternative way by ‘nesting’ IF statements inside one another to make several ‘layers’. There is no right or wrong way to do these but through experience you will see either can be used depending on the situation.

x <- 3

if (x < 0) {
  if (x %% 2 == 0){
  print(paste(x, "is a negative even number"))
  } else {
    print(paste(x,"is a negative odd number"))}
} else if (x > 0) {
  if (x %% 2 == 0){
  print(paste(x, "is a positive even number"))
  } else {
    print(paste(x, "is a positive odd number"))
  }
} else {
  print(paste(x, "is Zero"))
}

## [1] "3 is a positive odd number"

What happens if we let $x$ be a vector?

Note - The IF statement will technically work in the sense it will print something out, but it will not do quite what we expect. This is because in an IF statement, the conditions or ‘test statements’ can only be single elements and thus, R will only consider the first element of the vector. With this in mind, it is important to note that if you use a logical operator in an IF statement, it is always best to use the double version, i.e. && or ||.

That being said, it is possible to bypass such a problem using the ifelse() function. The ifelse() function allows us to create an IF statement which only has one condition but can be applied to a vector element-wise.

x <- c(1, 2, 3)
ifelse(x %% 2 == 0, "Even", "Odd")

## [1] "Odd"  "Even" "Odd"

Note - This only works for quite simple statements.

It is possible to use a more complicated IF statement on a vector as we tried above but to do so we have to introduce the idea of FOR loops, which we will discuss next week!

2.4 Exercises

Create an R script that calculates the square root of a value, x. If the value contained in x is negative it should return NA as output.

Create an R script that returns the maximum value out of the elements of a numeric vector of length 2 (two elements), without using the min, max or sort functions.

Use the command x <- rexp(20, rate = 0.5) to create a vector containing 20 simulations of an exponential random variable with mean 2. Return the number of values that are larger than the mean of the vector x. You are allowed to use the mean() function.

2.5 Applied exercises

Before proceeding with this exercise, you need to first generate 1,000 random values which will represent your data in throughout the questions. To do this, use the code yearly.returns <- rbeta(1000, 5, 2) - 0.7.

The values you have generated represent 1000 yearly returns from an asset. Using this data:

Plot a histogram of the yearly returns for this asset.

Calculate the sample mean and sample standard deviation (s.d.) for the yearly returns.

The Sharp Ratio is a measure of risk for a given asset calculated by comparing the mean returns to the risk-free rate of interest. That is, if we denote the mean return from an asset by $r_A$, the standard deviation by $\sigma_A$ and the risk-free rate of interest is denoted $r_f$, then the Sharpe ratio is given by \[SR = \frac{r_A-r_f}{\sigma_A}.\]

Given that the risk-free rate of interest is $r_f=4\%$, calculate the Sharpe Ratio for this asset. Comment on your result.

Calculate the proportion of positive (gains) and negative (losses) yearly returns, respectively.

Calculate the proportion of yearly returns that are larger than 2 s.d. away from the mean.

Calculate the mean yearly losses. HINT: You can extract elements from vectors/matrices using boolean values, e.g. if x is a 2 element vector, then x[c(TRUE, FALSE)] will extract the first element but not the second.

Calculate the s.d. of the losses (downside risk) of the daily returns. Given your answer in part 2., comment on this result.

The Sortino Ratio is another measure of risk for an asset but only takes into account the downside risk of an investment. That is, if we denote the downside risk (deviation) by $\sigma_A^-$, then the Sortino ratio is given by \[SorR = \frac{r_A-r_f}{\sigma_A^-}.\]

Given that the risk-free rate of interest $r_f=4\%$, calculate the Sortino Ratio for this asset. Comment on the difference between this measure and the Sharpe Ratio.

2.6 DataCamp course(s)

https://www.datacamp.com/courses/intermediate-r (Intermediate R Course)
https://app.datacamp.com/learn/courses/intermediate-r-for-finance (Intermediate R for Finance Course)