Week2 Conditionals and IF Statements
In R, conditional statements or arguments are used to compare or analyse values/data based on certain conditions. In general, this is done with the use of ‘relational operators’ (=
, >
, <
, >=
, <=
, !=
) and ‘logical operators’ (OR, AND, AND/OR).
2.1 Relational operators
The most basic of the ‘relational operators’ is the equality operator (==
), which can be used to check if two objects (values, vectors, matrices etc.) are equal:
## [1] TRUE
## [1] TRUE
## [1] TRUE
This can also be performed on vectors on an element by element basis (as usual):
## [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [1] FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
Unsurprisingly, it also works on matrices on an element by element basis as well:
## [,1] [,2] [,3]
## [1,] 5 5 5
## [2,] 5 5 5
## [3,] 5 5 5
## [,1] [,2] [,3]
## [1,] FALSE FALSE FALSE
## [2,] FALSE TRUE FALSE
## [3,] FALSE FALSE FALSE
## [,1] [,2] [,3]
## [1,] 5 0 0
## [2,] 0 5 0
## [3,] 0 0 5
## [,1] [,2] [,3]
## [1,] TRUE TRUE TRUE
## [2,] TRUE TRUE TRUE
## [3,] TRUE TRUE TRUE
Notice that this equality operator uses a double equal sign (==
) rather than a single =
. This is due to the fact the single equality sign is already used for assignments (similar to <-
). This can be confusing, can easily cause errors and is the main reason I always suggest using <-
for variable assignment.
Conversely, you can use the not equal operator (!=
) in a similar way
## [1] TRUE
## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
Note - In general, the (!
) symbol negates any type of relational operator or Boolean value in R, e.g.
## [1] FALSE
## [1] TRUE
In a similar way, you should easily be able to understand how the rest of the relational operators work, i.e. (<
, >
, <=
, >=
). In the following example(s), I will introduce you to one of the many pre-programmed data sets that form part of the base package data sets, i.e, mtcars; we will discuss data sets in more details in the next few weeks.
mpg | cyl | disp | hp | drat | wt | qsec | vs | am | gear | carb | |
---|---|---|---|---|---|---|---|---|---|---|---|
Mazda RX4 | 21.0 | 6 | 160.0 | 110 | 3.90 | 2.620 | 16.46 | 0 | 1 | 4 | 4 |
Mazda RX4 Wag | 21.0 | 6 | 160.0 | 110 | 3.90 | 2.875 | 17.02 | 0 | 1 | 4 | 4 |
Datsun 710 | 22.8 | 4 | 108.0 | 93 | 3.85 | 2.320 | 18.61 | 1 | 1 | 4 | 1 |
Hornet 4 Drive | 21.4 | 6 | 258.0 | 110 | 3.08 | 3.215 | 19.44 | 1 | 0 | 3 | 1 |
Hornet Sportabout | 18.7 | 8 | 360.0 | 175 | 3.15 | 3.440 | 17.02 | 0 | 0 | 3 | 2 |
Valiant | 18.1 | 6 | 225.0 | 105 | 2.76 | 3.460 | 20.22 | 1 | 0 | 3 | 1 |
Duster 360 | 14.3 | 8 | 360.0 | 245 | 3.21 | 3.570 | 15.84 | 0 | 0 | 3 | 4 |
Merc 240D | 24.4 | 4 | 146.7 | 62 | 3.69 | 3.190 | 20.00 | 1 | 0 | 4 | 2 |
Merc 230 | 22.8 | 4 | 140.8 | 95 | 3.92 | 3.150 | 22.90 | 1 | 0 | 4 | 2 |
Merc 280 | 19.2 | 6 | 167.6 | 123 | 3.92 | 3.440 | 18.30 | 1 | 0 | 4 | 4 |
Merc 280C | 17.8 | 6 | 167.6 | 123 | 3.92 | 3.440 | 18.90 | 1 | 0 | 4 | 4 |
Merc 450SE | 16.4 | 8 | 275.8 | 180 | 3.07 | 4.070 | 17.40 | 0 | 0 | 3 | 3 |
Merc 450SL | 17.3 | 8 | 275.8 | 180 | 3.07 | 3.730 | 17.60 | 0 | 0 | 3 | 3 |
Merc 450SLC | 15.2 | 8 | 275.8 | 180 | 3.07 | 3.780 | 18.00 | 0 | 0 | 3 | 3 |
Cadillac Fleetwood | 10.4 | 8 | 472.0 | 205 | 2.93 | 5.250 | 17.98 | 0 | 0 | 3 | 4 |
Lincoln Continental | 10.4 | 8 | 460.0 | 215 | 3.00 | 5.424 | 17.82 | 0 | 0 | 3 | 4 |
Chrysler Imperial | 14.7 | 8 | 440.0 | 230 | 3.23 | 5.345 | 17.42 | 0 | 0 | 3 | 4 |
Fiat 128 | 32.4 | 4 | 78.7 | 66 | 4.08 | 2.200 | 19.47 | 1 | 1 | 4 | 1 |
Honda Civic | 30.4 | 4 | 75.7 | 52 | 4.93 | 1.615 | 18.52 | 1 | 1 | 4 | 2 |
Toyota Corolla | 33.9 | 4 | 71.1 | 65 | 4.22 | 1.835 | 19.90 | 1 | 1 | 4 | 1 |
Toyota Corona | 21.5 | 4 | 120.1 | 97 | 3.70 | 2.465 | 20.01 | 1 | 0 | 3 | 1 |
Dodge Challenger | 15.5 | 8 | 318.0 | 150 | 2.76 | 3.520 | 16.87 | 0 | 0 | 3 | 2 |
AMC Javelin | 15.2 | 8 | 304.0 | 150 | 3.15 | 3.435 | 17.30 | 0 | 0 | 3 | 2 |
Camaro Z28 | 13.3 | 8 | 350.0 | 245 | 3.73 | 3.840 | 15.41 | 0 | 0 | 3 | 4 |
Pontiac Firebird | 19.2 | 8 | 400.0 | 175 | 3.08 | 3.845 | 17.05 | 0 | 0 | 3 | 2 |
Fiat X1-9 | 27.3 | 4 | 79.0 | 66 | 4.08 | 1.935 | 18.90 | 1 | 1 | 4 | 1 |
Porsche 914-2 | 26.0 | 4 | 120.3 | 91 | 4.43 | 2.140 | 16.70 | 0 | 1 | 5 | 2 |
Lotus Europa | 30.4 | 4 | 95.1 | 113 | 3.77 | 1.513 | 16.90 | 1 | 1 | 5 | 2 |
Ford Pantera L | 15.8 | 8 | 351.0 | 264 | 4.22 | 3.170 | 14.50 | 0 | 1 | 5 | 4 |
Ferrari Dino | 19.7 | 6 | 145.0 | 175 | 3.62 | 2.770 | 15.50 | 0 | 1 | 5 | 6 |
Maserati Bora | 15.0 | 8 | 301.0 | 335 | 3.54 | 3.570 | 14.60 | 0 | 1 | 5 | 8 |
Volvo 142E | 21.4 | 4 | 121.0 | 109 | 4.11 | 2.780 | 18.60 | 1 | 1 | 4 | 2 |
Exercise 2.1 Assume we want to analyse the hp
(horsepower) variable (column) only. Based on what we discussed last week regarding vector/matrix extraction, how can we extract the hp
data only?
An alternative method of extraction for data sets (data frames) is to use the $
extraction command based on the column/variable name. Note that this only works on data frames and not general matrices, whereas the square bracket extraction works for both:
## [1] 110 110 93 110 175 105 245 62 95 123 123 180 180 180 205 215 230 66 52
## [20] 65 97 150 150 245 175 66 91 113 264 175 335 109
## [1] FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
## [13] FALSE FALSE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
## [25] FALSE FALSE FALSE FALSE TRUE FALSE TRUE FALSE
What do you think will happen if we execute the code sum(HP>200)
and mean(HP>200)
? Have a think about this then check out the solution when you’re ready.
Solution
## [1] 7
## [1] 0.21875
In both of these case, the conditional statement(s) have produced a vector of TRUE
and FALSE
Boolean values. In R, these are understood as being values of 1 and 0 respectively. Hence, it is then possible to take the sum()
or the mean()
over the Boolean values themselves. The above gives an examples of how R understands the Boolean values (TRUE
/FALSE
) as 1 and 0, respectively and also give you an idea of how powerful such simple lines of conditional code can be when used in the right way.
Exercise 2.2 Can you create a vector of all square numbers from 1 to 100 and count how many of these values are divisible by 3? Moreover, can you determine what percentage of them are NOT divisible by 5?
In the next few weeks, we will look in more details at how we can use these relational operators (along with the logical operators discussed below) to conditionally extract data/values from a data.frame. This is a very helpful skill to learn for data handling and manipulation.
2.2 Logical operators
‘Logical operators’ are used to check whether multiple conditions have been satisfied at the same time (AND) or at least one of them (OR). The key to understanding how these work in R, is understanding how logical operators work in theory.
Let us begin with the logical operator ‘AND’ which, in R, is denoted via &
or &&
(I will explain the difference later). For an AND statement/condition to evaluate to TRUE
, both conditions in the statement must be TRUE
. That is, the condition on the left is TRUE
‘AND’ the condition on the right is TRUE
## [1] TRUE
## [1] FALSE
## [1] FALSE
## [1] 3.141593
## [1] TRUE
## [1] TRUE
## [1] TRUE
## [1] FALSE
It is actually possible to have more than two arguments and include different relational operators as well.What do we think the following expression will evaluate to, TRUE
or FALSE
?
## [1] TRUE
As with relational operators, logical operators can also be used in vector form, where the & operator evaluates on a term by term basis, e.g.
## [1] TRUE TRUE TRUE
In fact, this sort of logical/relational operation can also be computed on other objects than just numerical values, i.e. ‘character strings’:
## [1] TRUE
## [1] FALSE
## [1] FALSE
c(1, 2, 3) < c(2, 3, 4) & "Red" == "Blue" # How has this worked? The left hand side is a 3 element vector but the right is a single logical element?
## [1] FALSE FALSE FALSE
## [1] TRUE FALSE TRUE
In contrast to &
which evaluates on a term by term basis, the double &&
reads from left to right and only evaluates the first values of each vector
## Warning in c(1, 2, 3) < c(2, 1, 4) && "Red" == "Red": 'length(x) = 3 > 1' in
## coercion to 'logical(1)'
## [1] TRUE
## Warning in c(5, 2, 3) < c(2, 1, 4) && "Red" == "Red": 'length(x) = 3 > 1' in
## coercion to 'logical(1)'
## [1] FALSE
The second logical operator is the so called OR operator, denoted by |
and ||
, which evaluates to TRUE
as long as ‘at least one statement is TRUE
’, e.g.
## [1] TRUE
## [1] TRUE
## [1] TRUE
## [1] FALSE
## [1] TRUE
The same ideas as were discussed above for &
work also for |
, i.e. |
evaluates element-wise, whilst ||
only evaluates the first element of a vector.
Exercise 2.3 With all this in mind, how can we calculate the number of cars in the mtcars data set that have horsepower greater than 200, mpg at most 30, are automatic but do not have 6 cylinders?
Exercise 2.4 The set of data VADeaths contains the death rates (measured per 1000 population per year), in Virginia, USA, in 1940. The structure of this data set is a matrix (not a data frame) with the rows denoting age ranges and the columns sex/area.
- How can we find out this information (and possibly more) about the data set?
- Extract the two columns containing the female data, either together or separately.
- Using conditional arguments, determine how many age groups have a death rate larger than 20 for rural females and a death rate less than 30 from Urban females.
2.3 IF statements
‘IF’ Statements are extremely popular and powerful tools in programming that are used to execute certain commands, based on given conditions. In most cases, the conditions used within IF statements are built up from combinations of the relational and logical operators seen above.
In general, an IF statement has the following form:
if ( condition ){
command
} else {
command
}
To see how an IF statement works in practice, let us look at a simple example to check if a number is odd or even
## [1] "This number is even"
You can actually make the output even better in this example by asking it to print out the value of \(x\) that has been given by using the paste function paste()
. Notice the variable \(x\) is not in quotation marks but the ‘words’ are.
x <- 14
if (x %% 2 == 0){
print(paste(x, "is an even number"))
} else {
print(paste(x,"is an odd number"))
}
## [1] "14 is an even number"
This is quite a simple example but it is very possible to have more complicated and longer IF statements that contain more conditional possibilities. If this is the case, you can simply extend the IF statement by adding elseif
instead of just else. Finally, once you have finished with all conditions, you finish with else. For example
x <- 7
if (x < 0) {
print(paste(x, "is a negative number"))
} else if (x > 0) {
print(paste(x, "is a positive number"))
} else {
print(paste(x, "is Zero"))
}
## [1] "7 is a positive number"
Exercise 2.5 Can you create an IF statement which tells you whether a number (x) is divisible by another number (y), where both x and y can be changed (not fixed)? Hint: Use the modulus operator %%
.
Looking back at the previous two examples regarding even/odd and positive/negative numbers, we can actually combine these two statements by using logical operators within the IF conditions:
x <- 4
if (x < 0 & x %% 2== 0) {
print(paste(x, "is a negative even number"))
} else if (x < 0) {
print(paste(x,"is a negative odd number"))
} else if (x > 0 & x %% 2 == 0) {
print(paste(x, "is a positive even number"))
} else if (x > 0){
print(paste(x, "is a positive odd number"))
} else {
print(paste(x, "is Zero"))
}
## [1] "4 is a positive even number"
In fact, you could do this an alternative way by ‘nesting’ IF statements inside one another to make several ‘layers’. There is no right or wrong way to do these but through experience you will see either can be used depending on the situation.
x <- 3
if (x < 0) {
if (x %% 2 == 0){
print(paste(x, "is a negative even number"))
} else {
print(paste(x,"is a negative odd number"))}
} else if (x > 0) {
if (x %% 2 == 0){
print(paste(x, "is a positive even number"))
} else {
print(paste(x, "is a positive odd number"))
}
} else {
print(paste(x, "is Zero"))
}
## [1] "3 is a positive odd number"
What happens if we let \(x\) be a vector?
Note - The IF statement will technically work in the sense it will print something out, but it will not do quite what we expect. This is because in an IF statement, the conditions or ‘test statements’ can only be single elements and thus, R will only consider the first element of the vector. With this in mind, it is important to note that if you use a logical operator in an IF statement, it is always best to use the double version, i.e. &&
or ||
.
That being said, it is possible to bypass such a problem using the ifelse()
function. The ifelse()
function allows us to create an IF statement which only has one condition but can be applied to a vector element-wise.
## [1] "Odd" "Even" "Odd"
Note - This only works for quite simple statements.
It is possible to use a more complicated IF statement on a vector as we tried above but to do so we have to introduce the idea of FOR loops, which we will discuss next week!
2.4 Exercises
- Create an R script that calculates the square root of a value, x. If the value contained in x is negative it should return
NA
as output.
- Create an R script that returns the maximum value out of the elements of a numeric vector of length 2 (two elements), without using the min, max or sort functions.
- Use the command
x <- rexp(20, rate = 0.5)
to create a vector containing 20 simulations of an exponential random variable with mean 2. Return the number of values that are larger than the mean of the vector x. You are allowed to use themean()
function.
2.5 Applied exercises
Before proceeding with this exercise, you need to first generate 1,000 random values which will represent your data in throughout the questions. To do this, use the code yearly.returns <- rbeta(1000, 5, 2) - 0.7
.
The values you have generated represent 1000 yearly returns from an asset. Using this data:
- Plot a histogram of the yearly returns for this asset.
- Calculate the sample mean and sample standard deviation (s.d.) for the yearly returns.
The Sharp Ratio is a measure of risk for a given asset calculated by comparing the mean returns to the risk-free rate of interest. That is, if we denote the mean return from an asset by \(r_A\), the standard deviation by \(\sigma_A\) and the risk-free rate of interest is denoted \(r_f\), then the Sharpe ratio is given by \[SR = \frac{r_A-r_f}{\sigma_A}.\]
- Given that the risk-free rate of interest is \(r_f=4\%\), calculate the Sharpe Ratio for this asset. Comment on your result.
- Calculate the proportion of positive (gains) and negative (losses) yearly returns, respectively.
- Calculate the proportion of yearly returns that are larger than 2 s.d. away from the mean.
- Calculate the mean yearly losses. HINT: You can extract elements from vectors/matrices using boolean values, e.g. if
x
is a 2 element vector, thenx[c(TRUE, FALSE)]
will extract the first element but not the second.
- Calculate the s.d. of the losses (downside risk) of the daily returns. Given your answer in part 2., comment on this result.
The Sortino Ratio is another measure of risk for an asset but only takes into account the downside risk of an investment. That is, if we denote the downside risk (deviation) by \(\sigma_A^-\), then the Sortino ratio is given by \[SorR = \frac{r_A-r_f}{\sigma_A^-}.\]
- Given that the risk-free rate of interest \(r_f=4\%\), calculate the Sortino Ratio for this asset. Comment on the difference between this measure and the Sharpe Ratio.
2.6 DataCamp course(s)
- https://www.datacamp.com/courses/intermediate-r (Intermediate R Course)
- https://app.datacamp.com/learn/courses/intermediate-r-for-finance (Intermediate R for Finance Course)