Week4 Functions

In the previous weeks, we have already encountered and worked with some of R’s pre-defined functions that you can use on your data/objects to produce certain results. For example, the mean() function, var() function or even plot() function. Each of these ‘functions’ require one (or more) input variables, then provide some output. Although these functions are readily available for you to use in the base packages, the functions themselves have actually been created from scratch and primarily consist solely of basic programming techniques we have already discussed, e.g. loops, conditional statements etc.

4.1 Creating functions

In this section, we are going to discuss how to create our own functions. There are two main reasons for wanting to create your own functions:

  1. To reuse a series of code over and over again without having to re-write the same code (especially if the code is complex and long)

  2. For other people to use in their programming (similar to how we have already used some of the functions other people have created)

To create a function in R, we need to following steps:

  1. Choose of a name for the function
  2. Consider the input variables that will be required for the function
  3. Use the following lines of code:

functionname <- function(input1, input2, ...){
Commands to execute for function using input variables listed
}

As a basic example, let us re-create the mean() function ourselves from scratch:

mean(1:100) # Remind ourselves how the mean() function works
## [1] 50.5
mean_function <- function(x){
  sum(x)/length(x)
}

mean_function(1:100)
## [1] 50.5
mean_function(5:5000)
## [1] 2502.5

As you can see in the above, once the function has been created and given a name, in this case mean_function() it can now be called and used like any other pre-defined function.

(vec <- rexp(100, rate = 1))
##   [1] 0.233062571 0.287785502 0.635319289 0.665331448 0.099118779 0.319374929
##   [7] 0.030674642 1.862398766 0.189140707 0.399991057 0.112924471 0.402592935
##  [13] 0.300230652 1.007858773 1.133081333 0.177681225 0.959252605 0.796021482
##  [19] 0.860415824 0.309252647 0.571253570 0.113867391 0.138248821 2.041111238
##  [25] 0.657329419 3.031535724 1.174595118 0.010152092 0.865131355 0.758500003
##  [31] 1.113312119 0.420498796 0.026373402 0.295133093 0.424081943 0.202073025
##  [37] 0.360592224 2.477918086 1.987166641 1.925566325 1.849179139 1.307377508
##  [43] 1.264218544 0.187345587 0.878255324 0.116979238 0.211217160 0.805113044
##  [49] 0.004945884 0.955366756 0.199257377 1.067233632 0.612793300 0.893734290
##  [55] 0.536592008 1.602893557 1.501353474 3.579520688 0.463349212 1.719331898
##  [61] 0.062045055 0.350623705 0.976738160 1.247234407 0.248120417 1.872900145
##  [67] 0.198102107 0.369280057 1.147851179 0.960012237 0.924871236 0.623059175
##  [73] 0.925353289 0.468828540 0.296698745 3.807578346 0.066638329 0.534605446
##  [79] 2.539649268 1.436552302 0.836982440 0.603102823 0.177597011 0.495244812
##  [85] 0.103623640 0.409421449 0.099164369 0.308665173 5.306004584 0.345620009
##  [91] 0.262301464 2.016079178 3.418144993 0.240982255 0.480893278 1.417750526
##  [97] 1.776156085 0.546648572 0.141083874 0.850725687
mean(vec)
## [1] 0.8802494
mean_function(vec)
## [1] 0.8802494

As you can see from this simply example, it is even possible to use functions inside functions, e.g. we have used the sum() and length() functions inside our newly created function.

Exercise 4.1 Can you create a function called sum_function which sums up all of the values in a vector without using the predefined sum() function?

Exercise 4.2 Using IF statements, can you create a function that rounds a number to its nearest integer (.5 rounds up)? You cannot use the already pre-defined round() function.

Of course, some functions are much more complicated underneath the surface. For example, the lm() function executes a full ‘linear regression’ fitting to a set of data and returns a variety of information about the fitted model:

fit <- lm(mtcars$mpg ~ mtcars$hp)
summary(fit)
## 
## Call:
## lm(formula = mtcars$mpg ~ mtcars$hp)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.7121 -2.1122 -0.8854  1.5819  8.2360 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 30.09886    1.63392  18.421  < 2e-16 ***
## mtcars$hp   -0.06823    0.01012  -6.742 1.79e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.863 on 30 degrees of freedom
## Multiple R-squared:  0.6024, Adjusted R-squared:  0.5892 
## F-statistic: 45.46 on 1 and 30 DF,  p-value: 1.788e-07

Note: Do not worry about understanding all of this output here, it is included purely for the sake of an example.

Although such functions may look much more complicated, they are still ultimately only made up of combinations of basic commands (albeit many lines of them), the concept is much the same and only requires more thought. As an example of a slightly more complicated function, let us create a prime number calculator:

4.2 Prime number calculator - example

prime <- function(number){
  flag <- 0

  if(number == 2){
    flag <- 1
  } else if (number > 2) {
    # check for factors
    flag <- 1
    for(i in 2:(number-1)) {
      if ((number %% i) == 0) {
        flag <- 0
        break
      }
    }
  }

  if(flag == 1) {
    print(paste(number,"is a prime number"))
  } else {
    print(paste(number,"is not a prime number"))
  }
}

prime(7)
## [1] "7 is a prime number"
prime(986376383)
## [1] "986376383 is not a prime number"

4.3 Multiple Input Variables

In the functions we have created so far, we have only considered one input variable. However, it is possible to create functions with multiple inputs. For example, imagine we wanted to find the accumulated value of an investment over some time period under compound interest. In such a problem, you have three different possible inputs:

  1. Initial investment
  2. annual interest rate
  3. Time (years):
Acc_value <- function(initial, interest, years){
  value <- initial*(1+interest)^years
  return(value)
}

Acc_value(100000, 0.05, 25)
## [1] 338635.5
Acc_value(100,0.05,10)
## [1] 162.8895

Notice how much easier this is now that we have created a function. Before functions, we would have had to define each variable as a set value, then run the calculation and every time we wanted to calculate it for a new set of values, we would have to change them individually and run it all again, i.e.,

initial <- 100000
interest <- 0.05
years <- 25
initial*(1+interest)^years
## [1] 338635.5

Creating functions avoids this tedious problem but also allows us to use them inside other calculations or even other functions (see later).

Before we look at some examples of functions working inside of functions, we note that functions also work with vectors:

(x <- seq(0.01, 0.06, by = 0.005))
##  [1] 0.010 0.015 0.020 0.025 0.030 0.035 0.040 0.045 0.050 0.055 0.060
Acc_value(100000, x, 25)
##  [1] 128243.2 145094.5 164060.6 185394.4 209377.8 236324.5 266583.6 300543.4
##  [9] 338635.5 381339.2 429187.1

In this case, the function works element-by-element wise in the usual way to create a vector of outputs. To see this implemented inside another function, let us consider the following example of plotting the various accumulated values calculated above using the plot() function:

plot(x, Acc_value(100,x, 10), ylab = "Accumulated Value (100)", xlab = "Interest Rate")

This particular use of the function within the plot() function will prove to be very useful for future assessments, where you are typically asked to compare certain quantities under varying conditions (interest rates, terms etc.)

As another example, we recall that there are actually two different types of interest (Simple and Compound). Of course, we could create two separate functions for each of these. However, since these are related it would be nice to have a single function that could deal with both. This can easily by adding a new variable:

Acc_value <- function(initial, interest, years, type){
  if (type == "compound"){
  value <- initial*(1+interest)^years
  return (value)
  } else if (type == "simple"){
    value <- initial*(1+(interest*years))
    return (value)
  } else {
    print("Invalid Interest Type. Must either be 'compound' or 'simple'")
  }
}

Acc_value(100, 0.05, 10, "simple")
## [1] 150
Acc_value(100,0.05,10, "compound")
## [1] 162.8895
Acc_value(11, 0.05, 10, "comp")
## [1] "Invalid Interest Type. Must either be 'compound' or 'simple'"

An alternative way to do this to let the type variable by a Boolean value as we have seen in other functions:

Acc_value <- function(initial, interest, years, compound){
  if (compound == TRUE){
    value <- initial*(1+interest)^years
  } else if (compound == FALSE){
    value <- initial*(1+(interest*years))
  }
  return(value)
}

Acc_value(100, 0.05, 10, compound = FALSE)
## [1] 150

4.3.1 Default options

In some cases, you can have variables within a function that can be changed but more often than not will take a certain value. In this case, you can set a default value for this variable which it will take if not explicitly defined in the function command:

Acc_value <- function(initial, interest, years, compound = TRUE){
  if (compound == TRUE){
    value <- initial*(1+interest)^years
  } else if (compound == FALSE){
    value <- initial*(1+(interest*years))
  }
  return(value)
}


Acc_value(100, 0.05, 10)
## [1] 162.8895
Acc_value(100, 0.05, 10, compound = FALSE)
## [1] 150

Finally, just as a nice example of the above application, we could further develop the interest function above:

Acc_value <- function(initial, interest, years, compound = TRUE, compare = FALSE){

    comp_values <- c(initial)
    for (i in 1:years){
    comp_values <- c(comp_values,initial*(1+interest)^i)
    }
    simp_values <- c(initial)
    for (i in 1:years){
      simp_values <- c(simp_values, initial*(1+(interest*i)))
    }

    if (compare == FALSE){
      if (compound == TRUE){
        return(comp_values[length(comp_values)])
      } else {
        return(simp_values[length(simp_values)])
      }
    }

    if(compare == TRUE){
      x <- 0:years
      plot(x, comp_values, ylab = "Accumulated Value", xlab = "Year", main = "Comparison of Interests", type = "l", col = "red")
      lines(x, simp_values, type = "l", col = "blue")
      legend("bottomright", legend = c("Compound", "Simple"), col = c("red", "blue"), lty = 1)
    }
}

Acc_value(100, 0.05, 10)
## [1] 162.8895
Acc_value(100, 0.05, 10, compound = FALSE)
## [1] 150
Acc_value(100,0.05, 10, compare = TRUE)

Acc_value(10000, 0.04, 50, compare = TRUE)

Now that you understand the basics of how functions work, try having a go at the following exercises.

4.4 Exercises

  1. In the above, we discussed how to re-create the sum() and mean() function from programming basics. In a similar way, create a function called variance that calculates the variance of a vector of values. You are allowed to use the pre-defined sum() and mean() functions inside your variance function. Try doing this in two different ways:
  1. Using For loops
  2. Using vectorised calculations.
  1. Create a function that, given an integer, will calculate how many divisors it has (other than 1 and itself). Make the divisors appear on the screen.
  1. From your ‘Introduction to Actuarial Science’ module, you should have come across the concept of ‘discounting’ and the ‘present value’ of money. Create a function in R called PV that takes 3 input variables: 1) Final value (F), 2) Annual interest rate and 3) Number of years, which calculates the present value of F.
  1. Recall that for geometric summation, we have \[\begin{equation*} \sum_{k = 0}^{n-1} x^k = \frac{1-x^n}{1-x}. \end{equation*}\] Moreover, if \(|x| < 1\), the above summation converges as \(n \rightarrow \infty\), such that \[\begin{equation*} \sum_{k = 0}^{\infty} x^k = \frac{1}{1-x}. \end{equation*}\]

Create a function in R called GeomSum that takes two input variables (x and n) and calculates the geometric sum of x from 0 up to n. It should also be possible to include the option that \(n = \infty\). Hint: You may have to include a Boolean value for this but remember, the above limit only exists under a given condition. If this condition is not satisfied, make the function print out a warning message.

  1. Recall from your ‘Introduction to Actuarial Science’ module that the ‘Accumulated Value’ of an annuity-due with unit payments is defined by \[\begin{equation*} \ddot{s}_{n\rceil} = \sum_{k=1}^n (1+i)^k = \frac{(1+i)^n -1}{i}\times(1+i) \end{equation*}\] Create a function that takes three input variables representing 1) The value of repeated payments, 2) The annual interest rate and 3) The number of years. The function should create a vector with the accumulated value of the investment after each year and plot it on a basic plot against time (see the R Script for a similar example).
  1. Recall the Stock price example from the Applied exercises of the previous chapter. Create a function called Stock that allows the user to input a starting amount, the standard deviation of percentage change (assume the change is normally distributed \(N(0, \sigma^2)\)) and the values of an upper and lower barrier. The function should then plot the movement of the stock and print out the number of days it takes to reach either the upper or lower barrier.

4.5 DataCamp course(s)