Descriptions of exercises are in
green.
Homeworks are in
pink
R is an extremely powerful programming language, broadly used in science for:
To install R for Windows, follow the link and then
click on “Download R – version_number – for Windows”.
To install R for OS X (Macs), follow the link and then click on
“Download R for (Mac) OS X”
To install R for Linux, type the lines below in the terminal.
Run R. You should see a window similar to the one below.
It is the so-called R console. Anything written and executed within a console will be interpreted (calculated) by R, and the result or message will be printed out in the console.
RStudio is a shell and programming environment for the R language. It makes working with R much easier and more intuitive by providing a user interface to R features originally hidden behind R functions. However, remember that all actions can also be performed within the classical R console.
Follow the link, choose the appropriate operating system, and install free RStudio Desktop. Have in mind that R needs to be installed first.
The RStudio interface is divided into four main panels, but today we will focus only on the Console panel, located in the lower-left corner. In the following classes, we will explore the other three panels and learn how to use them effectively.
R understands standard mathematical operators: +
(addition), -
(subtraction), *
(multiplication), /
(division), and ^
(power).
We can perform mathematical operations on single values – numbers, as
well as other objects made up of numbers.
Sum up all numbers from 1 to 10
using the +
operator.
Expected result:
## [1] 55
Raise the result of exercise 1 to
the power of 5.
Expected result:
## [1] 503284375
R also provides 2 additional operators:
%%
- modulo - returns the reminder from the division
%/%
- integer division - returns how many times one number
fits into another
For numbers 10, 156, 557, 777, and
1055, check which are divisible by 7.
Expected result:
## [1] 3
## [1] 2
## [1] 4
## [1] 0
## [1] 5
Calculate the area of a circle if the radius equals 40 meters.
Tip: Due to its unique role in science π
value can
be obtained just by typing ‘pi’.
Expected result:
## [1] 5026.548
Advice: R follows the standard order of mathematical operations. However, it is usually a good practice to use parentheses.
Apart from that, there are also commonly used mathematical functions as:
log()
- natural logarithm
log10()
- logarithm base 10
exp()
- exponent, Euler number raised to a
given power
sin()
, cos()
,
tan()
- trigonometric functions
abs()
- absolute value
You can use them by including the desired number inside the
parentheses, e.g. exponent of e for exp()
or an angle in
radians for any of the trigonometric functions.
Using the equation for Shannon - Wiener index and species frequencies shown below, calculate diversity for both populations separately. Which is more diverse (higher values reflect more diverse populations)?
\[ H_{SW} = \sum_{i=1}^{S} p_i \cdot \ln\left(\frac{1}{p_i}\right) \]
Where:
Species | Population 1 | Population 2 |
---|---|---|
species 1 | 0.8 | 0.2 |
species 2 | 0.1 | 0.2 |
species 3 | 0.1 | 0.6 |
Expected result:
## [1] 0.6390319
## [1] 0.9502705
Until you name something, it does not exist in a computer memory! Any outcome of the execution of a command within the console perishes when the calculation is finished, unless it is assigned to a given name. Named objects within the computer memory are called variables. You can create one by using the arrow (assignment symbol) in the following manner:
chosen_name <- object_to_be_saved
You can easily recall the value of the variable later on by typing its name.
Try to save 5 as a variable. Choose a variable name on your own. Then, type your variable’s name and press Enter.
Expected result:
## [1] 5
Advice: The variable name is case sensitive and cannot
contain blank spaces or start with a digit. When you want to
combine several words into one name, use the underscore
(\_
). By convention, dots are used for function names and
should be used with caution.
Since the variable is saved, its name can replace the actual value in
any R commands, e.g. if 2 is assigned to x, both 2+3
and
x+3
would result in 5.
Using the table from exercise 5,
calculate the range of species frequencies for both populations and save
them as separate variables. Then, using chosen names, calculate the
absolute difference between ranges. Save it to a variable called
range_diff
and call it.
Expected result:
## [1] 0.3
Variables can also be overwritten. It is done by assigning a new object to the already used variable’s name. Remember, however, that once you overwrite the variable, the old value disappears for good.
Change variable
range_diff
by increasing its value by 20%. Call
it.
Expected result:
## [1] 0.36
Variables can store not only numbers. The other very popular type of
data is a string. It is a text that behaves as a single
object regardless of its length. To distinguish strings from variable
names, R requires the use of quotation marks
(""
) around them.
Save your name to the variable
my_name
. Call it.
Expected result:
## [1] "Your Name"
One of the usual ways to deal with your data in R is to use functions. They are simply lines of code saved in a computer memory that perform desired operations and often return a result. Some functions are built into R, and some require additional tools called packages to be installed. Think apps pre-loaded on your phone, like a calculator or a calendar, vs apps for specific uses you need to download separately. We will talk about packages in the next class.
Functions we used before take only a single argument, e.g.,
log()
takes a number. However, it is rarely the case. The
list of the function’s arguments and the way of usage can be found in
the function’s manual. It can be reached by typing a
question mark (?
) followed by the function’s name.
Open the manual for the
paste()
function.
Usually, the manual consists of 7 sections:
Use the paste()
function to stick the following words together: ’I’m’, ‘using’, and ‘R’.
Don’t forget about quotation marks
(""
).
Expected result:
## [1] "I’m using R"
Arguments passed to functions often have their own names. Distinctive
names are crucial because many functions take multiple arguments that
need to be distinguished. Such named arguments are passed in the
following pattern: argument_name = argument_value
.
Use the same function as above,
but set another argument called sep
(separator) to
’_’.
Expected result:
## [1] "I’m_using_R"
Note that the blank space was replaced with an underscore. However,
where did the blank spaces in the Exercise 11 result come from? The
answer is that some arguments have their default values
that would be taken if no value is put into the function. In the above
case, the default value for the sep
argument is a blank
space (” “).
Advice: It is a good practice to use argument names while calling
a function. Although R can “guess” the argument name by the order in
which arguments are typed, it can work improperly when the number of
arguments is not strictly defined (…
sign in function
description).
A vector is a series of numbers (or strings) that are saved as a
single variable. A new vector can be created with
c()
function in the following manner:
c(value_1,value_2,value_3,…)
.
Create a vector containing integers from 5 to 10 and save it to a variable. Call it.
Expected result:
## [1] 5 6 7 8 9 10
Tip: To create a vector of consecutive integers, you can type the limits of the range separated by a colon.
Create a vector containing
integers from 1 to 100 and save it to a variable
one_to_hundred
. Call it.
Expected result:
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
## [19] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
## [37] 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
## [55] 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
## [73] 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
## [91] 91 92 93 94 95 96 97 98 99 100
Advice: Note that concerning ranges, R is fully inclusive, which means that both limits of the range will be included in the outcome.
To create a vector of consecutive numbers that differ by a given
value, use the seq()
function. Note that the function will
return a vector, so there is no need to use c()
.
Access the seq()
manual. Using the seq()
function, create a vector of
numbers between 0 and 1 that differ by 0.1.
Expected result:
## [1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
To create a vector of repeated values, use the rep()
function.
Access the rep()
manual. Using the rep()
function, create a vector
consisting of 1,2, and 3 repeated 20 times. Save it as a variable
repeated
. Call it.
## [1] 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2
## [39] 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3
Create a vector from 1 to 4 with odd numbers typed as digits and even numbers typed as words. What is the outcome? Vector of integers or vector of strings?
Expected result:
## [1] "1" "two" "3" "four"
Useful functions:
min()
- minimum value
max()
- maximum value
sum()
- sum up all numbers in a
vector
prod()
- multiply all numbers in a
vector
mean()
- average value
median()
- central value
length()
- number of elements in a
vector
sort()
- sort values (default is ascending
order, use decreasing argument to sort in descending order)
unique()
- return unique values
round()
- round numbers (to integers by
default)
Having a vector
one_to_hundred
calculate its mean and
median.
Expected results:
## [1] 50.5
## [1] 50.5
Having a vector
repeated
sort it and return unique
values.
Expected results:
## [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## [39] 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
## [1] 1 2 3
Vectors can also be used in typical mathematical operations. However, there is one important rule: the shorter vector will be repeated until it reaches the length of the longer one (a single number is treated as a vector of length 1).
Examples:
c(1,2,3,4) + 5 # 5 is added to each element, the same as c(1,2,3,4) + c(5,5,5,5)
## [1] 6 7 8 9
c(1,2,3,4) + c(1,2) # the same as c(1,2,3,4) + c(1,2,1,2)
## [1] 2 4 4 6
paste("text", seq(1:10)) # Similarly here - the shorter vector including 1 element ("text") is repeated 10 times to reach the length of the longer vector.
## [1] "text 1" "text 2" "text 3" "text 4" "text 5" "text 6" "text 7"
## [8] "text 8" "text 9" "text 10"
The result:
## [1] "cat 1" "dog 2" "mouse 3" "cat 4" "dog 5" "mouse 6" "cat 7"
## [8] "dog 8" "mouse 9" "cat 10"
Having a vector
one_to_hundred
, raise each element to the power of 3. Save
it to a variable called power_3
.
Expected result:
## [1] 1 8 27 64 125 216 343 512 729
## [10] 1000 1331 1728 2197 2744 3375 4096 4913 5832
## [19] 6859 8000 9261 10648 12167 13824 15625 17576 19683
## [28] 21952 24389 27000 29791 32768 35937 39304 42875 46656
## [37] 50653 54872 59319 64000 68921 74088 79507 85184 91125
## [46] 97336 103823 110592 117649 125000 132651 140608 148877 157464
## [55] 166375 175616 185193 195112 205379 216000 226981 238328 250047
## [64] 262144 274625 287496 300763 314432 328509 343000 357911 373248
## [73] 389017 405224 421875 438976 456533 474552 493039 512000 531441
## [82] 551368 571787 592704 614125 636056 658503 681472 704969 729000
## [91] 753571 778688 804357 830584 857375 884736 912673 941192 970299
## [100] 1000000
A vector can be subsetted (=accessed or displayed at
a specific point) in the following manner:
vector_name[element_index]
. Index is the
position of the value in the vector, e.g., 1st, 3rd, 25th, etc.
Return the 15th element of the
vector power_3
.
Expected result:
## [1] 3375
Return the 2nd to 20th element of
the vector power_3
.
Tip: Colon can be used for ranges just as for vector creation.
Expected result:
## [1] 8 27 64 125 216 343 512 729 1000 1331 1728 2197 2744 3375 4096
## [16] 4913 5832 6859 8000
Return the 15th, 30th, and 45th
elements of the vector power_3
.
Tip: To obtain multiple values, put a vector instead single position index.
Expected result:
## [1] 3375 27000 91125
Create a vector including numbers
from 1 to 10, 40 and 55. Save it to a variable. Return corresponding
elements of the vector power_3
with the use of previously
saved variable.
Tip: While creating a vector, you can combine both ranges and
single indexes with c()
function.
Expected result:
## [1] 1 8 27 64 125 216 343 512 729 1000
## [11] 64000 166375
If you want to save your code written during the class, type the commands below to save the R history.
savehistory(file = "my_history.txt") # saves your R history to the file "my_history.txt"
getwd() # displays where the file was saved
Please save all your R
commands in a plain text file and call it
“class_1_homework_Your_Name.txt” (replace “Your_Name” with your actual
name). Then, upload the file to the Class 1
tab on the
Pegaz platform.
1. Create a vector of the number of days in each month called
days
. The vector should have 12 elements and contain only
numbers. Assume it is not a leap year.
2. Using vector days
, find the median number of days in a
month and save it to a new variable called
median_days
.
3. Using vector days
, find the difference (in days) between
the longest and shortest month and save it to a new variable called
range_days
. Find the longest and shortest month using R
functions. 4. Using vector days
, get a vector of unique
month lengths and call a new vector month_length
.
5. Overwrite the days
vector by replacing it with the
number of minutes in each month.