knitr::opts_chunk$set(echo = F)
RStudio has three great advantages over the classical R console:
The RStudio window consists of four main panels:
Code editor - Here you can write your code and
save it in a text file (.R). You can run it anytime by highlighting the
given piece of code and clicking Run
in the top-right
corner (also Ctrl + Enter).
Console - The same console as in standard R (see
Class 1). Note that any line of code run from the Code editor will
appear in the console. To recall the last run line of code, use
Arrow-up
(it works as a general way of browsing through the
history of the executed commands).
Environment / History
- Environment - shows all the objects that you created
or loaded into your computer’s memory (RAM). Have in mind they will be
lost as soon as you close RStudio.
- History - access to the entire code run in a given
RStudio session.
Files / Plots / Help / Packages
- Files - manage files on your computer.
- Plots - preview generated plots before saving
them.
- Help - access help page (often with examples) for a
given function.
- Packages - manage all additionally installed
packages.
In the previous class, working directly in the R Console was sufficient — we simply typed and executed individual commands. However, imagine that you now want to perform a series of operations on multiple datasets or rerun the same analysis in the future. Typing the same commands over and over would be inefficient and error-prone.
This is where R scripts come in. Scripts allow you to save your code in a file, so you can organize, reuse, modify, and share your analyses easily. To create a new R script, go to File → New File → R Script or New File icon → R Script. Alternatively, use the keyboard shortcut Ctrl + Shift + N. Your new R script file will open in the Code editor (top-left panel). Remember to save your script when you are done (File → Save As..), so you can return to it later.
To run code from an R script, place the cursor on the desired
line (or highlight multiple lines) and press Ctrl +
Enter or click the Run
button. The results will
appear in the Console.
Create a new script file and save it. Call it as you wish. Write code for all subsequent exercises using the Code editor. Remember that to obtain any outcome/result, you need to run your code first.
To make your code easier to read and understand, it is a good practice to add comments. Comments help explain what specific sections of the code are doing. This may be useful for other people using your script, and for you as well. Even if you know what your code is doing, after some time, you may forget it, and you may have to spend a lot of time understanding its functionality again.
In R you can add comments to your script by starting the line
with the hashtag #
. Apart from adding comments to
your script hashtag can also be used to “turn off” a chunk of
your code. Anything to the right of the hashtag will
not be taken by R as a command. To comment several
lines, highlight them and use Ctrl+Shift+c.
Repeat it to uncomment. When you remove the hashtag, the command will be
executed when you run your script again.
The working directory is a folder on your computer
where R is operating at the moment. Think about it as a
separate room for a given analysis. R will look for and save all files
in the working directory by default. You can check the current working
directory by typing getwd()
.
To set a new working directory, choose Session → Set Working
Directory → Choose Directory, or use the setwd()
function
with a destination path provided as an argument. Unless you continue a
previous analysis, always set a new working
directory.
Note that if you are using the setwd()
command, the
Files panel will not update automatically. To display
files from your working directory, use the
rstudioapi::filesPaneNavigate(getwd())
command or choose
the proper folder manually, by clicking the three dots
...
icon in the top-right corner of the
Files panel.
Create a new folder on your computer and set it as a working directory.
Curiosity
Although it is usually easier to create folders traditionally, using your operating system, keep in mind that you can also do it through R by usingdir.create()
function.
“So it begins” - Théoden, 3019 TE
A huge part of preventing mistakes and reducing the amount of frustration in your work is developing good programming habits and having an established order of doing things. Now we will go through some parts of the programming workflow.
One of the advantages of RStudio is that it can save a lot of your work in between coding sessions. However, this can also generate some problems, as it can be easy to forget what you were doing last time. Having consistent setup rules can help with this.
Every time you boot up RStudio, remember to:
- create and save a new script
file – if you don’t have one already
- copy and update the name of your
script file (e.g. with that day’s date) – if you are
working on a previously created file and want to keep track of
multiple versions - set up a
working directory – easiest way to keep track of this
is to have a line of code at the beginning of the
script; if you set up a working directory through the RStudio
user interface, you can copy the code from the console into the
script
- clear the RStudio
environment – this is important, especially when
switching between scripts/projects you are working
on
- load any packages that the script is
using – further explanation of this will be in the
section about packages; for now, remember that some functions we will
use later on are not present in base R, so if RStudio
suddenly doesn’t want to run code with a function that you’ve been using
before, it might be because it comes from a package that isn’t
currently loaded
There are two main ways of uploading files into the RStudio environment – by calling a function in your code, and through the user interface. Which one you pick depends on you. If you want to upload an MS Excel file through the user interface, go to the files section of the screen (lower right square section). Then, pick the files tab.
Remember that, by default, the tab will display the content of your working directory. You can pick a file that is not in your working directory; however, it is good practice to keep a copy of the files you’re working on in the folder set as your working directory.
Once you find the file, you can click on it, select import dataset, and navigate through the upload process. Through the upload menu, you can set up the name of the file, the sheet in the Excel file you want to upload, and the range of data you want to include. R also generates code that you can then copy and include in your R script.
Uploading an MS Excel file to R should create an
object called a data frame, which is basically a
table with columns and rows. You can check whether the
object you created is a proper data frame by using the
class()
function.
Download the class_2_data.xlsx file and save it in your working directory. Then, upload the file into R using the RStudio user interface.
Notice that, when you upload your data from an MS Excel file, one of
the lines of code that RStudio generated is
library(readxl)
. This is because there is no way in
base R to upload .xlsx files – we need an external package,
readxl
.
A package is a collection of functions not available in the base R. Think about it like a toolbox, or an app for your phone that you need to install for a specific function. Base R packages are downloaded at the same time as RStudio is installed; others you need to find and download yourself.
For two reasons. Firstly, if every time you opened RStudio, it would have to load every single function in it, even the fastest laptop would take many minutes to open R. This is why some packages can be found in RStudio from the moment you install it, but are not loaded (=activated) until you ask RStudio to do it. You can browse a list of packages in the packages tab, in the same lower right corner as the file upload.
Secondly, the R programming language is a collective non-profit project – thousands of scientists and programmers have contributed to it since it was created. You do not need all of these packages. Many hyper-specific packages are being introduced every year for very particular research and analysis purposes. This allows people to personalize their R experience and contribute to its development.
Check the Packages
tab (the lower right panel). You will see a list of installed packages;
loaded packages have a checkmark (✅). Did any packages load
automatically when you started R?
Remember, if you are using someone else’s package in your research, you need to cite it in your articles the same way as you would someone else’s lab protocol. Usually, when a new package is introduced, a manual is published at the same time, and the appropriate citation is provided in the package instructions.
Curiosity
Although R was developed primarily for data science and statistics, it can be used for many different purposes. For example, there are some text-based games for R. Check out these (useless but fun) R packages.
If the package you want to use is already present in RStudio, all you
need to do is load it using the library()
function.
However, if the package is not installed, you will need to
install it first. You can do it using a line of code
provided by the package creator or the install.packages()
function. Remember to put the name of the package in quotation
marks (""
). Once a package is installed, you need
to load it. An installed package will remain in R
memory for as long as the app itself is not reinstalled.
As we learned in the first class, you can access the
documentation of a package or function by typing
?package_or_function_name
(for example,
?dplyr
). Keep in mind that for functions from
packages outside of base R, you can view the manual only after
the package has been loaded.
Remember that every time you close the RStudio app, it turns off any
packages you were using (= puts away the toolbox). So every time you
close and open the app again, you need to run the library()
command again. Base R packages do not need to be
loaded!
Install the ggplot2
package using the install.packages()
function. Check the
Packages
tab - a newly installed package should appear
there!
Install the
tidyverse
package. Then, see here to learn more about
tidyverse
. Load the dplyr
package.
Advice: The more packages you install, the longer it will take for RStudio to open; it can also slow down the app. For most computers, this difference is pretty small; however, if you have an older laptop, it is better not to install every package under the sun.
For the sake of keeping data clean and files simple and compatible, it can be a good idea to work on more universal, plain-text files than complex .xlsx files:
They can be easily uploaded and downloaded from R. Additionally, .csv files are much lighter in “weight”, which can be crucial for working with very large datasets, e.g. output of genetic sequencers, continuous trackers of environmental conditions, or results of laboratory tests.
For Polish speakers: Using Polish versions of some software (e.g., Excel) with default settings of the operating system may generate problems with .csv files, as the comma is used as the decimal delimiter. Excel (in Polish) by default creates a .csv file with columns separated by semicolon (;). Then, arguments of importing R functions need to be properly adjusted.
In Excel, save the “class_2_data.xlsx” file as a “class_2_data.csv” file. Remember to set it as a comma-separated file.
Uploading a .csv file into R does not require any additional
packages. We can do it using a read.csv()
or
read.table()
function. read.table()
takes one
main argument – the name of the file (including the file
extension!), written in quotation marks.
Remember that if this file is not in your working directory, you will
need to provide the absolute path to it.
Additionally, we need to provide extra arguments to make sure the file is read correctly. These arguments are, for example:
TRUE
if the top row corresponds to column descriptors and
to FALSE
if the top row starts immediately from data,
, _
, or
) was used to separate
values in the file; by default, .csv files should be comma
(,
) separated, but might be set to something else – you can
check by opening the .csv file through a basic text editorIf you are trying to upload a file with non-Latin characters and R is displaying them as nonsense symbols, you can try the following: open your .csv as a text document in the most basic text editor and save it with the UTF-8 encoding; then, while uploading it to R, make sure to specify encoding as UTF-8; this should fix the problem.
The read.csv()
function can work with just one argument
– the file name. However, as the name suggests, it only works with
standard .csv files and does not allow for certain kinds of
customization through arguments.
Check the
read.table()
and the read.csv()
function
manuals. Upload the “class_2_data.csv” file into the R environment using
both functions. Make sure what additional arguments you should use for
both functions to upload files properly.
In most of your analyses, you will not need to download a modified
data frame out of R – you will only be interested in the results of
certain tests or graphs that R will generate. However, if you want to
download the data frame, you can use:
- write.table()
– for saving a .csv file; uses similar
arguments to read.table()
- write_xlsx()
– requires the writexl
package
to be installed and loaded
R is a constantly growing and updating community. If you need something done in R, chances are, someone has made a library or a function for it.
Tips for finding R solutions on the Internet:
Search the Internet for an R function that can rename columns in a data frame. Install and load a package if necessary. Try using the RStudio manual to figure out how to use the function that can perform this task.
Remember that the manual for any installed function can be pulled up
by typing a question mark (?
) before its name.
If you are attempting to use someone else’s code and you need to adapt
it for your purposes, consider the following:
Find a script online that uses R to calculate time differences between two dates. Try adjusting it to calculate how many days are left in this year.
You will likely encounter warning messages and error messages daily while using R. This does not mean you are careless or bad at coding – errors and warnings exist to guide you in the process of using R. Because computers are very particular about details and can be unintuitive for us, it takes effort to understand them. Knowing how to react to errors is a big component of an effective coding workflow.
Whenever RStudio finds an issue in the code, it will display a message in the console. When in doubt, you can always copy the message and paste it into Google search.
Warning messages are alerts from R about things that it considers potentially problematic, but that do not prevent the code from running. This can happen if running the code introduces changes to files that R wants to alert you about, or when there are conflicts between certain functions, or when R detects unused bits in your code.
Some common warning messages are:
NA
value instead,
and warn you about this changeSome warning messages can be safely ignored, while some indicate potential issues in your code. When in doubt, always double-check on the internet.
Look at these lines of code. What kind of warning messages do you think they contain? Run the lines of code separately, then identify what warning is occurring and why. If you know how to fix the issue, edit the code.
Example One:
numbers <- c(10, -5, 0)
log(numbers)
Example Two:
log(100, by = 2)
Example Three:
vec <- c(1, 2, "three")
vec * 2
Error messages are issues in the code that actually prevent the code from running. If you are attempting to run a whole script, the error message will show up on the console immediately after the line that caused the issue, and the code will not run past it. If you are unsure exactly what function is causing the issue, you can run the code line by line until it stops working.
Note: a message with the same content can be either a warning or an error. The difference is in how it affects the code – if the operation was still executed, it is a warning. If the operation failed, it is an error.
Some common causes for errors in R can be:
The order of operations when encountering an error:
If all of the above fails, you can try looking up your specific situation – chances are, someone else had the same issue and posted it on Reddit or StackOverflow. where someone else has already explained how to fix it.
Look at these lines of code. What kind of errors do you think they contain? Run the lines of code separately, then identify what error is occurring and why. If you know how to fix the issue, edit the code.
Example One:
median(my_numbers)
Example Two:
x <- c(1, 2, 3; 4)
Example Three:
describe(class_2_data)
Example Four:
df <- data.frame(name = c("Alice", "Bob"), age = c(25, 30, 35))
As you can see, coding can be meticulous and sometimes frustrating work. To reduce this frustration, remember the following:
#
to both turn off certain parts
of the code and leave useful descriptions – this can be a lifesaver when
using someone else’s code or even coming back to work you were doing
weeks or months agodplyr
to make your files
tidier – we will learn more about it during the next lesson 1. Create a new script file and save
it as “class_2_homework_Your_Name.R”. Create a separate folder on your
computer and set that folder as your working directory. Include
the code for setting up that directory at the beginning of your
script.
2. Upload the homework_data.xlsx file into RStudio using
your preferred method. Include the code for uploading the file
into the script. Remember that some upload methods require
packages to be loaded first.
3. Load the dplyr
package.
Include the loading of the package as a line of code in your
script.
4. Find a dplyr
function that can
sort a data frame based on the values of the given
column of a data frame. Use it to sort the homework data frame
according to the values in the column score
from lowest to
highest.
5. Save your result as a .csv file using
write.table()
function. Call the .csv file
“class_2_sorted_column_Your_Name.csv”. Check the
write.table()
function manual to find out what arguments
you should use to save a .csv file properly. The file should have column
names and fields should be separated by commas ,
.
Upload both your R script and .csv file to the “Class 2” tab on the Pegaz platform.