Introduction to R - class 2 Working in RStudio

knitr::opts_chunk$set(echo = F)

RStudio interface

RStudio has three great advantages over the classical R console:

saving and editing your code as a text file - R scripts (.R)
running any piece of previously written code (reanalysis)
displaying all R objects and variables saved in your computer memory at any given moment

The RStudio window consists of four main panels:

Code editor - Here you can write your code and save it in a text file (.R). You can run it anytime by highlighting the given piece of code and clicking Run in the top-right corner (also Ctrl + Enter).
Console - The same console as in standard R (see Class 1). Note that any line of code run from the Code editor will appear in the console. To recall the last run line of code, use Arrow-up (it works as a general way of browsing through the history of the executed commands).
Environment / History
- Environment - shows all the objects that you created or loaded into your computer’s memory (RAM). Have in mind they will be lost as soon as you close RStudio.
- History - access to the entire code run in a given RStudio session.
Files / Plots / Help / Packages
- Files - manage files on your computer.
- Plots - preview generated plots before saving them.
- Help - access help page (often with examples) for a given function.
- Packages - manage all additionally installed packages.

R Scripts

In the previous class, working directly in the R Console was sufficient — we simply typed and executed individual commands. However, imagine that you now want to perform a series of operations on multiple datasets or rerun the same analysis in the future. Typing the same commands over and over would be inefficient and error-prone.

This is where R scripts come in. Scripts allow you to save your code in a file, so you can organize, reuse, modify, and share your analyses easily. To create a new R script, go to File → New File → R Script or New File icon → R Script. Alternatively, use the keyboard shortcut Ctrl + Shift + N. Your new R script file will open in the Code editor (top-left panel). Remember to save your script when you are done (File → Save As..), so you can return to it later.

To run code from an R script, place the cursor on the desired line (or highlight multiple lines) and press Ctrl + Enter or click the Run button. The results will appear in the Console.

Exercise 1

Create a new script file and save it. Call it as you wish. Write code for all subsequent exercises using the Code editor. Remember that to obtain any outcome/result, you need to run your code first.

To make your code easier to read and understand, it is a good practice to add comments. Comments help explain what specific sections of the code are doing. This may be useful for other people using your script, and for you as well. Even if you know what your code is doing, after some time, you may forget it, and you may have to spend a lot of time understanding its functionality again.

In R you can add comments to your script by starting the line with the hashtag #. Apart from adding comments to your script hashtag can also be used to “turn off” a chunk of your code. Anything to the right of the hashtag will not be taken by R as a command. To comment several lines, highlight them and use Ctrl+Shift+c. Repeat it to uncomment. When you remove the hashtag, the command will be executed when you run your script again.

Working directory

The working directory is a folder on your computer where R is operating at the moment. Think about it as a separate room for a given analysis. R will look for and save all files in the working directory by default. You can check the current working directory by typing getwd().

To set a new working directory, choose Session → Set Working Directory → Choose Directory, or use the setwd() function with a destination path provided as an argument. Unless you continue a previous analysis, always set a new working directory.

Note that if you are using the setwd() command, the Files panel will not update automatically. To display files from your working directory, use the rstudioapi::filesPaneNavigate(getwd()) command or choose the proper folder manually, by clicking the three dots ... icon in the top-right corner of the Files panel.

Exercise 2

Create a new folder on your computer and set it as a working directory.

Curiosity
Although it is usually easier to create folders traditionally, using your operating system, keep in mind that you can also do it through R by using dir.create() function.

“So it begins” - Théoden, 3019 TE

A practical workflow in R

A huge part of preventing mistakes and reducing the amount of frustration in your work is developing good programming habits and having an established order of doing things. Now we will go through some parts of the programming workflow.

Setting up RStudio at the beginning of your working session

One of the advantages of RStudio is that it can save a lot of your work in between coding sessions. However, this can also generate some problems, as it can be easy to forget what you were doing last time. Having consistent setup rules can help with this.

Every time you boot up RStudio, remember to:
- create and save a new script file – if you don’t have one already
- copy and update the name of your script file (e.g. with that day’s date) – if you are working on a previously created file and want to keep track of multiple versions - set up a working directory – easiest way to keep track of this is to have a line of code at the beginning of the script; if you set up a working directory through the RStudio user interface, you can copy the code from the console into the script
- clear the RStudio environment – this is important, especially when switching between scripts/projects you are working on
- load any packages that the script is using – further explanation of this will be in the section about packages; for now, remember that some functions we will use later on are not present in base R, so if RStudio suddenly doesn’t want to run code with a function that you’ve been using before, it might be because it comes from a package that isn’t currently loaded

Uploading files into RStudio

There are two main ways of uploading files into the RStudio environment – by calling a function in your code, and through the user interface. Which one you pick depends on you. If you want to upload an MS Excel file through the user interface, go to the files section of the screen (lower right square section). Then, pick the files tab.

Remember that, by default, the tab will display the content of your working directory. You can pick a file that is not in your working directory; however, it is good practice to keep a copy of the files you’re working on in the folder set as your working directory.

Once you find the file, you can click on it, select import dataset, and navigate through the upload process. Through the upload menu, you can set up the name of the file, the sheet in the Excel file you want to upload, and the range of data you want to include. R also generates code that you can then copy and include in your R script.

Uploading an MS Excel file to R should create an object called a data frame, which is basically a table with columns and rows. You can check whether the object you created is a proper data frame by using the class() function.

Exercise 3

Download the class_2_data.xlsx file and save it in your working directory. Then, upload the file into R using the RStudio user interface.

Notice that, when you upload your data from an MS Excel file, one of the lines of code that RStudio generated is library(readxl). This is because there is no way in base R to upload .xlsx files – we need an external package, readxl.

Packages

A package is a collection of functions not available in the base R. Think about it like a toolbox, or an app for your phone that you need to install for a specific function. Base R packages are downloaded at the same time as RStudio is installed; others you need to find and download yourself.

Why are some R functions not available in base R?

For two reasons. Firstly, if every time you opened RStudio, it would have to load every single function in it, even the fastest laptop would take many minutes to open R. This is why some packages can be found in RStudio from the moment you install it, but are not loaded (=activated) until you ask RStudio to do it. You can browse a list of packages in the packages tab, in the same lower right corner as the file upload.

Secondly, the R programming language is a collective non-profit project – thousands of scientists and programmers have contributed to it since it was created. You do not need all of these packages. Many hyper-specific packages are being introduced every year for very particular research and analysis purposes. This allows people to personalize their R experience and contribute to its development.

Exercise 4

Check the Packages tab (the lower right panel). You will see a list of installed packages; loaded packages have a checkmark (✅). Did any packages load automatically when you started R?

Remember, if you are using someone else’s package in your research, you need to cite it in your articles the same way as you would someone else’s lab protocol. Usually, when a new package is introduced, a manual is published at the same time, and the appropriate citation is provided in the package instructions.

Curiosity
Although R was developed primarily for data science and statistics, it can be used for many different purposes. For example, there are some text-based games for R. Check out these (useless but fun) R packages.

How to use a package?

If the package you want to use is already present in RStudio, all you need to do is load it using the library() function. However, if the package is not installed, you will need to install it first. You can do it using a line of code provided by the package creator or the install.packages() function. Remember to put the name of the package in quotation marks (""). Once a package is installed, you need to load it. An installed package will remain in R memory for as long as the app itself is not reinstalled.

As we learned in the first class, you can access the documentation of a package or function by typing ?package_or_function_name (for example, ?dplyr). Keep in mind that for functions from packages outside of base R, you can view the manual only after the package has been loaded.

Remember that every time you close the RStudio app, it turns off any packages you were using (= puts away the toolbox). So every time you close and open the app again, you need to run the library() command again. Base R packages do not need to be loaded!

Exercise 5

Install the ggplot2 package using the install.packages() function. Check the Packages tab - a newly installed package should appear there!

Exercise 6

Install the tidyverse package. Then, see here to learn more about tidyverse. Load the dplyr package.

Advice: The more packages you install, the longer it will take for RStudio to open; it can also slow down the app. For most computers, this difference is pretty small; however, if you have an older laptop, it is better not to install every package under the sun.

Coming back to data upload

For the sake of keeping data clean and files simple and compatible, it can be a good idea to work on more universal, plain-text files than complex .xlsx files:

.csv - comma-separated values - columns separated by commas
.tsv - tab-separated values - columns separated by tab

They can be easily uploaded and downloaded from R. Additionally, .csv files are much lighter in “weight”, which can be crucial for working with very large datasets, e.g. output of genetic sequencers, continuous trackers of environmental conditions, or results of laboratory tests.

For Polish speakers: Using Polish versions of some software (e.g., Excel) with default settings of the operating system may generate problems with .csv files, as the comma is used as the decimal delimiter. Excel (in Polish) by default creates a .csv file with columns separated by semicolon (;). Then, arguments of importing R functions need to be properly adjusted.

Exercise 7

In Excel, save the “class_2_data.xlsx” file as a “class_2_data.csv” file. Remember to set it as a comma-separated file.

Uploading a .csv file into R does not require any additional packages. We can do it using a read.csv() or read.table() function. read.table() takes one main argument – the name of the file (including the file extension!), written in quotation marks. Remember that if this file is not in your working directory, you will need to provide the absolute path to it.

Additionally, we need to provide extra arguments to make sure the file is read correctly. These arguments are, for example:

header: specifies whether the top row should be considered the headers of each column; should be set as TRUE if the top row corresponds to column descriptors and to FALSE if the top row starts immediately from data
sep: stands for separator; specifies which symbol (,, _, or ) was used to separate values in the file; by default, .csv files should be comma (,) separated, but might be set to something else – you can check by opening the .csv file through a basic text editor
encoding: specifies which encoding is used in the .csv file; can be important if you are using files with non-Latin characters, for example*, but isn’t always needed

If you are trying to upload a file with non-Latin characters and R is displaying them as nonsense symbols, you can try the following: open your .csv as a text document in the most basic text editor and save it with the UTF-8 encoding; then, while uploading it to R, make sure to specify encoding as UTF-8; this should fix the problem.

The read.csv() function can work with just one argument – the file name. However, as the name suggests, it only works with standard .csv files and does not allow for certain kinds of customization through arguments.

Exercise 8

Check the read.table() and the read.csv() function manuals. Upload the “class_2_data.csv” file into the R environment using both functions. Make sure what additional arguments you should use for both functions to upload files properly.

In most of your analyses, you will not need to download a modified data frame out of R – you will only be interested in the results of certain tests or graphs that R will generate. However, if you want to download the data frame, you can use:
- write.table() – for saving a .csv file; uses similar arguments to read.table()
- write_xlsx() – requires the writexl package to be installed and loaded

Searching for functions and code on the internet

R is a constantly growing and updating community. If you need something done in R, chances are, someone has made a library or a function for it.

Tips for finding R solutions on the Internet:

improve your Google searches - identify keywords that will allow you to find the function you need and combine them using quotation marks and Boolean logic (e.g., “R package” AND “general linear model”)
use trusted sources for finding packages and getting advice – StackOverflow, RDocumentation, CRAN, and published research are the best
use published codes - many coders and researchers will share their code for free on GitHub and other websites – even if the code cannot solve your problem, it can point you in the right direction
be careful with AI-generated solutions: while websites like ChatGPT can be useful for finding specific functions, it rarely generates a whole block of code that does not require heavy editing; if the code uses functions you don’t know, you might not notice that it generates faulty results

Exercise 9

Search the Internet for an R function that can rename columns in a data frame. Install and load a package if necessary. Try using the RStudio manual to figure out how to use the function that can perform this task.

Remember that the manual for any installed function can be pulled up by typing a question mark (?) before its name.
If you are attempting to use someone else’s code and you need to adapt it for your purposes, consider the following:

before changing anything in the code you copied, go through the entire script and make sure you understand, at least on a basic level, what every line is doing
remember to change the parts of the code that do not apply to your tasks: e.g., change the path to files being used, adjust function arguments and parameters, and remove parts that are not important for you
another coder might assume that everyone uses the same packages as they do – if a function is not working, it can be a result of R not recognizing it
you can find what package uses a specific function by searching “R [function name] package” – keep in mind that multiple packages can use the same word to describe different functions, so make sure to read their descriptions before downloading the package

Exercise 10

Find a script online that uses R to calculate time differences between two dates. Try adjusting it to calculate how many days are left in this year.

Reading warning and error messages.

You will likely encounter warning messages and error messages daily while using R. This does not mean you are careless or bad at coding – errors and warnings exist to guide you in the process of using R. Because computers are very particular about details and can be unintuitive for us, it takes effort to understand them. Knowing how to react to errors is a big component of an effective coding workflow.

Whenever RStudio finds an issue in the code, it will display a message in the console. When in doubt, you can always copy the message and paste it into Google search.

Warnings vs Errors

Warning messages are alerts from R about things that it considers potentially problematic, but that do not prevent the code from running. This can happen if running the code introduces changes to files that R wants to alert you about, or when there are conflicts between certain functions, or when R detects unused bits in your code.

Some common warning messages are:

NAs introduced by coercion – often happens when converting between data types, e.g., numeric to character; if R fails to do this for whatever reason, it will insert an NA value instead, and warn you about this change
unused argument – happens when R notices that some argument in a function or a whole line of code does not do anything; this is not necessarily an issue, but might mean that something you intended to work is not actually working
non-numeric argument to binary operator – most likely means you are trying to perform a mathematical operation (addition, multiplication, etc) on something that is not a number; remember that something can look like a number in a vector or a data frame but still be seen as a character

Some warning messages can be safely ignored, while some indicate potential issues in your code. When in doubt, always double-check on the internet.

Exercise 11

Look at these lines of code. What kind of warning messages do you think they contain? Run the lines of code separately, then identify what warning is occurring and why. If you know how to fix the issue, edit the code.

Example One:
numbers <- c(10, -5, 0) log(numbers)
Example Two:
log(100, by = 2)
Example Three:
vec <- c(1, 2, "three") vec * 2

Error messages are issues in the code that actually prevent the code from running. If you are attempting to run a whole script, the error message will show up on the console immediately after the line that caused the issue, and the code will not run past it. If you are unsure exactly what function is causing the issue, you can run the code line by line until it stops working.

Note: a message with the same content can be either a warning or an error. The difference is in how it affects the code – if the operation was still executed, it is a warning. If the operation failed, it is an error.

Some common causes for errors in R can be:

Error in [code]: object not found - you are trying to operate on an object that does not exist; this can happen because your file did not load into R (e.g. because the working directory is set incorrectly), because of a typo in the object’s name (remember that names in R are case-sensitive!) or because the object was deleted from the environment
Error: unexpected ‘[symbol]’ in “[code]” – there is some sort of syntax issue, e.g., a typo in the function, a misplaced comma, or missing parentheses
Error in [function()]: could not find function “function()” – either the function name is misspelled or you are trying to use a function from a package that isn’t installed or loaded; can also happen with old code when the name of a function updates – in this case, RStudio will sometimes suggest the new name, otherwise it can be found on the internet
Error in [data frame]: arguments imply differing number of rows – you are trying to create a data frame using vectors of different length or combine dataframes of different sizes

The order of operations when encountering an error:

Do not panic
Identify what line of code is causing the error
Identify what kind of error it is
Check for the most obvious issues as outlined above

If all of the above fails, you can try looking up your specific situation – chances are, someone else had the same issue and posted it on Reddit or StackOverflow. where someone else has already explained how to fix it.

Exercise 12

Look at these lines of code. What kind of errors do you think they contain? Run the lines of code separately, then identify what error is occurring and why. If you know how to fix the issue, edit the code.

Example One:
median(my_numbers)
Example Two:
x <- c(1, 2, 3; 4)
Example Three:
describe(class_2_data)

Example Four: df <- data.frame(name = c("Alice", "Bob"), age = c(25, 30, 35))

Keeping your code clean and tidy

As you can see, coding can be meticulous and sometimes frustrating work. To reduce this frustration, remember the following:

stick to habits of setting up your working session every time you open RStudio
if you are working on a script for multiple days, consider making a new copy of it every day or every week to keep previous versions in hand in case you need to revert the changes
include comments in your code! remember that you can use the hashtag symbol # to both turn off certain parts of the code and leave useful descriptions – this can be a lifesaver when using someone else’s code or even coming back to work you were doing weeks or months ago
make it a habit to go back through your scripts and simplify anything you can - the shorter the script, the less that can go wrong
give your objects short, unique, and self-describing names - things can quickly get out of hand if all of your data frames are called “dataframe”
make use of packages such as dplyr to make your files tidier – we will learn more about it during the next lesson

Homework

1. Create a new script file and save it as “class_2_homework_Your_Name.R”. Create a separate folder on your computer and set that folder as your working directory. Include the code for setting up that directory at the beginning of your script.
2. Upload the homework_data.xlsx file into RStudio using your preferred method. Include the code for uploading the file into the script. Remember that some upload methods require packages to be loaded first.
3. Load the dplyr package. Include the loading of the package as a line of code in your script.
4. Find a dplyr function that can sort a data frame based on the values of the given column of a data frame. Use it to sort the homework data frame according to the values in the column score from lowest to highest.
5. Save your result as a .csv file using write.table() function. Call the .csv file “class_2_sorted_column_Your_Name.csv”. Check the write.table() function manual to find out what arguments you should use to save a .csv file properly. The file should have column names and fields should be separated by commas ,.

Upload both your R script and .csv file to the “Class 2” tab on the Pegaz platform.

Introduction to R - class 2
Working in RStudio

Mateusz Chechetkin, Tomasz Gaczorek, Wiesław Babik & Marzena Marszałek
marzena.marszalek@doctoral.uj.edu.pl

2025-09-30

RStudio interface

R Scripts

Exercise 1

Working directory

Exercise 2

A practical workflow in R

Setting up RStudio at the beginning of your working session

Uploading files into RStudio

Exercise 3

Packages

Why are some R functions not available in base R?

Exercise 4

How to use a package?

Exercise 5

Exercise 6

Coming back to data upload

Exercise 7

Exercise 8

Searching for functions and code on the internet

Exercise 9

Exercise 10

Reading warning and error messages.

Warnings vs Errors

Exercise 11

Exercise 12

Keeping your code clean and tidy

Homework

Introduction to R - class 2 Working in RStudio

Mateusz Chechetkin, Tomasz Gaczorek, Wiesław Babik & Marzena Marszałek marzena.marszalek@doctoral.uj.edu.pl

2025-09-30

RStudio interface

R Scripts

Exercise 1

Working directory

Exercise 2

A practical workflow in R

Setting up RStudio at the beginning of your working session

Uploading files into RStudio

Exercise 3

Packages

Why are some R functions not available in base R?

Exercise 4

How to use a package?

Exercise 5

Exercise 6

Coming back to data upload

Exercise 7

Exercise 8

Searching for functions and code on the internet

Exercise 9

Exercise 10

Reading warning and error messages.

Warnings vs Errors

Exercise 11

Exercise 12

Keeping your code clean and tidy

Homework

Introduction to R - class 2
Working in RStudio

Mateusz Chechetkin, Tomasz Gaczorek, Wiesław Babik & Marzena Marszałek
marzena.marszalek@doctoral.uj.edu.pl