Summary and Schedule

Welcome to R! Working with a programming language (especially if it’s your first time) often feels intimidating, but the rewards outweigh any frustrations. An important secret of coding is that even experienced programmers find it difficult and frustrating at times – so if even the best feel that way, why let intimidation stop you? Given time and practice* you will soon find it easier and easier to accomplish what you want.

Why learn to code? Bioinformatics – like biology – is messy. Different organisms, different systems, different conditions, all behave differently. Experiments at the bench require a variety of approaches – from tested protocols to trial-and-error. Bioinformatics is also an experimental science, otherwise we could use the same software and same parameters for every genome assembly. Learning to code opens up the full possibilities of computing, especially given that most bioinformatics tools exist only at the command line. Think of it this way: if you could only do molecular biology using a kit, you could probably accomplish a fair amount. However, if you don’t understand the biochemistry of the kit, how would you troubleshoot? How would you do experiments for which there are no kits?

R is one of the most widely-used and powerful programming languages in bioinformatics. R especially shines where a variety of statistical tools are required (e.g. RNA-Seq, population genomics, etc.) and in the generation of publication-quality graphs and figures. Rather than get into an R vs. Python debate (both are useful), keep in mind that many of the concepts you will learn apply to Python and other programming languages.

Finally, we won’t lie; R is not the easiest-to-learn programming language ever created. So, don’t get discouraged! The truth is that even with the modest amount of R we will cover today, you can start using some sophisticated R software packages, and have a general sense of how to interpret an R script. Get through these lessons, and you are on your way to being an accomplished R user!

* We very intentionally used the word practice. One of the other “secrets” of programming is that you can only learn so much by reading about it. Do the exercises in class, re-do them on your own, and then work on your own problems.

Prerequisite

Prerequisites

  • Experimenter’s Mindset: We define the “Experimenter’s mindset” as an approach to bioinformatics that treats it like any other experiment. There are probably a variety of metaphors we could employ (data are our reagents, scripts are our protocols, etc.), but the most important idea of the mindset is to remind you that as a researcher, you need to employ all of your training in the bench or field to working with analyses. Evaluate results critically, and don’t expect that things will always work the first time, or that they will always work in the same way.

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.

R Genomics Workshop Setup Directions


Logging into Rivanna/Afton


To log into the Rivanna or Afton HPC cluster, start by visiting the HPC Login Instructions page:
https://www.rc.virginia.edu/userinfo/hpc/login/

You can either follow the instructions for Web-based Access, or the Secure Shell Access (SSH)

Web-based Access

For web-based access, scroll down to the Web-based Access, and click on ``Launch Open OnDemand’’. Log on using your UVA credentials (uva_compute_id and Netbadge password). You now have point and click access to your account on UVA’s HPC!. 8. Click on “>_ Open in Terminal” on the upper left of the screen. 9. This will open a UNIX terminal on your web browser.

Secure Shell Access (SSH)

UVA HPC is accessible through ssh (Secure Shell) connections. Follow the instructions for Secure Shell Access (SSH) based on your operating system:

  • Windows: Install the recommended SSH client, MobaXterm, by clicking the Install MobaXterm button under the Windows section of the page and following the instructions.
  • macOS: Open the Terminal application, located in the Utilities folder.
    • On macOS Mojave and earlier, the default shell is Bash.
    • On macOS Catalina and later, the default shell is Zsh. To temporarily switch to Bash, open Terminal, type bash, and press Return.
  • Linux: Most Linux systems use Bash as the default shell. Open a terminal using the Applications menu or search bar (look for Terminal, Konsole, or xterm). If your system defaults to another shell, you can switch by typing bash and pressing Return.

Once you have a terminal open, connect to the HPC cluster by typing:

BASH

ssh -Y uva_compute_id@login.hpc.virginia.edu

Replace uva_compute_id with your UVA computing ID. You will then be prompted to enter your UVA NetBadge password.

Workspace for This Lesson

First, make sure you are in your home directory. In the terminal, type:

BASH

cd

and press Return. (The cd command changes your location. With no arguments, it takes you back to your home directory.)

Next, create a new directory called day2 where you will do your work:

BASH

mkdir day2

and press Return. (The mkdir command makes a new folder named day2.)

Move into this directory:

cd day2

and press Return. (Now you are “inside” the day2 folder, and any files you create will be saved here.)

To confirm you are in the correct location, type:

BASH

pwd

and press Return. (The pwd command prints your current working directory, i.e., the folder you are in.)

The output should look like:

/home/your_computing_id/day2

with your own computing ID in place of your_computing_id.

Data for This Lesson

The data for today’s lesson is stored in /standard/bims6000/r_day2.

Let’s copy the file into your working directory so you have your own copy. Run:

BASH

cp /standard/bims6000/r_day2/combined_tidy_vcf.csv .

and press Return. The cp command copies files. Here, it copies the file combined_tidy_vcf.csv from the shared class folder into your current directory, day2. The . means “put it here.”