It is very helpful to produce R Notebook when using R to analyze data. R Notebooks help journalists achieve two aims:

  1. Describe, on conceptual and logical levels, what they’re trying to do and follow it up with the code they used to accomplish that task. (This is really helpful for jogging their memory in the days/weeks after they write their code.)

  2. Produce reproducible code. This means that others can dissect how journalists did something and replicate their analyses, because those analyses can be made open for all to see. Transparency is becoming increasingly important within journalism, and R Notebooks offer a convenient way to be transparent. (Some news organizations will even upload the source materials behind big projects to GitHub.)

R Notebooks are an alternative to R Scripts. They effectively do the same things but in different ways. R Notebooks are designed to produce a clean and clear document at the end, while R Scripts are better for quick programming.

Installing R and RStudio

To create an R Notebook—and to make R much more pleasant to use—we’ll want to load RStudio, which in turn requires R. Briefly, “R” is the programming language we’ll use for loading data, reshaping objects, and performing our analyses. “RStudio” is an integrated development environment that makes using R way more pleasant and convenient.

At the time of writing, you have two options for doing this: installing the software on your computer or using your web browser to execute it remotely.

Local installation

There are many advantages to installing R locally. Two of the main reasons are that (1) you can store everything securely on your local computer and (2) it just tends to run faster than the current remote options.

If you want a local installation, start by downloading the latest version of R for your system. If you’re on Windows, click on “base” and select “Download R [version] for Windows”. If you’re on OSX, select “R-[version].pkg”. (You may need an older version if your version of OSX is 4+ years old.)

Then, download the latest version of RStudio for your operating system. The free, “Open Source License” version of RStudio is perfectly suitable for our purposes, so select that.

Once you have both files downloaded, use the two installer files to install them. For best results, install R before RStudio (after all, the latter is the one that depends on the former) and keep the default installation options as they are.

Remote/Browser option

RStudio recently made available beta access to its RStudio Cloud product. It is free for anyone to use at the time of writing, though they have expressed an intent to monetize it in the future—perhaps as a freemium product once it is stable.

RStudio Cloud is nifty because you can access it anytime, on any computer with an internet connection. Basically, you can switch computers and have access to the same environment. The main downsides are that (1) it is considerably slower than a local installation at the time of writing, (2) it has certain restrictions that may limit the size of the datasets you can work on, and (3) and some packages might not work as expected. However, it has served some of my past students well.

Loading RStudio

The local and remote flavors of RStudio will look and work very similarly. When you load RStudio, you’ll be presented with an interface that has four major panes.

We’ll slowly cover the different features in this powerful program. However, here’s a handy “cheatsheet” for the different options available in this main interface of RStudio.

Creating an R Notebook

To create an R Notebook, simply load RStudio, click on File, New File, and R Notebook. RStudio may ask you to install some packages the first time you create an R Notebook. Select ‘yes’ to install them.

Notebook heading

At the start of a the default R Notebook, you’ll see the following:

---
title: "R Notebook"
output: html_notebook
---

Everything appearing between the three dashes at the top of the file is treated as heading/metadata information (specifically, the “YAML header”). This block of text allows you to fine-tune the output of the document by specifying different key:value pairs that describes how the output file should be built from your R Notebook file.

For example, the title refers to the title of the notebook. The output refers to the object that will be produced when we “knit” the notebook. We can also add certain keys like author and date to specify those details. Here’s a full list of YAML header options for we can use in our R Notebook.

For example, this Notebook includes the following YAML header options:

---
title: "Creating an R Notebook"
author: "Rodrigo Zamith"
output:
  html_document:
    df_print: paged
    theme: spacelab
---

Notebook styling

We can style the content of our R Notebook by using R Markdown syntax below the three dashes enclosing the YAML header. For a quick reference styling guide, click on Help and select Markdown Quick Reference. The nice thing about this setup is that you’ll be producing simple text files, which are small, easy to share, and platform independent.

You can surround text with backticks (`, which are also called grave accent marks) to delineate code (like this: `code`). You can use a single asterisk (*) to specify italics (*italics*) and two asterisks to specify bolding (**bolding**).

To create a heading, use pound signs (#) at the start of a new line (e.g., # Heading 1). You can create lower-order headings by adding more pound signs (e.g., ## Heading 2). The heading at the start of this section uses three pound signs (### Creating an R Notebook) while the heading of this subsection includes four (#### Notebook styling).

You can also create links by using the following syntax: [link text](URL).

There are many other styling options you can use by referencing the R Markdown Guide.

Executing code

To execute code in an R Notebook, you’ll want to insert chunks into your Notebook. To insert a chunk, press Cmd+Option+I. (Note: If you use Windows or Linux, replace Cmd with Ctrl and Option with Alt for the remainder of the course. So, in this example, you’d press Ctrl+Alt+I.)

After you do that, you should see a chunk appear that looks like this:

```{r}
 
```

We’ll first enter any code we want to execute into the space in the middle. Then, we can execute it by pressing Cmd+Shift+Enter. The result of the operation will appear immediately below the code chunk in our Notebook.

Let’s try to create and execute our first code chunk. I’d like to know the result of the following arithmetic operation: 2+2.

2+2
## [1] 4

Behold, a line appears underneath the code chunk showing the output R gives us: the result of our operation is 4!

Knitting your notebook

While we can execute code chunks within our editor and see the results in real time, we’ll want to use the Knit functionality in RStudio to check for markdown formatting and to produce a document that can be shared or uploaded places.

To knit your notebook, select the dropdown arrow next to Knit and select Knit to HTML. A new window should pop up that will show you a (hopefully) pretty document.

Note: There is also a similar Preview button that will seem to do the same thing. Preview shows you a rendered HTML copy of the contents of the editor. Unlike knitting, previewing does not run any R code chunks. Instead, the output of the chunk when it was last run in the editor is displayed. Use previewing if you just want to quickly check markdown formatting; use knitting to create a file that can be shared and to make sure your code works independently from your environment.

Do note that when you knit an R Notebook file, it will execute everything in your Notebook from scratch. Thus, if you’ve loaded a package or created an object outside of the Notebook, it won’t be considered part of the document—which may result in error messages. So make sure you work within the Notebook for best results.

When it’s time for you to share your document, just upload or e-mail the .html file that will appear in your project directory after you knit your Notebook. You can also knit your documents into PDF and Microsoft Word files, though I’ve found HTML files to be the most versatile.

Putting it into practice

Now that you’ve learned how to create an R Notebook, try to create an exact duplicate of the Notebook below.


My First R Notebook

Your Name

Hello Prof. Zamith,

I’m so happy to be taking this class because I love the idea of data-driven storytelling. I’ve spent the past six months anxiously awaiting the moment I’d produce my first R Notebook.

I mean, how cool is this?

I just really wanted to demonstrate some of my R Markdown and math skills.

Headings are cool

I expect you’ll give me questions to answer over the course of the semester, and it would probably be nice for me to organize our notebooks around those questions. A good way to do that, I think, is to create a separate top-level heading for each question.

Moreover, the Data Challenges will probably ask me to describe my thinking in addition to listing my code. Some questions will surely require multiple steps to be completed before the question can be fully answered. I think second- and even third-level headings will be really useful for organizing things!

Question 1

Objective/Rationale

It would be helpful for me to describe my rationale prior to writing any code. That’s because conceptually describing what I want to accomplish can help me organize my thoughts and make the information “stick” better. I can write that in plain text here.

Code

I’ll also probably be asked to write some code in R to reshape my data, perform calculations, and even compute some new variables. Here’s a simple code chunk to show you that I’m ready to take on the challenge!

The following code chunk will perform the following mathematical operation: I’ll add two to four and multiply the result by seven.

(4+2)*7
## [1] 42

Answer

The result of that operation is 42. Coincidentally, that is the answer to everything.

Summary

With this notebook, I’ve shown you that I can:

  1. Organize my notebook with section headers and paragraph breaks.
  2. Format text by bolding and italicizing, and using links to point to external resources.
  3. Execute R code, showing both the answer and how I arrived at it.
  4. Produce an ordered list.

Again, I’m so excited to be taking this course and I promise to give it my best!