4 Building parameterized reports
In the previous sections of the guide, you’ve learned how to start using R, how to import and clean data, and how to analyze and visualize data. You’ve written R scripts and generated outputs, that you can of course use to build a report using whatever software you would like.
This section, however, will introduce you to an open source publishing system that works with R to easily create reports using R code in a variety of formats.
4.1 Installing Quarto
Quarto is built-in to R Studio, so there’s likely nothing extra that you have to do. For Visual Studio Code, you’ll need to install the Quarto extension and Quarto itself from https://quarto.org/docs/get-started/. When you begin a Quarto project that includes .pdf
output, you’ll be prompted to install TinyTex by using the OS Terminal command quarto install tinytex
.
Using Quarto will require the R packages rmarkdown
and knitr
.
4.2 Creating a Quarto document
Quarto documents use the .qmd
extension, since Quarto makes extensive use of the markdown
syntax. Markdown is a relatively simple coding scheme that is designed to be both human- and machine-readable.
You can create a Quarto document simply by creating a new file with the extension .qmd
. However, both R Studio and Visual Studio Code include menus to utilize templates that give you a great place to start. Go ahead and start a new Quarto project.
Quarto document header
The first section of a Quarto document is the header. The header is comprised of a special series of tags that define how the document should be rendered, in a format called YAML1. A very simple such example is to simply define the document’s title:
---
title: "New Quarto Project"
---
The Quarto authoring tutorial introduces you to additional options that you can define in the YAML header.
Basic markdown
Next, we can begin to write out the explanatory text in our document. It is of course helpful to break our document into sections. It is simple to arrange cascading sections by using the #
symbol. Using two such symbols consecutively, ##
, defines a top-level section header, while additional ones (###
, ####
) define additional levels.
Within regular text you can add links in different ways:
<https://oairp.org/>
renders as https://oairp.org[the OAIRP website](https://oairp.org/)
renders as the OAIRP website
As well as emphasize text:
*italics*
renders as italics**bold**
renders as bold
We can use *
, +
, or -
to create lists – they just have to have a blank line between the list and any text above.
The Quarto documentation has a great overview of the basics of Markdown.
Let’s use some of these to continue to build out our .qmd
file by adding the following after the header:
---
title: "New Quarto Project"
---
## Quarto
Quarto is a fantastic way to get started writing reports
without spending **a ton of time** messing with things like:
* font sizes
* styles
* spacing
<https://quarto.org>. Quarto has excellent documentation at
Code blocks
Finally, we’ll get back to using R code to run some simple calculations and include the result. We do this by adding a code block. A code block opens and closes with the ```
sequence. Using curly brackets after the code block opening sequence ({}
), we define the language that the code block is running: for R, this is {R}
.2 Inside, we’ll include the code that will produce the output we’d like to show:
---
title: "New Quarto Project"
---
## Quarto
Quarto is a fantastic way to get started writing reports
without spending **a ton of time** messing with things like:
* font sizes
* styles
* spacing
<https://quarto.org>.
Quarto has excellent documentation at
```{R}
<- 1 + 1
calc print(calc)
```
Test this code yourself. You should see it render like it does in Appendix A.
By default, the code block prints both the code and the result. We can add special execution options at the start of the code block to define how it should be processed. Those options start with a special preface #|
. Adding the echo: false
execution option will suppress the code and only show the output:
```{R}
#| echo: false
<- 1 + 1
calc print(calc)
```
You can review additional Execution Options in the Quarto guide.
4.3 In Practice: A full report build
Now let’s build a full report, working with the data we’ve cleaned and analyzed in Section 2.4 and Chapter 3, projections_data
. If we pull the code together, we should have:
```{R}
# load libraries
library("readxl")
library("curl")
library("tidyverse")
# define file url
<- "https://ohiolmi.com/_docs/PROJ/JobsOhio/Central.xlsx"
projections_url
# download file
curl_download(projections_url, "projectionsdata.xlsx")
# read file with approrpiate settings
<- read_xlsx(
projections_data "projectionsdata.xlsx",
skip = 6,
col_names = c(
"SOC",
"Occupation",
"Current_jobs",
"Projected_jobs",
"Change_num",
"Change_pct",
"Growth",
"Exits",
"Transfers",
"Tot_openings",
"Wage",
"Notes"
),col_types = c(
"text",
"text",
"numeric",
"numeric",
"numeric",
"numeric",
"numeric",
"numeric",
"numeric",
"numeric",
"numeric",
"text"
)|>
)
# remove summary occupations and notes/blank rows
filter(
str_sub(SOC, -4, -1) != "0000",
str_length(SOC) == 7
|>
)
# create annual wage column so values are consistent
mutate(
annual_wage = case_when(
< 200 ~ Wage * 2080,
Wage .default = Wage
)|>
)
# create turnover rate variable
mutate(
turnover_rate = (Exits + Transfers) / Current_jobs
)
# run linear regression model to predict turnover rate using annual wage
<- lm(turnover_rate ~ log(annual_wage),
projections_model data = projections_data,
na.action = na.exclude)
# add the residuals back to projections_data
<- projections_data |>
projections_data mutate(residuals = residuals(projections_model))
```
Starting a new report
Now we’re going to start a new Quarto file (.qmd
). We’ll start with the header but add a title that describes our report, then add a short introduction to the report describing the data source we’re using, and then add the code from above with the execution option include: false
so that it is only used to prepare the R objects we’ll use.
---
title: "Report: The largest, fastest-growing, and lowest-turnover occupations in Central Ohio"
---
Source: ODJFS Bureau of Labor Market Information, Long-Term Occupational Projections
for the Central Ohio JobsOhio region, 2020-2030. <https://ohiolmi.com/Home/Projections/ProjectionsHome_08.16.23>
```{R}
#| include: false
# load libraries
library("readxl")
library("curl")
library("tidyverse")
# define file url
<- "https://ohiolmi.com/_docs/PROJ/JobsOhio/Central.xlsx"
projections_url
# download file
curl_download(projections_url, "projectionsdata.xlsx")
# read file with approrpiate settings
<- read_xlsx(
projections_data "projectionsdata.xlsx",
skip = 6,
col_names = c(
"SOC",
"Occupation",
"Current_jobs",
"Projected_jobs",
"Change_num",
"Change_pct",
"Growth",
"Exits",
"Transfers",
"Tot_openings",
"Wage",
"Notes"
),col_types = c(
"text",
"text",
"numeric",
"numeric",
"numeric",
"numeric",
"numeric",
"numeric",
"numeric",
"numeric",
"numeric",
"text"
)|>
)
# remove summary occupations and notes/blank rows
filter(
str_sub(SOC, -4, -1) != "0000",
str_length(SOC) == 7
|>
)
# create annual wage column so values are consistent
mutate(
annual_wage = case_when(
< 200 ~ Wage * 2080,
Wage .default = Wage
)|>
)
# create turnover rate variable
mutate(
turnover_rate = (Exits + Transfers) / Current_jobs
)
# run linear regression model to predict turnover rate using annual wage
<- lm(turnover_rate ~ log(annual_wage),
projections_model data = projections_data,
na.action = na.exclude)
# add the residuals back to projections_data
<- projections_data |>
projections_data mutate(residuals = residuals(projections_model))
```
Adding content
Now we’ll need to call upon a few libraries that we’ll use to prepare data tables for a report. We’ll need knitr
to use code to build tables for the report and scales
to format raw numbers appropriately.3
Though we’re using only data tables for this report, you can build a report with graphs like those you built in Chapter 3 in much the same way. See the Quarto guide for more details.
We’ll use the label_
functions from scales
to create new display variables we’ll use in our printed tables. Those label_
functions work similarly to Microsoft Excel’s TEXT()
functions. We’ll include the include: false
option again since we don’t need this code or any output in our report, just to continue to build out the R data objects we’re using.
```{R}
#| include: false
library("knitr")
library("scales")
<- projections_data |>
projections_data mutate(
Current_jobs_displ = label_number(big.mark = ",")(Current_jobs),
Change_pct_displ = label_percent(accuracy = .1)(Change_pct),
Tot_openings_displ = label_number(big.mark = ",")(Tot_openings),
annual_wage_displ = label_dollar()(annual_wage),
turnover_rate_displ = label_percent(accuracy = .1)(turnover_rate)
)
```
Let’s build out our first section. We’ll make a summary table of the 15 largest occupations, displaying the occupation name, code, current number of jobs, 10-yr change, annual openings, median earnings, and turnover rate.
We’ll start with a section header name (preceded by ##
). Next we’ll open a code block, and specify the echo: false
execution option so that only the result appears in the report, not the code.
Then we’ll use slice_max()
to pull the 15 largest occupations (by passing the Current_jobs
variable and an n = 15
parameter). We’ll sort (arrange) those occupations in decending order using arrange()
but using -
ahead of the variable name to indicate the sort should be descending, then use select()
to keep only those variables used in the report. If we were using Pivot Tables in Microsoft Excel, these steps would be equivalent to filtering and sorting on row values.
Finally, we’ll call the kable
function from the knitr
package to build a nice table designed to be read by humans. Within that function we’ll specify how the columns will align (using align =
) and give proper column headings (using col.names =
).
## Largest 15 occupations
```{R}
#| echo: false
|>
projections_data slice_max(Current_jobs, n = 15) |>
arrange(-Current_jobs) |>
select(
Occupation, SOC, Current_jobs_displ, Change_pct_displ,
Tot_openings_displ, annual_wage_displ, turnover_rate_displ|>
) kable(
col.names = c(
"Occupation", "SOC", "Current jobs", "10-yr change",
"Annual openings", "Median earnings", "Turnover rate"
),align = c("l","l","r","r","r","r","r")
)```
We’re only modifying projections_data
temporarily, not storing the result back as projections_data
. This allows us to build additional similar tables quickly.
For the fastest-growing and lowest-turnover, we’ll focus on modifying the slice_
and arrange
lines of code:
## 15 fastest-growing occupations
```{R}
#| echo: false
|>
projections_data slice_max(Change_pct, n = 15) |>
arrange(-Change_pct) |>
select(
Occupation, SOC, Current_jobs_displ, Change_pct_displ,
Tot_openings_displ, annual_wage_displ, turnover_rate_displ|>
) kable(
col.names = c(
"Occupation", "SOC", "Current jobs", "10-yr change",
"Annual openings", "Median earnings", "Turnover rate"
),align = c("l","l","r","r","r","r","r")
)```
## 15 occupations with the lowest turnover rates
```{R}
#| echo: false
|>
projections_data slice_min(turnover_rate, n = 15) |>
arrange(turnover_rate) |>
select(
Occupation, SOC, Current_jobs_displ, Change_pct_displ,
Tot_openings_displ, annual_wage_displ, turnover_rate_displ|>
) kable(
col.names = c(
"Occupation", "SOC", "Current jobs", "10-yr change",
"Annual openings", "Median earnings", "Turnover rate"
),align = c("l","l","r","r","r","r","r")
)```
Lastly, we’ll make use of the analysis we ran in Chapter 3. We’ll generate a table with the lowest turnover rates relative to median earnings by using the residuals
that we received from our regression model.
## 15 occupations with the lowest turnover rates relative to median earnings
```{R}
#| echo: false
|>
projections_data slice_min(residuals, n = 15) |>
arrange(residuals) |>
select(
Occupation, SOC, Current_jobs_displ, Change_pct_displ,
Tot_openings_displ, annual_wage_displ, turnover_rate_displ|>
) kable(
col.names = c(
"Occupation", "SOC", "Current jobs", "10-yr change",
"Annual openings", "Median earnings", "Turnover rate"
),align = c("l","l","r","r","r","r","r")
)```
Full render
Run the report. You should see it render and pull together the four tables we worked on above. Check your work against Appendix B
4.4 Parameterizing your report
Let’s prepare the report to run with different options by modifying just a few sections of the code.
The key here is to use Quarto’s powerful parameters capability. We can set parameters in the Quarto document header, and then reference them in our R code blocks.
In the Quarto document header, we’ll add the params:
option, followed by parameter names and values:
---
title: "Report: The largest, fastest-growing, and lowest-turnover occupations in our region"
params:
region: Southeast
top_n: 10
---
We’ll then change pieces of our code to call those parameters. We call them by using params$region
and params$top_n
.4
Outside of our code blocks, like in headers and regular text we can call the parameters using special inline code:`r params$value`
.5
We’ll need to:
- Change the reference to the region in the “Source:” paragraph;
- Modify the
projections_url
line usingstr_c
so that we include the region; - Change the
15
in every header; and - Change the
15
in theslice_
functions in each code block.
If successful, we’ll be able to run the report for a specified region (that matches the ODJFS projections URL) and a specified limit for the number of occupations returned for the tables. We’ll change from Central Ohio to Southeast Ohio and from 15 occupations to 10 occupations to test our code.
---
title: "Report: The largest, fastest-growing, and lowest-turnover occupations in our region"
params:
region: Southeast
top_n: 10
---
Source: ODJFS Bureau of Labor Market Information, `r params$region` Ohio JobsOhio region, 2020-2030.
Long-Term Occupational Projections for the <https://ohiolmi.com/Home/Projections/ProjectionsHome_08.16.23>
```{R}
#| include: false
# load libraries
library("readxl")
library("curl")
library("tidyverse")
# define file url
<- str_c(
projections_url "https://ohiolmi.com/_docs/PROJ/JobsOhio/",
$region,
params".xlsx"
)
# download file
curl_download(projections_url, "projectionsdata2.xlsx")
# read file with approrpiate settings
<- read_xlsx(
projections_data "projectionsdata2.xlsx",
skip = 6,
col_names = c(
"SOC",
"Occupation",
"Current_jobs",
"Projected_jobs",
"Change_num",
"Change_pct",
"Growth",
"Exits",
"Transfers",
"Tot_openings",
"Wage",
"Notes"
),col_types = c(
"text",
"text",
"numeric",
"numeric",
"numeric",
"numeric",
"numeric",
"numeric",
"numeric",
"numeric",
"numeric",
"text"
)|>
)
# remove summary occupations and notes/blank rows
filter(
str_sub(SOC, -4, -1) != "0000",
str_length(SOC) == 7
|>
)
# create annual wage column so values are consistent
mutate(
annual_wage = case_when(
< 200 ~ Wage * 2080,
Wage .default = Wage
)
)
```
```{R}
#| include: false
<- projections_data |>
projections_data mutate(
turnover_rate = (Exits + Transfers) / Current_jobs
)
<- lm(turnover_rate ~ log(annual_wage),
projections_model data = projections_data,
na.action = na.exclude)
<- projections_data |>
projections_data mutate(residuals = residuals(projections_model))
```
```{R}
#| include: false
library("knitr")
library("scales")
<- projections_data |>
projections_data mutate(
Current_jobs_displ = label_number(big.mark = ",")(Current_jobs),
Change_pct_displ = label_percent(accuracy = .1)(Change_pct),
Tot_openings_displ = label_number(big.mark = ",")(Tot_openings),
annual_wage_displ = label_dollar()(annual_wage),
turnover_rate_displ = label_percent(accuracy = .1)(turnover_rate)
)
```
## Largest `r params$top_n` occupations
```{R}
#| echo: false
|>
projections_data slice_max(Current_jobs, n = params$top_n) |>
arrange(-Current_jobs) |>
select(
Occupation, SOC, Current_jobs_displ, Change_pct_displ,
Tot_openings_displ, annual_wage_displ, turnover_rate_displ|>
) kable(
col.names = c(
"Occupation", "SOC", "Current jobs", "10-yr change",
"Annual openings", "Median earnings", "Turnover rate"
),align = c("l","l","r","r","r","r","r")
)```
## `r params$top_n` fastest-growing occupations
```{R}
#| echo: false
|>
projections_data slice_max(Change_pct, n = params$top_n) |>
arrange(-Change_pct) |>
select(
Occupation, SOC, Current_jobs_displ, Change_pct_displ,
Tot_openings_displ, annual_wage_displ, turnover_rate_displ|>
) kable(
col.names = c(
"Occupation", "SOC", "Current jobs", "10-yr change",
"Annual openings", "Median earnings", "Turnover rate"
),align = c("l","l","r","r","r","r","r")
)```
## `r params$top_n` occupations with the lowest turnover rates
```{R}
#| echo: false
|>
projections_data slice_min(turnover_rate, n = params$top_n) |>
arrange(turnover_rate) |>
select(
Occupation, SOC, Current_jobs_displ, Change_pct_displ,
Tot_openings_displ, annual_wage_displ, turnover_rate_displ|>
) kable(
col.names = c(
"Occupation", "SOC", "Current jobs", "10-yr change",
"Annual openings", "Median earnings", "Turnover rate"
),align = c("l","l","r","r","r","r","r")
)```
## `r params$top_n` occupations with the lowest turnover rates relative to median earnings
```{R}
#| echo: false
|>
projections_data slice_min(residuals, n = params$top_n) |>
arrange(residuals) |>
select(
Occupation, SOC, Current_jobs_displ, Change_pct_displ,
Tot_openings_displ, annual_wage_displ, turnover_rate_displ|>
) kable(
col.names = c(
"Occupation", "SOC", "Current jobs", "10-yr change",
"Annual openings", "Median earnings", "Turnover rate"
),align = c("l","l","r","r","r","r","r")
)```
Once again, you can check your code against Appendix C.
Exercises
Exercise 1
Practice using markdown. How would you:
- Emphasize text?
- Create a link with custom text?
- Add an image?
- Create a numbered list?
Take a look at the RStudio cheat sheet for Quarto and the cheet sheet for rmarkdown, specifically the section to the far right titled “Write with Markdown.”6
Exercise 2
Branstorm a few different ways you might use parameters to create a series of reports that utilize much of the same code.
Extra: Other Quarto document types
Though you may be most interested in creating a report or article, Quarto is capable of generating other document types, including powerpoints, websites, and ebooks.
Those types do get a bit more complicated, but rely on many of the basics you’ve learned in this section as building blocks. The [Quarto guide] details these possibilities.
This guide itself is written as a Quarto ebook, as are other resources like R for Data Science (Wickham, Çetinkaya-Rundel, and Grolemund 2023). If interested in seeing examples of quarto projects in action, you can browse their code on GitHub, which we’ll learn in Chapter 5.
YAML stands for Yet Another Markup Language, but don’t worry - there’s not actually much to learn↩︎
Quarto also works with other languages, including Python and Julia↩︎
scales
is actually used byggplot2
, but we’ll load it just in case.↩︎params
is an R data object being passed by Quarto, and$
tells R to pull out specific variables (columns) from that object.↩︎Inline code can also be used to call summary figures in your text paragraphs. We did this for the guide in Chapter 3 in the section about
summary()
.↩︎Quarto is an evolution of rmarkdown, adding additional functionality and programming languages.↩︎