Visualization of composite plots in R using a programmatic approach and smplot2
Seung Hyun Min
Now published as a research article:
Min, S. H. (2024). Visualization of composite plots in R using a programmatic approach and smplot2. Advances in Methods and Practices in Psychological Science, 7(3).
Abstract
In psychology and human neuroscience, the practice of creating multiple subplots and combining them into a composite plot has become common because the nature of research has become more multifaceted and sophisticated. In the last decade, the number of methods and tools for data visualization has surged. For example, R, a programming language, has become widely used in part due to ggplot2, a free, open-source and intuitive plotting library. However, despite its strength and ubiquity, it has some built-in restrictions that are most noticeable when one creates a composite plot, which currently involves a complex and repetitive process, with steps that go against the principles of open science out of necessity. To address this issue, I introduce smplot2, an open-source R package that integrates ggplot2’s declarative syntax and a programmatic approach to plotting. The package aims to enable users to create customizable composite plots by linearizing the process of complex visualization. The documentation and code examples of the smplot2 package can be found online (https://smin95.github.io/dataviz).
The Rise of ggplot2
With modern software tools, there has been a surge in the number of methods and tools through which researchers and clinicians can perform data visualization, an important skill in scientific research. For instance, R, a programming language (R Core Team, 2021), has become exponentially prevalent for statistical data visualization in the last 15 years in part due to ggplot2, a plotting library that was introduced in 2009 by Hadley Wickham (Wickham, 2009). Its citation count has towered over that of Python’s matplotlib (see Figure 1), an extensive, flexible but a challenging low-level plotting library that was first introduced by John Hunter in 2007 (Hunter, 2007). The reason for the recent rise of ggplot2 is that the library is free, open-source and intuitive for users. Layers of graphics can be added sequentially on a plotting space to produce complex plots. The details of the philosophy behind ggplot2, which is better known as the “grammar of graphics”, are well-explained in a tutorial in this journal (Nordmann, McAleer, Toivo, Paterson, & DeBruine, 2022). In brief, as long as users know how to add a layer of points, a layer of lines, and other specific layers sequentially using ggplot2’s declarative syntax, they will be able to plot their data in both simple and complex fashions with a high level of customization without applying the programmatic approach, such as creating loops and functions (Hehman & Xie, 2021). Furthermore, due to the active community of users, there exist diverse third-party R packages (Mowinckel & Vidal-Piñeiro, 2020; Patil, 2021; Tang, Horikoshi, & Li, 2016), which complement ggplot2, that provide shortcut functions for plotting, allowing users to plot data in just a few lines of codes in wide-ranging ways. These factors have made R, rather than Python, a preferable tool for data visualization for researchers and clinicians across disciplines and levels of experience.
Figure 1. Year-to-year citation count of two major plotting libraries in R and python: ggplot2 and matplotlib. The year 2024 shows a partial count of the citations. Each point denotes the timepoint when the authors have published an article regarding their software. Citation counts were collected from Google Scholar on April 4, 2024.
Built-in Restrictions of ggplot2
Figure 2. A comparison of the standard routines for subplotting in between matplotlib from Python and ggplot2 from R. In Python, it is standard to generate multiple panels using iterative or functional programming approach. After the plots have been combined, the specific aesthetics of the composite plot, such as the number of rows and columns, as well as the common legend, x-axis and y-axis labels, can be adjusted without modifying the individual plots. Furthermore, it provides a full flexibility for text, shape and other types of annotations to be added on the combined image. So, the process of visualizing a composite plot is linear in Python’s matplotlib with its clear starting and ending points. However, in R’s ggplot2, the process often requires users to go back and forth between the stages of creating individual plots and then combining them. So, users are encouraged to plot one graph at a time and then combine all plots together as late as possible. The goal of smplot2 is to simplify the process of complex data visualizations by resolving these issues.
In psychology and human neuroscience, the practice of creating multiple subplots and combining them into one composite plot is common (Kubilius, 2014). This method of data visualization is known as subplotting. In the last few decades, it has become more widespread as research has become increasingly sophisticated, as demonstrated by the recent trend of including more variables and conditions in experiments, conducting collaborations with other laboratories if possible, and implementing multiple methodologies for data collection and analysis (Lin & Lu, 2023). These, in turn, create datasets with complicated structures, thereby requiring complex forms of data visualizations. However, as a high-level plotting library - which does not require users to plot each detail of the plot separately - ggplot2 has some built-in restrictions that are most noticeable when one creates a composite plot.
Currently, creating a composite plot in ggplot2 is complex for several reasons. First, although ggplot2 allows for flexible customization of individual plots with concise codes, it is not compatible with the most well-known programmatic approach - iteration using a for loop - unless unorthodox methods are used. Consequently, users unfamiliar with proper methods may struggle with applying iterations in ggplot2.
Second, ggplot2 provides limited options for subplotting. A
typical ggplot2 operation returns a single plot object that can
be easily manipulated or stored. While facet_wrap()
and
facet_grid()
support data allocation into multiple subplots
(facets) within a single plot object, ggplot2 limits aesthetic
customization of these subplots within a faceted plot. For example,
assigning subsets of data to different subplots using multiple or
hierarchical variables, or applying dynamic color schemes for each
variable level is challenging. To circumvent this, users might need to
restructure their data frames to visualize data as intended, but this
only affects components that map data variables to aesthetics, not
elements like axis limits, background themes, or coordinate systems.
For plotting unique visual elements that have no relation to the given data across panels, a third method has been used. It involves combining separate ggplot2 plot objects into one composite figure using libraries such as cowplot (Wilke, 2019) and patchwork (Pedersen, 2019). This method enables users to draw a composite plot flexibly but requires them to code each subplot separately (see Pseudocode 1), resulting in repetitive scripts albeit with minor differences (compare Plot 1 and Plot 8 in Pseudocode 1). Also, this approach restricts the aesthetic control of the composite figure, such as its layout, annotations (including legends) and marginal space (see Figure 2), further encouraging users to code each subplot individually.
Put together, although ggplot2 has enjoyed its widespread user base, for visualizing a composite plot, users have had to write repetitive scripts, seek third-party packages, or resort to a vector graphics editor, straying from the recommended practices for scientific reproducibility.
# Pseudocode 1: Composite plot in Figure 2 using ggplot2
# Generate each plot separately
plot1 <- ggplot(data = <DATA>) +
<GEOM_FUNCTION>(mapping = aes(<MAPPINGS>)) +
... +
<THEME_FUNCTION>(<LESS SPACING>) + # unique for this panel
<THEME_FUNCTION>(<REMOVE X-TICKS>) + # unique for this panel
<ANNOTATE_FUNCTION>(<TEXT, SHAPE ANNOTATIONS>)
# Repeat for plot2, plot3, .... , plot7
plot8 <- ggplot(data = <DATA>) +
<GEOM_FUNCTION>(mapping = aes(<MAPPINGS>)) +
...
<THEME_FUNCTION>(<LESS SPACING>) +
<THEME_FUNCTION>(<REMOVE X-TICKS>) + # unique for this panel
<THEME_FUNCTION>(<REMOVE Y-TICKS>) + # unique for this panel
<ANNOTATE_FUNCTION>(<TEXT, SHAPE ANNOTATIONS>)
library(<THIRD_PARTY_PACKAGES>)
multi_plot <- <COMBINE_FUNCTION>(plot1, plot2, plot3, plot4, plot5,
plot6, plot7, plot8)
# Check if the multi_plot looks OK. If not, revise the codes that generate each plot.
On the other hand, the workflow for subplotting and creating a composite plot is simpler and more concise in Python’s matplotlib, as it requires programmatic approaches like building loops (see Pseudocode 2) and custom functions. Applying a programmatic approach ensures a full flexibility for graphical aesthetics because users can then allocate subsets of data to unique panels using any number and combinations of variables, as well as dynamically control the aesthetics, such as color, without writing repetitive scripts.
# Pseudocode 2: Composite plot in Figure 2 using Python's matplotlib
fig, ax = plt.subplots(nrows = 2, ncols = 4, sharex = True, sharey = True)
for <PLOT INDEX> in range(<NUMBER OF PLOTS>): # 8 iterations
ax[<PLOT INDEX>].<PLOTTING_FUNCTIONS>(<DATA>, <COLOR>)
fig.subplots_adjust(hspace = 0.2, wspace = 0.1) # more spacing between rows than columns
fig.text(x = 0.5, y = 0.95, 'Title of a Composite Figure')
The first line of Pseudocode 2 determines the structure of the
composite figure. The data themselves are plotted within a for
loop at each panel, iterating for the length of the total number of
subplots (eight total, see Figure 2). Due to the programmatic approach,
the color and other aesthetics in the plot for each panel can be
different, yielding more flexibility. In addition, although the codes
that generate the panels are identical, the panels actually
look different from one another as some have y-axis ticks or
x-axis ticks (or both; see Figure 2) because the layout of the combined
figure has already been established in the beginning. Furthermore, the
aesthetics of the composite figure can be controlled, such as the amount
of blank space between panels (hspace
and
wspace
in Pseudocode 2). Finally, after the panels have
been combined, a common legend and annotations (texts, shapes, points,
patches, lines, etc) can be added anywhere in the composite figure. In
short, matplotlib offers flexibility both at the level of each
panel and the composite figure, making it possible for the workflow of
generating a composite plot to be linear, with its clear start
and resolution. This versatility of control and a structured workflow
for performing complex visualizations are missing in
ggplot2.
A Need for a Solution: smplot2
Although the “grammar of graphics” interface in ggplot2 simplifies the code for a standalone figure, it can complicate the workflow when multiple ggplot2 outputs are combined into one composite figure, which has restricted flexibility for aesthetics. So, it has encouraged users to separately code each subplot and combine the subplots as late as possible. This is concerning, as ggplot2 has been widely used (see Figure 1) and research routines in psychology and human neuroscience have become more sophisticated.
To address this issue, I introduce smplot2, an open-source R package that integrates the practice of data visualization in ggplot2 and the programmatic approach to plotting. This package gives users equal levels of control over both individual subplots and a composite plot. It has over 40 functions at the time of writing (see 300+ examples in https://smin95.github.io/dataviz) but for brevity, in this tutorial, I will primarily discuss how it can linearize the workflow of visualizing elegant composite plots using a programmatic approach and maximize the flexibility for aesthetics in ggplot2. All examples here are created with aesthetic defaults of the smplot2 package, which are clean and appropriate for research articles across various fields and data structures. The functions of smplot2 have been optimized for subplotting to maximize the visibility of data in a composite plot by controlling the extent of blank spacing, scaling and the relative text size. I hope that this tutorial can empower readers to perform complex and expressive data visualizations of a composite plot using a structured workflow.
Aim and structure of the tutorial
The aim of this tutorial is not to reiterate the contents of the package’s documentation from the web in its entirety or introduce ggplot2 (Hehman & Xie, 2021; Wickham, 2016; Wickham, Çetinkaya-Rundel, & Grolemund, 2023). Instead, it aims to present a new workflow for the visualization of a composite plot in ggplot2 with a programmatic approach and the smplot2 package. In the first section, I will briefly introduce some of the visualization functions of smplot2, such as its background themes, which improve aesthetics for subplotting. Then, in the next three sections, I will demonstrate how we can produce subplots in ggplot2 iteratively, and then combine them into a composite plot using a linear process (similar as shown in Figure 2 for Python’s matplotlib) with three examples. The examples will become increasingly more sophisticated to demonstrate there is no limit to how users can create and customize composite figures. The tutorial is summarized in Table 1 in the end.
Target audience
The tutorial assumes that readers have some basic knowledge of R and
ggplot2, and some experience with working with data frames
using functions such as filter()
, group_by()
,
%>%
and summarise()
. They do not need to be
familiar with concepts of programming or be fluent in any other
programming languages, such as Python. Although, the examples in this
tutorial use randomly generated data based on human vision studies,
readers across disciplines will be able to adapt the codes/examples in
this tutorial easily for their own purpose.
Readers who have not used ggplot2 and R should read Chapters 2-3 of the package’s documentation webpage (https://smin95.github.io/dataviz) before starting this tutorial. The chapters provide a step-by-step guide on how to install RStudio and use ggplot2. Those who have not worked with data frames in R are recommended to read the tutorial by Nordmann et al. (Nordmann et al., 2022) or the early sections of Chapter 7 of the documentation webpage (sections 7.1 & 7.2). Completing these two prerequisites for the tutorial would take about 2-3 hours.
Installation requirements for this tutorial
These two packages - tidyverse (Wickham et al., 2019) and smplot2 - should be downloaded for the completion of the tutorial from the Comprehensive R Archive Network (CRAN). The tidyverse package is a suite of multiple packages, such as ggplot2 (for plotting and saving visualizations), dplyr (for working with data frames), and readr (for reading external data files).
Open science practices
With more than 300 examples, smplot2 has been documented online in detail (https://smin95.github.io/dataviz) with 12 chapters devoted to the package at the time of writing. The documentation webpage was created using the bookdown package for reproducibility (source codes in https://www.github.com/smin95/dataviz). The codes in the tutorial are posted online (https://www.smin95.com/smplot2doc).
Introduction to smplot2 - Background Themes
First and foremost, we should load the two packages to memory.
smplot2 offers various plotting and thematic functions. In this section, only the thematic functions will be discussed. For more information about the plotting functions (raincloud plot, slope chart, forest plot, Bland-Altman plot, etc), please see examples in Chapters 3-6 from the documentation webpage.
In this example, a randomly generated data set will be used as shown below:
set.seed(2022) # Set seed for generating random data
df <- data.frame(
Subject = rep(paste0('S', 1:16), times = 3),
Value = c(
rnorm(n = 16, mean = 0, sd = 1.5), # Day 1
rnorm(n = 16, mean = 5, sd = 1.7), # Day 2
rnorm(n = 16, mean = 10, sd = 2.0) # Day 3
),
Time = rep(paste("Day", 1:3), each = 16)
)
head(df)
## Subject Value Time
## 1 S1 1.3502130 Day 1
## 2 S2 -1.7600187 Day 1
## 3 S3 -1.3462280 Day 1
## 4 S4 -2.1667521 Day 1
## 5 S5 -0.4965204 Day 1
## 6 S6 -4.3509435 Day 1
The data frame is stored in the object df
. Each row of
the df
object represents a single observation from each
Subject
and Time
. The column
Subject
stores identifiers for all subjects in the form of
character strings; the column Value
stores the dependent
variable, which is the value of interest in this example; the column
Time
contains all identifiers for the independent variable,
which has three levels, in the form of character strings:
Day 1
, Day 2
, and Day 3
.
Figure 3. (A) A default raincloud plot with a background theme that has major horizontal grids. (B) A raincloud plot with a classic theme. (C) A raincloud plot with a minimal theme (i.e., no grids).
In this section, raincloud plots are drawn using the function
sm_raincloud()
to present the different themes (see Figure
3). Each subject’s data (as points), the sample’s distribution (in
violin plots), median and first and third quartiles (in boxplots) are
typically displayed in a raincloud plot. A black dot below the boxplot
for Day 1 denotes that an outlier is present. Details of this function
are described in Chapter 6 of the documentation webpage. Here,
the data from the Value
column are plotted as a function of
Time
. We can map the aesthetics (i.e., fill
)
within the ggplot()
function so that each unique color
represents each level of Time
(see the codes for Figure
3).
Each panel of Figure 3 shows a different background theme. The theme
with major horizontal grids is used in Figure 3A by default because
sm_raincloud()
implements the theme automatically. However,
this can be overwritten if users add another theme function modularly to
a ggplot2 object (ex. sm_classic()
is added to
generate Figures 3). These thematic functions provide minimalistic
aesthetics, and have borders
and legends
arguments. The former, if set to borders = TRUE
, will print
the border of the panel. The latter, if set to
legends = TRUE
, will print the legend of the standalone
plot. There are several background themes in the package:
sm_hgrid()
is a theme with horizontal major grids (Figure 2A).sm_vgrid()
is a theme with vertical major grids.sm_hvgrid_minor()
is a theme with horizontal and vertical grids (major and minor).sm_classic()
is a theme with a standard y-axis on the left side and x-axis at the bottom (Figure 2B).sm_minimal()
is a theme with no grids (Figure 2C).
# Figure 3A - Major horizontal grids
ggplot(data = df, mapping = aes(x = Time, y = Value, fill = Time)) +
sm_raincloud() + # Default
scale_fill_manual(values = sm_color('blue','darkred','viridian'))
# Figure 3B - Classic theme
ggplot(data = df, mapping = aes(x = Time, y = Value, fill = Time)) +
sm_raincloud(sep_level = 3) + # Separates the graphical components
sm_classic() +
scale_fill_manual(values = sm_color('blue','darkred','viridian'))
# Figure 3C - White background with no grids
ggplot(data = df, mapping = aes(x = Time, y = Value, fill = Time)) +
sm_raincloud(which_side = 'l') + # Changes the raincloud plot's facing direction
sm_minimal() +
scale_fill_manual(values = sm_color('blue','darkred','viridian'))
The themes have been developed to optimize the discernity of each
plotting feature (ex. relative text size, blank spacing, etc) even when
multiple subplots are combined into one composite figure. Here, for
instance, the three examples of the raincloud plot have been combined
into one figure using the function sm_put_together()
(codes
not shown), which we will discuss about extensively in the later
sections. To foreshadow, sm_put_together()
, which combines
subplots into a composite figure (as described later in the text),
essentially interacts with these themes and other functions to optimize
the aesthetics so that each plotting feature is discernible in a
multi-panel, composite figure. For this reason, these functions are
discussed before we create a composite plot. The hex codes of the three
colors in Figure 3 are from the sm_color()
function, which
primarily archives colors with high visibility, an important factor when
one creates a composite plot.
In the next three examples where composite plots will be created, we
will strictly use these thematic functions (ex. sm_hgrid()
and sm_minimal()
) and geom_*()
functions to
plot data in the form of lines and points (i.e., simplest type of data
visualization) so that users across all levels of experience and
background can understand the codes without knowing the plotting
functions of smplot2.
Example 1: Subplotting Data Using One Variable
Simulated dataset
Amblyopia is a visual deficit with origins in the primary visual cortex (Min et al., 2022). The simulated data here represent visual health in individuals with amblyopia and normal vision at various experimental conditions and types of visual stimuli. They will be used throughout the rest of the tutorial.
df2 <- read_csv('https://www.smin95.com/amblyopia_random2.csv')
df2_amb <- df2 %>% filter(Group == 'Amblyopia') %>%
mutate(logSF = log2(SF)) %>%
mutate(Condition = factor(Condition, levels = c('One','Two','Three')))
head(df2_amb)
## # A tibble: 6 × 6
## Subject absBP SF Group Condition logSF
## <chr> <dbl> <dbl> <chr> <fct> <dbl>
## 1 A1 0.168 0.5 Amblyopia Three -1
## 2 A1 1.37 1 Amblyopia Three 0
## 3 A1 1.29 2 Amblyopia Three 1
## 4 A1 2.67 4 Amblyopia Three 2
## 5 A1 0.0111 8 Amblyopia Three 3
## 6 A2 0.0136 0.5 Amblyopia Three -1
This dataset should be loaded to memory using the code above.
Throughout the tutorial, we will use %>%
operator, which
is known as the pipe. It allows the data frame from the
previous operation of data transformation to be carried over or
piped to the next operation. This reduces the burden of users
from supplying the input data frame for each operation. To begin with,
we extract the data from the df2
object only for
individuals in the Group == 'Amblyopia'
by using
filter()
. Next, the continuous variable SF
,
which is an acronym for spatial frequency, is converted into log2 scale
using mutate()
, which creates another column
(logSF
) based on the existing column (SF
) in
the data frame df2
. Through this logarithmic operation in
mutate()
, a new column logSF
is created, with
equal spacing along its scale. Then, the data type of the
Condition
column is changed using mutate()
; it
is initially a string but mutate()
converts it into a
factor and then re-orders the level of the variable to its numerical
order ('One'
-'Two'
-'Three'
).
After the data transformations, a newly formed data frame is stored in
the object df2_amb
, whose first six rows are displayed in
the tutorial. Readers can double-check by comparing their own printed
values of the df2_amb
object to those in the tutorial.
The column absBP
, which is short for absolute balance
point, contains data of the dependent variable (y-axis). It is a measure
of visual health. The higher the value, the worse the vision.
In this example, we will allocate data to each panel by using the
variable Subject
(i.e., subplotting with one variable). In
other words, each panel will display the data of each subject
(absBP
as a function of logSF
).
lapply()
lapply()
is one of the apply
functions from
base R. It applies a function to a list or a vector, and returns a
list with the same length as the input. A list is a data structure
of an object that can contain different types of elements, such as
strings, numbers and lists. Essentially, lapply()
is
similar to how a for loop works but it returns a list
as output. Since ggplot2
objects can be stored in a
list but not in other types of vectors, we will use
lapply()
to perform iterations. Pseudocode 3 shows the
basic syntax of lapply()
.
The input can be either a list or a vector. If the input has a length of five (i.e., five elements), then the function will be run five times, and an output list that has a length of five will be returned. In our case, the function will be plotting the data, with specific mapping and aesthetics, and generate ggplot2 objects. We will plot each of the nine individuals’ data, so we will run the function nine times (i.e., nine iterations). The returning output should therefore have a length of nine, each of which is a plot. Additional arguments can be passed to the function but in our tutorial there will not be any additional arguments, so these can be ignored.
First, we create an input object that specifies the nine subjects in
the Amblyopia Group
. The data frame df2
contains a column of identifiers for subjects. We see that these
subjects have identifiers A1
to A9
. These can
be recreated as vector subj_list
by concatenating the
string 'A'
with a sequence numbers 1:9
(1 to 9
in integers) using the function paste0()
. So, the elements
within the vector subj_list
will contain subject
identifiers that are also found in the Subject
column of
the data frame df2
.
In the lapply()
structure, through which we will plot
the data of each subject on a separate panel, there should be
two parts. The approach will be used throughout the
rest of the tutorial, and it can be widely applicable across designs and
disciplines:
The first part filters data using the index of the iteration. Here,
iSubj
is the index of the iteration, and it starts from 1 and ends at 9 as specified by1:length(subj_list)
. During each iteration, the index is used to retrieve the element of the objectsubj_list
, ex.A9
fromsubj_list[iSubj]
wheniSubj = 9
. The extracted subject identifier is then used to filter for each subject’s data before plotting begins (ex.filter(Subject == subj_list[iSubj])
). The filtered data is stored in the objectsubj_data
, which will be used by the subsequent plotting functions; so, plotting will only use the filtered data from each subject.The second part of the lapply() function plots the filtered data. The variables are mapped to aesthetics, and the appearance of the plot is customized using functions from ggplot2. Here, the specifications are set so that the data frame to be used is
subj_data
, thatx
islogSF
, and thaty
isabsBP
, which is the outcome of interest in this simulated dataset. Moreover, the aestheticgroup
is mapped to the variableCondition
of the data framesubj_data
, so that the points are connected with lines for each condition. Also, within theggplot()
function, theshape
, the filling color of the points (fill
), as well ascolor
of the lines are all set to be unique for each condition. In other words, all three conditions will be plotted in one panel of each subject at once.
In this example, each condition is coded to a unique shape with the
function scale_shape_manual()
. The first condition is coded
to the shape value of 21 (circle with borders), the second to the value
of 22 (square with borders), and the third to the value of 23 (triangle
with borders). Since these shapes have borders, the argument
fill
determines their filling color, and color
determines their border color, which is set to transparent
.
Similarly, with scale_fill_manual()
, each condition is
color coded to a specific filling color. The colors are specified in the
object cList
, which is defined outside the
lapply()
function using the sm_palette()
function that returns three colors here. This function returns default
colors of the package and is equivalent to sm_color()
,
except that it takes the number of colors as input instead of character
strings specifying the colors. Users are encouraged to find their own
color schemes from other packages, such as RColorBrewer and
viridis, available in the R ecosystem.
subj_list <- paste0("A", 1:9) # 9 subjects
cList <- sm_palette(3) # Three colors from smplot2 (defaults)
indv_plots <- lapply(1:length(subj_list), function(iSubj) {
# First part: Filter for each subject's data during each iteration
subj_data <- df2_amb %>%
filter(Subject == subj_list[iSubj])
# Second part: Plot each subject's data
ggplot(data = subj_data, aes(
x = logSF, y = absBP, group = Condition,
shape = Condition, fill = Condition, color = Condition
)) +
geom_line(linewidth = 1) +
geom_point(size = 5, color = "transparent") +
scale_color_manual(values = cList) +
scale_fill_manual(values = cList) +
scale_shape_manual(values = c(21, 22, 23)) +
sm_hgrid() +
scale_y_continuous(limits = c(0, 3)) +
scale_x_continuous(
limits = c(-1.3, 3.3),
labels = c(0.5, 1, 2, 4, 8)
)
})
As lapply()
performs function each time, a plot will be
generated. Each plot will get stored in the object
indv_plots
, which is a list. Since
length(subj_list)
is 9 and the input for the
lapply()
function is a digit from from 1 to 9
(1:length(subj_list)
), there will be nine iterations, and
hence, nine plots that will be generated.
When one codes for multiple subplots in a lapply()
function (second part of the structure), it is important to make the
limits of x- and y-axes identical. In the lapply()
function, both have been specified using
scale_y_continuous()
(y-axis: 0 to 3) and
scale_x_continuous()
(x-axis: -1.3 to 1.3). If there is no
specification of the limits, each plot will have its own limit based on
each subject’s data. Also, notice that although we plot data as a
function of the variable logSF
, which has values of -1, 0,
1, 2 and 3, the tick labels of the x-axis are displayed as 0.5, 1, 2, 4
and 8. This is because the labels
argument has been
supplied with these specifications in the function
scale_x_continuous()
. Essentially, these inputs mask over
the true values of the x ticks on the plot. This is a common method of
plotting data in human vision studies because the visual system has been
known to process information non-linearly (Baker,
Wallis, Georgeson, & Meese, 2012), and it is specific to the
examples in the tutorial.
To display a plot from the object indv_plots
, users can
type the name of the list indv_plots
in the console, or
subset for one specific subject’s plot using double brackets (ex.
indv_plots[[3]]
). This individual plot still has ticks for
x- and y-axes as well as their labels. However, these will be removed
automatically later or resized during the generation of a composite
plot.
Next, we define the title, as well as common x- and y-axes labels of
the composite figure that we will create. As their names suggest,
sm_common_title()
sets the title of the combined figure,
sm_common_xlabel()
sets the common x-axis label of the
combined plot, and sm_common_ylabel()
sets the common
y-axis label of the combined figure. In these three functions,
x
and y
control the location of the texts.
Their defaults are set to x = 0.5, y = 0.5
which refers to
the center origin of their respective areas (x
and
y
do not refer to the coordinate relative to the combined
figure).
# Figure 4 - Set the title and axis labels of the composite figure
title <- sm_common_title("Individual data (subplotting with one variable)", x = 0.53, y = 0.52)
xlabel <- sm_common_xlabel("Spatial frequency (c/deg)", x = 0.52)
ylabel <- sm_common_ylabel("Visual deficit")
Notice that this process is highly similar to what is often used in
Python’s matplotlib, where fig
refers to the object that
stores the combined figure (see Pseudocode 4).
# Pseudocode 4: Titles and axis labels in Python's matplotlib (Figure 4)
fig.suptitle('Individual data', x, y)
fig.text(x, y, 'Spatial frequency (c/deg)') # x-axis label
fig.text(x, y, 'Visual deficit', rotation = 90) # y-axis label
Figure 4. A composite plot with three columns and three rows. Each panel plots each subject’s data across all three conditions. A legend is absent in this figure.
In Python’s matplotlib, the text labels and the title get
added to fig
(the composite plot) using the object-oriented
approach. Here, we will use sm_put_together()
, which is
essentially a layout function that creates a composite figure from
individual plots. The output from sm_put_together()
must be
stored in an output object (ex. plots_tgd
). Three inputs
must be provided to run sm_put_together()
: 1) the list
object, which stores all plots (ex. all_plots
argument =
indv_plots
), 2) the number of columns (ex.
ncol
argument = 3
), and 3) the number of rows
(ex. nrow
argument = 3
). There are optional
arguments as well, such as title
, xlabel
and
ylabel
. For instance, if title
is not supplied
as input, then no space will be allocated for a common title in the
composite figure. Here, we will supply title
in
sm_put_together()
. In addition, the extent of blank spacing
(i.e., margin) in both width (wmargin
) and height
(hmargin
) can be adjusted (see Figure 2). In this example,
they are set as negative values to minimize the spacing between
subpanels.
# Figure 4 - Combine subplots into a specified layout
plots_tgd <- sm_put_together(
all_plots = indv_plots, title = title, xlabel = xlabel,
ylabel = ylabel, ncol = 3, nrow = 3,
wmargin = -4.5, hmargin = -4.5
)
We can save the figure using ggsave()
(see Figure 4). We
supply the name of the image file in strings
('together1.pdf'
) as the function’s first input, and the
object of the composite figure (plots_tgd
) as its second
input. This forces the function save the object plots_tgd
as together1.pdf
in your directory. Also, we set the
dimension of the image so that it has a height
and a
width
of 9 inches.
# Figure 4 - Save the composite output as a vector file
ggsave("together1.pdf", plots_tgd,
width = 9, # inches
height = 9
)
Immediately in Figure 4, we can notice that the function
sm_put_together()
has removed extraneous tick labels and
titles from both axes in the inner panels. This was possible because we
had provided the layout of the composite figure that was to be
constructed in sm_put_together()
, which is similar to how
matplotlib controls the layout of the figure (see Pseudocode
2). Although the default of the function removes the extraneous ticks in
inner panels (remove_ticks = 'some'
in
sm_put_together()
), this option can be overwritten so that
all ticks are kept (remove_ticks = 'none')
or removed
(remove_ticks = 'all'
). The order of the subpanels follows
that of plots that are generated by the lapply()
code
chunk. In this case, we set the layout to be
(ncol = 3, nrow = 3
), so axis ticks in the 2nd, 3rd, 5th
and 6th panels in the composite figure are removed.
We can also label each panel by annotating each subject’s identifier
(ex. A1 for Subject 1 in Amblyopia
; see Figure 5). There
are two ways of achieving this. The first way is revisit and modify the
code chunk that generates the nine points iteratively using
lapply()
, but this goes against our aim of linearizing the
process of subplotting. Therefore, we will use the function
sm_panel_label()
to label each panel.
# Figure 5 - Add subject identification label (ex. A1) in each panel
indv_plots_label1 <- sm_panel_label(
all_plots = indv_plots, x = 0.15, y = 0.85,
panel_pretag = "A", panel_tag = "1",
text_color = "black"
)
The function sm_panel_label()
has a few arguments, some
of which are similar to those of the former function. For the first
argument all_plots
, users must provide the list vector
(indv_plots
) that stores all plots. Next, x
and y
determine the location of the panel label; 0.5 is the
origin of the panel (i.e., the center of each subplot).
panel_tag
sets the string for enumeration. In this example,
we set panel_tag = "1"
so that the first panel will have
“1” labelled but the next one will have “2”. There are other options to
enumerate each panel, such as: 1) panel_tag = "A"
for
uppercase letters, 2) panel_tag = "a"
for small case
letters, 3) panel_tag = "I"
for upper roman numerals, and
4) panel_tag = "i"
for lower roman numerals. These options
of the panel_tag
argument were included as inspired by the
plot_annotation()
function of the patchwork
package (Pedersen, 2019). Also, there are
tag labels that can be set to be consistent across panels:
panel_pretag
and panel_posttag
. As their names
imply, panel_pretag
comes before panel_tag
,
and panel_posttag
comes after panel_tag
. These
two arguments can be any string at any lengths. To label each panel
using the subject’s identifier that is consistent with those in the data
frame df2
(ex. A1
and A3
),
panel_pretag
should be “A”. Then, we store the output from
sm_panel_label()
in the object
indv_plots_label1
. The differences between
plot_annotation()
and sm_panel_label()
are
that in sm_panel_label()
1) the locations can be specified
in x
and y
coordinates within but not outside
each panel, 2) annotations can be added multiple times (as demonstrated
in this example) in sequence, 3) the plot input must be a single list
object rather than separate ggplot2 objects.
We will also label each panel so that the first panel has “a)” and
the second panel has “b)”. To do so, we use the function
sm_panel_label()
again, where we provide the
indv_plots_label1
object as input and set
panel_tag = 'a'
and panel_posttag = ')'
. This
creates labels with a small alphabet that is followed by a bracket in
each panel. The final output is then saved in the
indv_plots_label2
object, which is the end-result of
running sm_panel_label()
twice.
A composite plot with two rows and five columns. Nine panels display each subject’s data in the Amblyopia group, while the last panel shows the legend, representing each condition with a unique color.
# Figure 5 - Add panel label in small alphabets followed by a bracket
indv_plots_label2 <- sm_panel_label(
all_plots = indv_plots_label1, x = 0.15, y = 0.7,
panel_tag = "a", panel_posttag = ")",
text_color = "black", fontface = "bold"
)
Next, we sort the nine panels into a layout with five columns and two
rows (ncol = 5, nrow = 2
) using the function
sm_put_together()
. We also add the common title, common
x-axis label and common y-axis label by directly supplying character
strings rather than using sm_common_*()
functions. This
option is less flexible but it is more convenient; the text size can
still be adjusted using the labelRatio
argument, where 1
refers to the default size, but not its location. The
labelRatio
argument does not affect the size of text labels
created from sm_common_*()
functions.
sm_put_together()
also supports combining subplots with
secondary x- and y-axes (not shown in the tutorial);
xlabel2
and ylabel2
should be provided to set
the titles for these axes.
# Figure 5 - Combine the subplots into one figure
plots_tgd2 <- sm_put_together(
all_plots = indv_plots_label2,
title = "Individual data (subplotting with one variable)",
xlabel = "Spatial frequency (c/deg)",
ylabel = "Visual deficit", ncol = 5, nrow = 2,
wmargin = -2, hmargin = -2, labelRatio = 0.9
)
Now that a composite figure has been created with individual subplots
and labels, we will add a common legend in the combined figure
plots_tgd2
. There are two ways to do so using
smplot2. There is a quick way and a slow but highly
customizable way. They both involve the function
sm_add_legend()
. To preview, readers can compare the legend
in Figure 5 from the quick method with the legend in Figure 6 from the
slow method.
The first method of adding legend basically forces
sm_add_legend()
to derive a legend from a reference plot so
that users do not have to manually make it. To make the legend using the
quick method, users should provide some inputs for some arguments. The
output from sm_put_together()
(plots_tgd2
)
must be supplied as input for the argument combined_plot
.
The coordinate of the legend can be specified using x
(horizontal coordinate of the legend) and y
(vertical
coordinate of the legend) arguments. Also, a reference plot from which
the legend can be derived must be supplied for the argument
sampleplot
(i.e., one plot from indv_plots
).
In this example, the coordinate is set to be within the area of the
empty 10th panel (x=0.92
, y=0.35
); the sample
plot is derived from the first subject’s plot
(indv_plots[[1]]
). The direction
argument
(i.e., orientation) of the legend is specified to be
vertical
, not horizontal
.
legend_spacing
is an argument that can set the extent of
blank space within the legend to prevent overcrowding. If
border = FALSE
, then the border of the legend will be
removed. The font size of the legend can be adjusted using the argument
font_size
. The code below stores the output from
sm_add_legend()
in the object
plots_tgd2_legend
, and then saves the figure as a vector
file using the ggsave()
function with specified
width
and height
.
# Figure 5 - Legend in the area of the 10th panel
plots_tgd2_legend <- sm_add_legend(
combined_plot = plots_tgd2, x = 0.92, y = 0.35,
sampleplot = indv_plots[[1]], direction = "vertical",
legend_spacing = 1, border = TRUE, font_size = 13
)
# Figure 5 - Save the composite figure as a vector file
ggsave("together2.pdf", plots_tgd2_legend,
width = 15, # inches
height = 6.6
)
We can make two observations from this legend in Figure 5. First, the
legend’s title matches to one of the column’s name
(Condition
) in the data frame df2
. Second,
labels within legends are identical to the string characters that are
provided in the Condition
column of df2
. So,
these similarities indicate that the legend’s title and labels have been
automatically generated according to the given data frame. If the legend
is created in this quick approach (by forcing
sm_add_legend()
to derive one from a sample plot), the
title and the labels cannot be customized although the title can be
removed.
# Compute the average and standard error for each SF and Condition level
df2_amb_avg <- df2_amb %>%
group_by(logSF, Condition) %>%
summarise(
avgBP = mean(absBP),
stdErr = sm_stdErr(absBP), .groups = "drop"
)
head(df2_amb_avg)
## # A tibble: 6 × 4
## logSF Condition avgBP stdErr
## <dbl> <fct> <dbl> <dbl>
## 1 -1 One 0.0769 0.0182
## 2 -1 Two 0.283 0.0649
## 3 -1 Three 0.199 0.0720
## 4 0 One 0.234 0.151
## 5 0 Two 0.491 0.0705
## 6 0 Three 0.374 0.135
Since we have one empty panel that is available for plotting (10th
panel of Figure 5), we can add an additional panel, which shows the
average data of the nine subjects with error bars (ex. standard error).
This panel showing the average data should have the same x- and y-limits
as those of the individual subjects’ panels. Next, we compute the
average and standard errors of the data from nine individuals for each
independent variable (logSF
) and experimental condition
(Condition
), and store the resulting data frame into the
object df2_amb_avg
. The initial step can be achieved using
functions from the dplyr
package, such as
group_by()
and summarise()
.
group_by()
does not change the data frame at the surface
level. Instead, it changes its underlying structure so that the
following functions that will be called later for computations within
summarise()
will be done separately for each grouped
variable’s level. The two computations - mean and standard error - are
conducted using the functions mean()
and
sm_stdErr()
, respectively. The latter is a shortcut
function from the smplot2 package. The grouping will remain
even after the computation has been performed, so it is crucial to undo
the grouping by setting .groups = 'drop'
in
summarise()
. More information about these functions can be
found in Chapter 7 of the documentation webpage (https://smin95.github.io/dataviz).
# Figure 6 - 10th panel showing the average data
avg_plot <- ggplot(data = df2_amb_avg, aes(
x = logSF, y = avgBP, group = Condition,
shape = Condition, fill = Condition, color = Condition
)) +
geom_line(linewidth = 1) +
geom_point(size = 5, color = "white", stroke = 1) +
geom_linerange(aes(ymin = avgBP - stdErr, ymax = avgBP + stdErr), linewidth = 1) +
scale_color_manual(values = sm_palette(3)) +
scale_fill_manual(values = sm_palette(3)) +
scale_shape_manual(values = c(21, 22, 23)) +
sm_hgrid() +
scale_y_continuous(limits = c(0, 3)) +
scale_x_continuous(
limits = c(-1.3, 3.3),
labels = c(0.5, 1, 2, 4, 8)
) +
annotate("text", label = "Average", x = -0.3, y = 2.65, size = 5.5,
fontface='bold')
Figure 6. A composite plot with two rows and five columns with a common legend that is located at the bottom-right area of the figure. The first nine panels display each subject’s data from the Amblyopia group, while the last panel shows the average data with error bars, which represent standard errors.
With the newly created data frame df2_amb_avg
, we can
plot the average data using the same mapping specifications as those in
the individual plots in the lapply()
function. Average data
are plotted as points with white borders using the
geom_point()
function. The lines are drawn to join the
points with geom_line()
, and the error bars without caps
are displayed using geom_linerange()
, which is a useful
function for indicating intervals of some range. The aesthetic mapping
is defined in geom_linerange()
so that vertical lines with
certain ranges can be plotted at each level of logSF
(x-axis); we explicitly specify the minimum
(ymin = avgBP - stdErr
) and maximum
(ymax = avgBP + stdErr
) of the vertical range to be equal
to the range of the standard error of the average data. We do not use
lapply()
function here because we need to make one
plot.
# Figure 6 - Combine all the subplots into a composite plot
all_plots <- list(indv_plots_label1, avg_plot)
plots_tgd3 <- sm_put_together(
all_plots = all_plots,
title = "Individual data and average (subplotting with one variable)",
xlabel = "Spatial frequency (c/deg)",
ylabel = "Visual deficit", ncol = 5, nrow = 2,
wmargin = -4.5, hmargin = -4.5, labelRatio = 0.9
)
The limits of both x- and y-axes, as well as the thematic background
(i.e., sm_hgrid()
), are set to be identical to those of the
individual plots. Also, we annotate the average plot with the bolded
text 'Average'
using annotate()
, where we can
specify its coordinate to be at the top-left of the panel
(x = -0.3, y = 2.65
) in the units of the plotted data
(x = logSF
, y = avgBP
). The plot output is
then saved in the object avg_plot
.
Then, we store all ten plots (9 individuals’ plots in
indv_plots_label1
+ one average plot in
avg_plot
) that we have generated into one list
using the function list()
and then assign the output to the
object all_plots
. The all_plots
object will be
the input for sm_put_together()
, which will create a
composite plot using the plots, title and axis labels with a layout
(ncol = 5
and nrow = 2
).
Since we have ten panels to plot in a layout with five columns and
two rows, there should already be a limited amount of available space
for the legend (see Figure 6 for our final output). So, to effectively
use the remaining plotting space, we will have to build and customize a
legend using the function sm_common_legend()
rather than
relying on the automatically generated legend from
sm_add_legend()
. After creating a legend manually, we can
then add it to the composite plot using sm_add_legend()
at
a specific location within the combined figure. This option requires
more work but it is more flexible.
To do so, we need to essentially create a new plot using the standard
procedure of ggplot2 (see codes below). This includes setting
the mapping the x
and y
variables to certain
aesthetics. Points are also drawn using geom_point()
so
that they are included in the legend. The legend labels have also been
changed, as specified in the two scale_*()
functions.
Finally, we finish creating the legend by using
sm_common_legend()
, which essentially hides all features of
a normal graph, such as points and axis lines that will be plotted
otherwise. As a result, the output legend2
only prints the
legend components when it gets called. We set the legend to have a
horizontal
orientation with no borders
(border = FALSE
). The text size of the legend can also be
adjusted using the argument textRatio
, which has been set
to 1.1 in this example; this means that the text size of the legend is
1.1x larger than the default from a given theme. Lastly,
legend_spacing
controls the amount of blank space in the
legend.
# Figure 6 - Create a legend manually
legend2 <- ggplot(data = df2_amb, aes(
x = logSF, y = absBP, group = Condition,
shape = Condition, fill = Condition
)) +
geom_point(size = 4.5, color = "white") +
scale_fill_manual(
values = sm_palette(3),
labels = c(
"Condition 1 ", "Condition 2 ",
"Condition 3 "
)
) +
scale_shape_manual(
values = c(21, 22, 23),
labels = c(
"Condition 1 ", "Condition 2 ",
"Condition 3 "
)
) +
sm_common_legend(
title = FALSE, direction = "horizontal", border = FALSE,
textRatio = 1.1, legend_spacing = .9
)
The customized legend can be added to the composite plot with the
function sm_add_legend()
at a specific coordinate
(x = 0.84, y = 0.05
; bottom-right region of Figure 6).
Since we have manually created the legend with
sm_common_legend()
, there is no need for us to supply
inputs for other arguments in sm_add_legend()
, such as
direction
, border
and sampleplot
,
all of which will be ignored. The final output - a composite figure that
shows both individual plots and a panel that shows the average data
(Figure 6) - is saved using ggsave()
from the
ggplot2
package.
# Figure 6 - Save the figure with a legend as a vector file
plots_tgd3_legend <- sm_add_legend(
combined_plot = plots_tgd3, legend = legend2, x = 0.84,
y = 0.05
)
ggsave("together3.pdf", plots_tgd3_legend,
width = 15, # inches
height = 6.6
)
Readers might realize that they could also generate Figures 4-6 with
facet_wrap()
. Indeed, when subplotting data using one
variable, using facet_wrap()
might be simpler. However, the
advantage of using smplot2
’s pipeline with
lapply()
is that it remains very similar even if more
variables or lapply()
functions are added (next two
examples).
Example 2: Subplotting Data Using Two Variables
Thus far, we have only explored a relatively simple way of assigning
data to each panel. In this example, we will allocate data to each panel
using two variables (Condition
and Subject
Group
).
In this example, the same dataset (df2
) will be used
albeit with some data transformations. Average data at each level of
condition and subject group will be plotted. There are three
experimental conditions and two groups, totaling to six combinations of
levels from the two variables. Therefore, the data will be allocated to
six separate panels.
df2_avg <- df2 %>%
mutate(logSF = log2(SF)) %>%
mutate(Condition = factor(Condition, levels = c("One", "Two", "Three"))) %>%
group_by(logSF, Condition, Group) %>%
summarise(
avgBP = mean(absBP),
stdErr = sm_stdErr(absBP), .groups = "drop"
)
head(df2_avg)
## # A tibble: 6 × 5
## logSF Condition Group avgBP stdErr
## <dbl> <fct> <chr> <dbl> <dbl>
## 1 -1 One Amblyopia 0.0769 0.0182
## 2 -1 One Normal 0.149 0.0491
## 3 -1 Two Amblyopia 0.283 0.0649
## 4 -1 Two Normal 0.287 0.0707
## 5 -1 Three Amblyopia 0.199 0.0720
## 6 -1 Three Normal 0.244 0.0868
To begin with, the original data frame df2
is
transformed similarly as in the previous example by creating another
column for spatial frequency in log-scale to achieve equal spacing
(logSF
column) and re-ordering the level of the
Condition
column to its proper, numerical order by
converting it into factor from strings
('One'
-'Two'
-'Three'
). Next, the
codes compute the average and standard error for each combination of the
two variables. This is possible because the underlying structure of the
data frame is transformed using group_by()
so that
subsequent computations for average and standard error on these data in
summarise()
are performed according to the specified
groupings. As in Example 1, the mean is computed using
mean()
and the standard error is computed using
sm_stdErr()
.
In this example, there will be two levels of lapply()
structure in the code fragment because we will perform subplotting with
two variables (Group
and Condition
). Hence,
the code structure will have one inner function and one outer function.
This is better known as a nested structure, which involves
using functions in a hierarchical fashion. The outer function will
iterate around the variable Group
, and the inner function
around Condition
. This structure of the nested functions
will affect the order in which the plots will be generated and stored in
the object avg_plots
. Specifically, plots from the first
level of Group
and all three levels of
Condition
will be generated first, followed by those from
the second level of Group
.
With the structure of the nested lapply()
functions in
mind, we can first create vectors that contain string elements that
match the identifiers of Group
and Condition
columns from the df2_avg
data frame. These are then stored
in group_list
and cond_list
objects,
respectively. Each iteration of the nested functions will filter the
average data based on the selected element of group_list
and cond_list
from their indices, ex.
Group == group_list[[iGroup]]
, where iGroup
=
1, and therefore, Amblyopia
.
# Figure 7 - Visualize each subplot
group_list <- c("Amblyopia", "Normal")
cond_list <- c("One", "Two", "Three")
shape_list <- c(21, 22, 23) # Shape for each condition
cList <- list(
c("#ddc7d8", "#d3a7c0", "#b7729a"), # Color for each subject group
c("#bababa", "#999999", "#636262")
)
avg_plots <- lapply(1:length(group_list), function(iGroup) {
lapply(1:length(cond_list), function(iCond) {
# First part: Filter average data for each group & condition during each iteration
currData <- df2_avg %>%
filter(Condition == cond_list[iCond]) %>%
filter(Group == group_list[iGroup])
# Second part: Plot the filtered average data
pp <- ggplot(data = currData, aes(x = logSF, y = avgBP)) +
geom_area(fill = cList[[iGroup]][[iCond]], alpha = 0.3) +
geom_line(linewidth = 1, color = cList[[iGroup]][[iCond]]) +
geom_point(
size = 5, shape = shape_list[[iCond]], color = "white",
fill = cList[[iGroup]][[iCond]], stroke = 1
) +
geom_linerange(aes(ymin = avgBP - stdErr, ymax = avgBP + stdErr),
linewidth = 1, color = cList[[iGroup]][[iCond]]
) +
scale_y_continuous(limits = c(0, 1.6)) +
scale_x_continuous(
limits = c(-1.3, 3.3),
labels = c(0.5, 1, 2, 4, 8)
) # pp is the intermediate plot output
# Third part (optional): Apply different themes based on subject grouping
if (group_list[iGroup] == "Amblyopia") {
pp + sm_minimal() # No grids for Amblyopia
} else {
pp + sm_hgrid() # Horizontal grids for Control
}
})
})
In addition, shapes are set to be unique for each of the three
experimental conditions; their values are stored in the
shape_list
vector, and each value will get selected during
the iteration for each condition to specify the shape when plotting (ex.
shape = shape_list[[iCond]]
). The color palettes for the
two subject groups are set to be different, and the intensity of the
color is set to increase as a function of Condition
. The
color values (in hex codes) are stored in the list vector
cList
, which contains six different colors that have been
separated into two vectors (one for each Group
). Therefore,
if the iteration has an index for the first level of Group
and the second level of Condition
, the corresponding color
will be cList[[1]][[2]]
, where cList[[1]]
contains three colors that are in the pink palette in the increasing
intensity. Here, the first level of Group
is
Amblyopia
because the first element of
group_list
is Amblyopia
. Lastly, using
if
conditional statements, we only allow subplots of
Normal
group’s data to have horizontal grids but not those
of Amblyopia
. This is possible because the intermediate
plotting output is stored as in the second part of the
lapply
function. Thematic functions are then added
modularly to pp
in the third part of the
lapply
function, creating a final output. The third part is
optional to perform subplotting, and it can be useful to set specific
customizations. Integrating the programmatic approach for plotting
allows us to dynamically control aesthetics, such as color, shape and
theme (Figure 7), which is very difficult to do in ggplot2
unless users code plots separately.
As in Example 1, the lapply()
function must contain two
parts. The first part filters for the data of interest, which are
average data at each group and condition. The filtered data is stored in
the object currData
. Then, the second part of the function
plots the data from currData
. Specifically, the average
data are plotted as points using geom_point()
, whereas the
associated standard error values are drawn in vertical lines using
geom_linerange()
(as explained in Example 1). Here, we use
an additional function from the ggplot2 package:
geom_area()
, which plots area (i.e., filled line plots).
The function essentially fills the area below the lines of the plot with
colors. Coloring the area is useful to illustrate the magnitude of the
data. In this example, we make it transparent to some extent by setting
alpha = 0.3
; if alpha = 1
, the colored area
will be opaque. Furthermore, the x- and y-axes limits are set to be
identical for all panels using scale_x_continuous()
and
scale_y_continuous()
functions because the panels will get
combined into one composite figure with shared tick labels. Finally, the
theme is set to sm_hgrid()
to optimize the aesthetics of
each panel for subplotting.
Figure 7. A composite plot with two rows and three columns, showing the average data from each condition and group with error bars (standard error). The first row shows data of the Amblyopia group, whereas the second row shows the data of the Normal group, as specified in the lapply() function. The main and secondary titles have been added as annotations.
Notice that in this example, the object avg_plots
is a
list of list, where each element is a list containing three
plots. So, it has a length of 3 even if it stores six plots in total.
However, the function sm_put_together
still recognizes it
as a list with six elements (i.e., plots) because the function
automatically flattens each element if the element is a list
itself. So, there is no need for us to manually reorganize the structure
of the object avg_plots
for the function
sm_put_together()
to operate. It is important to be aware
that the order of the plots that will be used in the composite figure
from sm_put_together
is: avg_plots[[1]][[1]]
,
avg_plots[[1]][[2]]
, avg_plots[[1]][[3]]
,
avg_plots[[2]][[1]]
, avg_plots[[2]][[2]]
, and
avg_plots[[2]][[3]]
. So, if we are to subplot these panels
in a 2x3 figure, then three plots from avg_plots[[1]]
will
be on the first row, whereas three plots from
avg_plots[[2]]
will be on the second row.
Next, we set y-axis label of the combined figure as in Example 1 by
directly providing character strings in sm_put_together()
.
However, for the xlabel
, we use the output created from
sm_common_xlabel()
, demonstrating that both options of
labelling the axes can work in concert. Here, the argument
wRatio
controls the width of the leftmost column to those
of other columns. The value exceeds the value of 1 because the panels in
the leftmost column have y-axis ticks, capturing additional plotting
space. If an input for this argument is missing, the function by default
adjusts a width ratio using the information about the composite plot,
such as the number of lines and characters in the tick labels. The
argument ylabel2
has an input of an empty string because,
if the input is supplied in any form (even when it is empty), some space
will be spared on the right side of the composite plot (area for labels
of secondary y-axis), where we will add labels of the two subject
groups. As previously noted, labelRatio
only affects axis
labels that are created directly from sm_put_together()
, so
it will adjust the size of ylabel
but not
xlabel
.
# Figure 7 - Combine the subplots and specify the layout
xlabel <- sm_common_xlabel("Spatial frequency (c/deg)", x = 0.52)
avg_plots_tgd <- sm_put_together(
all_plots = avg_plots,
title = "", # Spare space for title
xlabel = xlabel,
ylabel = "Visual deficit",
ylabel2 = "", # Spare space for group label
ncol = 3, nrow = 2, wRatio = 1.1, wmargin = -2, hmargin = -2,
labelRatio = 0.95 # Text size of the ylabel
)
In this example, notice that we also did not supply a character
string for the main title (title
argument) of the combined
plot in sm_common_title()
. By putting an empty string, we
merely allocated some space for the title at the top of the figure,
where we will add text annotations using sm_add_text()
. We
can set the coordinate of the title to be at the center of the x-axis
and top along the y-axis (x = .55, y = .98
, where 0.5
represents the origin of the combined figure) and its
fontface
to be bold
. The text annotation
itself can be defined using the label
argument within
sm_add_text()
.
As for the group labels, we can also use sm_add_text()
to denote the two subject groups by setting the orientations of the
texts at 270 degrees relative to the horizontal axis using the
angle
argument. We position them on the right side of the
composite figure by setting x = 0.93
.
Next, because we have assigned data to multiple panels using two
variables (groups and conditions), it leaves us with one more variable
(Condition
) to label in the composite plot. Here, we can
add a sub-title at the top of each column where we label each condition
(as shown in Figure 7) using sm_add_text()
. When using
sm_add_*()
functions, the coordinate is uniform regardless
of the size of the composite plot output that is generated from
sm_put_together()
or sm_add_legend()
(0 to 1 ;
x = 0.5, y = 0.5
is the center); essentially, the
annotations can be added to the composite figure similarly as to how
geom
objects can be added together to form a
ggplot2 object with a common coordinate.
# Figure 7 - Add text annotations
avg_plots_tgd1 <-
avg_plots_tgd + # Composite plot
sm_add_text(
label = "Average data (subplotting with two variables)", # Main title
x = 0.53, y = 0.98, fontface = "bold", size = 17
) +
sm_add_text(label = "Condition 1", x = 0.25, y = 0.92, size = 14) + # Sub-title for Column 1
sm_add_text(label = "Condition 2", x = 0.51, y = 0.92, size = 14) + # Sub-title for Column 2
sm_add_text(label = "Condition 3", x = .78, y = .92, size = 14) + # Sub-title for Column 3
sm_add_text(label = "Control", x = 0.93, y = 0.33, angle = 270, size = 15) + # Group label
sm_add_text(label = "Amblyopia", x = 0.93, y = 0.71, angle = 270, size = 15) # Group label
The final figure is stored in the object avg_plots_tgd1
,
which then gets saved as an image using the ggsave()
function from the ggplot2 package.
Example 3: Complex Subplotting Using Separate lapply()
Functions
In this example, using the data frames df2_amb
and
df2_avg
from the previous examples, we will create a
composite figure that plots the data of individuals in the
Amblyopia
group in a slightly more complex way that is not
currently possible with ggplot2 or its third-party packages
that enhance its faceting functions. This time, we will allocate the
data from each condition of each individual to a unique panel, as well
as plot the average data for each condition on a unique panel.
# Figure 8 - Generate three subplots for each subject
subj_list <- paste0("A", 1:9) # 9 subjects
cond_list <- c("One", "Two", "Three")
shape_list <- c(21, 22, 23)
cond_cList <- c("#ddc7d8", "#d3a7c0", "#b7729a")
indv_plots <- lapply(1:length(subj_list), function(iSubj) {
lapply(1:length(cond_list), function(iCond) {
# First part: Filter data for each subject & condition during each iteration
subj_data <- df2_amb %>%
filter(Subject == subj_list[iSubj]) %>%
filter(Condition == cond_list[iCond])
# Second part: Plot the filtered data
ggplot(data = subj_data, aes(x = logSF, y = absBP)) +
geom_area(fill = cond_cList[[iCond]], alpha = 0.3) +
geom_line(linewidth = 1, color = cond_cList[[iCond]]) +
geom_point(
size = 5, shape = shape_list[[iCond]],
color = "transparent", fill = cond_cList[[iCond]]
) +
sm_hgrid() +
scale_y_continuous(limits = c(0, 3)) +
scale_x_continuous(
limits = c(-1.3, 3.3),
labels = c(0.5, 1, 2, 4, 8)
) +
annotate("text",
label = paste0("A", iSubj), x = -0.9, y = 2.65, size = 5.5,
hjust = 0
)
})
})
As the title for this example implies, we will create two separate
lapply()
functions to build a composite figure. With the
data frame df2_amb
, the first function will iterate through
each subject’s data at each condition, creating 27 plots (9 subjects x 3
conditions). With the data frame df2_avg
, another
lapply()
function will iterate through the average data at
each condition, generating 3 plots (3 conditions). These outputs will
then be combined and stored in a single object, which will then be used
as input by the layout function sm_put_together()
to create
a composite plot of 30 subplots, all of which will have the same x- and
y-axes limits.
To generate a plot for each subject at each condition using
df2_amb
, a nested lapply()
structure should be
used, with one inner function and one outer function (as described in
Example 2). Data can be filtered similarly as in Example 2 during each
iteration. The structure of the nested functions will determine the
order in which the figure outputs will be created and stored in the
output (i.e., indv_plots
). In this example, the first three
outputs will be plots using data from the first element of
subj_list
and all three elements from
cond_list
because the latter has been used to iterate
through the inner lapply()
function. Aesthetics can also be
dynamically controlled using the programmatic approach. For example,
different shapes and colors can be set to represent each condition, as
specified by the order of objects shape_list
and
cond_cList
. The figures that are generated from this
lapply()
function are stored in the indv_plots
object.
In Example 1, we annotated each panel with the subject’s identifier
using the function sm_panel_label()
because we had
forgotten to include codes that add panel label inside the nested
lapply()
code fragment (Figures 5 & 6). Here, we use an
alternative method; the annotation label on each panel is defined within
the lapply()
code fragment that generates the individual
panels in sequence. Specifically, this can be performed using the
function annotate()
, which is from the ggplot2
package. We specify its annotation type as text
for the
first input, and the label
as each subject’s identifier by
concatenating the string A
with the index of the subject
during each iteration (iSubj
) with the function
paste0()
. The coordinates of x
and
y
are set in the units of the data that are plotted so that
the label annotations are on the top-left of each panel. The argument
hjust
aligns the text to the left because we have set it as
0. If hjust
is set to 1, the text label will be aligned to
the right. After writing the codes, readers can check if the
indv_plots
object correctly stores each subject’s plot at
each condition with the panel label of each subject’s identifier (ex. A1
in the first plot of indv_plots
).
# Figure 8 - Generate a subplot for each condition's average across subjects
avg_plots_amblyopia <- lapply(1:length(cond_list), function(iCond) {
# First part: Filter average data for each condition
currData <- df2_avg %>%
filter(Group == "Amblyopia") %>%
filter(Condition == cond_list[iCond])
# Second part: plot the filtered data
ggplot(data = currData, aes(x = logSF, y = avgBP)) +
geom_area(fill = cond_cList[[iCond]], alpha = 0.25) +
geom_line(linewidth = 1, color = cond_cList[[iCond]]) +
geom_point(
size = 5, shape = shape_list[[iCond]], color = "white",
fill = cond_cList[[iCond]], stroke = 1
) +
geom_linerange(aes(ymin = avgBP - stdErr, ymax = avgBP + stdErr),
linewidth = 1, color = cond_cList[[iCond]]
) +
sm_hgrid() +
scale_y_continuous(limits = c(0, 3)) +
scale_x_continuous(
limits = c(-1.3, 3.3),
labels = c(0.5, 1, 2, 4, 8)
) +
annotate("text",
label = "Average", x = -0.9, y = 2.65, size = 5.5,
hjust = 0, fontface = "bold"
)
})
Next, we construct codes that generate plots using the average data
(from df2_avg
) at each condition. This requires a single
lapply()
structure, looping through each condition. Data
can be filtered similarly as in the previous examples. For the average
panels, we establish the aesthetics so that the points have white border
lines. thereby accentuating the error bars. Also, we add annotations
with the bolded text 'Average'
. So, there will be three
iterations total from this lapply()
function. The order of
the figure outputs will follow the order of the elements in
cond_list
. The three output figures are stored in the
object avg_plots_amblyopia
.
In the lapply()
function, as that in Example 2, the
average data are plotted as points using geom_point()
, the
lines that connect the points are drawn using geom_line()
,
the areas below the lines are filled with colors and some transparency
using geom_area()
, and the range of standard error across
subjects is displayed in vertical lines using
geom_linerange()
. The ticks and limits of both x- and
y-axes are set to be consistent across panels.
# Figure 8 - Put together all subplots from individual and average data
all_plots1 <- list(indv_plots, avg_plots_amblyopia) # Combine all plot outputs in a list
composite_plot <- sm_put_together(
all_plots = all_plots1,
title = "Individual and average data (two separate functions)",
xlabel = "Spatial frequency (c/deg)",
ylabel = "Visual deficit",
ncol = 6, nrow = 5, wmargin = -5, hmargin = -5,
labelRatio = 0.95 # Text size of the axes' label
)
We then combine two objects (indv_plots
and
avg_plots_amblyopia
) from the two lapply()
structures into a single object (all_plots1
) using the
function list()
. The object all_plots1
will
then used as input for sm_put_together()
. Notice that since
indv_plots
list has been generated from a nested
lapply()
structure, each of the nine elements in the list
contains three plots (hence, 27 plots total). Conversely,
avg_plots_amblyopia
is from a single lapply()
function, so there are three elements in the list, and each element
stores one plot (hence, three plots total). In other words, these two
lists have different underlying structures. However, this is not an
issue because sm_put_together()
will automatically
flatten different structures of list (ex. list of list) into a
uniform list structure, thereby making it easier for users to use the
function when they have used multiple, separate lapply()
structures to generate numerous subplots.