Chapter 4 Manual and Examples of smplot

This chapter is a manual for smplot; it includes numerous examples. It also includes tutorials about sm_bar(), sm_bland_altman() and sm_raincloud(), all of which are not mentioned in the preceding chapters. However, this chapter does not describe sm_effsize(), sm_power() and sm_common_axis(); these functions are described in Chapters 5-7.

  • If you are not sure about any of the functions, please type ? before the function names, ex. ?sm_bar.

4.1 Installation of the Package

  • The smplot package is NOT available on CRAN yet. So, you will need to download it directly from my github for now.
  • To install it, please type in the R console:
install.packages('devtools')
devtools::install_github('smin95/smplot')
  • To use the package, load it:
library(smplot)

What is smplot?

  • smplot is a package that provides functions that visually improve graphs produced from ggplot2.
    • So it does not work with plots made from base R.
  • It was first developed in May 2021.
  • It is free and open source (https://github.com/smin95/smplot).

4.2 Colors and Graph Themes of smplot

smplot’s color palette

  • Its color palette can be accessed via two functions: sm_color() and sm_palette().
  • sm_color() accepts the character string of the color name.
  • sm_palette() accepts the number of colors (up to 20) and returns the hex codes accordingly.
smplot's color palette

Figure 4.1: smplot’s color palette

  • For example, if you want blue and red, just type the input like this:
sm_color('blue','red')
## [1] "#1262b3" "#cc3d3d"
  • But, do not form a single vector that contains two characters. If so, sm_color() will only return the first color.
sm_color(c('blue','red'))
## Warning in if (color == "blue") return("#1262b3"): the condition has length > 1 and
## only the first element will be used
## [1] "#1262b3"
  • If you need 5 colors, you can use sm_palette().
sm_palette(5)
## [1] "#cc1489" "#1262b3" "#5b4080" "#e57717" "#0f993d"

smplot’s graph themes

  • There are several graph themes. The text positions and the font are all similar.
  • sm_corr_theme() and sm_hvgrid() are equivalent. They have major horizontal and vertical grids.
  • sm_hgrid() has major horizontal grids.This is the default theme of sm_raincloud(), which creates a raincloud plot.
  • sm_minimal has no major grid. This is useful when a graph has a lot of annotation, such as texts and arrows.
  • sm_slope_theme() is a theme for a slope chart. It removes everything except the y-axis. This is the default theme of sm_slope().
  • sm_classic() has no grid. This is useful for all types of plots, such as bar graph, correlation plot and Bland-Altman plot. It has a typical x-axis and y-axis without panel borders. The Bland-Altman plot created using smplot uses sm_classic() automatically.
  • sm_vgrid() has major vertical grids.
  • All of these functions, except for sm_slope_theme() and sm_minimal(), have two arguments: borders and legends. The two exceptions only have the argument legends.
    • For some of these functions, the defaults are set to borders = TRUE and legends = TRUE.
    • You can check the defaults by typing ? in front of the function. Ex: ?sm_corr_theme
  • Let’s explore these themes in depth. First, let’s create p1, which has the default theme of ggplot2.
library(tidyverse)
p1 <- ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = class)) + 
  geom_point(size = 2)
p1

  • Now we can change the theme using sm_corr_theme().
p1 + sm_corr_theme()

  • We can also remove borders and legends by setting them as FALSE.
p2 <- p1 + sm_corr_theme(borders = FALSE, legends = FALSE)
p2

  • You can also apply smplot’s colors by using scale_color_manual().
p2 + scale_color_manual(values = sm_palette(7))

  • We can also apply sm_hgrid().
p1 + sm_hgrid()

  • Let’s try sm_vgrid().
p1 + sm_vgrid()

  • You can remove the legend as shown below.
p1 + sm_vgrid(legends = FALSE)

  • sm_classic() is one of my favourites!
p1 + sm_classic()

  • You can add the legend as shown below.
p1 + sm_classic(legends = TRUE)

  • Another choice is sm_minimal().
p1 + sm_minimal()

  • You can choose to include the legend in sm_minimal().
p1 + sm_minimal(legends = TRUE)

4.3 Correlation Plot

  • sm_corr_theme() and sm_statCorr() can be used as a pair when plotting a correlation.

  • This is the plot using the default theme of ggplot2.

p1 <- ggplot(data = mtcars, mapping = aes(x = drat, y = mpg)) +
  geom_point(shape = 21, fill = sm_color('green'), color = 'white', size = 3) 
p1

  • The next plot uses sm_corr_theme() to apply the smplot’s theme and sm_statCorr() to print linear regression slope and statistical results from a paired correlation test (Pearson’s).
  • Important: sm_statCorr() recognizes the data for the y- and x-axes from the mapping = aes() in ggplot() function.
    • There is no mapping argument in sm_statCorr().
p1 + sm_corr_theme() + 
  sm_statCorr(color = sm_color('green'))

  • You can also change the line_type to 'solid' in sm_statCorr(), and change the location of the printed texts by using label_x and label_y arguments.
  • You can also change the font size of the printed texts by setting text_size to a larger numerical value.
p1 + sm_corr_theme() + 
  sm_statCorr(color = sm_color('green'),
               line_type = 'solid',
               label_x = 3.7,
               label_y = 11,
               text_size = 5)

4.3.1 Data frame for a correlation plot

  • Column 1 has to be the data for x-axis.

  • Column 2 has to be the data for y-axis.

  • This structure of the data frame is slightly different from that is typically used in ggplot2 and smplot functions (ex. sm_boxplot(), sm_bar(), sm_violin() and sm_raincloud()).

  • Correlation plot and a bar plot requires a different data frame structure.

# Example
set.seed(11) # generate random data
method1 = c(rnorm(19,0,1),2.5)
method2 = c(rnorm(19,0,1),2.5)
Subject <- rep(paste0('S',seq(1:20)), 2)
Data <- data.frame(Value = matrix(c(method1,method2),ncol=1))
Method <- rep(c('Method 1', 'Method 2'), each = length(method1))
df_general <- cbind(Subject, Data, Method) # used for sm_bar(), sm_boxplot(), sm_violin(), etc

df_corr <- data.frame(first = method1, second = method2) # used for correlation
  • We have created two data frames: df_general and df_corr. Let’s take a look at their structures.
head(df_general)
##   Subject       Value   Method
## 1      S1 -0.59103110 Method 1
## 2      S2  0.02659437 Method 1
## 3      S3 -1.51655310 Method 1
## 4      S4 -1.36265335 Method 1
## 5      S5  1.17848916 Method 1
## 6      S6 -0.93415132 Method 1
  • Notice that df_general has three columns. The first column is subject, second column is data (i.e., Value) and third column is measurement group.
head(df_corr)
##         first      second
## 1 -0.59103110 -0.65571812
## 2  0.02659437 -0.68251762
## 3 -1.51655310 -0.01585819
## 4 -1.36265335 -0.44260479
## 5  1.17848916  0.35255750
## 6 -0.93415132  0.07317058
  • Notice that df_corr has two columns, each of which represents a measurement group.
# correlation plot using data frame 'df_corr' 
ggplot(data = df_corr, mapping = aes(x = first, y = second)) + 
  geom_point(shape = 21, fill = sm_color('crimson'), color = 'white',
             size = 3) + sm_corr_theme(borders = FALSE) +
  scale_y_continuous(limits = c(-2.5,2.5)) +
  scale_x_continuous(limits = c(-2.5,2.5)) +
  sm_statCorr(color = sm_color('crimson'), corr_method = 'pearson',
              label_x = -2.2, label_y = 2.3) +
  ggtitle('Correlation plot') + 
  xlab('Method 1') + ylab('Method 2') 

# bar graph using data frame 'df_general'
ggplot(data = df_general, mapping = aes(x = Method, y = Value, fill = Method)) +
  sm_bar(shape = 21, color = 'white', bar_fill_color = 'gray80') +
  scale_fill_manual(values = sm_color('crimson','green'))

4.3.2 Correlation plot with both regression and reference lines

  • You can also add a reference line (slope = 1) in a correlation plot.
  • This can be done with geom_abline(). In this example, the reference line’s slope is set to 1 and it has a dashed line style.
# correlation plot using data frame 'df_corr' 
ggplot(data = df_corr, mapping = aes(x = first, y = second)) + 
  geom_point(shape = 21, fill = sm_color('crimson'), color = 'white',
             size = 3) + sm_corr_theme(borders = FALSE) +
  geom_abline(slope = 1, linetype = 'dashed') +
  scale_y_continuous(limits = c(-2.8,2.8),  expand = c(0,0)) +
  scale_x_continuous(limits = c(-2.8,2.8), expand = c(0,0)) +
  sm_statCorr(color = sm_color('crimson'), corr_method = 'pearson',
              label_x = -2.2, label_y = 2.3) +
  ggtitle('Correlation plot') + 
  xlab('Method 1') + ylab('Method 2') 

4.4 Boxplot - sm_boxplot()

  • sm_boxplot() generates a boxplot and individual points at the same time.
  • It automatically uses sm_hgrid() as its default theme.
  • It also has arguments borders and legends, both of which have been set to FALSE as defaults.
  • First, let’s generate some random data.
set.seed(1) # generate random data
day1 = rnorm(16,0,1)
day2 = rnorm(16,5,1)
Subject <- rep(paste0('S',seq(1:16)), 2)
Data <- data.frame(Value = matrix(c(day1,day2),ncol=1))
Day <- rep(c('Day 1', 'Day 2'), each = length(day1))
df <- cbind(Subject, Data, Day)
  • Now, let’s make a boxplot using sm_boxplot().
# a boxplot with the random data, all black points
ggplot(data = df, mapping = aes(x = Day, y = Value)) +
  sm_boxplot(fill = 'black')

  • Now let’s apply different color for each Day.
# a boxplot with different colored points
ggplot(data = df, mapping = aes(x = Day, y = Value, fill = Day)) +
  sm_boxplot(shape = 21, color = 'white') +
  scale_fill_manual(values = sm_color('blue','orange'))

  • You can change the shape of the boxplot by setting notch = TRUE. You can also change the size of the individual points using point_size argument.
  • A notched boxplot shows the confidence interval around the median (+/- 1.58 * interquartile range / sqrt(n)).
    • The notches are used for group comparison.
    • If the notch of each box does not overlap, there is a strong likelihood that the medians are significantly different between groups.
ggplot(data = df, mapping = aes(x = Day, y = Value, fill = Day)) +
  sm_boxplot(shape = 21, point_size = 4, notch = 'TRUE', alpha = 0.5) +
  scale_fill_manual(values = sm_color('blue','orange'))

4.4.1 Plotting individual points with unique colors

  • One can also use sm_boxplot() to plot individual points with unique colors.
  • However, sm_boxplot() cannot print distinct box colors across distinct x levels (i.e., in this example, all boxes are grey). This is because I think that it is not a good practice to print different colors of boxes when each point has unique color as well.
ggplot(data = df, mapping = aes(x = Day, y = Value, fill = Subject)) +
  sm_boxplot(shape = 21, color = 'white') +
  scale_fill_manual(values = sm_palette(16))

4.5 Violin Plot - sm_violin()

  • sm_violin() plots a violin plot, individual points and lines that indicate means and +/- 1 standard deviation at the same time.
  • It is very similar to sm_boxplot() except there is no option for notch = TRUE in sm_violin().
  • Also sm_violin() uses both color (for the lines of mean and SD) and fill (for the colors of the points) arguments.
  • The default border color of the points is white.
  • sm_violin() automatically uses sm_hgrid() as its default theme.
  • It also has arguments borders and legends, both of which have been set to FALSE as defaults.
# a violin plot with the random data, all black points and lines
ggplot(data = df, mapping = aes(x = Day, y = Value)) +
  sm_violin(fill = 'black') 

# a violin plot with different colored points and lines
ggplot(data = df, mapping = aes(x = Day, y = Value, color = Day)) +
  sm_violin() + 
  scale_color_manual(values = sm_color('blue','orange'))

4.5.1 Plotting individual points with unique colors

  • One can also use sm_violin() to plot individual points with unique colors.
  • The x-level has to be grouped in the aesthetics (ex. group = Day).
  • But sm_violin() cannot print distinct violin colors across distinct x levels (i.e., in this example, all violins are grey). This is because it is aesthetically distracting to assign different colors of the violin across distinct x-levels.
ggplot(data = df, mapping = aes(x = Day, y = Value, fill = Subject,
                                group = Day)) +
  sm_violin(shape = 21, color = 'white', point_alpha = 0.6) + 
  scale_fill_manual(values = sm_palette(16)) 

ggplot(data = df, mapping = aes(x = Day, y = Value, fill = Subject,
                                group = Day, color = Day)) +
  sm_violin(shape = 21, color = 'white', point_alpha = 0.6) + 
  scale_fill_manual(values = sm_palette(16)) +
  scale_color_manual(values = sm_color('blue', 'orange'))

4.6 Bar Plot - sm_bar()

  • sm_bar() automatically uses sm_bar_theme() / sm_hgrid().
  • It also has arguments borders and legends, both of which have been set to FALSE as defaults.
  • Let’s use data (df) we generated before.
ggplot(data = df, mapping = aes(x = Day, y = Value, fill = Day)) +
  sm_bar(shape = 21, color = 'white', bar_fill_color = 'gray80') +
  scale_fill_manual(values = sm_color('blue','orange'))

  • In this case, the error bar represents standard error. If you prefer to show standard deviation, then you should set errorbar_type = 'sd' in sm_bar().
ggplot(data = df, mapping = aes(x = Day, y = Value, fill = Day)) +
  sm_bar(shape = 21, color = 'white', bar_fill_color = 'gray80', errorbar_type = 'sd') +
  scale_fill_manual(values = sm_color('blue','orange'))

  • 95% confidence interval can also be displayed with errorbar_type = 'ci'.
ggplot(data = df, mapping = aes(x = Day, y = Value, fill = Day)) +
  sm_bar(shape = 21, color = 'white', bar_fill_color = 'gray80', errorbar_type = 'ci') +
  scale_fill_manual(values = sm_color('blue','orange'))

  • You can also make the bar slightly transparent by setting bar_alpha to less than 1.
ggplot(data = df, mapping = aes(x = Day, y = Value, fill = Day)) +
  sm_bar(shape = 21, color = 'white', bar_fill_color = 'gray80', 
         errorbar_type = 'ci', bar_alpha = 0.7) +
  scale_fill_manual(values = sm_color('blue','orange'))

4.6.1 Plotting individual points with unique colors

  • One can also use sm_bar() to plot individual points with unique colors.
ggplot(data = df, mapping = aes(x = Day, y = Value, color = Subject)) +
  sm_bar(bar_fill_color = 'gray80') +
  scale_color_manual(values = sm_palette(16)) 

  • sm_bar() can also print distinct box colors across distinct x levels.
ggplot(data = df, mapping = aes(x = Day, y = Value, color = Subject,
                                fill = Day)) +
  sm_bar() +
  scale_color_manual(values = sm_palette(16)) +
  scale_fill_manual(values = sm_color('yelloworange','skyblue'))

4.7 Slope Chart - sm_slope()

  • sm_slope() plots a slope chart.
  • It also has the argument legends, which has been set to FALSE as the default.
  • A slope chart is useful to describe changes between two different timepoints for each measurement (ex. a participant).
  • It automatically uses sm_slope_theme().
  • Let’s use df that we generated before.
  • Important: To make this function work, the mapping within ggplot() has to have a certain structure.
    • x- and y-axes have to be defined.
    • A slope chart groups each observation (ex. Subject) across x-axis. This has to be specified in mapping as group =.
  • The x-axis cannot be continuous. It has to be discrete. So, it should take the form of character or factor (ex. ‘One’, ‘Two’, ‘Three’). If x-axis only has number (i.e., double form, such as 1.02, 1.05, 1.5), then sm_slope() will produce an error.
  • labels argument is required to use sm_slope(). This refers to the labels of the ticks in the x-axis. Ex. labels = c('Day 1', 'Day 2').
ggplot(data = df, mapping = aes(x = Day, y = Value, group = Subject)) +
  sm_slope(labels = c('Day 1', 'Day 2'))

  • Let’s set the shape to 21.
  • Let’s make the border color to white.
  • Let’s apply the same color to each Day.
ggplot(data = df, mapping = aes(x = Day, y = Value, group = Subject)) +
  sm_slope(labels = c('Day 1','Day 2'), shape = 21, color = 'white', fill = sm_color('blue'))

  • You could also apply different color for each Day using scale_fill_manual().
ggplot(data = df, mapping = aes(x = Day, y = Value, group = Subject, fill = Day)) +
  sm_slope(labels = c('Day 1','Day 2'), shape = 21, color = 'white') +
  scale_fill_manual(values = sm_color('blue','orange'))

  • You can also change the line color and other aesthetics. For more information, please type ?sm_slope.
ggplot(data = df, mapping = aes(x = Day, y = Value, group = Subject, fill = Day)) +
  sm_slope(labels = c('Day 1','Day 2'), shape = 21, color = 'white',
           fill = sm_color('blue'), line_color = '#bfd5db',
           line_size = 0.6)

4.8 A Bland Altman Plot - sm_bland_altman()

  • sm_bland_altman() and sm_statBlandAlt() functions can be used to create a Bland-Altman plot.
  • The plot is used to measure agreement between two different measurements.
  • It is also used to measure test-retest variability of a method.
  • Let’s generate random data.
set.seed(1)
first <- rnorm(20)
second <- rnorm(20)
df3 <- as_tibble(cbind(first,second)) # requires library(tidyverse)
  • Now let’s draw a Bland Altman plot using sm_bland_altman(), which requires two arguments: first dataset, second dataset. They have to be numerical vectors of equal length.
    • This function automatically uses sm_classic() theme.
sm_bland_altman(df3$first, df3$second, color = sm_color('green')) + 
  scale_y_continuous(limits = c(-4,4))

  • The upper dashed line represents the upper limit of the difference between two measurements (mean difference + 1.96 * standard deviation of the difference).
  • The lower dashed line represents the lower limit of the difference between two measurements (mean difference - 1.96 * standard deviation of the difference).
  • The middle dashed line represents the mean difference.
  • The shaded region is the 95% confidence interval of the difference between the two measurements from one-sample t-test (difference vs 0).
    • If the shaded region includes 0 in the y-axis, then there is no significant difference (p > 0.05) between 0 and the difference.
    • If it does not include 0, then there is a significant difference. This indicates that the two measurement results are considerably different.
  • I usually label them with annotate(), which is a function from ggplot2. This process can be tedious.
  • Also, sm_statBlandAlt() calculates the statistical values that are necessary to draw a Bland-Altman plot, such as the mean difference, upper and lower limits. This function is used to annotate the values in the plot.
    • The arguments for this function are first and second datasets, just like in sm_bland_altman().
res <- sm_statBlandAlt(df3$first,df3$second) # store the results in res variable 

sm_bland_altman(df3$first, df3$second, color = sm_color('green')) + 
  scale_y_continuous(limits = c(-4,4)) +
  annotate('text', label = 'Mean', x = -1, y = res$mean_diff + 0.4) +
  annotate('text', label = signif(res$mean_diff,3), x = -1, y = res$mean_diff - 0.4) +
  annotate('text', label = 'Upper limit', x = 1.2, y = res$upper_limit + 0.4) +
  annotate('text', label = signif(res$upper_limit,3), x = 1.2, y = res$upper_limit - 0.4) +
  annotate('text', label = 'Lower limit', x = 1.2, y = res$lower_limit + 0.4) +
  annotate('text', label = signif(res$lower_limit,3), x = 1.2, y = res$lower_limit-0.4)

  • Let’s change the border color of the circles to white. To do so, we will have to change their shape to 21.
sm_bland_altman(df3$first, df3$second, shape = 21, fill = sm_color('green'), color = 'white') + 
  scale_y_continuous(limits = c(-4,4)) +
  annotate('text', label = 'Mean', x = -1, y = res$mean_diff + 0.4) +
  annotate('text', label = signif(res$mean_diff,3), x = -1, y = res$mean_diff - 0.4) +
  annotate('text', label = 'Upper limit', x = 1.2, y = res$upper_limit + 0.4) +
  annotate('text', label = signif(res$upper_limit,3), x = 1.2, y = res$upper_limit - 0.4) +
  annotate('text', label = 'Lower limit', x = 1.2, y = res$lower_limit + 0.4) +
  annotate('text', label = signif(res$lower_limit,3), x = 1.2, y = res$lower_limit-0.4)

4.9 Raincloud plot - sm_raincloud()

  • sm_raincloud() generates a raincloud plot.
  • It also has arguments borders and legends. The the default of the former has been set as TRUE and that of the latter as FALSE.
  • A raincloud plot is a combination of jittered points, a boxplot and a violin plot.
  • However, this plot can be visually crowded. Some people like to use raincloud plots, some do not. So, the choice to use it is entirely yours.
  • Let’s generate some random data.
set.seed(2) # generate random data
day1 = rnorm(20,0,1)
day2 = rnorm(20,5,1)
day3 = rnorm(20,6,1.5)
day4 = rnorm(20,7,2)
Subject <- rep(paste0('S',seq(1:20)), 4)
Data <- data.frame(Value = matrix(c(day1,day2,day3,day4),ncol=1))
Day <- rep(c('Day 1', 'Day 2', 'Day 3', 'Day 4'), each = length(day1))
df2 <- cbind(Subject, Data, Day)
  • The x-axis variable column has to have the right level. If not, you should convert the column as factor and establish the levels correctly.
  • Now let’s draw a raincloud plot using sm_raincloud().
sm_raincloud(data = df2, x = Day, y = Value) 

  • Let’s change the x-axis labels.
sm_raincloud(data = df2, x = Day, y = Value) +
  scale_x_continuous(limits = c(0.25,4.75), labels = c('First', 'Second', 'Third', 'Fourth'), breaks = c(1,2,3,4)) +
  xlab('Day') 

  • The filling colors of the violin plots and boxplots can be modified by using scale_fill_manual().
  • The border color of the violin plot can be changed by using scale_color_manual().
    • I will set it transparent to remove the border of the violin plots.
  • The color of the points can be used by either of the 2 functions depending on the shape, which can be set within sm_raincloud().
sm_raincloud(data = df2, x = Day, y = Value, boxplot_alpha = 0.5, 
              color = 'white', shape = 21, sep_level = 2) +
  scale_x_continuous(limits = c(0.25,4.75), labels = c('One', 'Two', 'Three', 'Four'), breaks = c(1,2,3,4)) +
  xlab('Day') +
  scale_color_manual(values = rep('transparent',4)) + 
  scale_fill_manual(values = sm_palette(4))

  • sep_level is an argument to specify the degree of separation among points, boxplots and violin plots. When sep_level = 0, they will all be crowded. When sep_level = 4, they will all be separated from each other.
    • I personally prefer when the boxplot and violin plots are together, but not the points. So I set the default to sep_level = 2.
    • Shown below is an example when sep_level = 4 with a grid theme sm_minimal().
sm_raincloud(data = df2, x = Day, y = Value, boxplot_alpha = 0.5, 
              color = 'white', shape = 21, sep_level = 4) +
  scale_x_continuous(limits = c(0.25,4.75), labels = c('1', '2', '3', '4'), breaks = c(1,2,3,4)) +
  xlab('Day') +
  scale_color_manual(values = rep('transparent',4)) + 
  scale_fill_manual(values = sm_palette(4)) +
  sm_minimal()

  • You can also flip the raincloud plot by setting which_side to left.
sm_raincloud(data = df2, x = Day, y = Value, boxplot_alpha = 0.5, 
              color = 'white', shape = 21, sep_level = 2, which_side = 'left') +
  scale_x_continuous(limits = c(0.25,4.75), labels = c('1', '2', '3', '4'), breaks = c(1,2,3,4)) +
  xlab('Day') +
  scale_color_manual(values = rep('transparent',4)) + 
  scale_fill_manual(values = sm_palette(4)) +
  sm_minimal()

  • So far the distribution plots (violin plots) have been vertical. We can change their configuration by setting vertical = FALSE.
sm_raincloud(data = df2, x = Day, y = Value, boxplot_alpha = 0.5, 
              color = 'white', shape = 21, sep_level = 2, which_side = 'left', vertical = FALSE) +
  scale_x_continuous(limits = c(0.25,4.75), labels = c('1', '2', '3', '4'), breaks = c(1,2,3,4)) +
  xlab('Day') +
  scale_color_manual(values = rep('transparent',4)) + 
  scale_fill_manual(values = sm_palette(4)) +
  sm_minimal()

  • The orientation is not correct, so let’s change it by setting which_side = 'right'.
sm_raincloud(data = df2, x = Day, y = Value, boxplot_alpha = 0.5, 
              color = 'white', shape = 21, sep_level = 2, which_side = 'right', vertical = FALSE) +
  scale_x_continuous(limits = c(0.25,4.75), labels = c('1', '2', '3', '4'), breaks = c(1,2,3,4)) +
  xlab('Day') +
  scale_color_manual(values = rep('transparent',4)) + 
  scale_fill_manual(values = sm_palette(4))

4.10 Forest plot - sm_forest()

  • As the name suggests, sm_forest() draws a forest plot.
  • It has very similar arguments to those in sm_raincloud().
    • sep_level sets the separation level between the average point and individual points. When sep_level = 0, they overlap. When sep_level = 3, they are separated. The values range from 0 to 3.
  • The shape and the color of the average point can be set by arguments avg_point_shape and avg_point_size.
  • The range of the error bar can be either at top or bottom, which could be set by the argument range_side.
  • The type of the error bar can be set using the argument errorbar_type, the default of which is ci, short for confidence interval (95%).
  • borders and legends can be used to remove the borders and legends as you can do in sm_raincloud().
  • Most importantly, x is required to be a continuous variable. y should be a categorical variable (i.e., factor).

First, generate random numbers.

set.seed(2) # generate random data
day1 = rnorm(20,0,1)
day2 = rnorm(20,5,1)
day3 = rnorm(20,6,1.5)
day4 = rnorm(20,7,2)
Subject <- rep(paste0('S',seq(1:20)), 4)
Data <- data.frame(Value = matrix(c(day1,day2,day3,day4),ncol=1))
Day <- rep(c('Day 1', 'Day 2', 'Day 3', 'Day 4'), each = length(day1))
df2 <- cbind(Subject, Data, Day)

When drawing a forest plot, the ratio of the figure is very important, and we will do an example until we save it properly. x here is set to be the value, and y the categorical variable. In sm_raincloud, it was the opposite.

sm_forest(data = df2, x = Value, y = Day) 

Therefore, the x and y scales have been flipped compared to the raincloud plots above. Notice that sm_forest() automatically calculates the mean, 95% confidence interval and plots individual points as well.

We can further customise the aesthetics.

sm_forest(data = df2, x = Value, y = Day, alpha = 0.2,
           avg_point_shape= 23, shape = 21) +
  scale_color_manual(values = rep('transparent',4)) +
  scale_fill_manual(values = sm_palette(4))

By changing the shape of the average point to 23 (diamond with outline), we can include the line scale_color_manual(values = rep('transparent',4)).

You can also change jitter_width and sep_level.

sm_forest(data = df2, x = Value, y = Day, alpha = 0.2,
           avg_point_shape= 23, shape = 21,
          jitter_width = 0.1, sep_level = 0) +
  scale_color_manual(values = rep('transparent',4)) +
  scale_fill_manual(values = sm_palette(4)) +
  ylab('Day') +
  ggtitle('Forest plot')

Now we can save the plot as an image file. First store the plot into a variable, then set the correct width (base_width = 3) and height (base_height = 5) using save_plot().

forest_plot <- sm_forest(data = df2, x = Value, y = Day, alpha = 0.2,
           avg_point_shape= 23, shape = 21,
          jitter_width = 0.15, sep_level = 0) +
  scale_color_manual(values = rep('transparent',4)) +
  scale_fill_manual(values = sm_palette(4)) +
  ylab('Day') +
  ggtitle('Forest plot')

save_plot('forest.png', forest_plot,  base_height = 5,
          base_width = 3)

You can also set errorbar_type = sd so that the the error bar and the text denote the standard deviation of each sample.

sm_forest(data = df2, x = Value, y = Day, alpha = 0.2,
           avg_point_shape= 23, shape = 21,
          jitter_width = 0.1, sep_level = 0,
          errorbar_type = 'sd') +
  scale_color_manual(values = rep('transparent',4)) +
  scale_fill_manual(values = sm_palette(4)) +
  ylab('Day') +
  ggtitle('Forest plot')

4.11 Overriding Defaults of smplot’s Themes

  • You can override all the defaults by adding theme() object to your ggplot2 graph.

  • Here is a bar graph.

ggplot(data = df, mapping = aes(x = Day, y = Value, fill = Day)) +
  sm_bar(shape = 21, color = 'white', bar_fill_color = 'gray80') +
  scale_fill_manual(values = sm_color('blue','orange'))

  • Now let’s remove the x-axis title Day.
ggplot(data = df, mapping = aes(x = Day, y = Value, fill = Day)) +
  sm_bar(shape = 21, color = 'white', bar_fill_color = 'gray80') +
  scale_fill_manual(values = sm_color('blue','orange')) +
  theme(axis.title.x = element_blank())

  • Let’s customise the graph more by changing the y-axis title and adding a main title.
ggplot(data = df, mapping = aes(x = Day, y = Value, fill = Day)) +
  sm_bar(shape = 21, color = 'white', bar_fill_color = 'gray80') +
  scale_fill_manual(values = sm_color('blue','orange')) +
  theme(axis.title.x = element_blank()) +
  ylab('Reading speed') +
  ggtitle('Reading performance in children')

4.11.1 Scaling the y-axis

A bar plot is shown below using this random data.

set.seed(1) # generate random data
day1 = abs(rnorm(16,1,1))
day2 = abs(rnorm(16,5,1))
Subject <- rep(paste0('S',seq(1:16)), 2)
Data <- data.frame(Value = matrix(c(day1,day2),ncol=1))
Day <- rep(c('Day 1', 'Day 2'), each = length(day1))
df <- cbind(Subject, Data, Day)

However, there is something weird going on regarding the y-axis limit.

ggplot(data = df, mapping = aes(x = Day, y = Value, fill = Day)) +
  sm_bar(shape = 21, color = 'white', bar_fill_color = 'gray80') +
  scale_fill_manual(values = sm_color('blue','orange'))

The y-axis limit does not start from 0. Let’s specify the y-axis limit manually using scale_y_continuous().

ggplot(data = df, mapping = aes(x = Day, y = Value, fill = Day)) +
  sm_bar(shape = 21, color = 'white', bar_fill_color = 'gray80') +
  scale_fill_manual(values = sm_color('blue','orange')) +
  scale_y_continuous(limits = c(0,7))

Although we have specified that the y-axis limit begins from 0 and ends at 7, there is still a small margin below 0. What is going on here?

The default of ggplot2 is that there is always a small margin below the lowest point of the y-axis limit and above the largest point of the y-axis.The empty space below 0 and above 7 can be removed by using expand = c(0,0) within scale_y_continuous().

ggplot(data = df, mapping = aes(x = Day, y = Value, fill = Day)) +
  sm_bar(shape = 21, color = 'white', bar_fill_color = 'gray80') +
  scale_fill_manual(values = sm_color('blue','orange')) +
  scale_y_continuous(limits = c(0,7), expand = c(0,0))

Note that expand = c(0,0) has already been used in this chapter where it discusses the correlation plot (with both regression and reference lines). You can plot it without expand = c(0,0) and see what happens.

# correlation plot using data frame 'df_corr' 
ggplot(data = df_corr, mapping = aes(x = first, y = second)) + 
  geom_point(shape = 21, fill = sm_color('crimson'), color = 'white',
             size = 3) + sm_corr_theme(borders = FALSE) +
  geom_abline(slope = 1, linetype = 'dashed') +
  scale_y_continuous(limits = c(-2.8,2.8)) +
  scale_x_continuous(limits = c(-2.8,2.8)) +
  sm_statCorr(color = sm_color('crimson'), corr_method = 'pearson',
              label_x = -2.2, label_y = 2.3) +
  ggtitle('Correlation plot') + 
  xlab('Method 1') + ylab('Method 2') 

Notice that even we have specified the x- and y-limits from -2.8 to 2.8, we still see -3 and 3. This is because ggplot2 default provides extra margin space. Also, the plot above is not pretty because the grid lines at the outer ends act as pseudo-borders. So it is best to remove them. We can do this with expand = c(0,0), which reduces the margin for both x and y axes.

# correlation plot using data frame 'df_corr' 
ggplot(data = df_corr, mapping = aes(x = first, y = second)) + 
  geom_point(shape = 21, fill = sm_color('crimson'), color = 'white',
             size = 3) + sm_corr_theme(borders = FALSE) +
  geom_abline(slope = 1, linetype = 'dashed') +
  scale_y_continuous(limits = c(-2.8,2.8),  expand = c(0,0)) +
  scale_x_continuous(limits = c(-2.8,2.8), expand = c(0,0)) +
  sm_statCorr(color = sm_color('crimson'), corr_method = 'pearson',
              label_x = -2.2, label_y = 2.3) +
  ggtitle('Correlation plot') + 
  xlab('Method 1') + ylab('Method 2') 

I think this looks much nicer.

4.12 Overriding Defaults of smplot Colors

  • sm_color('blue) prints a hex code of the blue. Likewise, sm_color('blue','orange') prints out two hex codes, one for blue and another for orange.

  • Therefore, instead of using sm_color() function to call forth the colors, you can directly write the hex codes of colors that are not included in smplot.

my_colors <- c('#ff1493', '#483d8B') # pink and lavender

ggplot(data = df, mapping = aes(x = Day, y = Value, fill = Day)) +
  sm_bar(shape = 21, color = 'white', bar_fill_color = 'gray80') +
  scale_fill_manual(values = my_colors) +
  scale_y_continuous(limits = c(0,7), expand = c(0,0.05))