# Chapter 6 Calculating Area under a Curve

In this chapter, we will continue to use data from this paper (PDF):

**Seung Hyun Min, Alex S. Baldwin and Robert F. Hess. Ocular dominance plasticity: A binocular combination task finds no cumulative effect with repeated patching (2019). Vision Research, 161, 36-42.**

In Chapter 5, we only dealt with data from Days 1 and 5. In this chapter, we will use data from Days 1, 2, 3, 4 and 5.

Let’s begin by loading the `tidyverse`

and other libraries, and uploading the csv file `min2019b.csv`

using `read_csv()`

from the `tidyverse`

package.

`head()`

returns the first 6 rows of the data frame. `tail()`

returns the last 6 rows of the data frame.

```
library(tidyverse)
library(smplot)
library(cowplot)
<- read_csv('https://www.smin95.com/min2019b.csv')
df head(df)
```

```
## # A tibble: 6 x 4
## Subject Day Time Cbratio
## <chr> <dbl> <dbl> <dbl>
## 1 a 1 0 -0.421
## 2 b 1 0 2.82
## 3 c 1 0 1.69
## 4 d 1 0 2.55
## 5 e 1 0 -0.217
## 6 f 1 0 0.626
```

`unique()`

returns unique values in the data set. Let’s see how many unique values there are in the `Day`

column of the data set. A column can be extracted by using `$`

.

```
$Day <- factor(df$Day)
dfunique(df$Day)
```

```
## [1] 1 2 3 4 5
## Levels: 1 2 3 4 5
```

Notice that there are now five different days in the data set.

As previously mentioned, there are four columns in this data frame:

First,

`Subject`

refers to each participant. There are 10 participants total.Next,

`Day`

refers to the day of testing. The participants were tested on Days 1, 2, 3, 4 and 5, ad we will using all data.`Time`

refers to the number of minutes after an experimental manipulation (ex. monocular deprivation). These are 0, 3, 6, 12, 24 and 48 minutes, but in the data frame, it says 0, 1, 2, 3, 4 and 5; we will change the labels manually.The

`Cbratio`

column refers to the actual data that will be used for data analysis.

## 6.1 Pipe `%>%`

Let’s plot data from Day 1 only.

To do so, we will need to first filter for data from Day 1 using `filter()`

, compute the average and standard error across subjects using `summarise()`

and `sm_stdErr()`

, and save the changed
format of the dataframe into a new variable using `<-`

or `->`

.

```
<- filter(df, Day == 1)
df_day1 <- group_by(df_day1, Time)
by_time_day1 <- summarise(by_time_day1, Average = mean(Cbratio),
avg_day1 StdError = sm_stdErr(Cbratio))
```

The steps above (`filter()`

, `group_by()`

and `summarise()`

) are coded separately. Each line has one of these steps. However, these can be combined in one line by using pipe `%>%`

as shown below.

```
<- df %>% filter(Day == 1) %>% group_by(Time) %>%
avg_day1 summarise(Average = mean(Cbratio),
StdError = sm_stdErr(Cbratio))
```

Now the steps (`filter()`

, `group_by()`

and `summarise()`

) are written from left-to-right in a continuous fashion, rather than a separate step.

Pipe `%>%`

is part of the `tidyverse`

package, and it is deemed a necessary tool for data analysis using R. It is extremely useful because it allows you to write codes that are easier to understand. Notice that if you use `%>%`

, the input does not have to written because the input is actually the code that is written just before. For example:

`%>% filter(Day == 1) df `

```
## # A tibble: 60 x 4
## Subject Day Time Cbratio
## <chr> <fct> <dbl> <dbl>
## 1 a 1 0 -0.421
## 2 b 1 0 2.82
## 3 c 1 0 1.69
## 4 d 1 0 2.55
## 5 e 1 0 -0.217
## 6 f 1 0 0.626
## 7 g 1 0 2.62
## 8 h 1 0 1.42
## 9 u 1 0 1.54
## 10 j 1 0 3.05
## # ... with 50 more rows
```

`filter(df, Day == 1)`

```
## # A tibble: 60 x 4
## Subject Day Time Cbratio
## <chr> <fct> <dbl> <dbl>
## 1 a 1 0 -0.421
## 2 b 1 0 2.82
## 3 c 1 0 1.69
## 4 d 1 0 2.55
## 5 e 1 0 -0.217
## 6 f 1 0 0.626
## 7 g 1 0 2.62
## 8 h 1 0 1.42
## 9 u 1 0 1.54
## 10 j 1 0 3.05
## # ... with 50 more rows
```

These two are identical. But notice that if you do not use pipe,
you might have to save the intermediate variables (such as `df_day1`

and `by_time_day1`

) multiple times, which can be tedious. Notice that when pipe `%>%`

is used, the need to save these variables disappears.

I will use `%>%`

from now on due to my personal preference.

## 6.2 Area under a curve

Let’s use the data set `avg_day1`

to plot data only from Day 1.

```
%>% ggplot(aes(x = Time, y = Average)) +
avg_day1 geom_area(fill = sm_color('skyblue'), alpha = 0.4) +
geom_point(size = 4.5, color = sm_color('skyblue')) +
geom_line(size = 1, color = sm_color('skyblue')) +
geom_errorbar(aes(ymin = Average - StdError, ymax = Average + StdError), size = .75, width = .05, color = sm_color('skyblue')) +
scale_x_continuous(breaks = unique(df$Time),
labels = c("0", "3", "6", "12", "24", "48")) +
sm_hgrid(legends = FALSE) +
ggtitle("Day 1") +
xlab("Minutes after monocular deprivation") +
ylab("\u0394 Contrast balance ratio (dB)") +
annotate('text', x = 16, y = 0.3, label = 'Area under curve')
```

One way to show data is to calculate the area under a curve (as shown by the colored area) in the figure above. Area under a curve is useful because it reduces a number of data points (6 in this case due to 6 time points) into one. In our context, a large area under a curve represents a large change in contrast balance ratio over time. In short, it captures both the magnitude and longevity of the effect.

Let’s calculate area under a curve for each subject and day across 0 to 48 minutes in the x-axis (minutes after monocular deprivation) using `sm_auc()`

.

`<- c(0,3,6,12,24,48) x `

On the x-axis of the graph, there are six time points. You will need to store these values in a new variable, which I will call `x`

here because it represents the x-axis. Instead of manually writing them down, you could extract the unique number of `Time`

from the data frame.

`<- unique(df$Time) # this is my preferred way x `

Now let’s filter for one subject’s data (subject `b`

) on Day 1 using `filter()`

and pipe `%>%`

.

```
<- df %>% filter(Day == 1 & Subject == 'b')
subj_b_day1 subj_b_day1
```

```
## # A tibble: 6 x 4
## Subject Day Time Cbratio
## <chr> <fct> <dbl> <dbl>
## 1 b 1 0 2.82
## 2 b 1 3 0.881
## 3 b 1 6 1.83
## 4 b 1 12 0.877
## 5 b 1 24 0.693
## 6 b 1 48 2.09
```

```
%>% ggplot(aes(x = Time, y = Cbratio)) +
subj_b_day1 geom_area(fill = sm_color('skyblue'), alpha = 0.4) +
geom_point(size = 4.5, color = sm_color('skyblue')) +
geom_line(size = 1, color = sm_color('skyblue')) +
scale_x_continuous(breaks = unique(df$Time),
labels = c("0", "3", "6", "12", "24", "48")) +
sm_hgrid(legends = FALSE) +
ggtitle("Day 1, Subject b") +
xlab("Minutes after monocular deprivation") +
ylab("\u0394 Contrast balance ratio (dB)") +
annotate('text', x = 16, y = 0.2, label = 'Area under curve = ?')
```

The variable `subj_b_day1`

now stores the data of subject `b`

on Day 1. Now let’s calculate the area under a curve (AUC) from `subj_b_day1`

using `sm_auc()`

.

`sm_auc()`

calculates the AUC using the method of trapezoid integration; this is equivalent to `trapz`

function in Matlab and `numpy.trapz`

in Python. It has two arguments:
- The first argument is the x point. In this case, it is the minutes after monocular deprivation (0, 3, 6, 12, 24 and 48). If this argument is left empty, it will use 0, 1, 2, 3 etc as default. For more information, please type `?sm_auc`

in the command console.
- The second argument is the data, which is shown by the y-axis in the blue graph. In this case, it is the change of contrast balance ratio (dB). These values can be extracted from `subj_b_day1`

using `$`

.

`sm_auc(x, subj_b_day1$Cbratio)`

`## [1] 60.5286`

The AUC equals 60.53. This shows that for subject `b`

, the effect is large and long-lasting.

```
%>% ggplot(aes(x = Time, y = Cbratio)) +
subj_b_day1 geom_area(fill = sm_color('skyblue'), alpha = 0.4) +
geom_point(size = 4.5, color = sm_color('skyblue')) +
geom_line(size = 1, color = sm_color('skyblue')) +
scale_x_continuous(breaks = unique(df$Time),
labels = c("0", "3", "6", "12", "24", "48")) +
sm_hgrid(legends = FALSE) +
ggtitle("Day 1, Subject b") +
xlab("Minutes after monocular deprivation") +
ylab("\u0394 Contrast balance ratio (dB)") +
annotate('text', x = 17.5, y = 0.2, label = 'Area under curve = 60.5')
```

Now let’s visualize the data of subject `b`

from Day 2.

```
<- df %>% filter(Day == 2 & Subject == 'b')
subj_b_day2
%>% ggplot(aes(x = Time, y = Cbratio)) +
subj_b_day2 geom_area(fill = sm_color('yelloworange'), alpha = 0.4) +
geom_point(size = 4.5, color = sm_color('yelloworange')) +
geom_line(size = 1, color = sm_color('yelloworange')) +
scale_x_continuous(breaks = unique(df$Time),
labels = c("0", "3", "6", "12", "24", "48")) +
sm_hgrid(legends = FALSE) +
ggtitle("Day 2, Subject b") +
xlab("Minutes after monocular deprivation") +
ylab("\u0394 Contrast balance ratio (dB)") +
annotate('text', x = 16, y = 0.2, label = 'Area under curve = ?')
```

Now, let’s calculate the AUC for subject `b`

from Day 2.

`sm_auc(x, subj_b_day2$Cbratio)`

`## [1] 80.38215`

The AUC equals 80.38. In the case of subject `b`

, the effect seems to have increased on Day 2 relative to Day 1.

```
<- df %>% filter(Day == 2 & Subject == 'b')
subj_b_day2
%>% ggplot(aes(x = Time, y = Cbratio)) +
subj_b_day2 geom_area(fill = sm_color('yelloworange'), alpha = 0.4) +
geom_point(size = 4.5, color = sm_color('yelloworange')) +
geom_line(size = 1, color = sm_color('yelloworange')) +
scale_x_continuous(breaks = unique(df$Time),
labels = c("0", "3", "6", "12", "24", "48")) +
sm_hgrid(legends = FALSE) +
ggtitle("Day 2, Subject b") +
xlab("Minutes after monocular deprivation") +
ylab("\u0394 Contrast balance ratio (dB)") +
annotate('text', x = 17.5, y = 0.2, label = 'Area under curve = 80.4')
```

So far,we have 9 more subjects across five days and subject `b`

from three other days. This will require us to compute 48 more AUCs total. This process is rater repetitive. Should we manually compute AUC for each one? Luckily, the answer is no. Let’s use `sm_auc_list()`

.

```
sm_auc_list(subjects = 'Subject', conditions = 'Day',
x = 'Time', values = 'Cbratio',
data = df)
```

`## [1] "AUC = Cbratio * Time"`

```
## Subject Day AUC_Cbratio
## 1 a 1 -2.36910
## 2 a 2 0.68265
## 3 a 3 -9.54795
## 4 a 4 27.12165
## 5 a 5 41.95680
## 6 b 1 60.52860
## 7 b 2 80.38215
## 8 b 3 94.68870
## 9 b 4 34.05225
## 10 b 5 86.51805
## 11 c 1 94.84035
## 12 c 2 1.78695
## 13 c 3 44.94750
## 14 c 4 24.83535
## 15 c 5 67.99770
## 16 d 1 100.62510
## 17 d 2 -19.60125
## 18 d 3 85.39140
## 19 d 4 9.37800
## 20 d 5 -16.12920
## 21 e 1 55.32240
## 22 e 2 132.99825
## 23 e 3 67.24410
## 24 e 4 -0.93135
## 25 e 5 27.36150
## 26 f 1 108.13770
## 27 f 2 31.14285
## 28 f 3 21.66375
## 29 f 4 116.81430
## 30 f 5 48.95040
## 31 g 1 63.27840
## 32 g 2 80.24685
## 33 g 3 33.71850
## 34 g 4 131.84340
## 35 g 5 32.61525
## 36 h 1 48.71625
## 37 h 2 75.98130
## 38 h 3 73.61250
## 39 h 4 21.84735
## 40 h 5 71.04780
## 41 u 1 39.11865
## 42 u 2 64.56690
## 43 u 3 15.80355
## 44 u 4 28.84035
## 45 u 5 15.56220
## 46 j 1 74.90610
## 47 j 2 101.06550
## 48 j 3 52.90590
## 49 j 4 40.73910
## 50 j 5 36.71595
```

`sm_auc_list()`

creates a new data frame with AUC for each subject and condition (in this case, Day). It has five arguments:

`data`

= this argument requires the variable that stores the data frame. In our case, it is`df`

.`subjects`

= this argument requires the name of the column of the data frame that contains subjects. It must strings, ex.`'Subject'`

, not`Subject`

.`groups`

= this argument requires the name of the column of the data frame that contains each group. In this example, there is no group. An example would be`Group`

column that contains two groups:`Normal`

and`Amblyopia`

(see next chapter on Slope Analysis).`conditions`

= this argument requires name of the column of the data frame that contains each condition. In our example, the condition is the day. It must strings, ex.`'Day'`

, not`Day`

.`x`

= this argument requires the name of the column of the data frame that contains the x-axis points from which the AUC can be calculated. In our case, these are 0, 3, 6, 12, 24 minutes. It must be strings, ex.`'Time'`

, not`Time`

. Also, it must be**numeric**/**double**, NOT**factor**. Make sure you check that the column is numeric. If its not, convert the column of the dataframe into**double**beforehand. ex.`df$Time <- as.numeric(df$Time)`

or`df$SpatialFreq <- as.numeric(df$Time)`

.`values`

= this argument requires the name of the column of the data frame that contains the actual data, which are the y-axis points from which the AUC can be calculated. In our case, it is the change in contrast balance ratio. It must strings, ex.`'Cbratio'`

, not`Cbratio`

.

Before using `sm_auc_list()`

, you will need to check for a few things. First, check if the x column is numeric, not factor, by using `is.numeric()`

function. If it is numeric, proceed with using `sm_auc_list()`

. Second, see if the x levels (ex. 0, 3, 6, 12, 24 and 48 etc) are identical for each subject and condition (ex. no 7, 16, 29 minutes for subject 10).

```
# check if the x column is numeric
is.numeric(df$Time)
```

`## [1] TRUE`

```
# check if the x column is factor
is.factor(df$Time)
```

`## [1] FALSE`

```
# if it is factor and not numeric:
$Time <- as.numeric(df$Time) df
```

After checking, we can store the results from `sm_auc_list()`

into a new variable. I will call the new variable `auc_df`

.

```
<- sm_auc_list(subjects = 'Subject', conditions = 'Day',
auc_df x = 'Time', values = 'Cbratio',
data = df)
```

`## [1] "AUC = Cbratio * Time"`

Now we can plot the AUCs using `sm_bar()`

, `sm_boxplot`

etc. Let’s have a try.

Here is a bar graph showing `auc_df`

’s data. This figure is similar to Figure 3E in the original paper (Min, Baldwin and Hess, 2019).

- re: means relative to X. The data are the differences compared to the baseline (re: Baseline).

```
%>% ggplot(aes(x = Day, y = AUC_Cbratio, fill = Day)) +
auc_df sm_bar(shape = 21, color = 'white',
bar_fill_color = 'gray80',
point_alpha = 1) +
scale_fill_manual(values = sm_palette(5)) +
ggtitle('\u0394 AUC re: Each day\'s baseline')
```

Here is a box plot showing `auc_df`

’s data.

```
%>% ggplot(mapping = aes(x = Day, y = AUC_Cbratio, color = Day)) +
auc_df sm_boxplot(alpha = 0.6) +
scale_color_manual(values = sm_palette(5)) +
ggtitle('\u0394 AUC re: Each day\'s baseline')
```

## 6.3 Reproducing figures in the paper (Min et al., 2019)

Let’s redraw some of the figures in the paper. In 2019, when the paper was published, there was no smplot. However, the themes I developed during then have become part of smplot. So, I have incorporated functions from smplot to reproduce the figures here.

### 6.3.1 Figure 3B (a best-fit line with points and error bars)

```
<- read_csv('https://www.smin95.com/min2019b.csv')
df # filter data from Days 2-4. | means OR.
<- df %>% filter(Day == 2 | Day == 3 | Day == 4) %>% group_by(Day, Time) %>%
df1 summarise(avg = mean(Cbratio), se = sm_stdErr(Cbratio))
# mean and standard error computed for each day and time, both of which have been grouped.
$Time <- as.numeric(df1$Time)
df1$Day <- as.factor(df1$Day) # Day has to be factored here because they will be labelled as different colors. Factor means categorical variables.
df1
<- c("#55A868", "#C44E52", "#8172B2")
colors
%>% ggplot(aes(x = Time, y = avg, color = Day)) +
df1 geom_point(size = 4.5) +
geom_errorbar(aes(ymin = avg - se, ymax = avg + se), size = .5, width = 1) +
geom_smooth(method = 'lm', se = F, size = 0.9) +
# lm = linear regression method
scale_x_continuous(breaks = unique(df$Time),
labels = c("0", "3", "6", "12", "24", "48")) +
sm_hgrid(legends = TRUE) +
scale_color_manual(values = colors,
labels = c("Day 2", "Day 3", "Day 4")) +
ggtitle("Recovery of the patching effect") +
xlab("Time after monocular deprivation (min)") +
ylab("\u0394 Contrast balance ratio (dB)") +
theme(legend.justification = c(1,0),
legend.position = c(0.96, 0.67),
legend.title = element_blank())
```

### 6.3.2 Figure 3D (a bar graph)

This a bar graph that has been plotted with `sm_bar()`

. The y-axis has been set with the function `coord_cartesian()`

.

```
<- read_csv('https://www.smin95.com/baseline.csv')
baseline
<- c("#4C72B0", "#55A868", "#C44E52",
colors "#8172B2", "#D55E00")
$Day <- as.factor(baseline$Day)
baseline
%>% ggplot(aes(x = Day, y= Baseline, fill = Day)) +
baseline sm_bar(points = FALSE, bar_width = 0.9, errSize = 0.5) +
scale_fill_manual(values = colors) +
ggtitle("Baseline across days") +
coord_cartesian(ylim = c(-2, 2)) +
ylab("Contrast balance ratio (dB)") +
xlab('Patching day')
```

### 6.3.3 Figure 3E (modeling in Matlab and plotting in R)

The grey shades have actually been approximated using Matlab. The results have been saved in the .csv file `fit_error_shade.csv`

. They are then imported into R, where the final figure is plotted.

Notice that two data frames are used to create this plot, `df_shade1`

and `auc_df1`

. The former contains the values from the modeling in Matlab, the latter contains AUC values.

So do not worry if your field does not use R for data analysis. One can still perform data visualization using R.

```
<- read_csv('https://www.smin95.com/fit_error_shade.csv')
df_shade <- df_shade %>% mutate(yUpp = y_mean + y_se, yLow = y_mean - y_se)
df_shade1
<- read_csv('https://www.smin95.com/min2019b.csv')
df <- sm_auc_list(subjects = 'Subject', conditions = 'Day',
auc_df x = 'Time', values = 'Cbratio',
data = df)
```

`## [1] "AUC = Cbratio * Time"`

```
<- auc_df %>% group_by(Day) %>%
auc_df1 summarise(y_mean = mean(AUC_Cbratio), std_err = sm_stdErr(AUC_Cbratio))
$Day <- as.numeric(auc_df1$Day)
auc_df1
<- c("#4C72B0", "#55A868", "#C44E52",
colors "#8172B2", "#D55E00")
%>% ggplot(aes(x=x, y=y_mean)) +
df_shade1 geom_ribbon(aes(ymin = yLow, ymax = yUpp), alpha = .08) +
geom_linerange(data = auc_df1, aes(Day, ymin = y_mean - std_err, ymax = y_mean + std_err), size = .5) +
geom_point(data = auc_df1, aes(Day, y_mean, color = as.factor(Day), size = 5)) +
sm_hgrid(legends = FALSE) +
scale_x_continuous(limits = c(1, 5)) +
scale_y_continuous(limits = c(0, 90)) +
scale_color_manual(values = colors) +
ylab("Area Under Curve (dB min)") +
xlab("Patching Day") +
ggtitle("\u0394 AUC re: Each Day's Baseline") +
annotate("text", label = "y = -5.8(Day) + 67.8", x = 2.0, y= 5) +
annotate("text", label = "***", x = 1, y = 82, size = 3.5, fontface="bold") +
annotate("text", label = "**", x = 2, y = 75, size = 3.5, fontface="bold") +
annotate("text", label = "**", x = 3, y = 68, size = 3.5, fontface="bold") +
annotate("text", label = "*", x = 4, y = 65, size = 3.5, fontface="bold") +
annotate("text", label = "**", x = 5, y = 60, size = 3.5, fontface="bold")
```