Invasive range calls were taken through a pre-processing workflow similar to previously published native range calls[1]. Previously published invasive range calls for 2004 were provided by Susannah Buhrman-Deever[2], and were pre-processed in the same way. See the appendix of the associated article in Animal Behavior for details on pre-processing steps. The package warbleR[3] was used for pre-processing, making extended selection tables and spectrograms, and calculating acoustic measurements for downstream analyses. Some function names and other code in this script (and subsequent scripts) may need to be updated in order to reproduce results with more recent versions of the packages used. Here, we generated summary statistics of the combined native and invasive range datasets, then analyzed differences in nest and social group size estimates between ranges.

Install packages needed to run analyses in this script and the subsequent 3 scripts. Packages MASS, parallel, and grid are provided with the base R installation.

# Linux users may need to run the following terminal commands to install devtools and CRAN packages: 
# sudo apt-get install libcurl4-openssl-dev 
# sudo apt-get install libssl-dev
# sudo apt-get install libxml2-dev
# sudo apt install libgeos-dev
# sudo apt install libgdal-dev
# sudo apt install libudunits2-dev

# Install packages from CRAN
# Note that until orddom is back on CRAN, you may have to download an earlier version of the package from source
X <- c("ggplot2", "pbapply", "data.table", "tidyverse", "scales", "egg", "ggplotify", "grid", "rgdal", "ranger", "caret", "rgeos", "Rmisc", "Rraven", "lubridate", "knitr", "coin", "cowplot", "dplyr", "corrplot", "MLmetrics", "e1071", "gbm", "pdp", "pracma", "gridExtra", "facetscales", "grDevices", "gtable", "rstatix", "orddom", "IDmeasurer", "utils")

is_installed <- function(p) is.element(p, installed.packages()[,1])

lapply(1:length(X), function(x){
  if(!is_installed(X[x])){
    install.packages(X[x], repos = "http://lib.stat.cmu.edu/R/CRAN")
  }
})

# Install the most recent version of the warbleR package from GitHub
# Install devtools if not installed
if (!"devtools" %in% installed.packages()[,"Package"]) install.packages("devtools")

# Linux users may have to install fftw:
# sudo apt-get install libfftw3-dev libfftw3-doc
# as well as cmake for bioacoustics to be installed:
# sudo apt-get install cmake
library(devtools)
install_github("maRce10/warbleR")
install_github("zmjones/edarf", subdir = "pkg")

# facetscales has a bug when making boxplots, we did not use this package to make boxplots
install_github("zeehio/facetscales")

# To install a previous version of orddom from source:

# May need to install package psych from source depending on R version
pkgurl <- "https://cran.r-project.org/src/contrib/Archive/psych/psych_2.0.12.tar.gz"
install.packages(pkgurl, repos = NULL, type="source")

pkgurl <- "https://cran.r-project.org/src/contrib/Archive/orddom/orddom_3.1.tar.gz"
install.packages(pkgurl, repos = NULL, type="source")

Load packages and external functions.

rm(list = ls())

X <- c("warbleR", "pbapply", "Rraven", "tidyverse", "ggplot2", "data.table", "lubridate", "knitr", "coin", "rgdal", "rgeos", "cowplot", "egg", "ggplotify")

invisible(lapply(X, library, character.only = TRUE))

path <- "/media/gsvidaurre/MYIOPSITTA/R/VocalLearning_PostInvasion/Data"
gpath <- "/home/gsvidaurre/Desktop/MANUSCRIPTS/SimplerSignatures_PostInvasion/FIGURES"

seed <- 401

cores <- parallel::detectCores() - 2

# Call R code to make raincloud plots, downloaded from https://github.com/RainCloudPlots/RainCloudPlots
# Loading these scripts will install packages if these are not already on your machine
source("/home/gsvidaurre/Desktop/Software/RainCloudPlots-master/tutorial_R/R_rainclouds.R")
source("/home/gsvidaurre/Desktop/Software/RainCloudPlots-master/tutorial_R/summarySE.R")
source("/home/gsvidaurre/Desktop/Software/RainCloudPlots-master/tutorial_R/simulateData.R")

Read in the extended selection table that contains metadata and wave objects for pre-processed native and invasive range calls. Throughout this script, you can reproduce analyses by reading in the file “nat_inv_indiv_site_seltbl.csv”, which is a selection table that contains the metadata employed here.

nat_inv_est <- readRDS(file.path(path, "nat_inv_indiv_site_EST.RDS"))
glimpse(nat_inv_est)
## Rows: 1,596
## Columns: 26
## $ sound.files          <chr> "2017_07_17_EMBR_INDIV-RAW_MZ000199_1.WAV", "2017…
## $ selec                <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ start                <dbl> 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0…
## $ end                  <dbl> 0.2646363, 0.2310324, 0.2509257, 0.2378781, 0.220…
## $ date                 <date> 2017-07-17, 2017-07-17, 2017-07-17, 2017-07-17, …
## $ range                <fct> Native, Native, Native, Native, Native, Native, N…
## $ social_scale         <chr> "Individual", "Individual", "Individual", "Indivi…
## $ site                 <chr> "EMBR", "EMBR", "EMBR", "EMBR", "EMBR", "EMBR", "…
## $ Bird_ID              <chr> "NAT-RAW", "NAT-RAW", "NAT-RAW", "NAT-RAW", "NAT-…
## $ lat                  <dbl> -34.44437, -34.44437, -34.44437, -34.44437, -34.4…
## $ lon                  <dbl> -57.72845, -57.72845, -57.72845, -57.72845, -57.7…
## $ Visual_quality_score <fct> H, M, H, H, M, M, M, H, M, H, H, H, M, M, H, H, H…
## $ Overlapping_signal   <fct> N, N, N, N, N, N, N, N, N, N, N, N, N, N, N, N, N…
## $ Truncated            <chr> "N", "N", "N", "N", "N", "N", "N", "N", "N", "N",…
## $ tailored             <fct> y, y, y, y, y, y, y, y, y, y, y, y, y, y, y, y, y…
## $ prev_SNR             <dbl> 20.10350, 14.90409, 18.55063, 18.49727, 18.74672,…
## $ SNR_margin_works     <chr> "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y",…
## $ SNR_before_only      <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ old.sound.file.name  <chr> "2017_07_17_EMBR_INDIV-RAW_MZ000199_1.WAV", "2017…
## $ SNR                  <dbl> 23.45415, 22.14815, 15.36660, 19.25200, 21.86770,…
## $ region               <chr> "West", "West", "West", "West", "West", "West", "…
## $ country              <chr> "Uruguay", "Uruguay", "Uruguay", "Uruguay", "Urug…
## $ site_year            <chr> "EMBR_2017", "EMBR_2017", "EMBR_2017", "EMBR_2017…
## $ year                 <chr> "2017", "2017", "2017", "2017", "2017", "2017", "…
## $ dept_state           <chr> "Colonia", "Colonia", "Colonia", "Colonia", "Colo…
## $ invasive_city        <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
# Write out selection table for sharing data (not in extended format, just a .csv)
# nat_inv_est %>%
#   as_tibble %>%
#   write.csv(., file.path(path, "nat_inv_indiv_site_seltbl.csv"), row.names = FALSE)

Repeatedly sampled individuals

Total number of repeatedly sampled individual calls between ranges and by range.

nat_inv_est %>%
  as_tibble() %>%
  filter(social_scale == "Individual") %>%
  dplyr::count() %>%
  kable(align = rep("c", ncol(.)))
n
229
nat_inv_est %>%
  as_tibble() %>%
  filter(social_scale == "Individual") %>%
  group_by(range) %>%
  dplyr::count() %>%
  kable(align = rep("c", ncol(.)))
range n
Native 97
Invasive 132

Number of known repeatedly sampled individuals across ranges.

nat_inv_est %>%
  as_tibble() %>%
  filter(social_scale == "Individual") %>%
  group_by(range) %>%
  dplyr::summarise(count = n_distinct(Bird_ID)) %>%
  kable(align = rep("c", ncol(.)))
range count
Native 8
Invasive 9

Number of sites at which we sampled known repeatedly sampled individuals across ranges.

nat_inv_est %>%
  as_tibble() %>%
  filter(social_scale == "Individual") %>%
  group_by(range) %>%
  dplyr::summarise(count = n_distinct(site)) %>%
  kable(align = rep("c", ncol(.)))
range count
Native 3
Invasive 7

Number of calls per known repeatedly sampled individual across ranges, and recording dates. Used for a table in the appendix.

nat_inv_est %>%
  as_tibble() %>%
  filter(social_scale == "Individual") %>%
  group_by(range, Bird_ID, lat, lon, site, dept_state) %>%
  dplyr::count() %>%
  arrange(range, site) %>%
  inner_join(
    nat_inv_est %>%
      as_tibble() %>%
      filter(social_scale == "Individual") %>%
      group_by(range, Bird_ID) %>%
      distinct(date),
    by = c("range", "Bird_ID")
  ) %>%
  ungroup() %>%
  dplyr::mutate(
    lat = round(lat, 3),
    lon = round(lon, 3)
  ) %>%
  dplyr::select(Bird_ID, site, dept_state, lat, lon, n, date) %>%
  kable(align = rep("c", ncol(.)))
Bird_ID site dept_state lat lon n date
NAT-AAT 1145 Colonia -34.376 -57.500 12 2017-07-29
NAT-UM1 1145 Colonia -34.375 -57.502 25 2017-07-28
NAT-UM2 1145 Colonia -34.375 -57.502 23 2017-07-24
NAT-UM3 1145 Colonia -34.375 -57.502 5 2017-07-24
NAT-UM4 1145 Colonia -34.376 -57.500 13 2017-07-26
NAT-UM5 CHAC Colonia -34.413 -57.843 7 2017-08-21
NAT-RAW EMBR Colonia -34.444 -57.728 4 2017-07-17
NAT-ZW8 EMBR Colonia -34.444 -57.728 8 2017-07-21
INV-UM6 ASCA Texas 31.754 -106.405 25 2019-03-10
INV-UM1 BART Texas 30.305 -97.695 23 2011-02-15
INV-UM19 CAME Louisiana 30.022 -90.065 12 2004-03-30
INV-UM7 ELEM Texas 30.260 -97.718 28 2019-08-06
INV-UM9 ELEM Texas 30.260 -97.718 5 2019-08-06
INV-UM10 INTR Texas 30.317 -97.728 6 2019-08-08
INV-UM5 ROBE Louisiana 30.021 -90.069 20 2011-02-18
INV-UM16 SOCC Texas 30.270 -97.761 8 2019-08-09
INV-UM17 SOCC Texas 30.270 -97.761 5 2019-08-09

Median and range of the number of calls per known repeatedly sampled individual across ranges.

nat_inv_est %>%
  as_tibble() %>%
  filter(social_scale == "Individual") %>%
  group_by(range, Bird_ID) %>%
  dplyr::summarise(num_calls = length(Bird_ID)) %>%
  dplyr::summarise(
    median_calls = round(median(num_calls), 2),
    range_calls = paste(round(range(num_calls), 2), collapse = " - ")
  ) %>%
  kable(align = rep("c", ncol(.)))
## `summarise()` has grouped output by 'range'. You can override using the `.groups` argument.
range median_calls range_calls
Native 10 4 - 25
Invasive 12 5 - 28

Median and range of signal to noise ratio (SNR) across ranges.

nat_inv_est %>%
  as_tibble() %>%
  filter(social_scale == "Individual") %>%
  group_by(range) %>%
  filter(!is.na(SNR)) %>%
  dplyr::summarise(
    median_SNR = round(median(SNR), 2),
    range_SNR = paste(round(range(SNR), 2), collapse = " - ")
  ) %>%
  kable(align = rep("c", ncol(.)))
range median_SNR range_SNR
Native 16.80 9.69 - 26.6
Invasive 17.23 7.5 - 29.11

Call dataset of broader geographic resolution

As described in the main text of the associated publication, this larger dataset contained one call per unique individual (most birds recorded for this dataset were unmarked). Get the total number of sites sampled across ranges for this dataset.

nat_inv_est %>%
  as_tibble() %>%
  filter(social_scale == "Site") %>%
  dplyr::summarise(count = n_distinct(site_year)) %>%
  kable(align = rep("c", ncol(.)))
count
63

Number of site-years across ranges (e.g. unique sites).

nat_inv_est %>%
  as_tibble() %>%
  filter(social_scale == "Site") %>%
  group_by(range) %>%
  dplyr::summarise(count = n_distinct(site_year)) %>%
  kable(align = rep("c", ncol(.)))
range count
Native 37
Invasive 26

Number of sites across ranges and years.

nat_inv_est %>%
  as_tibble() %>%
  filter(social_scale == "Site") %>%
  group_by(range, year) %>%
  dplyr::summarise(count = n_distinct(site_year)) %>%
  kable(align = rep("c", ncol(.)))
## `summarise()` has grouped output by 'range'. You can override using the `.groups` argument.
range year count
Native 2017 37
Invasive 2004 11
Invasive 2011 8
Invasive 2018 1
Invasive 2019 6

Total number of calls per site-year across ranges.

nat_inv_est %>%
  as_tibble() %>%
  filter(social_scale == "Site") %>%
  group_by(range) %>%
  dplyr::summarise(num_calls = length(range)) %>%
  kable(align = rep("c", ncol(.)))
range num_calls
Native 610
Invasive 757

Median and range of the number of calls per site-year.

nat_inv_est %>%
  as_tibble() %>%
  filter(social_scale == "Site") %>%
  group_by(range, site_year) %>%
  dplyr::summarise(num_calls = length(site_year)) %>%
  dplyr::summarise(
    median_calls = round(median(num_calls), 2),
    range_calls = paste(round(range(num_calls), 2), collapse = " - ")
  ) %>%
  kable(align = rep("c", ncol(.)))
## `summarise()` has grouped output by 'range'. You can override using the `.groups` argument.
range median_calls range_calls
Native 15.0 5 - 53
Invasive 15.5 5 - 93

Median and range of SNR of site-years across ranges, by range.

nat_inv_est %>%
  as_tibble() %>%
  filter(social_scale == "Site") %>%
  group_by(range) %>%
  filter(!is.na(SNR)) %>%
  dplyr::summarise(
    median_SNR = round(median(SNR), 2),
    range_SNR = paste(round(range(SNR), 2), collapse = " - ")
  ) %>%
  kable(align = rep("c", ncol(.)))
range median_SNR range_SNR
Native 18.77 7.98 - 37.19
Invasive 13.48 7.02 - 33.96

Number of calls, sampling dates and geographic coordinates per site-year across ranges. Used for appendix tables as well.

nat_inv_est %>%
  as_tibble() %>%
  filter(social_scale == "Site") %>%
  group_by(range, site, year, dept_state, invasive_city, lat, lon) %>%
  distinct(date) %>%
  arrange(range, site, dept_state, year) %>%
  ungroup() %>%
  inner_join(
    nat_inv_est %>%
      as_tibble() %>%
      filter(social_scale == "Site") %>%
      group_by(range, site, date, lat, lon) %>%
      dplyr::summarise(
        n_calls = length(sound.files)
      ),
  by = c("range", "site", "date", "lat", "lon")
  ) %>%
  dplyr::mutate(
    lat = round(lat, 3),
    lon = round(lon, 3)
  ) %>%
  dplyr::select(range, site, dept_state, invasive_city, lat, lon, n_calls, date) %>%
  kable(align = rep("c", ncol(.)))
## `summarise()` has grouped output by 'range', 'site', 'date', 'lat'. You can override using the `.groups` argument.
range site dept_state invasive_city lat lon n_calls date
Native 1145 Colonia NA -34.375 -57.502 6 2017-07-26
Native 1145 Colonia NA -34.376 -57.500 7 2017-07-26
Native 1145 Colonia NA -34.375 -57.502 2 2017-07-24
Native 1145 Colonia NA -34.375 -57.502 1 2017-07-28
Native 1145 Colonia NA -34.376 -57.500 1 2017-07-29
Native ARAP Salto NA -30.946 -57.520 12 2017-05-07
Native ARAZ San Jose NA -34.535 -56.812 15 2017-11-03
Native BAGU Montevideo NA -34.848 -56.384 20 2017-10-09
Native BCAR Montevideo NA -34.788 -56.223 13 2017-10-20
Native CEME Montevideo NA -34.913 -56.187 6 2017-10-18
Native CHAC Colonia NA -34.413 -57.843 12 2017-08-21
Native CISN Maldonado NA -34.861 -55.150 28 2017-09-13
Native ECIL San Jose NA -34.361 -57.060 4 2017-07-28
Native ECIL San Jose NA -34.360 -57.060 13 2017-07-28
Native ELTE Maldonado NA -34.889 -54.863 23 2017-09-13
Native EMBR Colonia NA -34.444 -57.728 22 2017-07-21
Native EMBR Colonia NA -34.444 -57.728 1 2017-07-17
Native FAGR Montevideo NA -34.838 -56.219 7 2017-09-05
Native GOLF Montevideo NA -34.923 -56.164 22 2017-11-20
Native HIPE Maldonado NA -34.825 -55.010 5 2017-09-12
Native INBR Canelones NA -34.668 -56.330 19 2017-09-03
Native INES-01 Colonia NA -34.349 -57.727 12 2017-07-03
Native INES-03 Colonia NA -34.336 -57.668 15 2017-07-11
Native INES-04 Colonia NA -34.335 -57.668 9 2017-07-11
Native INES-05 Colonia NA -34.340 -57.690 6 2017-07-15
Native INES-06 Colonia NA -34.344 -57.708 6 2017-07-13
Native INES-07 Colonia NA -34.346 -57.710 9 2017-07-13
Native INES-08 Colonia NA -34.345 -57.733 27 2017-07-13
Native KIYU San Jose NA -34.607 -56.715 8 2017-11-03
Native LENA Colonia NA -34.411 -57.838 19 2017-10-23
Native OJOS Rocha NA -33.804 -53.506 23 2017-11-16
Native PAVO San Jose NA -34.442 -56.967 25 2017-10-17
Native PEIX Montevideo NA -34.765 -56.279 19 2017-10-06
Native PFER Colonia NA -34.468 -57.831 34 2017-06-19
Native PFER Colonia NA -34.465 -57.827 19 2017-06-21
Native PIED Colonia NA -34.413 -57.849 21 2017-10-25
Native PLVE Maldonado NA -34.870 -55.264 11 2017-05-21
Native PROO Canelones NA -34.855 -56.022 12 2017-09-14
Native QUEB Maldonado NA -34.834 -55.260 16 2017-09-13
Native RIAC Colonia NA -34.437 -57.706 17 2017-06-28
Native RIAC Colonia NA -34.436 -57.706 8 2017-06-28
Native ROSA Colonia NA -34.338 -57.336 15 2017-07-27
Native SAUC Maldonado NA -34.857 -55.041 6 2017-09-12
Native SEMI Colonia NA -34.326 -57.680 11 2017-07-25
Native VALI Rocha NA -34.334 -53.803 23 2017-11-16
Invasive AIRP Texas Austin 30.285 -97.705 9 2019-08-07
Invasive AUDU Connecticut Milford 41.176 -73.102 17 2004-03-30
Invasive BALL Louisiana New Orleans 30.013 -90.132 26 2004-03-30
Invasive BAPT Florida Miami 25.688 -80.338 40 2004-03-30
Invasive BUCK Connecticut Milford 41.217 -73.038 60 2004-03-30
Invasive CANA Louisiana New Orleans 29.981 -90.110 13 2004-03-30
Invasive COMM Texas Austin 30.404 -97.705 11 2004-03-30
Invasive ELEM Texas Austin 30.260 -97.718 12 2011-02-15
Invasive ELEM Texas Austin 30.260 -97.718 61 2019-08-06
Invasive FOLS Louisiana New Orleans 30.027 -90.205 10 2004-03-30
Invasive GILB Arizona Gilbert 33.331 -111.791 16 2018-04-09
Invasive INTR Texas Austin 30.316 -97.719 15 2011-02-15
Invasive INTR Texas Austin 30.317 -97.728 82 2019-08-08
Invasive LAKE Louisiana New Orleans 30.029 -90.077 6 2011-02-18
Invasive LAWT Texas Dallas 32.820 -96.730 9 2011-02-20
Invasive MANO Texas Austin 30.299 -97.686 5 2019-08-09
Invasive MART Texas Austin 30.253 -97.731 14 2011-02-15
Invasive MART Texas Austin 30.251 -97.731 50 2019-08-10
Invasive MEAD Connecticut Milford 41.210 -73.071 28 2004-03-30
Invasive ROBE Louisiana New Orleans 30.021 -90.069 24 2011-02-18
Invasive SHAK Connecticut Stratford 41.184 -73.126 50 2004-03-30
Invasive SOCC Texas Austin 30.272 -97.767 77 2004-03-30
Invasive SOCC Texas Austin 30.270 -97.761 93 2019-08-09
Invasive SOFT Texas Austin 30.281 -97.725 14 2011-02-15
Invasive VALL Texas Austin 30.261 -97.711 5 2004-03-30
Invasive VALL Texas Austin 30.261 -97.711 10 2011-02-15

Nest estimates

Read in nest estimates to get nest density estimates at recording sites per range. Used for a table in the appendix.

nests <- read.csv(file.path(path, "NestEstimates.csv")) %>%
  dplyr::select(c(Range, Year, Department_orCityState, Site_Code, Estimated_Nests))

Get median and interquartile range of number of estimated nests per range.

nests %>%
  group_by(Range) %>%
  dplyr::mutate(
    Range = factor(Range, levels = c("Native", "Invasive"))
  ) %>%
  dplyr::summarise(
    median_nests = round(median(Estimated_Nests), 2),
    IQR_nests = round(IQR(Estimated_Nests), 2)
  ) %>%
  kable(align = rep("c", ncol(.)))
Range median_nests IQR_nests
Native 19.5 24.25
Invasive 4.0 6.00

Were nest number estimates significantly different between ranges? First check normality of nest data using histograms and a Shapiro-Wilks test.

hist(nests %>%
  pull(Estimated_Nests))

shapiro.test(nests %>%
  pull(Estimated_Nests))
## 
##  Shapiro-Wilk normality test
## 
## data:  nests %>% pull(Estimated_Nests)
## W = 0.45358, p-value = 1.39e-10

The nest data was skewed, so I asked whether the population distributions were identical using a Mann-Whitney-Wilcoxon test (Wilcoxon rank sum test). This test does not assume the data are normally distributed, but does assume the samples are independent. In order to meet the assumption that samples are independent of each other, I dropped some invasive range site-years for which we had nest estimates in more than one year.

# 4 invasive range sites had counts for 2 years
n2drop <- nests %>%
  group_by(Site_Code) %>%
  dplyr::summarise(n = n()) %>%
  filter(n > 1) %>%
  pull(Site_Code) %>%
  as.character()

n2drop
## [1] "AIRP" "ELEM" "INTR" "MART"
# Drop the 2011 estimates for these sites
nests2 <- nests %>%
  dplyr::mutate(site_year = paste(Site_Code, Year, sep = "-")) %>%
  filter(!site_year %in% paste(n2drop, "2011", sep = "-")) %>%
  droplevels() %>%
  dplyr::mutate(
    Range = factor(Range, levels = c("Native", "Invasive"))
  )
nrow(nests) - nrow(nests2)
## [1] 4
# Checking, looks good
# nests2 %>%
#   group_by(Site_Code) %>%
#   dplyr::summarise(n = n()) %>%
#   pull(n) %>%
#   unique()

# Still non-normal after samples above were dropped
shapiro.test(nests2 %>%
  pull(Estimated_Nests))
## 
##  Shapiro-Wilk normality test
## 
## data:  nests2 %>% pull(Estimated_Nests)
## W = 0.46941, p-value = 8.462e-10

The median and interquartile range of these filtered data per range were the same as the full dataset.

nests2 %>%
  group_by(Range) %>%
  dplyr::summarise(
    median_nests = round(median(Estimated_Nests), 2),
    IQR_nests = round(IQR(Estimated_Nests), 2)
  ) %>%
  kable(align = rep("c", ncol(.)))
Range median_nests IQR_nests
Native 19.5 24.25
Invasive 4.0 6.00

Mann-Whitney test with independent nest estimates. Computed exact p-values and 95% CIs to avoid using normal approximations. stats::wilcox.test uses the normal approximation for data with ties, as is the case here. Used coin::wilcox_test instead, which handles data with ties and can either get the conditional null distribution by Monte Carlo resampling, or the exact distribution. The confidence intervals are for the difference in location reported below.

levels(nests2$Range)
## [1] "Native"   "Invasive"
mw_res <- coin::wilcox_test(Estimated_Nests ~ Range, data = nests2, conf.int = TRUE, conf.level = 0.95, ties.method = "mid-ranks", alternative = "two.sided", distribution = "exact")

mw_res
## 
##  Exact Wilcoxon-Mann-Whitney Test
## 
## data:  Estimated_Nests by Range (Native, Invasive)
## Z = 4.2055, p-value = 2.872e-06
## alternative hypothesis: true mu is not equal to 0
## 95 percent confidence interval:
##   7 26
## sample estimates:
## difference in location 
##                     14

Nest densities

Read in shapefiles for nesting sites obtained by tracing site areas on Google My Maps.

# Read in polygons, no z-dimension saved but that doesn't matter here
nest_site_polys <- readOGR(file.path(path, "URY_US_Polygons.kml"), "URY_US_Polygons")
## OGR data source with driver: LIBKML 
## Source: "/media/gsvidaurre/MYIOPSITTA/R/VocalLearning_PostInvasion/Data/URY_US_Polygons.kml", layer: "URY_US_Polygons"
## with 25 features
## It has 11 fields
## Warning in readOGR(file.path(path, "URY_US_Polygons.kml"), "URY_US_Polygons"):
## Z-dimension discarded
# str(nest_site_polys)
length(nest_site_polys)
## [1] 25
# Repoject to EPSG 3857, a projected coordinate system used by Google Maps that has units in meters
nest_site_polys <- spTransform(nest_site_polys, CRSobj = CRS("+init=epsg:3857"))

# Looks good
proj4string(nest_site_polys)
## [1] "+init=epsg:3857 +proj=merc +a=6378137 +b=6378137 +lat_ts=0.0 +lon_0=0.0 +x_0=0.0 +y_0=0 +k=1.0 +units=m +nadgrids=@null +no_defs"
bbox(nest_site_polys)
##         min      max
## x -12444548 -6123569
## y  -4146383  3939279
# Get area in meters squared per polyon
ns_areas <- gArea(nest_site_polys, byid = TRUE)
ns_areas 
##           1           2           3           4           5           6 
## 320484.6326   4799.3819   7724.8855  31521.7993   4357.3265   8459.8991 
##           7           8           9          10          11          12 
##   5254.0449  13691.6860  11709.7432  45114.1232   7733.4930   3597.5193 
##          13          14          15          16          17          18 
## 214495.9551  12704.6108  11368.1638   8720.9933   3764.1841    701.7397 
##          19          20          21          22          23          24 
##  10549.7286   2819.3939  21274.8657    670.4065  28646.5755   6449.2127 
##          25 
##   2228.0292
# Convert to km squared (1 meter squared = 1e-6 km squared)
# ns_km2 <- ns_areas/1e6
# ns_km2

# Convert to hectares (1 square meter = 0.0001 hectares)
ns_hect <- ns_areas/1e4

# Check out nest site names
# nest_site_polys$Name

# Convert to character and replace the new line symbol in PLVE
# Doesn't disappear after correction on Google My Maps
nest_site_polys$Name <- as.character(nest_site_polys$Name)
nest_site_polys$Name
##  [1] "INTR_2011"    "CHAC_2017"    "PIED_2017"    "RIAC_2017"    "SEMI_2017"   
##  [6] "INES-08_2017" "INES-07_2017" "INES-06_2017" "INES-03_2017" "1145_2017"   
## [11] "INBR_2017"    "BCAR_2017"    "ECIL_2017"    "PLVE_2017"    "QUEB_2017"   
## [16] "CISN_2017"    "HIPE_2017"    "GILB_2018"    "INTR_2019"    "AIRP_2019"   
## [21] "SOCC_2019"    "MANO_2019"    "MART_2019"    "SOFT_2011"    "LAWT_2011"
# Add nest areas to the data frame with nest numbers
# Calculate nest density at each site (total nests / total area)
nests2 <- nests %>%
  dplyr::mutate(
    Site_Year = paste(Site_Code, Year, sep = "_")
  ) %>%
  left_join(
    data.frame(
      Site_Year = nest_site_polys$Name,
      Nest_Site_Area = ns_hect
    ) %>%
    # Use the same nest site polygon for MART 2011 and 2019
    bind_rows(
      data.frame(
        Site_Year = "MART_2011",
        Nest_Site_Area = ns_hect[grep("MART", nest_site_polys$Name)]
      )
    ),
    by = "Site_Year"
  ) %>%
  dplyr::mutate(
    Nest_Density = Estimated_Nests/Nest_Site_Area
  ) 

nests2 %>%
  dplyr::mutate(
    Nest_Site_Area = round(Nest_Site_Area, 2),
    Nest_Density = round(Nest_Density, 2)
  ) %>%
  dplyr::select(c(Range, Year, Department_orCityState, Site_Code, Estimated_Nests, Nest_Site_Area, Nest_Density)) %>%
  kable(align = rep("c", ncol(.)))
Range Year Department_orCityState Site_Code Estimated_Nests Nest_Site_Area Nest_Density
Native 2017 Maldonado PLVE 10 1.27 7.87
Native 2017 Colonia RIAC 109 3.15 34.58
Native 2017 San Jose ECIL 247 21.45 11.52
Native 2017 Colonia INES-01 10 NA NA
Native 2017 Colonia SEMI 29 0.44 66.55
Native 2017 Colonia INES-03 50 1.17 42.70
Native 2017 Colonia INES-07 15 0.53 28.55
Native 2017 Colonia INES-06 20 1.37 14.61
Native 2017 Colonia INES-08 25 0.85 29.55
Native 2017 Colonia INES-05 6 NA NA
Native 2017 Colonia 1145 8 4.51 1.77
Native 2017 Colonia ROSA 41 NA NA
Native 2017 Colonia CHAC 19 0.48 39.59
Native 2017 Canelones INBR 20 0.77 25.86
Native 2017 Montevideo BCAR 33 0.36 91.73
Native 2017 Maldonado HIPE 15 0.38 39.85
Native 2017 Maldonado QUEB 10 1.14 8.80
Native 2017 Maldonado CISN 9 0.87 10.32
Native 2017 Colonia PIED 38 0.77 49.19
Native 2017 Rocha VALI 13 NA NA
Invasive 2018 Gilbert AZ GILB 3 0.07 42.75
Invasive 2019 Austin TX INTR 13 1.05 12.32
Invasive 2019 Austin TX ELEM 1 NA NA
Invasive 2019 Austin TX AIRP 5 0.28 17.73
Invasive 2019 Austin TX SOCC 12 2.13 5.64
Invasive 2019 Austin TX MANO 4 0.07 59.67
Invasive 2019 Austin TX MART 8 2.86 2.79
Invasive 2011 Austin TX MART 6 2.86 2.09
Invasive 2011 Austin TX VALL 2 NA NA
Invasive 2011 Austin TX ELEM 4 NA NA
Invasive 2011 Austin TX SOFT 6 0.64 9.30
Invasive 2011 Austin TX AIRP 2 NA NA
Invasive 2011 Austin TX BART 1 NA NA
Invasive 2011 Austin TX INTR 20 32.05 0.62
Invasive 2011 New Orleans LA ROBE 8 NA NA
Invasive 2011 New Orleans LA LAKE 2 NA NA
Invasive 2011 Dallas TX LAWT 4 0.22 17.95
# Summary statistics
nests2 %>%
  dplyr::mutate(
    Nest_Site_Area = round(Nest_Site_Area, 2),
    Nest_Density = round(Nest_Density, 2)
  ) %>%
  dplyr::select(c(Range, Estimated_Nests, Nest_Site_Area, Nest_Density)) %>%
  pivot_longer(
    cols = c("Estimated_Nests", "Nest_Site_Area", "Nest_Density"),
    names_to = "statistic",
    values_to = "values"
  ) %>%
  dplyr::mutate(
    Range = factor(Range, levels = c("Native", "Invasive")),
    statistic = factor(statistic, levels = c("Estimated_Nests", "Nest_Site_Area", "Nest_Density"))
  ) %>%
  group_by(Range, statistic) %>%
  dplyr::summarise(
    median_nests = round(median(values, na.rm = TRUE), 2),
    IQR_nests = round(IQR(values, na.rm = TRUE), 2)
  ) %>%
  kable(align = rep("c", ncol(.)))
## `summarise()` has grouped output by 'Range'. You can override using the `.groups` argument.
Range statistic median_nests IQR_nests
Native Estimated_Nests 19.50 24.25
Native Nest_Site_Area 0.86 0.78
Native Nest_Density 29.05 29.34
Invasive Estimated_Nests 4.00 6.00
Invasive Nest_Site_Area 0.84 2.44
Invasive Nest_Density 10.81 14.39

Statistical tests of the difference in nest site area and density per range.

# Only 2 invasive range sites (INTR, MART) had counts for 2 years, as expected
n2drop <- nests2 %>%
  # Remove sites for which area was not obtained
  filter(!is.na(Nest_Site_Area)) %>%
  group_by(Site_Code) %>%
  dplyr::summarise(n = n()) %>%
  filter(n > 1) %>%
  pull(Site_Code) %>%
  as.character()

n2drop
## [1] "INTR" "MART"
# Drop the 2011 estimates for these sites
nests3 <- nests2 %>%
  # Remove sites for which area was not obtained
  filter(!is.na(Nest_Site_Area)) %>%
  dplyr::mutate(site_year = paste(Site_Code, Year, sep = "-")) %>%
  filter(!site_year %in% paste(n2drop, "2011", sep = "-")) %>%
  droplevels() %>%
  dplyr::mutate(
    Range = factor(Range, levels = c("Native", "Invasive"))
  )
nrow(nests2) - nrow(nests3)
## [1] 13
# Checking, looks good
# nests3 %>%
#   group_by(Site_Code) %>%
#   dplyr::summarise(n = n()) %>%
#   pull(n) %>%
#   unique()

16 native and 8 invasive range sites remain for statistical tests of differences in nest site area and density. Sample sizes were not matched between ranges.

nests3 %>%
  group_by(Range) %>%
  dplyr::summarise(n = n())
## # A tibble: 2 x 2
##   Range        n
##   <fct>    <int>
## 1 Native      16
## 2 Invasive     8

Check normality of nest site areas using histograms and Shapiro-Wilks test.

hist(nests3 %>%
  pull(Nest_Site_Area))

shapiro.test(nests3 %>%
  pull(Nest_Site_Area))
## 
##  Shapiro-Wilk normality test
## 
## data:  nests3 %>% pull(Nest_Site_Area)
## W = 0.39782, p-value = 6.579e-09

Repeated for nest densities.

hist(nests3 %>%
  pull(Nest_Density))

shapiro.test(nests3 %>%
  pull(Nest_Density))
## 
##  Shapiro-Wilk normality test
## 
## data:  nests3 %>% pull(Nest_Density)
## W = 0.89756, p-value = 0.01912

Both nest variables were skewed (not normally distributed). As for total nests above, I asked whether the population distributions per nest variable were identical using a Mann-Whitney-Wilcoxon test (Wilcoxon rank sum test) as above.

levels(nests3$Range)
## [1] "Native"   "Invasive"
mw_nestArea_res <- coin::wilcox_test(Nest_Site_Area ~ Range, data = nests3, conf.int = TRUE, conf.level = 0.95, ties.method = "mid-ranks", alternative = "two.sided", distribution = "exact")

mw_nestArea_res
## 
##  Exact Wilcoxon-Mann-Whitney Test
## 
## data:  Nest_Site_Area by Range (Native, Invasive)
## Z = 1.4697, p-value = 0.1531
## alternative hypothesis: true mu is not equal to 0
## 95 percent confidence interval:
##  -0.2851693  1.0666424
## sample estimates:
## difference in location 
##              0.3671253
mw_nestDensity_res <- coin::wilcox_test(Nest_Density ~ Range, data = nests3, conf.int = TRUE, conf.level = 0.95, ties.method = "mid-ranks", alternative = "two.sided", distribution = "exact")

mw_nestDensity_res
## 
##  Exact Wilcoxon-Mann-Whitney Test
## 
## data:  Nest_Density by Range (Native, Invasive)
## Z = 0.91856, p-value = 0.3826
## alternative hypothesis: true mu is not equal to 0
## 95 percent confidence interval:
##  -7.633161 28.938786
## sample estimates:
## difference in location 
##                8.42497

Social group sizes

We obtained the largest social group sizes observed during fieldwork in the native and invasive ranges. Some native range fieldwork observations were made during trapping for an independent study of genetic variation. All group sizes greater than or equal to 100 were set as estimates of 100 birds. This was an underestimate in most cases and was therefore more conservative. Group sizes were observed during staging, foraging, and flying by in each range.

sgs <- read.csv(file.path(path, "SocialGroupSize_Observations.csv"))
# glimpse(sgs)

# unique(sgs$Estimated_Flock_Size)
unique(sgs$Behavioral_Context)
## [1] "Staging; flying"             "Foraging"                   
## [3] "Foraging; staging; flying"   "Foraging; flying"           
## [5] "Perched"                     "Flying"                     
## [7] "Foraging in flock on ground" "Flock in stadium lights"
unique(sgs$Observer)
## [1] "GSV" "VP"  "TW"
# Set all estimates of 100 birds or more to a cap of 100
sgs2 <- sgs %>%
  dplyr::mutate(
    Estimated_Flock_Size = gsub("Several hundred|Over 100", "100", Estimated_Flock_Size), 
    Estimated_Flock_Size = gsub("Over |More than |About ", "", Estimated_Flock_Size),
    Estimated_Flock_Size = as.numeric(Estimated_Flock_Size),
    Range = factor(Range, levels = c("Native", "Invasive"))
  )

glimpse(sgs2)
## Rows: 14
## Columns: 14
## $ Row_ID               <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14
## $ Range                <fct> Native, Native, Native, Native, Native, Native, N…
## $ Country              <chr> "Uruguay", "Uruguay", "Uruguay", "Uruguay", "Urug…
## $ Dept_State           <chr> "Colonia", "Colonia", "Colonia", "Colonia", "Colo…
## $ Estimated_Flock_Size <dbl> 48, 100, 25, 50, 50, 100, 100, 100, 15, 30, 30, 3…
## $ Behavioral_Context   <chr> "Staging; flying", "Foraging", "Foraging; staging…
## $ Observer             <chr> "GSV", "GSV", "GSV", "GSV", "GSV", "GSV", "GSV", …
## $ Year                 <int> 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2…
## $ Date                 <chr> "04-June-2017", "04-June-2017", "05-June-2017", "…
## $ Site_Code            <chr> "CHAC", "DOMA", "DOMA", "DOMA", "DOMA", "ELGE", "…
## $ Site_Name            <chr> "La Chacra de los Olivos", "Estancia Dona Manuela…
## $ Notebook             <chr> "GSV Notebook 1", "GSV Notebook 1", "GSV Notebook…
## $ Notebook_page        <int> 8, 9, 11, 16, 21, 23, 42, 74, 8, 20, 25, 21, 25, …
## $ Notes                <chr> "", "", "", "", "", "", "", "", "", "", "", "Scan…
sgs2 %>%
  dplyr::select(Range, Country, Dept_State, Estimated_Flock_Size, Behavioral_Context, Date, Site_Code) %>%
  kable(align = rep("c", ncol(.)))
Range Country Dept_State Estimated_Flock_Size Behavioral_Context Date Site_Code
Native Uruguay Colonia 48 Staging; flying 04-June-2017 CHAC
Native Uruguay Colonia 100 Foraging 04-June-2017 DOMA
Native Uruguay Colonia 25 Foraging; staging; flying 05-June-2017 DOMA
Native Uruguay Colonia 50 Foraging; staging; flying 07-June-2017 DOMA
Native Uruguay Colonia 50 Foraging 08-June-2017 DOMA
Native Uruguay Colonia 100 Foraging; flying 08-June-2017 ELGE
Native Uruguay Colonia 100 Foraging; flying 12-June-2017 ELGE
Native Uruguay Colonia 100 Foraging 19-June-2017 SOLE
Invasive U.S. TX 15 Perched 06-August-2019 INTR
Invasive U.S. TX 30 Flying 09-August-2019 SOCC
Invasive U.S. TX 30 Flying 10-August-2019 MART
Invasive U.S. TX 37 Foraging in flock on ground 15-February-2011 MART
Invasive U.S. TX 30 Foraging in flock on ground 15-February-2011 INTR
Invasive U.S. LA 14 Flock in stadium lights 17-February-2011 JEFF

Calculate median and interquartile range of the largest social groups estimated in each range.

sgs2 %>%
  group_by(Range) %>%
  dplyr::summarise(
    median_vals = median(Estimated_Flock_Size),
    IQR_vals = IQR(Estimated_Flock_Size)
  )
## # A tibble: 2 x 3
##   Range    median_vals IQR_vals
##   <fct>          <dbl>    <dbl>
## 1 Native            75     50.5
## 2 Invasive          30     11.2

Were estimated flock sizes equivalent between ranges? Checked normality of social group size data from our fieldwork in each range.

hist(sgs2 %>%
  pull(Estimated_Flock_Size))

shapiro.test(sgs2 %>%
  pull(Estimated_Flock_Size))
## 
##  Shapiro-Wilk normality test
## 
## data:  sgs2 %>% pull(Estimated_Flock_Size)
## W = 0.81595, p-value = 0.007915

The social group size data was not normally distributed, and some flock sizes were obtained at the same sites in the same or different years. I retained one unique flock size estimate per site, then asked whether the population distributions were identical using a Mann-Whitney-Wilcoxon test (Wilcoxon rank sum test).

# 4 observations should remain per range
unique(as.character(sgs2$Site_Code))

# Randomly select one observation per site
set.seed(seed)
sgs3 <- sgs2 %>%
  dplyr::select(-c(Notebook, Notebook_page, Notes), Site_Name) %>%
  group_by(Site_Code) %>%
  nest() %>%
  ungroup() %>%
  # Randomly sample 1 observation (if just 1, then this single observation will be returned) 
  dplyr::mutate(
    rsfs = purrr::map2(data, 1, sample_n, replace = FALSE)
  ) %>%
  dplyr::select(-data) %>%
  unnest(rsfs)

glimpse(sgs3)

# Results I compute here on my local machine are different to those computed when knitting RMarkdown output. I think it has to do with setting the seed because the randomly sampled data points by RMarkdown are also different. I wrote out a new .csv with the randomly sampled social group size data to account for this
write.csv(sgs3, file.path(path, "SocialGroupSizes_randomSampling.csv"), row.names = FALSE)
sgs3 <- read.csv(file.path(path, "SocialGroupSizes_randomSampling.csv")) %>%
  dplyr::mutate(
    Range = factor(Range, levels = c("Native", "Invasive"))
  )

# Print the data frame
sgs3 %>%
  dplyr::select(Range, Country, Dept_State, Estimated_Flock_Size, Behavioral_Context, Date, Site_Code) %>%
  kable(align = rep("c", ncol(.)))
Range Country Dept_State Estimated_Flock_Size Behavioral_Context Date Site_Code
Native Uruguay Colonia 48 Staging; flying 04-June-2017 CHAC
Native Uruguay Colonia 50 Foraging 08-June-2017 DOMA
Native Uruguay Colonia 100 Foraging; flying 12-June-2017 ELGE
Native Uruguay Colonia 100 Foraging 19-June-2017 SOLE
Invasive U.S. TX 30 Foraging in flock on ground 15-February-2011 INTR
Invasive U.S. TX 30 Flying 09-August-2019 SOCC
Invasive U.S. TX 30 Flying 10-August-2019 MART
Invasive U.S. LA 14 Flock in stadium lights 17-February-2011 JEFF
# Perform the Mann-Whitney test with this reduced dataset to meet non-independence assumption
levels(sgs3$Range)
## [1] "Native"   "Invasive"
mw_flockSize_res <- coin::wilcox_test(Estimated_Flock_Size ~ Range, data = sgs3, conf.int = TRUE, conf.level = 0.95, ties.method = "mid-ranks", alternative = "two.sided", distribution = "exact")

mw_flockSize_res
## Warning in cci(alpha): cannot compute confidence interval
## 
##  Exact Wilcoxon-Mann-Whitney Test
## 
## data:  Estimated_Flock_Size by Range (Native, Invasive)
## Z = 2.3814, p-value = 0.02857
## alternative hypothesis: true mu is not equal to 0
## 95 percent confidence interval:
##  NA NA
## sample estimates:
## difference in location 
##                     53

The distributions of estimated flock sizes per range were not identical. Native range estimated flock sizes were significantly larger. Note that the 95% CI could not be computed, likely due to the small sample size per range.

Figure 1

Made raincloud plots and saved these to an image file for Figure 1. Added asterisks for variables that were significantly different between ranges.

# Combine total nests estimated, areas of nest sites, nest densities, estimated flock sizes per range
social_sumstats <- nests2 %>%
  dplyr::select(Range, Year, Site_Code, Estimated_Nests, Nest_Site_Area, Nest_Density) %>%
  full_join(
    sgs2 %>%
      dplyr::select(Range, Year, Site_Code, Estimated_Flock_Size),
    by = c("Range", "Site_Code", "Year")
  ) %>%
  pivot_longer(
    cols = c("Estimated_Nests", "Nest_Site_Area", "Nest_Density", "Estimated_Flock_Size"),
    names_to = "statistic",
    values_to = "values"
  ) %>%
  dplyr::mutate(
    Range = factor(Range, levels = c("Native", "Invasive")),
    # Rename summary statistics for plotting
    statistic = recode(
      statistic,
      `Estimated_Nests` = "Total nests",
      `Nest_Site_Area` = "Nest site area\n (hectares)",
      `Nest_Density` = "Nest density\n (nests/hectare)",
      `Estimated_Flock_Size` = "Maximum\n flock sizes" 
    ),
    statistic = factor(statistic, levels = c("Total nests", "Nest site area\n (hectares)", "Nest density\n (nests/hectare)", "Maximum\n flock sizes"))
  ) 
    
social_sumstats  


cols <- scales::alpha(c("navy", "orange"), 0.65)

# for superscript: `Nest_Site_Area` = "Nest~density~(nests/km^{2})
# with labeller = label_parsed inside facet_wrap

topms <- levels(social_sumstats$statistic)
topms

# Make a list of plots
# Doing this to customize y-axes of each plot, which isn't possible for plots in a single row with facetscales
gg_list <- list()

# i <- 3
invisible(pblapply(1:length(topms), function(i){
  
  tmp_df <- social_sumstats %>%
    # Drop NAs
    filter(!is.na(values)) %>%
    filter(statistic == topms[i]) %>%
    dplyr::mutate(
      statistic = as.character(statistic),
      statistic = factor(statistic, levels = unique(statistic))
    ) 
  
  # Get ymin and ymax values, add a buffer of 2 evenly spaced values before and after 
  ymin <- min(tmp_df$values)
  ymax <- max(tmp_df$values)
  
  # If the maximum value is over or equal to 90, set the max o 100 and use 25 as breaks for y-axis. If less than 90, use 10 as the breaks. Use this rule to set buffers too
  if(ymax >= 90 & ymax <= 110){
    ymax <- 100
    y_brks <- 25
    buf <- 10
  } else if(ymax > 110){
    y_brks <- 25
    buf <- 10
  } else if(ymax < 90){
    y_brks <- 10
    buf <- 10
  }
  
  # If on the first plot, add the y-axis label
  if(i == 1){
    yal <- "Values"
  } else{
    yal <- ""
  }
  
  # Initialize plot margins (top, right, bottom, left)
  tm <- 0 # -0.5
  rm <- 0
  bm <- 1
  lm <- 0
  
  # If on the first plot, make the left margin larger
  if(i == 1){
    lm <- 0.5
  # If on last plot, make the right margin larger
  } else if(i == length(topms)){
    rm <- 0.5
  }
  
  # Raincloud plot: Kernel smoothed with boxplot
  gg <- ggplot(tmp_df, aes(x = Range, y = values, fill = Range, color = Range)) + 
    geom_flat_violin(position = position_nudge(x = 0.1, y = 0), adjust = 1, size = 0.25)+
    geom_point(position = position_jitter(width = 0.05), size = 1, stroke = 0.5) +
    geom_boxplot(aes(x = as.numeric(Range) - 0.22, y = values), outlier.shape = NA, alpha = 0.55, width = 0.2, colour = "black", size = 0.25) +
    facet_wrap(~ statistic) +
    scale_fill_manual(values = cols) +
    scale_color_manual(values = cols) +
    guides(fill = FALSE, color = FALSE) +
    theme_bw() +
    xlab("") + ylab(yal) +
    scale_y_continuous(limits = round(c(0, ymax + buf)), breaks = round(seq(0, (ymax + buf), y_brks))) +
    theme(
      axis.title = element_text(size = 14),
      axis.text.y = element_text(size = 11),
      axis.text.x = element_text(size = 11),
      strip.text = element_text(size = 12, margin = margin(0.5, 1, 0.5, 1, "lines")),
      panel.grid.major = element_blank(),
      panel.grid.minor = element_blank(),
      axis.ticks = element_line(size = 0.25),
      plot.margin = unit(c(tm, rm, bm, lm), "lines")
    )
  
  # If on total nests or social group size estimates, add an asterisk to denote statistical significance
  
  if(any(grepl("Total|flock", topms[i]))){
    
    gg <- gg +
      geom_text(aes(label = "*", x = 1.5, y = ymax - (ymax/10)), size = 10, color = "black")
  }
  
  gg
  
  # Return the given plot
  gg_list[[i]] <<- gg
  
}))

gg_list

cols <- scales::alpha(c("navy", "orange"), 0.75)

# Make a legend
gg_leg <- gtable::gtable_filter(ggplot_gtable(ggplot_build(
  social_sumstats %>%
    ggplot(aes(x = Range, y = values)) +
    geom_errorbar(aes(ymin = values, ymax = values, color = Range), size = 2, width = 0.25) +
    scale_color_manual(values = cols) +
    guides(color = guide_legend(title = "", override.aes = list(size = 2))) +
    theme_bw() +
    theme(
      legend.text = element_text(size = 12),
      legend.position = "top",
      legend.margin = margin(0, 0, 0, 0),
      legend.box.margin = margin(-10, -10, -10, -10),
      legend.key.width = unit(3, "lines")
    )
)), "guide-box")

dev.off()

# Arrange the legend and plots into a single image file
# Animal Behavior double column figure = 190 mm or 7.48 in
jpeg(file.path(gpath, "Figure01_NestFlockSizeEstimates_final.jpeg"), units = "in", width = 7.48, height = 4, res = 1000)

ggarrange(
  as.ggplot(gg_leg),
  as.ggplot(ggarrange(
    as.ggplot(gg_list[[1]]),
    as.ggplot(gg_list[[2]]),
    as.ggplot(gg_list[[3]]),
    as.ggplot(gg_list[[4]]),
    nrow = 1
  )),
  nrow = 2,
  heights = c(1, 9)
)

dev.off()

References

1. Smith-Vidaurre, G., Araya-Salas, M., and T.F. Wright. 2020. Individual signatures outweigh social group identity in contact calls of a communally nesting parrot. Behavioral Ecology 31(2), 448-458.

2. Buhrman-Deever, S.C., Rappaport, A.R. and J.W. Bradbury. 2007. Geographic variation in contact calls of feral North American populations of the monk parakeet. The Condor 109(2), 389-398.

3. Araya‐Salas, M., and G. Smith‐Vidaurre. 2017. warbleR: An R package to streamline analysis of animal acoustic signals. Methods in Ecology and Evolution 8(2), 184-191.

The session info printed here represents the environment used for the final RMarkdown knitting. The software and package versions employed for main results are reported in the appendix of the associated article in Animal Behavior.

sessionInfo()
## R version 4.1.0 (2021-05-18)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.5 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] plyr_1.8.6         ggplotify_0.0.7    egg_0.4.5          gridExtra_2.3     
##  [5] cowplot_1.1.1      rgeos_0.5-5        rgdal_1.5-23       sp_1.4-5          
##  [9] coin_1.4-1         survival_3.2-11    lubridate_1.7.10   data.table_1.14.0 
## [13] forcats_0.5.1      stringr_1.4.0      dplyr_1.0.6        purrr_0.3.4       
## [17] readr_1.4.0        tidyr_1.1.3        tibble_3.1.2       ggplot2_3.3.3     
## [21] tidyverse_1.3.1    Rraven_1.0.13      pbapply_1.4-3      warbleR_1.1.27    
## [25] NatureSounds_1.0.4 knitr_1.33         seewave_2.1.6      tuneR_1.3.3       
## 
## loaded via a namespace (and not attached):
##  [1] bitops_1.0-7        matrixStats_0.59.0  fs_1.5.0           
##  [4] httr_1.4.2          tools_4.1.0         backports_1.2.1    
##  [7] utf8_1.2.1          R6_2.5.0            DBI_1.1.1          
## [10] colorspace_2.0-1    withr_2.4.2         tidyselect_1.1.1   
## [13] compiler_4.1.0      cli_2.5.0           rvest_1.0.0        
## [16] xml2_1.3.2          sandwich_3.0-1      scales_1.1.1       
## [19] mvtnorm_1.1-1       proxy_0.4-25        dtw_1.22-3         
## [22] digest_0.6.27       rmarkdown_2.8       pkgconfig_2.0.3    
## [25] htmltools_0.5.1.1   highr_0.9           dbplyr_2.1.1       
## [28] rlang_0.4.11        readxl_1.3.1        rstudioapi_0.13    
## [31] gridGraphics_0.5-1  generics_0.1.0      zoo_1.8-9          
## [34] jsonlite_1.7.2      RCurl_1.98-1.3      magrittr_2.0.1     
## [37] modeltools_0.2-23   Matrix_1.3-4        Rcpp_1.0.6         
## [40] munsell_0.5.0       fansi_0.5.0         lifecycle_1.0.0    
## [43] stringi_1.6.2       multcomp_1.4-17     yaml_2.2.1         
## [46] MASS_7.3-54         grid_4.1.0          parallel_4.1.0     
## [49] crayon_1.4.1        lattice_0.20-44     haven_2.4.1        
## [52] splines_4.1.0       hms_1.1.0           pillar_1.6.1       
## [55] rjson_0.2.20        fftw_1.0-6          codetools_0.2-18   
## [58] stats4_4.1.0        reprex_2.0.0        glue_1.4.2         
## [61] evaluate_0.14       BiocManager_1.30.15 modelr_0.1.8       
## [64] vctrs_0.3.8         cellranger_1.1.0    gtable_0.3.0       
## [67] assertthat_0.2.1    xfun_0.23           libcoin_1.0-8      
## [70] broom_0.7.6         signal_0.7-7        rvcheck_0.1.8      
## [73] TH.data_1.0-10      ellipsis_0.3.2