This script was the second part of our analysis of structural differences between ranges. Here we asked whether frequency modulation patterns, a structural component of learned signals that can partially encode individual identity, were significantly simpler in invasive range calls. We traced second harmonic frequency contours for a subset of native and invasive range calls, then obtained 3 frequency modulation measurements from these contours. We also obtained a standard set of spectral acoustic measurements from the full signals, and evaluated the effect sizes of range across all acoustic measurements, as well as the independent effects of habitat and range on these measurements. Some of the code in this script may need to be updated in order to reproduce results with more recent versions of the packages used.

rm(list = ls())

# load plyr before dplyr to avoid masking of certain functions
library(plyr)

X <- c("warbleR", "ggplot2", "tidyverse", "pbapply", "dplyr", "data.table", "pracma", "egg", "gridExtra", "facetscales", "grDevices", "ggplotify", "grid", "gtable", "MASS", "rstatix", "orddom")

invisible(lapply(X, library, character.only = TRUE))

path <- "/media/gsvidaurre/MYIOPSITTA/R/VocalLearning_PostInvasion/Data"
gpath <- "/home/gsvidaurre/Desktop/MANUSCRIPTS/SimplerSignatures_PostInvasion/FIGURES"
seed <- 401
cores <- parallel::detectCores() - 2

# Call R code to make raincloud plots, downloaded from https://github.com/RainCloudPlots/RainCloudPlots
# Loading these scripts will install packages if these are not already on your machine
source("/home/gsvidaurre/Desktop/Software/RainCloudPlots-master/tutorial_R/R_rainclouds.R")
source("/home/gsvidaurre/Desktop/Software/RainCloudPlots-master/tutorial_R/summarySE.R")
source("/home/gsvidaurre/Desktop/Software/RainCloudPlots-master/tutorial_R/simulateData.R")

Read in extended selection table (EST). Throughout this script, you can reproduce analyses by reading in the file “nat_inv_indiv_site_seltbl.csv”, which is a selection table made available on figshare that contains metadata needed here. Measurements needed to reproduce analyses here have also been provided on figshare.

nat_inv_est <- readRDS(file.path(path, "nat_inv_indiv_site_EST.RDS"))
# glimpse(nat_inv_est)

Decent number of samples over time. We can assess change over time at two scales, either the city or sites within cities sampled over time. New Orleans can be assessed only at the city scale, Austin can be assessed at specific sites over time.

# Which sites per city were sampled in each year (larger dataset of more geographic spread only)
nat_inv_est %>%
  filter(social_scale == "Site") %>%
  filter(!is.na(invasive_city)) %>%
  filter(invasive_city %in% c("Austin", "New Orleans")) %>%
  group_by(invasive_city, year) %>%
  dplyr::summarise(
    uniq_sites = paste(unique(site), collapse = "; ")
  )
## `summarise()` has grouped output by 'invasive_city'. You can override using the `.groups` argument.
## # A tibble: 5 x 3
## # Groups:   invasive_city [2]
##   invasive_city year  uniq_sites                        
##   <chr>         <chr> <chr>                             
## 1 Austin        2004  SOCC; COMM; VALL                  
## 2 Austin        2011  ELEM; INTR; MART; VALL; SOFT      
## 3 Austin        2019  ELEM; AIRP; INTR; SOCC; MANO; MART
## 4 New Orleans   2004  CANA; BALL; FOLS                  
## 5 New Orleans   2011  LAKE; ROBE

Which invasive range site-years were sampled over time?

# Get site-years in Austin sampled over time, and New Orleans sites over years (larger dataset of more geographic spread only)
austin <- nat_inv_est %>%
  filter(social_scale == "Site") %>%
  filter(invasive_city == "Austin") %>%
  group_by(invasive_city, site) %>%
  dplyr::summarise(
    n_years = n_distinct(year)
  ) %>%
  filter(n_years > 1) %>%
  pull(site)
## `summarise()` has grouped output by 'invasive_city'. You can override using the `.groups` argument.
new_orleans <- nat_inv_est %>%
  filter(social_scale == "Site") %>%
  filter(invasive_city == "New Orleans") %>%
  pull(site) %>%
  unique()

temporal_inv_sites <- c(austin, new_orleans)
temporal_inv_sites
##  [1] "ELEM" "INTR" "MART" "SOCC" "VALL" "LAKE" "ROBE" "CANA" "BALL" "FOLS"

Repeatedly sampled individuals

We included calls of repeatedly sampled individuals per range in frequency tracing, as this allowed us to ask whether frequency modulation patterns account for the high individual information content in calls (next script with Beecher’s statistic).

Randomly sampled 5 calls per known repeatedly sampled individual per range, and took all calls if less than 5 total for any bird.

# Random sampling for the repeatedly sampled individuals (5 calls, or the total if less, per bird)

# Get birds with less than 5 calls
ids_few_calls <- nat_inv_est %>%
  filter(social_scale == "Individual") %>%
  # Drop site-years with less than 4 calls
  group_by(Bird_ID) %>%
  dplyr::summarise(
    n_calls = length(sound.files)
  ) %>%
  # For now, drop birds with 5 calls or less, these will be excluded from random sampling and added back later
  group_by(Bird_ID) %>%
  filter(n_calls <= 5) %>%
  pull(Bird_ID)

set.seed(seed)

freq_mod_indiv_df <- nat_inv_est %>%
  filter(social_scale == "Individual") %>%
  # For now, drop birds with 5 calls or less, these will be excluded from random sampling and added back later
  filter(!Bird_ID %in% ids_few_calls) %>%
  group_by(Bird_ID) %>%
  nest() %>%
  ungroup() %>%
  # Randomly sample 5 calls 
  dplyr::mutate(
    rsamp_calls = purrr::map2(data, 5, sample_n, replace = FALSE)
  ) %>%
  dplyr::select(-data) %>%
  unnest(rsamp_calls) %>%
  # Add back birds with 5 calls or less
  bind_rows(
    nat_inv_est %>%
    filter(social_scale == "Individual") %>%
    filter(Bird_ID %in% ids_few_calls)
  ) %>%
  droplevels()

glimpse(freq_mod_indiv_df) 

# Checking, 17 individuals total (8 native, 9 invasive), 5 calls each or the total if less than 5, looks good
freq_mod_indiv_df %>%
  group_by(Bird_ID) %>%
  dplyr::summarise(
  n_calls = length(sound.files)
  )

# Includes calls from individual sampled at site BART in 2011 (INV-UM1)
freq_mod_indiv_df %>%
  group_by(site, Bird_ID) %>%
  dplyr::summarise(
  n_calls = length(sound.files)
  )

Dataset of broader geographic spread

Comparison between ranges

Performed random sampling of calls from the dataset of broader geographic spread for the comparison between ranges and the temporal comparison in the invasive range. Here we relied on all calls in the full dataset (did not remove calls with the “site_scale” suffix).

# Random sampling for the spatial question (40 native, 40 invasive calls over 10 sites in each range)
set.seed(seed)

freq_mod_spatial_df <- nat_inv_est %>%
  filter(social_scale == "Site") %>%
  # Drop site-years with less than 4 calls
  group_by(site_year) %>%
  dplyr::summarise(
    n_calls = length(sound.files)
  ) %>%
  filter(n_calls >= 4) %>%
  ungroup() %>%
  dplyr::select(-c(n_calls)) %>%
  # Join back with the EST to sample site-years
  inner_join(
    nat_inv_est %>%
    filter(social_scale == "Site"),
    by = "site_year"
  ) %>%
  # Make a data frame in which each row is a site-year, for random sampling below
  group_by(range, site_year) %>%
  dplyr::summarise(
    n_calls = length(sound.files)
  ) %>%
  ungroup() %>%
  # Group by range for random sampling 
  group_by(range) %>%
  nest() %>%
  ungroup() %>%
  # Randomly sample 10 native and 10 invasive range site-years
  dplyr::mutate(
    rsamp_sites = purrr::map2(data, 10, sample_n, replace = FALSE)
  ) %>%
  dplyr::select(-data) %>%
  unnest(rsamp_sites) %>%
  # Join back with the EST to sample calls per site-year
  inner_join(
    nat_inv_est %>%
    filter(social_scale == "Site"),
    by = c("range", "site_year")
  ) %>% 
  group_by(site_year) %>%
  nest() %>%
  ungroup() %>%
  # Randomly sample 4 calls per site-year
  dplyr::mutate(
    rsamp_calls = purrr::map2(data, 4, sample_n, replace = FALSE)
  ) %>%
  dplyr::select(-data) %>%
  unnest(rsamp_calls)

glimpse(freq_mod_spatial_df) 

# Checking, 4 calls per site and 20 sites total, looks good
# Does not include BART 2011, good
freq_mod_spatial_df %>%
  group_by(range, site_year) %>%
  dplyr::summarise(
  n_calls = length(sound.files)
  )

Comparison over time (invasive range)

# Random sampling for the temporal question: Austin sites sampled over time, and New Orleans sites, 5 calls per site-year
set.seed(seed)

freq_mod_temporal_df <- nat_inv_est %>%
  filter(social_scale == "Site") %>%
  # Filter by the Austin sites sampled over time, and New Orleans sites
  filter(site %in% temporal_inv_sites | invasive_city == "New Orleans") %>%
  group_by(site_year) %>%
  dplyr::summarise(
    n_calls = length(sound.files)
  ) %>%
  ungroup() %>%
  dplyr::select(-c(n_calls)) %>%
  # Join back with the EST to sample calls
  inner_join(
    nat_inv_est %>%
    filter(social_scale == "Site") %>%
    # Filter by the Austin sites sampled over time, and New Orleans sites
    filter(site %in% temporal_inv_sites | invasive_city == "New Orleans"),
    by = "site_year"
  ) %>%
  # Group by site_year for random sampling 
  group_by(site_year) %>%
  nest() %>%
  ungroup() %>%
  # Randomly sample 5 calls per site-year
  dplyr::mutate(
    rsamp_calls = purrr::map2(data, 5, sample_n, replace = FALSE)
  ) %>%
  dplyr::select(-data) %>%
  unnest(rsamp_calls)

glimpse(freq_mod_temporal_df) 

# Checking, 15 sites total, 5 calls each, looks good
# Does not include BART 2011, good
freq_mod_temporal_df %>%
  group_by(invasive_city, site_year) %>%
  dplyr::summarise(
    n_calls = length(sound.files)
  )

155 calls, 149 total unique calls for the frequency modulation analysis of calls from the dataset of broader geographic sprea. Then 84 calls for repeatedly sampled individuals, for 233 unique calls total. Valeria and I ended up dividing the frequency tracing workload.

Combine these calls into a single EST and write out to share with Valeria.

freq_mod_df <- freq_mod_spatial_df %>%
  mutate(
    question = "spatial"
  ) %>%
  bind_rows(freq_mod_temporal_df %>%
    mutate(
      question = "temporal"
    )
  ) %>%
  bind_rows(freq_mod_indiv_df %>%
    mutate(
      question = "indiv_scale"
    )
  )
glimpse(freq_mod_df)

# Save this data frame as a record of the calls to be used for each question/comparison (has duplicates)
saveRDS(freq_mod_df, file.path(path, "freq_mod_df.RDS"))

freq_mod_df <- readRDS(file.path(path, "freq_mod_df.RDS"))
glimpse(freq_mod_df)

# Save as a .csv as well for sharing data
# write.csv(freq_mod_df %>%
# dplyr::select(-c(n_calls)),
# file.path(path, "freq_mod_df.csv"), row.names = FALSE)

# Get calls in this data frame
calls <- freq_mod_df %>%
  pull(sound.files) %>%
  as.character()
head(calls)

# Actually 233 unique calls
length(calls)
length(unique(calls))

# Which calls are duplicated? Duplicates shared between the spatial and temporal comparisons
freq_mod_df %>%
  filter(duplicated(sound.files)) %>%
  View()

# Remove duplicates to make an EST for frequency tracing
freq_mod_df2 <- freq_mod_df %>%
  filter(!duplicated(sound.files)) %>%
  droplevels()
glimpse(freq_mod_df2)

# Randomly sample half of each set of calls per question, then VP will do tailoring for half and I will do the other half
set.seed(seed)

freq_mod_df_VP <- freq_mod_df2 %>%
  group_by(question) %>%
  nest() %>%
  ungroup() %>%
  # Create a vector with half of calls per comparison, to be used for random sampling
  # group_by(question) %>%
  inner_join(
    freq_mod_df2 %>%
    group_by(question) %>%
    dplyr::summarise(
      total_calls = length(sound.files)
    ) %>%
    dplyr::mutate(
      n = round(total_calls/2)
    ) %>% 
      ungroup() %>%
      dplyr::select(-total_calls),
    by = "question"
  ) %>%
  # Randomly half of the calls used per question/comparison
  dplyr::mutate(
    rsamp_calls = purrr::map2(data, n, sample_n, replace = FALSE)
  ) %>%
  dplyr::select(-data) %>%
  unnest(rsamp_calls) %>%
  droplevels() %>%
  dplyr::select(-c(n)) %>%
  dplyr::select(all_of(names(freq_mod_df2)[-grep("question", names(freq_mod_df2))]), "question")
  
freq_mod_df_GSV <- freq_mod_df2 %>%
  filter(!sound.files %in% freq_mod_df_VP$sound.files) %>%
  droplevels()

glimpse(freq_mod_df_VP)
glimpse(freq_mod_df_GSV)

# Filter the overall EST by the calls in each data frame
sub_est_VP <- nat_inv_est[grep(paste(paste("^", freq_mod_df_VP$sound.files, "$", sep = ""), collapse = "|"), nat_inv_est$sound.files), ]

sub_est_GSV <- nat_inv_est[grep(paste(paste("^", freq_mod_df_GSV$sound.files, "$", sep = ""), collapse = "|"), nat_inv_est$sound.files), ]

# Restore the EST class of each data frame above
freq_mod_est_VP <- fix_extended_selection_table(freq_mod_df_VP, sub_est_VP)
glimpse(freq_mod_est_VP)
class(freq_mod_est_VP)
length(attr(freq_mod_est_VP, "wave.objects"))

freq_mod_est_GSV <- fix_extended_selection_table(freq_mod_df_GSV, sub_est_GSV)
glimpse(freq_mod_est_GSV)
class(freq_mod_est_GSV)
length(attr(freq_mod_est_GSV, "wave.objects"))

# Write out the EST
saveRDS(freq_mod_est_VP, file.path(path, "freq_mod_est_VP.RDS"))
saveRDS(freq_mod_est_GSV, file.path(path, "freq_mod_est_GSV.RDS"))

Second harmonic frequency tracing

VP and GSV performed manual tailoring of frequency measurements to trace the 2nd harmonic visible in each call. First, we measured the fundamental frequency over 100 timepoints per call, which yielded fundamental frequency estimates that jumped around among harmonics. These frequency contours were then manually tailored below.

# VP call set
freq_mod_est_VP <- readRDS(file.path(path, "freq_mod_est_VP.RDS"))
glimpse(freq_mod_est_VP)

ff_trace_VP <- ffts(freq_mod_est_VP, wl = 378, length.out = 100, wn = "hanning", ovlp = 90, bp = c(0.5, 9), threshold = 15, img = FALSE, parallel = cores, path = path, img.suffix = "ffts", pb = TRUE, clip.edges = TRUE, leglab = "ffts", ff.method = "tuneR", flim = c(0.5, 9))
glimpse(ff_trace_VP)

write.csv(ff_trace_VP, file.path(path, "ff_trace_VP.csv"))

# GSV call set
freq_mod_est_GSV <- readRDS(file.path(path, "freq_mod_est_GSV.RDS"))
glimpse(freq_mod_est_GSV)

ff_trace_GSV <- ffts(freq_mod_est_GSV, wl = 378, length.out = 100, wn = "hanning", ovlp = 90, bp = c(0.5, 9), threshold = 15, img = FALSE, parallel = cores, path = path, img.suffix = "ffts", pb = TRUE, clip.edges = TRUE, leglab = "ffts", ff.method = "tuneR", flim = c(0.5, 9))
glimpse(ff_trace_GSV)

write.csv(ff_trace_GSV, file.path(path, "ff_trace_GSV.csv"))

We used the following code to manually tailor fundamental frequency curves to trace the second harmonic per call. VP and GSV performed this on separate machines, here showing just the code for GSV call set.

ff_trace <- read.csv(file.path(path, "ff_trace_GSV.csv"))
glimpse(ff_trace)

seltailor(X = freq_mod_est_GSV, ts.df = ff_trace, wl = 378, flim = c(0.5, 9), mar = 0.05, osci = TRUE, ovlp = 90, auto.contour = TRUE, width = 15, height = 8, path = path)

When done with manual tailoring, we combined the .csvs generated by seltailor with the ESTs to be used for the frequency modulation questions. One observer (GSV) performed a final check of the tailored frequency traces.

freq_mod_est_VP <- readRDS(file.path(path, "freq_mod_est_VP.RDS"))
glimpse(freq_mod_est_VP)

freq_mod_est_GSV <- readRDS(file.path(path, "freq_mod_est_GSV.RDS"))
glimpse(freq_mod_est_GSV)

freq_trace_VP <- read.csv(file.path(path, "seltailor_output_freq_VP.csv"))
glimpse(freq_trace_VP)

freq_trace_GSV <- read.csv(file.path(path, "seltailor_output_freq_GSV.csv"))
glimpse(freq_trace_GSV)

# Join the frequency traces with the respective ESTs
freq_mod_est_VP2 <- freq_mod_est_VP %>%
  inner_join(
    freq_trace_VP %>%
      dplyr::select(c(names(freq_trace_VP)[grep("sound.files|ffreq", names(freq_trace_VP))])),
    by = "sound.files"
  ) 
glimpse(freq_mod_est_VP2)

freq_mod_est_GSV2 <- freq_mod_est_GSV %>%
  inner_join(
    freq_trace_GSV %>%
      dplyr::select(c(names(freq_trace_GSV)[grep("sound.files|ffreq", names(freq_trace_GSV))])),
    by = "sound.files"
  ) 
glimpse(freq_mod_est_GSV2)

# Restore the EST class to each EST
freq_mod_est_VP2 <- fix_extended_selection_table(freq_mod_est_VP2, freq_mod_est_VP)
class(freq_mod_est_VP2)
length(attr(freq_mod_est_VP2, "wave.objects"))

freq_mod_est_GSV2 <- fix_extended_selection_table(freq_mod_est_GSV2, freq_mod_est_GSV)
class(freq_mod_est_GSV2)
length(attr(freq_mod_est_GSV2, "wave.objects"))

# Join ESTs
freq_mod_est <- rbind(freq_mod_est_VP2, freq_mod_est_GSV2)
class(freq_mod_est)
length(attr(freq_mod_est, "wave.objects"))
glimpse(freq_mod_est)

ts.df <- freq_mod_est %>%
  dplyr::select(names(freq_mod_est)[grep("sound.files|selec|ffreq", names(freq_mod_est))])
# glimpse(ts.df)

# Run seltailor one more time to check traces and modify any as needed
seltailor(X = freq_mod_est[, -grep("ffreq", names(freq_mod_est))], ts.df = ts.df, wl = 378, flim = c(0.5, 9), mar = 0.05, osci = TRUE, ovlp = 90, auto.contour = TRUE, width = 15, height = 8, path = path)

Fixed a good number of traces. Read these back in and added back to the combined EST for the frequency modulation analysis.

st <- read.csv(file.path(path, "seltailor_output_2ndharm_final_check.csv"))
glimpse(st)

freq_mod_est2 <- freq_mod_est %>%
  dplyr::select(-c(names(freq_mod_est)[grep("ffreq", names(freq_mod_est))])) %>%
  inner_join(
    st %>%
      dplyr::select(names(st)[grep("sound.files|ffreq", names(st))]),
    by = "sound.files"
  )

# Restore EST class
freq_mod_est2 <- fix_extended_selection_table(freq_mod_est2, freq_mod_est)
class(freq_mod_est2)
length(attr(freq_mod_est2, "wave.objects"))

Visual validation

Make spectrograms that contain the manually traced 2nd harmonic.

ts.df <- freq_mod_est2 %>%
  dplyr::select(names(freq_mod_est2)[grep("sound.files|selec|ffreq", names(freq_mod_est2))]) %>%
  as.data.frame()
# glimpse(ts.df)
class(ts.df)
  
trackfreqs(X = freq_mod_est2, wn = "hanning", wl = 378, collev = seq(-53, 0, 1), flim = c(0.5, 9), ovlp = 90, line = FALSE, mar = 0.005, res = 200, parallel = cores, path = gpath, it = "jpeg", custom.contour = ts.df, pch = 19, col = "firebrick")

The frequency traces look great, no more tailoring needed. Note that while GSV and VP split the manual tracing work, GSV did the final round of checking to make ensure consistency in frequency tracing between observers. See the example image file below.

AM3.1

Second harmonic contour

Saved the EST with the final tailored 2nd harmonic traces.

freq_mod_est3 <- freq_mod_est2 %>%
  dplyr::select(-c(n_calls)) 

saveRDS(freq_mod_est3, file.path(path, "freq_mod_est_m2h.RDS"))

# Save as a .csv as well for sharing data
write.csv(freq_mod_est3, file.path(path, "freq_mod_est_m2h.csv"), row.names = FALSE)

Frequency modulation measurements

Read in the EST (a unique sound file per row) and the data frame with calls allocated for different analyses (sound files per row, with 6 shared across intended spatial and temporal comparisons as described above).

# The EST with the manually tailored 2nd harmonic traces
freq_mod_est <- readRDS(file.path(path, "freq_mod_est_m2h.RDS"))
# freq_mod_est <- read.csv(file.path(path, "freq_mod_est_m2h.csv"))
# glimpse(freq_mod_est)

# Read in the data frame with used for each question/comparison (some calls used for both spatial and temporal question, only 6 total)
freq_modQ_df <- readRDS(file.path(path, "freq_mod_df.RDS"))
# freq_modQ_df <- read.csv(file.path(path, "freq_mod_df.csv"))
# glimpse(freq_modQ_df)

freq_modQ_df %>%
  group_by(question) %>%
  dplyr::summarise(n = length(sound.files))
## # A tibble: 3 x 2
##   question        n
##   <chr>       <int>
## 1 indiv_scale    84
## 2 spatial        80
## 3 temporal       75
# 4 calls each from native range urban sites GOLF and FAGR 
freq_modQ_df %>%
  filter(range == "Native") %>%
  group_by(site) %>%
  dplyr::summarise(n_calls = length(sound.files))
## # A tibble: 13 x 2
##    site    n_calls
##    <chr>     <int>
##  1 1145         25
##  2 CHAC          5
##  3 ELTE          4
##  4 EMBR          9
##  5 FAGR          4
##  6 GOLF          4
##  7 INBR          4
##  8 INES-04       4
##  9 INES-08       4
## 10 PAVO          4
## 11 PIED          4
## 12 PLVE          4
## 13 QUEB          4

Initial peak searching routine

Randomly select 5 calls per range to groundtruth a peak searching method. First, make visuals of each call, then manually count peaks.

freq_mod_est2 <- freq_mod_est %>%
  as_tibble()

# Randomly select 5 site-scale calls per range
set.seed(seed)
calls <- freq_mod_est2 %>%
  filter(social_scale == "Site") %>%
  group_by(range) %>%
  nest() %>%
  ungroup() %>%
  dplyr::mutate(
    rsamp_calls = purrr::map2(data, 5, sample_n, replace = FALSE)
  ) %>%
  dplyr::select(-data) %>%
  unnest(rsamp_calls) %>% 
  pull(sound.files) %>%
  as.character()
calls

tmp_df <- freq_mod_est2 %>%
  filter(sound.files %in% calls) %>%
  droplevels()

n_tot <- 100 # the number of timepoints per frequency trace
n_rem <- 5 # remove 5 points at the start and end of each call, for 90 frequency measurements total across each call
# the number of timepoints per frequency trace, minus 5 on either end
n <- n_tot - (n_rem*2)
n

# Make visuals of the frequency traces of these calls to count peaks
dev.off()
invisible(pblapply(1:nrow(tmp_df), function(i){
  
  # Get the frequency trace for the given call
  tmp <- tmp_df[i, ]
  # glimpse(tmp)
    
  # Drop 5 points on either end
  tmp <- tmp[, -grep(paste(paste("^", paste("ffreq", c(seq(1, n_rem, 1), seq(n_tot - n_rem + 1, n_tot, 1)), sep = "."), "$", sep = ""), collapse = "|"), names(tmp))]
  # glimpse(tmp)
    
  # Get the frequency trace as a vector
  ftrace <- as.vector(t(tmp[, grep("ffreq", names(tmp))]))

   jpeg(file.path(gpath, paste(tmp_df$sound.files[i], "count_peaks.jpeg", sep = "-")), units = "in", width = 5, height = 4, res = 200)
  plot(ftrace)
  lines(ftrace, col = "red")
  dev.off()
  
}))

tmp_df$sound.files

# Sound file                              # "Large" peaks  # Troughs following peaks
# [1] "2011_02_15_PLEA_WRIG1012_3.WAV"    # 4             # 3 
#  [2] "2011_02_18_LAKE_WRIG1039_10.WAV"    # 5           # 5
#  [3] "SBD_Austin.1002.0180_resamp.WAV"    # 5           # 3
#  [4] "SBD_NewOrleans.1006.0493_resamp.WAV" # 5          # 4
#  [5] "2017_05_21_PLVE_1005_6.WAV"       # 8             # 7
#  [6] "2017_10_25_PIED_1561_2.WAV"       # 7             # 6
#  [7] "2017_09_13_ELTE_1405_3.WAV"     # 5               # 5
#  [8] "2017_09_03_INBR_1269_3.WAV"       # 6             # 5
#  [9] "2017_11_20_GOLF_1644_1.WAV"       #  6            # 5
# [10] "2011_02_15_ELEM_WRIG1013_4.WAV" # 4               # 3

# Add these vaues to the temporary data frame
tmp_df$mnl_pks <- c(4, 5, 5, 5, 8, 7, 5, 6, 6, 4)
tmp_df$mnl_trghs <- c(3, 5, 3, 4, 7, 6, 5, 5, 5, 3)

Find peaks across these calls using a general peak search method. These peaks will be used as a rough estimate of peak size, to impose a threshold on minimum peak size later on the magnitude of peaks with respect to local points. Use the peak searching code to also identify troughs, by inverting the frequency trace.

n_tot <- 100 # the number of timepoints per frequency trace
n_rem <- 5 # remove 5 points at the start and end of each call, for 90 frequency measurements total across each call
# the number of timepoints per frequency trace, minus 5 on either end
n <- n_tot - (n_rem*2)
n

nups <- 2
ndowns <- 0
minpeakheight <- 1 # 1 kHz as min peak height
zero <- "0"

pk_hght_df <- rbindlist(pblapply(1:nrow(tmp_df), function(i){
  
  # Get the frequency trace for the given call
  tmp <- tmp_df[i, ]
  # glimpse(tmp)
    
  # Drop 5 points on either end
  tmp <- tmp[, -grep(paste(paste("^", paste("ffreq", c(seq(1, n_rem, 1), seq(n_tot - n_rem + 1, n_tot, 1)), sep = "."), "$", sep = ""), collapse = "|"), names(tmp))]
  # glimpse(tmp)
    
  # Get the frequency trace as a vector
  ftrace <- as.vector(t(tmp[, grep("ffreq", names(tmp))]))

  pks <- findpeaks(ftrace, nups = nups, ndowns = ndowns, zero = zero, minpeakheight = minpeakheight)
  
  return(data.frame(sound.files = tmp_df$sound.files[i], range = tmp_df$range[i], peak_number = seq(1, nrow(pks), 1), peak_height = pks[, 1], peak_max_index = pks[, 2]))
  
}))

glimpse(pk_hght_df)
max(pk_hght_df$peak_height)

Next, I customized this routine again to locate peaks that picks up large peaks but not gradual increases. Used the dataset of 10 randomly selected calls above for ground truthing, with spline-smoothing, and determine the degree of smoothing that yields results closest to visual inspection of peaks, as well as good detection of troughs (inverted peaks). Then moved on to identifying peaks across all calls with frequency traces for the spatial and temporal questions.

n_tot <- 100 # the number of timepoints per frequency trace
n_rem <- 5 # remove 5 points at the start and end of each call, for 90 frequency measurements total across each call
# the number of timepoints per frequency trace, minus 5 on either end
n <- n_tot - (n_rem*2)
n

nups <- 2
ndowns <- 0
minpeakheight <- 1 # 1 kHz as min peak height
zero <- "0"

# Degrees of freedom for smoothing
dfs <- seq(25, 125, 25)
dfs

# Impose a threshold for minimum peak rise with respect to local neigboring points before it
# This is 1/100th of the maximum peak height determined above with this subset of calls
thresh <- round(max(pk_hght_df$peak_height)/100, 1)
thresh # 0.1

# Iterate over calls and degrees of freedom for smoothing
# i <- 1
# j <- 1
pks_gt <- rbindlist(pblapply(1:nrow(tmp_df), function(i){
  
  tmp2 <- rbindlist(lapply(1:length(dfs), function(j){

    # Get the frequency trace for the given call
    tmp <- tmp_df[i, ]
    # glimpse(tmp)
    
    # Drop 5 points on either end
    tmp <- tmp[, -grep(paste(paste("^", paste("ffreq", c(seq(1, n_rem, 1), seq(n_tot - n_rem + 1, n_tot, 1)), sep = "."), "$", sep = ""), collapse = "|"), names(tmp))]
    # glimpse(tmp)
    
    # Get the frequency trace as a vector
    ftrace <- as.vector(t(tmp[, grep("ffreq", names(tmp))]))
    # ftrace
  
    # Fit a spline and smooth it, to remove small local maxima arising from manual tracing
    # Interpolate a total of n equally spaced values (here 5 times the length of the trace), take the mean of tied x-values
    splf <- spline(ftrace, n = 5*length(ftrace), method = "fmm")
    # str(splf)
    # plot(splf)
    # plot(ftrace)
  
    # Perform spline smoothing without weighting. Here, df represents the degree of smoothing. Higher values of df lead to less smoothing
    splf_smooth <- smooth.spline(splf, df = dfs[j])
    # str(splf_smooth)
    # plot(splf_smooth)
    # plot(splf)
  
    # Output is a matrix, each row is a peak found, first column is the height, second column is the index of the vector where the maximum is reached, then the third and fourth are the indices where the peak starts and ends
    pks <- findpeaks(splf_smooth$y, nups = nups, ndowns = ndowns, zero = zero, minpeakheight = minpeakheight)
    # pks
    # str(pks)
    
    # Impose additional filters on the peaks: 
    # 1) Remove "peaks" identified within the last 2 points of the trace
    # 2) Remove peaks with maximum values that are less than 0.1kHz than the frequency value 5 points before (if the given peak 5 points or less from the beginning, remove peaks that are less than the threshold than the point before)
    # 3) Remove peaks identified close together (less than 5 points apart) that are probably points on the same wide peak
  
    rem1 <- which(pks[, 2] > (length(splf_smooth$x) - 2)) 
    # rem1
    
    if(length(rem1) > 0){
      pks <- pks[-rem1, ]
    }
  
    # z <- 1
    rem2 <- unlist(lapply(1:nrow(pks), function(z){

      # cat(paste("z = ", z, "\n"))

      indx <- pks[z, 2]
      idiff <- indx - 10

      if(!idiff < 0 & idiff != 0){
        rel_ht <- pks[z, 1] - splf_smooth$y[idiff]
        if(rel_ht < thresh){
          return(z)
        }
      } else {
        idiff <- indx - 1
        rel_ht <- pks[z, 1] - splf_smooth$y[idiff]
        if(rel_ht < thresh){
          return(z)
        }
      }

    }))
    # rem2
    # pks
    
    # Remove these indices from the peaks
    if(length(rem2) > 0){
      pks <- pks[-rem2, ]
    }
  
    # If more than one peak remains, find those that are too close together (5 points or less)
    if(is.matrix(pks) & nrow(pks) > 0){
      rem3 <- which(diff(pks[, 2]) < 5)
      if(length(rem3) > 0){
        pks <- pks[-rem3, ]
      }
    }
  
    # Return the number of peaks
    if(is.matrix(pks)){
      num_peaks = nrow(pks)
    } else {
      num_peaks = length(pks)
    }

    # Identify troughs by applying the findpeaks code to the inverted frequency trace
    # Removed the minpeakheight limitation here
    trghs <- findpeaks(-splf_smooth$y, nups = nups, ndowns = ndowns, zero = zero)
    # trghs
    # str(trghs)
    
    # Filter the troughs to retain those following the peaks identified above
    # In other words, assign each trough to a peak, and drop troughs without matches
    # Do the peak-trough assignments by finding the closest trough following each peak
    
    # If more than one peak was identified for the given call
    if(is.matrix(pks) & nrow(pks) > 0){
      
      trough_df <- rbindlist(lapply(1:nrow(pks), function(z){
        # Get the difference in indices between the given peak and all troughs
        diff_inds <- pks[z, 2] - trghs[, 2] 
        # Find the negative differences (trough following a peak), and the max of these
        wh <- which(diff_inds < 0 & diff_inds == max(diff_inds[diff_inds < 0]))

        return(data.frame(peak_indx = pks[z, 2], trgh_indx = trghs[wh, 2]))
      }))
    
      # If only a single peak was identified per call  
      } else if(!is.matrix(pks) & nrow(pks) == 0){

        # Get the difference in indices between the given peak and all troughs
        diff_inds <- pks[2] - trghs[, 2] 
        # Find the negative differences (trough following a peak), and the max of these
        wh <- which(diff_inds < 0 & diff_inds == max(diff_inds[diff_inds < 0]))
        
        trough_df <- data.frame(peak_indx = pks[2], trgh_indx = trghs[wh, 2])
        
      }
    
    # Return the degree of smoothing and number of peaks
    return(data.frame(sound.files = tmp_df$sound.files[i], range = tmp_df$range[i], num_peaks = nrow(pks), num_trghs = nrow(trough_df), mnl_num_peaks = tmp_df$mnl_pks[i], mnl_num_trghs = tmp_df$mnl_trghs[i], smooth_param = dfs[j]))
  
    }))
  
  return(tmp2)

}))

glimpse(pks_gt)
# View(pks_gt)

# Calculate the difference in peaks per call and smoothing parameter compared to the visually identified peaks and troughs, then find the sum of differences per smoothing parameter
pks_gt %>%
  dplyr::mutate(
    numdiffp = abs(num_peaks - mnl_num_peaks),
    numdifft = abs(num_trghs - mnl_num_trghs)
  ) %>%
  group_by(smooth_param) %>%
  dplyr::summarise(
    sumdiffp = sum(numdiffp),
    sumdifft = sum(numdifft)
  )

# smooth_param (df in smooth.spline) of 50 had the least number of differences in peaks identified compared to visual inspection, and more troughs. More of the measurements below depend on better peak estimates, so decided to proceed with smoothing parameter of 50

Final customized peak and trough routine

Make the code above into a function, then checked out the peaks identified across the groundtruthing calls with this smoothing threshold. I improved the peak and trough searching in this function compared to the code above.

peak_locator <- function(X, n_tot, n_rem, degf, nups, ndowns, minpeakheight, zero, thresh, img, neighb, pkdist){
  
  tmp_df <- rbindlist(pblapply(1:nrow(X), function(i){
    
    # The number of timepoints per frequency trace, minus 5 on either end
    n <- n_tot - (n_rem*2)
 
    # Get the frequency trace for the given call
    tmp <- X[i, ]
    
    # Drop 5 points on either end
    tmp <- tmp[, -grep(paste(paste("^", paste("ffreq", c(seq(1, n_rem, 1), seq(n_tot - n_rem + 1, n_tot, 1)), sep = "."), "$", sep = ""), collapse = "|"), names(tmp))]
    
    # Get the frequency trace as a vector
    ftrace <- as.vector(t(tmp[, grep("ffreq", names(tmp))]))
  
    # Fit a spline and smooth it, to remove small local maxima arising from manual tracing
    # Interpolate a total of n equally spaced values (here 5 times the length of the trace), take the mean of tied x-values
    splf <- spline(ftrace, n = 5*length(ftrace), method = "fmm")
  
    # Perform spline smoothing without weighting. Here, df represents the degree of smoothing. Higher values of df lead to less smoothing
    splf_smooth <- smooth.spline(splf, df = degf)
  
    # Output is a matrix, each row is a peak found, first column is the height, second column is the index of the vector where the maximum is reached, then the third and fourth are the indices where the peak starts and ends
    pks <- findpeaks(splf_smooth$y, nups = nups, ndowns = ndowns, zero = zero, minpeakheight = minpeakheight)
    # pks
    # str(pks)
  
    # Impose additional filters on the peaks: 
    # 1) Remove "peaks" identified within the last 2 points of the trace
    # 2) Remove peaks with maximum values that when compared to the frequency value of neighoring points before have a difference of less than the given threshold (if the given peak is the number of neighboring points or less from the beginning, compare to neighbors within half the distance of the index from the start point)
    # 3) Remove peaks identified close together that are probably points on the same wide peak
  
    rem1 <- which(pks[, 2] > (length(splf_smooth$x) - 2)) 
    
    if(length(rem1) > 0){
      pks <- pks[-rem1, ]
    }
  
    rem2 <- unlist(lapply(1:nrow(pks), function(z){

      indx <- pks[z, 2]
      idiff <- indx - neighb

      if(!idiff < 0 & idiff != 0){
        rel_ht <- pks[z, 1] - splf_smooth$y[idiff]
        if(rel_ht < thresh){
          return(z)
        }
      } else {
        idiff <- indx - (indx/2)
        rel_ht <- pks[z, 1] - splf_smooth$y[idiff]
        if(rel_ht < thresh){
          return(z)
        }
      }

    }))
    
    # Remove these indices from the peaks
    if(length(rem2) > 0){
      pks <- pks[-rem2, ]
    }
  
    # If more than one peak remains, find those that are too close together
    if(is.matrix(pks) & nrow(pks) > 0){
      rem3 <- which(diff(pks[, 2]) < pkdist)
      if(length(rem3) > 0){
        pks <- pks[-rem3, ]
      }
    }
    
    # Identify troughs by applying the findpeaks code to the inverted frequency trace
    # Removed the minpeakheight limitation here
    trghs <- findpeaks(-splf_smooth$y, nups = nups, ndowns = ndowns, zero = zero)
    
    # Filter the troughs to retain those following the peaks identified above
    # In other words, assign each trough to a peak, and drop troughs without matches
    # Do the peak-trough assignments by finding the closest trough following each peak
    
    # If more than one peak was identified for the given call
    if(is.matrix(pks) & nrow(pks) > 0){

      trough_df <- rbindlist(lapply(1:nrow(pks), function(z){
        
        # Get the difference in indices between the given peak and all troughs
        diff_inds <- pks[z, 2] - trghs[, 2] 
        
        # Find the negative differences (trough following a peak)
        negs <- diff_inds[diff_inds < 0]
        
        # Check whether there are 2 troughs between this peak and the next
        # If so, return the deepest trough, to account for tall peaks with shoulders that drop lower after the shoulder
        if(z != nrow(pks) & z > 1){
          
          neigh_trghs <- which(trghs[, 2] >= pks[z, 2] & trghs[, 2] <= pks[z + 1, 2])
          # If there are neighboring troughs prior to the next peak, and the first trough is not as deep as the following troughs, find which of the following troughs is the deepest
          
          if(length(neigh_trghs) > 1 & all(trghs[neigh_trghs[1], 1] < trghs[neigh_trghs[-1], 1]) & length(negs) > 0){
            
            wh <- which(diff_inds < 0 & seq(1, nrow(trghs), 1) %in% neigh_trghs & diff_inds == min(diff_inds[seq(1, nrow(trghs), 1) %in% neigh_trghs])) 
          
            # If there are neighboring troughs prior to the next peak, but the first trough is deeper than the following troughs, get this first trough
          } else if(length(neigh_trghs) > 1 & all(trghs[neigh_trghs[1], 1] > trghs[neigh_trghs[-1], 1]) & length(negs) > 0){
            
            wh <- neigh_trghs[1]
            
          # Else if there's just a single trough identified between peaks 
          } else if(length(neigh_trghs) == 1 & length(negs) > 0){
            
            wh <- which(diff_inds < 0 & diff_inds == max(negs, na.rm = TRUE)) 
          
          # Else if no neighboring trough is identified for the given peak
          } else if(length(neigh_trghs) == 0){
            
            wh <- NULL
          }
          
        # Else if on the first or last peak, just a single trough or none should be identified
        } else if(z == nrow(pks) | z == 1 & length(negs) > 0){
          
          if(length(negs) > 0){
            wh <- which(diff_inds < 0 & diff_inds == max(negs, na.rm = TRUE))
          } else {
            wh <- NULL
          }
           
        }
        
        # Return a data frame with the slope of the frequency curve between the peak and its matching trough, unless the peak wasn't matched to a trough (e.g. a peak close to the end of a call)
        if(length(wh) > 0){
          
          pt_slope <- (splf_smooth$y[trghs[wh, 2]] - splf_smooth$y[pks[z, 2]])/(splf_smooth$x[trghs[wh, 2]] - splf_smooth$x[pks[z, 2]])
          
          # Make sure to convert the trough heights back to positive values here
          return(data.frame(peak_indx = pks[z, 2], trgh_indx = trghs[wh, 2], trgh_hght = -trghs[wh, 1], pt_slope = pt_slope))
          
        } else{
          
          return(data.frame(peak_indx = pks[z, 2], trgh_indx = NA, trgh_hght = NA, pt_slope = NA))
        }
        
      }))
    
      # If only a single peak was identified per call  
      } else if(!is.matrix(pks) & nrow(pks) == 0){

        # Get the difference in indices between the given peak and all troughs
        diff_inds <- pks[2] - trghs[, 2] 
        
        # Find the negative differences (trough following a peak), and the max of these if troughs were found
        negs <- diff_inds[diff_inds < 0]
        
        if(length(negs) > 0){
          wh <- which(diff_inds < 0 & diff_inds == max(negs, na.rm = TRUE)) 
        } else {
          wh <- NULL
        }
        # Return a data frame with the difference in height between the peak and its matching trough
        if(length(wh) > 0){
          
          (splf_smooth$y[trghs[wh, 2]] - splf_smooth$y[pks[2]])/(splf_smooth$x[trghs[wh, 2]] - splf_smooth$x[pks[2]])
          
          return(data.frame(peak_indx = pks[2], trgh_indx = trghs[wh, 2], trgh_hght = -trghs[wh, 1], pt_slope = pt_slope))
          
        } else{
          return(data.frame(peak_indx = pks[z, 2], trgh_indx = NA, trgh_hght = NA, pt_slope = NA))
        }
        
      }
    
    # If img = TRUE, generate an image file with the peaks and troughs
    if(img){
      jpeg(file.path(gpath, paste(X$sound.files[i], "freq_peaks.jpeg", sep = "-")), units = "in", width = 5, height = 4, res = 200)
      plot(splf_smooth$y)
      points(x = pks[, 2], y = pks[, 1], pch = 4, cex = 2, col = "red")
      points(x = trough_df$trgh_indx, y = trough_df$trgh_hght, pch = 4, cex = 2, col = "blue")
      dev.off()
    }
  
    # Return the peaks found per call, the rate of change, and the change in frequency between matched peaks and troughs
    return(data.frame(
      sound.files = X$sound.files[i], 
      range = X$range[i], 
      site = X$site[i], 
      year = X$year[i], 
      site_year = X$site_year[i], 
      region = X$region[i],
      dept_state = X$dept_state[i],
      invasive_city = X$invasive_city[i], 
      peak_number = seq(1, nrow(pks), 1), 
      peak_height = pks[, 1], 
      peak_max_index = pks[, 2], 
      peak_start_index = pks[, 3], 
      peak_end_index = pks[, 4], 
      peak_trgh_slope = trough_df$pt_slope,
      trgh_index = trough_df$trgh_indx
      )
    )

  }))
  
  return(tmp_df)
}

# Used the resulting image files to visually assess how well this customized routine picks up large peaks and neighboring troughs
peak_res <- peak_locator(X = tmp_df, n_tot = 100, n_rem = 5, nups = 2, ndowns = 0, minpeakheight = 1, zero = "0", degf = 50, thresh = 0.05, img = TRUE, neighb = 25, pkdist = 20)
# glimpse(peak_res)

This peak locator function performed well in locating large frequency peaks, and some smaller ones for the reduced call dataset. Each peak was matched pretty well with a following trough. For some wider peaks, the trough comes somewhat late, but overall still looks great for final analysis.

Next, checked peaks identified for all calls for which we performed frequency tracing. Here, I checked that all large visible peaks had been identified using the same parameters as for the subset of calls above. A handful of calls had one peak not identified, but these peaks were medium-sized and typically nested right in between larger peaks (more difficult to identify). Other calls had smaller peaks identified. Overall, the function picked up large modulation peaks per call, and results looked good to continue (see example image file below).

freq_mod_df <- freq_modQ_df %>%
  inner_join(
    freq_mod_est %>%
      as.data.frame() %>%
      dplyr::select(names(freq_mod_est)[grep("sound.files|ffreq", names(freq_mod_est))]),
    by = "sound.files"
  ) %>%
  droplevels() %>%
  # Remove duplicated sound files chosen for both spatial and temporal comparisons
  filter(!duplicated(sound.files))
# glimpse(freq_mod_df)
nrow(freq_mod_df)
## [1] 233

Frequency modulation measurements

peaks_troughs_df <- peak_locator(X = freq_mod_df, n_tot = 100, n_rem = 5, nups = 2, ndowns = 0, minpeakheight = 1, zero = "0", degf = 50, thresh = 0.05, img = TRUE, neighb = 25, pkdist = 20)
glimpse(peaks_troughs_df)

# Save this data for use later as needed
saveRDS(peaks_troughs_df, file.path(path, "peaks_troughs_df.RDS"))

# Save as a .csv as well for sharing data
write.csv(peaks_troughs_df, file.path(path, "peaks_troughs_df.csv"), row.names = FALSE)

Visual validation

The image files look great. Nearly every peak was assigned a trough. Some intermediate peaks close to larger peaks were picked up, and some more gradual increases were picked up as peaks, but overall, the function did a good job with identifying large, visible peaks and troughs across all calls for which we performed frequency tracing. In the image file below, peaks are shown in red and troughs in dark blue.

AM3.2

Spline-smoothed second harmonic frequency contour with peaks and troughs for 2017_09_13_QUEB_1432_2.WAV

I performed a final found of filtering using the frequency modulation measurements for the full dataset prior to parsing out calls chosen above for the spatial comparison between ranges. Here I removed very small peaks that would lead to overestimation of frequency modulation.

# The function returns one row per peak, with trough information per row
# Dropping peaks below will therefore drop associated trough information
peaks_trghs_df <- readRDS(file.path(path, "peaks_troughs_df.RDS"))
# peaks_trghs_df <- read.csv(file.path(path, "peaks_troughs_df.csv"))
glimpse(peaks_trghs_df)
## Rows: 1,309
## Columns: 15
## $ sound.files      <fct> 2017_07_11_INES-04_MZ000149_1.WAV, 2017_07_11_INES-04…
## $ range            <fct> Native, Native, Native, Native, Native, Native, Nativ…
## $ site             <fct> INES-04, INES-04, INES-04, INES-04, INES-04, INES-04,…
## $ year             <fct> 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017,…
## $ site_year        <fct> INES-04_2017, INES-04_2017, INES-04_2017, INES-04_201…
## $ region           <fct> West, West, West, West, West, West, West, West, West,…
## $ dept_state       <fct> Colonia, Colonia, Colonia, Colonia, Colonia, Colonia,…
## $ invasive_city    <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ peak_number      <dbl> 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4,…
## $ peak_height      <dbl> 2.677283, 4.936604, 4.949482, 4.875870, 5.060024, 4.8…
## $ peak_max_index   <dbl> 46, 108, 150, 203, 256, 304, 364, 53, 87, 144, 203, 2…
## $ peak_start_index <dbl> 1, 54, 125, 176, 229, 280, 352, 1, 59, 116, 167, 226,…
## $ peak_end_index   <dbl> 54, 125, 176, 229, 280, 339, 443, 59, 116, 167, 226, …
## $ peak_trgh_slope  <dbl> -0.016857731, -0.395336964, -0.336700987, -0.35746471…
## $ trgh_index       <dbl> 54, 125, 176, 229, 280, 339, 443, 59, 116, 167, 226, …
# How many peaks in this dataset weren't matched to troughs? 2
length(which(is.na(peaks_trghs_df$peak_trgh_slope)))
## [1] 2
# Remove these for distribution calculations below
peaks_trghs_df2 <- peaks_trghs_df %>%
  filter(!is.na(peak_trgh_slope)) %>%
  droplevels()

# What is the distribution of peak - trough ranges? Can this be used to exclude smaller peaks?
range(peaks_trghs_df2$peak_trgh_slope, na.rm = TRUE)
## [1] -0.90275383  0.01155562
# hist(peaks_trghs_df2$peak_trgh_slope, breaks = 50)

# If we split these distances into 50 bins (yields similar results to 100 intervals), then discard the last two bins of slopes, which represent the smallest peaks
levels(cut(peaks_trghs_df2$peak_trgh_slope, 50))
##  [1] "(-0.904,-0.884]"   "(-0.884,-0.866]"   "(-0.866,-0.848]"  
##  [4] "(-0.848,-0.83]"    "(-0.83,-0.811]"    "(-0.811,-0.793]"  
##  [7] "(-0.793,-0.775]"   "(-0.775,-0.756]"   "(-0.756,-0.738]"  
## [10] "(-0.738,-0.72]"    "(-0.72,-0.702]"    "(-0.702,-0.683]"  
## [13] "(-0.683,-0.665]"   "(-0.665,-0.647]"   "(-0.647,-0.628]"  
## [16] "(-0.628,-0.61]"    "(-0.61,-0.592]"    "(-0.592,-0.574]"  
## [19] "(-0.574,-0.555]"   "(-0.555,-0.537]"   "(-0.537,-0.519]"  
## [22] "(-0.519,-0.5]"     "(-0.5,-0.482]"     "(-0.482,-0.464]"  
## [25] "(-0.464,-0.446]"   "(-0.446,-0.427]"   "(-0.427,-0.409]"  
## [28] "(-0.409,-0.391]"   "(-0.391,-0.372]"   "(-0.372,-0.354]"  
## [31] "(-0.354,-0.336]"   "(-0.336,-0.318]"   "(-0.318,-0.299]"  
## [34] "(-0.299,-0.281]"   "(-0.281,-0.263]"   "(-0.263,-0.244]"  
## [37] "(-0.244,-0.226]"   "(-0.226,-0.208]"   "(-0.208,-0.19]"   
## [40] "(-0.19,-0.171]"    "(-0.171,-0.153]"   "(-0.153,-0.135]"  
## [43] "(-0.135,-0.116]"   "(-0.116,-0.0982]"  "(-0.0982,-0.0799]"
## [46] "(-0.0799,-0.0616]" "(-0.0616,-0.0433]" "(-0.0433,-0.025]" 
## [49] "(-0.025,-0.00673]" "(-0.00673,0.0125]"
# Add these bins back to the data frame, then drop peaks that fell into the first interval
peaks_trghs_df2$peak_size_class <- cut(peaks_trghs_df2$peak_trgh_slope, 50, labels = FALSE)

# Numbers of peaks by interval
# 160 peaks will be dropped after excluding peaks in the last two bins
table(peaks_trghs_df2$peak_size_class)
## 
##  1 11 12 13 15 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 
##  1  1  1  2  1  1  1  1  3  3  6  9 12  9  8 17 28 25 25 33 46 52 61 50 76 69 
## 38 39 40 41 42 43 44 45 46 47 48 49 50 
## 94 81 80 58 55 42 54 28 31 31 52 95 65
# Exclude peaks in the first 2 bins
peaks_trghs_df3 <- peaks_trghs_df2 %>%
  dplyr::mutate(
    peak_size_class = as.numeric(peak_size_class)
  ) %>%
  filter(peak_size_class < 49) %>%
  droplevels() %>%
  # Remove the peak_size_class column
  dplyr::select(-c(peak_size_class)) %>%
  # Add back the peaks that were not assigned to troughs
  bind_rows(
    peaks_trghs_df %>%
    filter(is.na(peak_trgh_slope)) %>%
    droplevels()
  )

# 1149 peaks remain, 160 were removed, looks great
nrow(peaks_trghs_df3)
## [1] 1149
nrow(peaks_trghs_df) - nrow(peaks_trghs_df3)
## [1] 160

Comparison between ranges

Obtained frequency modulation measurements and a standard set of spectral acoustic measurements for the calls set aside for the spatial comparison between ranges.

calls <- freq_modQ_df %>%
  filter(question == "spatial") %>%
  pull(sound.files) %>%
  as.character()

length(calls)
## [1] 80
peaks_spat_df <- peaks_trghs_df3 %>%
  filter(sound.files %in% calls) %>%
  droplevels()
# glimpse(peaks_spat_df)

# 40 native range calls, 40 invasive range calls, each call represents a "unique" individual
# 4 randomly sampled calls from 10 randomly sites per range
peaks_spat_df %>%
  group_by(range) %>%
  dplyr::summarise(
    n_calls = n_distinct(sound.files)
  )
## # A tibble: 2 x 2
##   range    n_calls
##   <fct>      <int>
## 1 Native        40
## 2 Invasive      40
# How many years are represented in the invasive range? More calls from 2004, but some from each 2011 and 2019
peaks_spat_df %>%
  filter(range == "Invasive") %>%
  group_by(year) %>%
  dplyr::summarise(
    n_calls = n_distinct(sound.files)
  )
## # A tibble: 3 x 2
##   year  n_calls
##   <fct>   <int>
## 1 2004       24
## 2 2011        8
## 3 2019        8
# Native spatial areas represented?
peaks_spat_df %>%
  filter(range == "Native") %>%
  group_by(region, dept_state) %>%
  dplyr::summarise(
    n_calls = n_distinct(sound.files)
  )
## `summarise()` has grouped output by 'region'. You can override using the `.groups` argument.
## # A tibble: 5 x 3
## # Groups:   region [3]
##   region  dept_state n_calls
##   <fct>   <fct>        <int>
## 1 West    Colonia         12
## 2 East    Maldonado       12
## 3 East    San Jose         4
## 4 Central Montevideo       8
## 5 Central Canelones        4
# Which native range sites? None of these had known repeatedly sampled individuals
peaks_spat_df %>%
  filter(range == "Native") %>%
  group_by(site) %>%
  dplyr::summarise(
    n_calls = n_distinct(sound.files)
  )
## # A tibble: 10 x 2
##    site    n_calls
##    <fct>     <int>
##  1 INES-04       4
##  2 PLVE          4
##  3 QUEB          4
##  4 PAVO          4
##  5 PIED          4
##  6 FAGR          4
##  7 ELTE          4
##  8 INBR          4
##  9 INES-08       4
## 10 GOLF          4
# Invasive spatial areas represented?
# More representation from Texas, as expected
peaks_spat_df %>%
  filter(range == "Invasive") %>%
  group_by(region, dept_state) %>%
  dplyr::summarise(
    n_calls = n_distinct(sound.files)
  )
## `summarise()` has grouped output by 'region'. You can override using the `.groups` argument.
## # A tibble: 4 x 3
## # Groups:   region [3]
##   region    dept_state  n_calls
##   <fct>     <fct>         <int>
## 1 Southwest Texas            24
## 2 Southeast Louisiana         4
## 3 Southeast Florida           4
## 4 Northeast Connecticut       8
# Which inasvive range site-years? 
# We recorded repeatedly sampled individuals at some of these sites
# But calls from repeatedly sampled individuals used in the final dataset are not included here (see below)
peaks_spat_df %>%
  filter(range == "Invasive") %>%
  group_by(site_year) %>%
  dplyr::summarise(
    n_calls = n_distinct(sound.files)
  )
## # A tibble: 10 x 2
##    site_year n_calls
##    <fct>       <int>
##  1 SOCC_2004       4
##  2 BALL_2004       4
##  3 INTR_2011       4
##  4 VALL_2004       4
##  5 AUDU_2004       4
##  6 AIRP_2019       4
##  7 BAPT_2004       4
##  8 MANO_2019       4
##  9 SHAK_2004       4
## 10 SOFT_2011       4
# Were any calls from known repeatedly sampled individuals used in frequency tracing?
# Yes, but these 2 calls are from birds that were not retained as repeatedly sampled individuals in the final dataset (not enough calls remained after pre-processing)
unique(peaks_spat_df$sound.files[grep("site_scale", peaks_spat_df$sound.files)])
## [1] 2019_08_07_AIRP_GSV1040_8_site_scale.WAV
## [2] 2019_08_07_AIRP_GSV1043_2_site_scale.WAV
## 80 Levels: 2017_07_11_INES-04_MZ000149_1.WAV ...

Combine acoustic measurements

Calculated frequency modulation measurements per call.

# Get the number of peaks per call
num_peaks_df <- peaks_spat_df %>%
  # Find the number of peaks per call
  group_by(range, sound.files) %>%
  dplyr::summarise(
    num_peaks = length(peak_number)
  ) %>%
  ungroup() %>%
  # Ensure levels of the range column are in the expected order
  dplyr::mutate(
    range = as.character(range),
    range = factor(range, levels = c("Native", "Invasive"))
  ) %>%
  dplyr::select(c(sound.files, num_peaks))
## `summarise()` has grouped output by 'range'. You can override using the `.groups` argument.
glimpse(num_peaks_df)
## Rows: 80
## Columns: 2
## $ sound.files <fct> 2017_07_11_INES-04_MZ000149_1.WAV, 2017_07_11_INES-04_MZ00…
## $ num_peaks   <int> 6, 6, 5, 7, 7, 5, 7, 5, 7, 7, 6, 5, 6, 5, 5, 5, 6, 8, 6, 7…
# Modulation rate per call
modRate_df <- peaks_spat_df %>%
  # Calculate duration per call
  inner_join(
    freq_mod_df %>%
    dplyr::mutate(
      duration = end - start
    ) %>%
      dplyr::select(sound.files, duration),
    by = "sound.files"
  ) %>%
  # Calculate modulation rate value per call
  group_by(range, sound.files, duration) %>%
  dplyr::summarise(
    num_peaks = length(peak_number)
  ) %>%
  ungroup() %>%
  dplyr::mutate(
    modulation_rate = (num_peaks/duration) # units for rate of change = peaks/second
  ) %>%
  # Ensure levels of the range column are in the expected order
  dplyr::mutate(
    range = as.character(range),
    range = factor(range, levels = c("Native", "Invasive"))
  ) %>%
  dplyr::select(c(sound.files, modulation_rate))
## `summarise()` has grouped output by 'range', 'sound.files'. You can override using the `.groups` argument.
glimpse(modRate_df)
## Rows: 80
## Columns: 2
## $ sound.files     <chr> "2017_05_21_PLVE_1005_2.WAV", "2017_05_21_PLVE_1005_3.…
## $ modulation_rate <dbl> 32.02348, 42.26115, 38.95694, 32.40491, 33.43053, 32.5…
# Maximum peak - trough slope per call
peakTroughSlope_df <- peaks_spat_df %>%
  # Drop peaks that were not matched to troughs
  filter(!is.na(peak_trgh_slope)) %>%
  # Ensure levels of the range column are in the expected order
  dplyr::mutate(
    range = as.character(range),
    range = factor(range, levels = c("Native", "Invasive"))
  ) %>%
  # Get maximum (absolute) peak-trough range per call
  group_by(range, sound.files) %>%
  dplyr::summarise(
    max_peak_trgh_slope = min(peak_trgh_slope)
  ) %>%
  ungroup() %>%
  dplyr::select(sound.files, max_peak_trgh_slope)
## `summarise()` has grouped output by 'range'. You can override using the `.groups` argument.
glimpse(peakTroughSlope_df)
## Rows: 80
## Columns: 2
## $ sound.files         <fct> 2017_07_11_INES-04_MZ000149_1.WAV, 2017_07_11_INES…
## $ max_peak_trgh_slope <dbl> -0.4120342, -0.4661997, -0.2942515, -0.4826950, -0…

Combined these frequency modulation measurements with the 27 standard spectral acoustic measurements used in machine learning (see previous script, prior to filtering for collinearity).

acoustic_measuremnts <- read.csv(file.path(path, "acoustic_parameters.csv")) %>%
      dplyr::select(-c(selec))
glimpse(acoustic_measuremnts)
## Rows: 1,596
## Columns: 28
## $ sound.files <chr> "2017_07_17_EMBR_INDIV-RAW_MZ000199_1.WAV", "2017_07_17_EM…
## $ duration    <dbl> 0.2146363, 0.1810324, 0.2009257, 0.1878781, 0.1709179, 0.1…
## $ meanfreq    <dbl> 4.261767, 4.207837, 4.300952, 4.449192, 4.737341, 4.649069…
## $ sd          <dbl> 1.745325, 1.934379, 1.751721, 1.747474, 1.874080, 1.903127…
## $ freq.median <dbl> 4.106908, 3.744148, 4.157787, 4.316218, 4.539256, 4.525018…
## $ freq.Q25    <dbl> 3.114309, 2.799090, 2.933548, 3.283657, 3.263085, 3.113488…
## $ freq.Q75    <dbl> 4.992325, 5.065020, 5.018735, 5.242329, 6.113981, 5.967368…
## $ freq.IQR    <dbl> 1.878015, 2.265930, 2.085187, 1.958673, 2.850895, 2.853880…
## $ time.median <dbl> 0.09390873, 0.09818989, 0.09419643, 0.08540507, 0.07958632…
## $ time.Q25    <dbl> 0.04829592, 0.05314866, 0.04844388, 0.04225304, 0.04250633…
## $ time.Q75    <dbl> 0.1466765, 0.1396278, 0.1462287, 0.1420421, 0.1175707, 0.1…
## $ time.IQR    <dbl> 0.09838057, 0.08647917, 0.09778486, 0.09978908, 0.07506437…
## $ skew        <dbl> 2.312270, 2.415746, 1.805732, 2.205839, 1.796657, 1.766370…
## $ kurt        <dbl> 9.950427, 11.207737, 6.606223, 9.014162, 7.522156, 7.14385…
## $ sp.ent      <dbl> 0.9416864, 0.9436562, 0.9395492, 0.9406724, 0.9496999, 0.9…
## $ time.ent    <dbl> 0.8827616, 0.8870927, 0.8842926, 0.8860298, 0.8891064, 0.8…
## $ entropy     <dbl> 0.8312847, 0.8371105, 0.8308364, 0.8334637, 0.8443843, 0.8…
## $ sfm         <dbl> 0.6051624, 0.6396942, 0.5738982, 0.5991476, 0.6141410, 0.6…
## $ meandom     <dbl> 3.728250, 3.458416, 3.490407, 3.845556, 4.110351, 3.755370…
## $ mindom      <dbl> 2.275000, 0.875000, 0.875000, 2.275000, 2.391667, 0.525000…
## $ maxdom      <dbl> 4.841667, 5.075000, 5.425000, 5.075000, 7.291667, 7.291667…
## $ dfrange     <dbl> 2.566667, 4.200000, 4.550000, 2.800000, 4.900000, 6.766667…
## $ modindx     <dbl> 16.863636, 7.194444, 11.307692, 18.333333, 12.928571, 9.91…
## $ startdom    <dbl> 4.608333, 1.225000, 0.875000, 3.791667, 5.191667, 5.541667…
## $ enddom      <dbl> 3.3250000, 3.2083333, 3.5583333, 3.5583333, 4.6083333, 1.2…
## $ dfslope     <dbl> -5.9791080, 10.9556803, 13.3548542, -1.2419399, -3.4129451…
## $ meanpeakf   <dbl> 4.141667, 3.325000, 4.258333, 4.375000, 4.608333, 2.508333…
## $ peakf       <dbl> 4.14631312, 3.24191609, 4.05568720, 4.30036206, 3.12994163…
ameas_df <- acoustic_measuremnts %>%
  # Add back the range column
  inner_join(
    nat_inv_est %>%
      as_tibble() %>%
      dplyr::select(range, sound.files),
    by = "sound.files"
  ) %>%
  # Filter by calls used in the frequency modulation analysis
  filter(sound.files %in% peaks_spat_df$sound.files) %>%
  # Join with the frequency modulation measurements
  full_join(
    num_peaks_df,
    by = "sound.files"
  ) %>%
  full_join(
    modRate_df,
    by = "sound.files"
  ) %>%
  full_join(
    peakTroughSlope_df,
    by = "sound.files"
  ) 
  
glimpse(ameas_df)
## Rows: 80
## Columns: 32
## $ sound.files         <chr> "2017_05_21_PLVE_1005_2.WAV", "2017_05_21_PLVE_100…
## $ duration            <dbl> 0.1561354, 0.1656367, 0.1796856, 0.1542976, 0.1794…
## $ meanfreq            <dbl> 4.151223, 3.808729, 4.141827, 3.912671, 3.872825, …
## $ sd                  <dbl> 1.877118, 1.537948, 1.599546, 1.543469, 1.466119, …
## $ freq.median         <dbl> 4.080633, 3.790128, 4.012443, 3.942792, 3.593443, …
## $ freq.Q25            <dbl> 2.709872, 2.914773, 2.932547, 2.762777, 2.935738, …
## $ freq.Q75            <dbl> 5.086285, 4.375710, 4.769483, 4.746758, 4.351475, …
## $ freq.IQR            <dbl> 2.376413, 1.460938, 1.836935, 1.983982, 1.415738, …
## $ time.median         <dbl> 0.06718847, 0.07332887, 0.08759500, 0.07807336, 0.…
## $ time.Q25            <dbl> 0.03359424, 0.04254885, 0.05327944, 0.03812885, 0.…
## $ time.Q75            <dbl> 0.1153099, 0.1086354, 0.1182984, 0.1152944, 0.1253…
## $ time.IQR            <dbl> 0.08171571, 0.06608651, 0.06501897, 0.07716553, 0.…
## $ skew                <dbl> 2.470123, 2.159739, 2.303599, 1.797583, 2.263534, …
## $ kurt                <dbl> 11.029776, 7.635150, 8.583492, 6.169945, 8.442140,…
## $ sp.ent              <dbl> 0.9507935, 0.9196218, 0.9271719, 0.9265415, 0.9097…
## $ time.ent            <dbl> 0.8926133, 0.8900954, 0.8870764, 0.8930942, 0.8872…
## $ entropy             <dbl> 0.8486909, 0.8185512, 0.8224723, 0.8274888, 0.8072…
## $ sfm                 <dbl> 0.6920527, 0.5146865, 0.5469261, 0.5295827, 0.4444…
## $ meandom             <dbl> 3.647351, 3.381410, 3.461013, 3.669542, 3.383333, …
## $ mindom              <dbl> 2.2750000, 0.5250000, 0.8750000, 1.9250000, 0.5250…
## $ maxdom              <dbl> 4.958333, 4.491667, 5.075000, 5.191667, 4.375000, …
## $ dfrange             <dbl> 2.683333, 3.966667, 4.200000, 3.266667, 3.850000, …
## $ modindx             <dbl> 14.347826, 5.911765, 10.055556, 11.321429, 8.36363…
## $ startdom            <dbl> 2.3916667, 2.7416667, 4.3750000, 2.2750000, 0.5250…
## $ enddom              <dbl> 3.3250000, 0.5250000, 3.2083333, 4.7250000, 3.0916…
## $ dfslope             <dbl> 5.9777166, -13.3826988, -6.4928236, 15.8784059, 14…
## $ meanpeakf           <dbl> 4.258333, 4.025000, 4.025000, 4.608333, 3.208333, …
## $ peakf               <dbl> 4.245433, 2.879236, 4.101161, 4.354276, 3.080760, …
## $ range               <fct> Native, Native, Native, Native, Native, Native, Na…
## $ num_peaks           <int> 5, 7, 7, 5, 6, 6, 5, 7, 6, 7, 6, 6, 6, 6, 6, 4, 5,…
## $ modulation_rate     <dbl> 32.02348, 42.26115, 38.95694, 32.40491, 33.43053, …
## $ max_peak_trgh_slope <dbl> -0.2642049, -0.5134121, -0.4015570, -0.4108509, -0…

Assess normality per acoustic measurement per range.

vars <- names(ameas_df)[-grep("^range$|^sound.files$", names(ameas_df))]

# Shapiro-Wilks test to assess normality
shap_df <- ameas_df %>%
  dplyr::select(-c(sound.files)) %>%
  dplyr::mutate(
    range = factor(range, levels = c("Native", "Invasive"))
  ) %>%
  dplyr::group_by(range) %>%
  rstatix::shapiro_test(vars = vars)

# More than 10 acoustic measurements violate the assumption of normality for at least one range
shap_df %>%
  filter(p <= 0.05) %>%
  pull(variable) %>%
  unique()
##  [1] "dfrange"             "enddom"              "kurt"               
##  [4] "max_peak_trgh_slope" "maxdom"              "meanpeakf"          
##  [7] "mindom"              "num_peaks"           "peakf"              
## [10] "startdom"            "time.ent"            "dfslope"

Assess homogeneity of variance per acoustic measurement between ranges.

vars <- names(ameas_df)[-grep("^range$|^sound.files$", names(ameas_df))]

# Levene's test to assess homogeneity of variance
lev_df <- ameas_df %>%
  dplyr::select(-c(sound.files)) %>%
  dplyr::mutate(
    range = factor(range, levels = c("Native", "Invasive"))
  ) %>%
  # Convert to long format
  pivot_longer(
    cols = names(.)[-grep("^range$|^sound.files$", names(.))],
    names_to = "measurement",
    values_to = "values"
  ) %>%
  group_by(measurement) %>%
  rstatix::levene_test(values ~ range)

lev_df
## # A tibble: 30 x 5
##    measurement   df1   df2 statistic      p
##    <chr>       <int> <int>     <dbl>  <dbl>
##  1 dfrange         1    78    1.30   0.257 
##  2 dfslope         1    78    4.33   0.0408
##  3 duration        1    78    2.10   0.152 
##  4 enddom          1    78    0.0367 0.849 
##  5 entropy         1    78    1.69   0.198 
##  6 freq.IQR        1    78    5.78   0.0186
##  7 freq.median     1    78    1.12   0.294 
##  8 freq.Q25        1    78    0.432  0.513 
##  9 freq.Q75        1    78    0.375  0.542 
## 10 kurt            1    78    0.649  0.423 
## # … with 20 more rows
# Five acoustic measurements violate the assumption of equal variance between ranges
lev_df %>%
  filter(p <= 0.05) %>%
  pull(measurement)
## [1] "dfslope"   "freq.IQR"  "meanpeakf" "mindom"    "time.ent"

Calculated Cliff’s delta as the effect size of range per acoustic measurement.

var_nms <- names(ameas_df)[-grep("^range$|^sound.files$", names(ameas_df))]
var_nms

# i <- 1
eff_res_df <- rbindlist(pblapply(1:length(var_nms), function(i){
  
  tmp_df <- ameas_df %>%
    dplyr::select(range, var_nms[i])
  
  nat <- tmp_df %>%
    filter(range == "Native") %>%
    pull(var_nms[i])
  
  inv <- tmp_df %>%
    filter(range == "Invasive") %>%
    pull(var_nms[i])
  
  # Get Cliff's delta for independent groups (all values of y compared to all values of x)
  # orddom package v 3.1
  cliff_delta <- dmes(nat, inv)$dc
  
  # Using the bias-corrected and accelerated bootstrap method, found to be robust to non-normality and heterogeneous variance in a simulation study (Hess & Kromey 2004)
  # Using 10000 bootstrap iterations
  ci_res <- dmes.boot(nat, inv, theta.es = "dc", ci.meth = "BCA", B = 10000)
  ci_res
  
  lower_95_CI <- ci_res$theta.bci.lo
  upper_95_CI <- ci_res$theta.bci.up
  
  return(data.frame(
    measurement = var_nms[i],
    cliff_delta = cliff_delta,
    lower_95_CI = lower_95_CI,
    upper_95_CI = upper_95_CI
  ))
  
}))

eff_res_df

write.csv(eff_res_df, file.path(path, "cliffs_delta_bootstrappedCIs.csv"), row.names = FALSE)

Table A7

Effect sizes ordered by absolute magnitude. Add median and interquartile range per acoustic measurement per range.

eff_res_df <- read.csv(file.path(path, "cliffs_delta_bootstrappedCIs.csv"))
glimpse(eff_res_df)
## Rows: 30
## Columns: 4
## $ measurement <chr> "duration", "meanfreq", "sd", "freq.median", "freq.Q25", "…
## $ cliff_delta <dbl> -0.291250, -0.056250, 0.781250, -0.188750, -0.255000, 0.19…
## $ lower_95_CI <dbl> -0.522500, -0.316250, 0.606250, -0.433750, -0.495000, -0.0…
## $ upper_95_CI <dbl> -0.027500, 0.198750, 0.891250, 0.075000, 0.006250, 0.43125…
# Make a table for appendix in order of largest absolute effect size
eff_res_df %>%
  dplyr::mutate(
    effect_size = round(cliff_delta, 2),
    lower_95_CI = round(lower_95_CI, 2),
    upper_95_CI = round(upper_95_CI, 2),
    `95_CI` = paste("(", lower_95_CI, ", ", upper_95_CI, ")", sep = "")
  ) %>%
  arrange(desc(abs(effect_size))) %>%
  dplyr::select(measurement, effect_size, `95_CI`) %>%
  left_join(
    ameas_df %>%
      dplyr::select(-c(sound.files)) %>%
      dplyr::mutate(
        range = factor(range, levels = c("Native", "Invasive"))
      ) %>%
      # Convert to long format
      pivot_longer(
        cols = names(.)[-grep("^range$|^sound.files$", names(.))],
        names_to = "measurement",
        values_to = "values"
      ) %>%
      group_by(measurement, range) %>%
      dplyr::summarise(
        median = round(median(values), 3),
        IQR = round(IQR(values), 3)
      ) %>%
      # Convert to long format again
      pivot_longer(
        cols = c(median, IQR),
        names_to = "statistic",
        values_to = "values"
      ) %>%
      # Make wider to split out the mean and se columns
      pivot_wider(
        names_from = c(statistic, range),
        values_from = values
      ) %>%
      ungroup() %>%
      dplyr::mutate(
        median_IQR_NAT = paste(median_Native, IQR_Native, sep = "; "),
        median_IQR_INV = paste(median_Invasive, IQR_Invasive, sep = "; ")
      ),
    by = "measurement"
  ) %>% 
  rowid_to_column() %>%
  dplyr::select(rowid, measurement, median_IQR_NAT, median_IQR_INV, effect_size, `95_CI`) %>%
  kable()
rowid measurement median_IQR_NAT median_IQR_INV effect_size 95_CI
1 sd 1.692; 0.161 1.864; 0.191 0.78 (0.61, 0.89)
2 num_peaks 6; 1 4.5; 1 -0.69 (-0.82, -0.49)
3 max_peak_trgh_slope -0.411; 0.136 -0.309; 0.063 0.69 (0.48, 0.83)
4 modulation_rate 32.833; 4.891 26.334; 8.826 -0.63 (-0.79, -0.41)
5 sfm 0.583; 0.077 0.631; 0.082 0.58 (0.35, 0.75)
6 entropy 0.834; 0.009 0.841; 0.015 0.51 (0.27, 0.69)
7 sp.ent 0.938; 0.016 0.943; 0.015 0.45 (0.2, 0.64)
8 freq.IQR 2.026; 0.292 2.377; 0.638 0.44 (0.19, 0.64)
9 time.Q75 0.121; 0.018 0.112; 0.013 -0.34 (-0.56, -0.08)
10 dfrange 4.025; 1.75 4.492; 0.846 0.32 (0.06, 0.55)
11 meanpeakf 4.083; 0.467 3.383; 1.225 -0.32 (-0.56, -0.07)
12 duration 0.171; 0.013 0.165; 0.024 -0.29 (-0.52, -0.03)
13 time.ent 0.889; 0.002 0.89; 0.005 0.29 (0.02, 0.52)
14 time.IQR 0.075; 0.013 0.07; 0.011 -0.28 (-0.52, -0.02)
15 modindx 10.339; 4.803 9.039; 3.485 -0.27 (-0.5, -0.01)
16 freq.Q25 2.914; 0.483 2.754; 0.5 -0.26 (-0.5, 0.01)
17 maxdom 4.958; 0.379 5.133; 1.079 0.26 (0, 0.49)
18 enddom 3.033; 2.625 1.225; 2.158 -0.26 (-0.5, 0)
19 time.median 0.082; 0.012 0.078; 0.011 -0.25 (-0.49, 0)
20 time.Q25 0.046; 0.01 0.044; 0.006 -0.24 (-0.47, 0.02)
21 peakf 3.122; 1.37 2.871; 0.859 -0.24 (-0.48, 0.02)
22 meandom 3.479; 0.361 3.352; 0.448 -0.22 (-0.46, 0.04)
23 freq.median 3.961; 0.408 3.859; 0.469 -0.19 (-0.43, 0.07)
24 freq.Q75 4.962; 0.643 5.059; 0.665 0.19 (-0.07, 0.43)
25 startdom 2.217; 2.275 0.992; 1.575 -0.17 (-0.41, 0.09)
26 kurt 8.009; 2.276 8.364; 3.673 0.16 (-0.1, 0.41)
27 mindom 0.758; 1.429 0.642; 0.35 -0.16 (-0.39, 0.1)
28 skew 2.071; 0.342 2.054; 0.49 0.07 (-0.19, 0.32)
29 meanfreq 4.048; 0.413 4.044; 0.482 -0.06 (-0.32, 0.2)
30 dfslope 2.003; 19.876 1.356; 5.679 -0.04 (-0.29, 0.23)
# The 6 acoustic measurements with the largest effect sizes? In either direction
# sd is standard deviation of frequency, similar to freq.IQR
# sfm is spectral flatness, similar to spectral entropy
eff_res_df %>%
  arrange(desc(abs(cliff_delta))) %>%
  slice(1:6)
##           measurement cliff_delta lower_95_CI upper_95_CI
## 1                  sd    0.781250     0.60625     0.89125
## 2           num_peaks   -0.693125    -0.82500    -0.49375
## 3 max_peak_trgh_slope    0.691250     0.47875     0.83375
## 4     modulation_rate   -0.633750    -0.78750    -0.41250
## 5                 sfm    0.576250     0.34750     0.74750
## 6             entropy    0.508750     0.26625     0.69375
topms <- eff_res_df %>%
  arrange(desc(abs(cliff_delta))) %>%
  slice(1:6) %>%
  pull(measurement) %>%
  as.character()
topms
## [1] "sd"                  "num_peaks"           "max_peak_trgh_slope"
## [4] "modulation_rate"     "sfm"                 "entropy"

Figure 3B

A raincloud plot for acoustic measurements that displayed the largest effect sizes of range.

# Filter the measurements data frame by these variables, and rename these
ameas_df_tmp <- ameas_df %>%
  # Convert to long format
  pivot_longer(
    cols = names(.)[-grep("^range$|^sound.files$", names(.))],
    names_to = "measurement",
    values_to = "values"
  ) %>%
  filter(measurement %in% topms) %>%
  dplyr::mutate(
    measurement_nm = measurement,
    # Add new line symbols to all measurement names
    measurement_nm = recode(
      measurement_nm,
      `sd` = "Standard\n deviation\n of frequency",
      `num_peaks` = "Number of\n peaks",
      `max_peak_trgh_slope` = "Peak - trough\n slope",
      `modulation_rate` = "Modulation\n rate",
      `sfm` = "Spectral\n flatness",
      `entropy` = "Entropy\n"
    ),
    measurement_nm = factor(measurement_nm)
  ) 

ameas_df_tmp

cols <- scales::alpha(c("navy", "orange"), 0.75)

# Make a list of plots in order of effect sizes
# Doing this to customize y-axes of each plot, which isn't possible for plots in a single row with facetscales
gg_list <- list()

i <- 1
invisible(pblapply(1:length(topms), function(i){
  
  tmp_df <- ameas_df_tmp %>%
    filter(measurement == topms[i]) %>%
    dplyr::mutate(
      measurement_nm = as.character(measurement_nm),
      measurement_nm = factor(measurement_nm, levels = unique(measurement_nm))
    ) 
  
  # Get ymin and ymax values, add a buffer of 2 evenly space values before and after 
  ymin <- min(tmp_df$values)
  ymax <- max(tmp_df$values)
  
  # Get the difference between max and min, use this as a buffer on the y-axis scale
  buf <- (ymax - ymin)/5
  
  # If on the first plot, add the y-axis label
  if(i == 1){
    yal <- "Values"
  } else{
    yal <- ""
  }
  
  # Initialize plot margins (top, right, bottom, left)
  tm <- 1
  rm <- 0
  bm <- 1
  lm <- 0
  
  # If on the first plot, make the left margin larger
  if(i == 1){
    lm <- 0.5
  # If on last plot, make the right margin larger
  } else if(i == length(topms)){
    rm <- 0.5
  }
  
  # Raincloud plot: Kernel smoothed with boxplot
  gg <- ggplot(tmp_df, aes(x = range, y = values, fill = range, color = range)) + 
    geom_flat_violin(position = position_nudge(x = 0.1, y = 0), adjust = 1, size = 0.25)+
    geom_point(position = position_jitter(width = 0.05), size = 1, stroke = 0.5, alpha = 0.45) +
    geom_boxplot(aes(x = as.numeric(range) - 0.22, y = values), outlier.shape = NA, alpha = 0.55, width = 0.2, colour = "black", size = 0.25) +
    facet_wrap(~ measurement_nm) +
    scale_fill_manual(values = cols) +
    scale_color_manual(values = cols) +
    guides(fill = FALSE, color = FALSE) +
    theme_bw() +
    xlab("") + ylab(yal) 
  
  # gg
  
  # If on entropy, make the breaks differently to avoid a gap between 0.86 and 0.88
  if(topms[i] != "entropy"){
    gg <- gg +
      scale_y_continuous(limits = round(c(ymin - buf, ymax + buf), 2), breaks = round(seq(ymin - buf, (ymax + buf), (ymax - ymin)/5), 2))
  } else {
    brks <- round(seq(ymin - buf, (ymax + buf), (ymax - ymin)/5), 2)
    brks <- c(brks[-length(brks)], 0.87)
    gg <- gg +
      scale_y_continuous(limits = round(c(ymin - buf, 0.87), 2), breaks = brks)
  }
  
  # Add final aesthetics
  gg <- gg +
    theme(
      axis.title = element_text(size = 14),
      axis.text.y = element_text(size = 12),
      axis.text.x = element_text(size = 10),
      strip.text = element_text(size = 13, margin = margin(0.5, 2, 0.5, 2, "lines")),
      panel.grid.major = element_blank(),
      panel.grid.minor = element_blank(),
      axis.ticks = element_line(size = 0.25),
      plot.margin = unit(c(tm, rm, bm, lm), "lines")
    )
  
  # gg
  
  # Return the given plot
  gg_list[[i]] <<- gg
  
}))

# Get the legend
cols <- scales::alpha(c("navy", "orange"), 0.85)

gg_leg <- gtable::gtable_filter(ggplot_gtable(ggplot_build(
  ameas_df_tmp %>%
    ggplot(aes(x = range, y = values)) +
    geom_errorbar(aes(ymin = values, ymax = values, color = range), size = 2, width = 0.25) +
    scale_color_manual(values = cols) +
    guides(color = guide_legend(title = "", override.aes = list(size = 4))) +
    theme_bw() +
    theme(
      legend.text = element_text(size = 20),
      legend.position = "top",
      legend.margin = margin(0, 0, 0, 0),
      legend.box.margin = margin(-10, -10, -10, -10),
      legend.key.width = unit(3, "lines")
    )
)), "guide-box")

dev.off()

# Write out the legend separately
jpeg(file.path(gpath, "Figure3_AcousticMeasrmnts_leg.jpeg"), units = "in", width = 4, height = 1, res = 300)
grid.draw(gg_leg)
dev.off()

# Arrange plots into a single image file
jpeg(file.path(gpath, "Figure3B_AcousticMeasrmnts.jpeg"), units = "in", width = 11, height = 3.85, res = 300)

ggarrange(
  as.ggplot(gg_list[[1]]),
  as.ggplot(gg_list[[2]]),
  as.ggplot(gg_list[[3]]),
  as.ggplot(gg_list[[4]]),
  as.ggplot(gg_list[[5]]),
  as.ggplot(gg_list[[6]]),
  nrow = 1,
  widths = rep(2.2, 6)
)

dev.off()

See the main body of the article for this figure.

Frequency quartiles indicate the distribution of energy across the spectrum. Here quartiles are the 3 frequencies that divide the spectrum per call into 4 even intervals. The 25% quartile represents the frequency at which 25% of total energy in the signal is located below the given frequency. Likewise, the 75% quartile represents the frequency that splits off 75% of the total energy of the call.

The interquartile range represents the difference in frequency between these quartiles, a measurement of the frequency spread of the total energy of the signal. Higher IQR in the invasive range indicates that energy is more spread out over the range of frequency than native range calls.

For spectral flatness and spectral entropy, values closer to 1 are noisy signals. Lower spectral flatness and entropy values represent more ordered and informative signals, which ties lower flatness and spectral entropy in native range calls to higher frequency modulation levels.

High background noise can affect frequency quantile measurements and spectral entropy[1], but we pre-processed calls using a strict SNR threshold prior to these analyses, so we did not expect differences in these parameters to arise from differences in background noise between ranges.

Temporal comparison

Get the invasive range calls set aside for a temporal comparison, as well as native range calls.

calls <- freq_modQ_df %>%
  filter(question == "temporal" | range == "Native") %>%
  filter(question != "indiv_scale") %>% # exclude repeatedly sampled individual calls
  pull(sound.files) %>%
  as.character()

length(calls) 
## [1] 115
peaks_temp_df <- peaks_trghs_df3 %>%
  filter(sound.files %in% calls) %>%
  droplevels()
# glimpse(peaks_temp_df)

# 40 native range calls, 75 invasive range calls
peaks_temp_df %>%
  group_by(range) %>%
  dplyr::summarise(
    n_calls = n_distinct(sound.files)
  )
## # A tibble: 2 x 2
##   range    n_calls
##   <fct>      <int>
## 1 Native        40
## 2 Invasive      75
# How many years are represented in the invasive range? 25 calls from 2004, 30 calls for 2011, and 20 2019 calls
peaks_temp_df %>%
  filter(range == "Invasive") %>%
  group_by(year) %>%
  dplyr::summarise(
    n_calls = n_distinct(sound.files)
  )
## # A tibble: 3 x 2
##   year  n_calls
##   <fct>   <int>
## 1 2004       25
## 2 2011       30
## 3 2019       20
# Invasive spatial areas represented?
# Louisiana and Texas, as expected (New Orleans, Austin)
peaks_temp_df %>%
  filter(range == "Invasive") %>%
  group_by(region, dept_state) %>%
  dplyr::summarise(
    n_calls = n_distinct(sound.files)
  )
## `summarise()` has grouped output by 'region'. You can override using the `.groups` argument.
## # A tibble: 2 x 3
## # Groups:   region [2]
##   region    dept_state n_calls
##   <fct>     <fct>        <int>
## 1 Southwest Texas           50
## 2 Southeast Louisiana       25
# Which invasive range site-years? 
# We recorded repeatedly sampled individuals at some of these sites
# But calls from repeatedly sampled individuals used in the final dataset are not included here (see below)
peaks_temp_df %>%
  filter(range == "Invasive") %>%
  group_by(site_year) %>%
  dplyr::summarise(
    n_calls = n_distinct(sound.files)
  )
## # A tibble: 15 x 2
##    site_year n_calls
##    <fct>       <int>
##  1 SOCC_2004       5
##  2 BALL_2004       5
##  3 INTR_2011       5
##  4 VALL_2004       5
##  5 CANA_2004       5
##  6 ELEM_2011       5
##  7 ELEM_2019       5
##  8 FOLS_2004       5
##  9 INTR_2019       5
## 10 LAKE_2011       5
## 11 MART_2011       5
## 12 MART_2019       5
## 13 ROBE_2011       5
## 14 SOCC_2019       5
## 15 VALL_2011       5
# City-years
peaks_temp_df %>%
  filter(range == "Invasive") %>%
  group_by(year, invasive_city) %>%
  dplyr::summarise(
    n_calls = n_distinct(sound.files)
  )
## `summarise()` has grouped output by 'year'. You can override using the `.groups` argument.
## # A tibble: 5 x 3
## # Groups:   year [3]
##   year  invasive_city n_calls
##   <fct> <fct>           <int>
## 1 2004  Austin             10
## 2 2004  New Orleans        15
## 3 2011  Austin             20
## 4 2011  New Orleans        10
## 5 2019  Austin             20
# Were any calls from known repeatedly sampled individuals used in frequency tracing?
# Yes, but these 2 calls are also from birds that were not retained as repeatedly sampled individuals in the final dataset (not enough calls remained after pre-processing)
unique(peaks_temp_df$sound.files[grep("site_scale", peaks_temp_df$sound.files)])
## [1] 2011_02_15_ELEM_WRIG1013_17_site_scale.WAV
## [2] 2011_02_15_ELEM_WRIG1013_10_site_scale.WAV
## 115 Levels: 2017_07_11_INES-04_MZ000149_1.WAV ...
# These two calls were from INV-UM3 and INV-UM4, not present in calls for the repeatedly sampled individuals
# nat_inv_est[grep(paste(peaks_temp_df$sound.files[grep("site_scale", peaks_temp_df$sound.files)], collapse= "|"), nat_inv_est$sound.files), ]

# nat_inv_est %>%
# filter(social_scale == "Individual") %>%
# pull(Bird_ID) %>%
# unique()

Combining acoustic measurements

Calculated frequency modulation measurements per call.

# Get the number of peaks per call
num_peaks_tdf <- peaks_temp_df %>%
  # Create a range-year column
  dplyr::mutate(
    range_year = paste(range, year, sep = "_")
  ) %>%
  # Find the number of peaks per call
  group_by(range_year, sound.files) %>%
  dplyr::summarise(
    num_peaks = length(peak_number)
  ) %>%
  ungroup() %>%
  dplyr::select(c(sound.files, num_peaks))
## `summarise()` has grouped output by 'range_year'. You can override using the `.groups` argument.
glimpse(num_peaks_tdf)
## Rows: 115
## Columns: 2
## $ sound.files <fct> SBD_NewOrleans.1013.0143_resamp.WAV, SBD_Austin.1001.0192_…
## $ num_peaks   <int> 5, 4, 4, 3, 3, 4, 5, 4, 5, 3, 6, 5, 5, 5, 5, 5, 5, 5, 4, 4…
# Modulation rate per call
modRate_tdf <- peaks_temp_df %>%
  # Create a range-year column
  dplyr::mutate(
    range_year = paste(range, year, sep = "_")
  ) %>%
  # Calculate duration per call
  inner_join(
    freq_mod_df %>%
    dplyr::mutate(
      duration = end - start
    ) %>%
      dplyr::select(sound.files, duration),
    by = "sound.files"
  ) %>%
  # Calculate modulation rate value per call
  group_by(range_year, sound.files, duration) %>%
  dplyr::summarise(
    num_peaks = length(peak_number)
  ) %>%
  ungroup() %>%
  dplyr::mutate(
    modulation_rate = (num_peaks/duration) # units for rate of change = peaks/second
  ) %>%
  dplyr::select(c(sound.files, modulation_rate))
## `summarise()` has grouped output by 'range_year', 'sound.files'. You can override using the `.groups` argument.
glimpse(modRate_tdf)
## Rows: 115
## Columns: 2
## $ sound.files     <chr> "SBD_Austin.1001.0192_resamp.WAV", "SBD_Austin.1001.02…
## $ modulation_rate <dbl> 27.81052, 25.32145, 19.28747, 16.71678, 21.47669, 27.2…
# Maximum peak - trough slope per call
peakTroughSlope_tdf <- peaks_temp_df %>%
  # Create a range-year column
  dplyr::mutate(
    range_year = paste(range, year, sep = "_")
  ) %>%
  # Drop peaks that were not matched to troughs
  filter(!is.na(peak_trgh_slope)) %>%
  # Get maximum (absolute) peak-trough range per call
  group_by(range, sound.files) %>%
  dplyr::summarise(
    max_peak_trgh_slope = min(peak_trgh_slope)
  ) %>%
  ungroup() %>%
  dplyr::select(sound.files, max_peak_trgh_slope)
## `summarise()` has grouped output by 'range'. You can override using the `.groups` argument.
glimpse(peakTroughSlope_tdf)
## Rows: 115
## Columns: 2
## $ sound.files         <fct> 2017_07_11_INES-04_MZ000149_1.WAV, 2017_07_11_INES…
## $ max_peak_trgh_slope <dbl> -0.4120342, -0.4661997, -0.2942515, -0.4826950, -0…

Combined these frequency modulation measurements with the 27 standard acoustic measurements used in machine learning (see previous script, prior to filtering for collinearity). 115 calls total across years.

acoustic_measuremnts <- read.csv(file.path(path, "acoustic_parameters.csv")) %>%
      dplyr::select(-c(selec))
glimpse(acoustic_measuremnts)
## Rows: 1,596
## Columns: 28
## $ sound.files <chr> "2017_07_17_EMBR_INDIV-RAW_MZ000199_1.WAV", "2017_07_17_EM…
## $ duration    <dbl> 0.2146363, 0.1810324, 0.2009257, 0.1878781, 0.1709179, 0.1…
## $ meanfreq    <dbl> 4.261767, 4.207837, 4.300952, 4.449192, 4.737341, 4.649069…
## $ sd          <dbl> 1.745325, 1.934379, 1.751721, 1.747474, 1.874080, 1.903127…
## $ freq.median <dbl> 4.106908, 3.744148, 4.157787, 4.316218, 4.539256, 4.525018…
## $ freq.Q25    <dbl> 3.114309, 2.799090, 2.933548, 3.283657, 3.263085, 3.113488…
## $ freq.Q75    <dbl> 4.992325, 5.065020, 5.018735, 5.242329, 6.113981, 5.967368…
## $ freq.IQR    <dbl> 1.878015, 2.265930, 2.085187, 1.958673, 2.850895, 2.853880…
## $ time.median <dbl> 0.09390873, 0.09818989, 0.09419643, 0.08540507, 0.07958632…
## $ time.Q25    <dbl> 0.04829592, 0.05314866, 0.04844388, 0.04225304, 0.04250633…
## $ time.Q75    <dbl> 0.1466765, 0.1396278, 0.1462287, 0.1420421, 0.1175707, 0.1…
## $ time.IQR    <dbl> 0.09838057, 0.08647917, 0.09778486, 0.09978908, 0.07506437…
## $ skew        <dbl> 2.312270, 2.415746, 1.805732, 2.205839, 1.796657, 1.766370…
## $ kurt        <dbl> 9.950427, 11.207737, 6.606223, 9.014162, 7.522156, 7.14385…
## $ sp.ent      <dbl> 0.9416864, 0.9436562, 0.9395492, 0.9406724, 0.9496999, 0.9…
## $ time.ent    <dbl> 0.8827616, 0.8870927, 0.8842926, 0.8860298, 0.8891064, 0.8…
## $ entropy     <dbl> 0.8312847, 0.8371105, 0.8308364, 0.8334637, 0.8443843, 0.8…
## $ sfm         <dbl> 0.6051624, 0.6396942, 0.5738982, 0.5991476, 0.6141410, 0.6…
## $ meandom     <dbl> 3.728250, 3.458416, 3.490407, 3.845556, 4.110351, 3.755370…
## $ mindom      <dbl> 2.275000, 0.875000, 0.875000, 2.275000, 2.391667, 0.525000…
## $ maxdom      <dbl> 4.841667, 5.075000, 5.425000, 5.075000, 7.291667, 7.291667…
## $ dfrange     <dbl> 2.566667, 4.200000, 4.550000, 2.800000, 4.900000, 6.766667…
## $ modindx     <dbl> 16.863636, 7.194444, 11.307692, 18.333333, 12.928571, 9.91…
## $ startdom    <dbl> 4.608333, 1.225000, 0.875000, 3.791667, 5.191667, 5.541667…
## $ enddom      <dbl> 3.3250000, 3.2083333, 3.5583333, 3.5583333, 4.6083333, 1.2…
## $ dfslope     <dbl> -5.9791080, 10.9556803, 13.3548542, -1.2419399, -3.4129451…
## $ meanpeakf   <dbl> 4.141667, 3.325000, 4.258333, 4.375000, 4.608333, 2.508333…
## $ peakf       <dbl> 4.14631312, 3.24191609, 4.05568720, 4.30036206, 3.12994163…
ameast_df <- acoustic_measuremnts %>%
  # Add back the range column
  inner_join(
    nat_inv_est %>%
      as_tibble() %>%
      dplyr::select(range, year, sound.files),
    by = "sound.files"
  ) %>%
  # Create a range-year column
  dplyr::mutate(
    range_year = paste(range, year, sep = "_")
  ) %>%
  dplyr::select(-c(year)) %>%
  # Filter by calls used in the frequency modulation analysis
  filter(sound.files %in% peaks_temp_df$sound.files) %>%
  # Join with the frequency modulation measurements
  full_join(
    num_peaks_tdf,
    by = "sound.files"
  ) %>%
  full_join(
    modRate_tdf,
    by = "sound.files"
  ) %>%
  full_join(
    peakTroughSlope_tdf,
    by = "sound.files"
  ) 
  
glimpse(ameast_df)
## Rows: 115
## Columns: 33
## $ sound.files         <chr> "2017_05_21_PLVE_1005_2.WAV", "2017_05_21_PLVE_100…
## $ duration            <dbl> 0.1561354, 0.1656367, 0.1796856, 0.1542976, 0.1794…
## $ meanfreq            <dbl> 4.151223, 3.808729, 4.141827, 3.912671, 3.872825, …
## $ sd                  <dbl> 1.877118, 1.537948, 1.599546, 1.543469, 1.466119, …
## $ freq.median         <dbl> 4.080633, 3.790128, 4.012443, 3.942792, 3.593443, …
## $ freq.Q25            <dbl> 2.709872, 2.914773, 2.932547, 2.762777, 2.935738, …
## $ freq.Q75            <dbl> 5.086285, 4.375710, 4.769483, 4.746758, 4.351475, …
## $ freq.IQR            <dbl> 2.376413, 1.460938, 1.836935, 1.983982, 1.415738, …
## $ time.median         <dbl> 0.06718847, 0.07332887, 0.08759500, 0.07807336, 0.…
## $ time.Q25            <dbl> 0.03359424, 0.04254885, 0.05327944, 0.03812885, 0.…
## $ time.Q75            <dbl> 0.1153099, 0.1086354, 0.1182984, 0.1152944, 0.1253…
## $ time.IQR            <dbl> 0.08171571, 0.06608651, 0.06501897, 0.07716553, 0.…
## $ skew                <dbl> 2.470123, 2.159739, 2.303599, 1.797583, 2.263534, …
## $ kurt                <dbl> 11.029776, 7.635150, 8.583492, 6.169945, 8.442140,…
## $ sp.ent              <dbl> 0.9507935, 0.9196218, 0.9271719, 0.9265415, 0.9097…
## $ time.ent            <dbl> 0.8926133, 0.8900954, 0.8870764, 0.8930942, 0.8872…
## $ entropy             <dbl> 0.8486909, 0.8185512, 0.8224723, 0.8274888, 0.8072…
## $ sfm                 <dbl> 0.6920527, 0.5146865, 0.5469261, 0.5295827, 0.4444…
## $ meandom             <dbl> 3.647351, 3.381410, 3.461013, 3.669542, 3.383333, …
## $ mindom              <dbl> 2.2750000, 0.5250000, 0.8750000, 1.9250000, 0.5250…
## $ maxdom              <dbl> 4.958333, 4.491667, 5.075000, 5.191667, 4.375000, …
## $ dfrange             <dbl> 2.683333, 3.966667, 4.200000, 3.266667, 3.850000, …
## $ modindx             <dbl> 14.347826, 5.911765, 10.055556, 11.321429, 8.36363…
## $ startdom            <dbl> 2.3916667, 2.7416667, 4.3750000, 2.2750000, 0.5250…
## $ enddom              <dbl> 3.3250000, 0.5250000, 3.2083333, 4.7250000, 3.0916…
## $ dfslope             <dbl> 5.9777166, -13.3826988, -6.4928236, 15.8784059, 14…
## $ meanpeakf           <dbl> 4.258333, 4.025000, 4.025000, 4.608333, 3.208333, …
## $ peakf               <dbl> 4.245433, 2.879236, 4.101161, 4.354276, 3.080760, …
## $ range               <fct> Native, Native, Native, Native, Native, Native, Na…
## $ range_year          <chr> "Native_2017", "Native_2017", "Native_2017", "Nati…
## $ num_peaks           <int> 5, 7, 7, 5, 6, 6, 5, 7, 6, 7, 6, 6, 6, 6, 6, 4, 5,…
## $ modulation_rate     <dbl> 32.02348, 42.26115, 38.95694, 32.40491, 33.43053, …
## $ max_peak_trgh_slope <dbl> -0.2642049, -0.5134121, -0.4015570, -0.4108509, -0…

I did not assess normality per acoustic measurement per range-year, since I proceeded with just summary statistics for the temporal analysis.

Calculate median and interquartile range per acoustic measurement per range-year.

ameast_ss <- ameast_df %>%
  dplyr::select(-c(sound.files)) %>%
  dplyr::mutate(
    range = factor(range, levels = c("Native", "Invasive"))
  ) %>%
  # Convert to long format
  pivot_longer(
    cols = names(.)[-grep("^range$|^range_year$|^sound.files$", names(.))],
    names_to = "measurement",
    values_to = "values"
  ) %>%
  group_by(measurement, range_year) %>%
  dplyr::summarise(
    median = round(median(values), 2),
    IQR = round(IQR(values), 2)
  ) %>%
  # Convert to long format again
  pivot_longer(
    cols = c(median, IQR),
    names_to = "statistic",
    values_to = "values"
  ) %>%
  # Make wider to split out the mean and se columns
  pivot_wider(
    names_from = c(statistic, range_year),
    values_from = values
  ) %>%
  ungroup()
## `summarise()` has grouped output by 'measurement'. You can override using the `.groups` argument.
ameast_ss %>%
  kable()
measurement median_Invasive_2004 IQR_Invasive_2004 median_Invasive_2011 IQR_Invasive_2011 median_Invasive_2019 IQR_Invasive_2019 median_Native_2017 IQR_Native_2017
dfrange 4.08 0.58 4.90 1.22 4.49 0.55 4.03 1.75
dfslope 8.47 16.86 0.00 15.03 -1.50 6.07 2.00 19.88
duration 0.16 0.02 0.17 0.02 0.16 0.01 0.17 0.01
enddom 2.51 2.92 0.99 1.98 0.82 0.47 3.03 2.62
entropy 0.85 0.01 0.84 0.01 0.84 0.01 0.83 0.01
freq.IQR 2.26 0.48 2.52 0.60 2.10 0.53 2.03 0.29
freq.median 3.71 0.22 4.00 0.47 3.99 0.31 3.96 0.41
freq.Q25 2.67 0.31 2.85 0.46 2.79 0.51 2.91 0.48
freq.Q75 4.95 0.50 5.28 0.70 4.99 0.44 4.96 0.64
kurt 6.89 5.86 7.10 4.13 7.87 2.64 8.01 2.28
max_peak_trgh_slope -0.32 0.09 -0.30 0.15 -0.31 0.07 -0.41 0.14
maxdom 4.72 0.58 5.60 1.46 5.13 0.35 4.96 0.38
meandom 3.28 0.34 3.44 0.66 3.50 0.29 3.48 0.36
meanfreq 3.96 0.35 4.15 0.45 3.98 0.34 4.05 0.41
meanpeakf 3.21 1.28 3.67 1.25 4.38 0.96 4.08 0.47
mindom 0.64 0.35 0.52 0.35 0.52 0.23 0.76 1.43
modindx 8.94 3.56 9.18 3.93 8.76 2.97 10.34 4.80
modulation_rate 27.52 6.10 24.87 6.33 24.90 7.89 32.83 4.89
num_peaks 5.00 1.00 4.00 1.00 4.00 1.25 6.00 1.00
peakf 2.86 0.84 2.69 1.62 3.35 2.42 3.12 1.37
sd 1.87 0.13 1.88 0.20 1.83 0.19 1.69 0.16
sfm 0.66 0.05 0.64 0.08 0.65 0.08 0.58 0.08
skew 1.79 0.70 1.89 0.59 2.00 0.46 2.07 0.34
sp.ent 0.95 0.01 0.95 0.01 0.95 0.01 0.94 0.02
startdom 0.99 1.52 0.99 2.86 1.11 0.85 2.22 2.28
time.ent 0.89 0.01 0.89 0.00 0.89 0.00 0.89 0.00
time.IQR 0.07 0.01 0.08 0.01 0.07 0.01 0.08 0.01
time.median 0.07 0.01 0.08 0.02 0.07 0.01 0.08 0.01
time.Q25 0.04 0.01 0.04 0.01 0.04 0.01 0.05 0.01
time.Q75 0.11 0.01 0.12 0.02 0.11 0.01 0.12 0.02

Appendix Figure 1

A raincloud plot for acoustic measurements that displayed the largest effect sizes of range, shown here for range-years in the temporal analysis.

# Filter the measurements data frame by these variables, and rename these
ameast_df_tmp <- ameast_df %>%
  # Convert to long format
  pivot_longer(
    cols = names(.)[-grep("^range$|^range_year$|^sound.files$", names(.))],
    names_to = "measurement",
    values_to = "values"
  ) %>%
  filter(measurement %in% topms) %>%
  dplyr::mutate(
    measurement_nm = measurement,
    # Add new line symbols to all measurement names
    measurement_nm = recode(
      measurement_nm,
      `sd` = "Standard\n deviation\n of frequency",
      `num_peaks` = "Number of\n peaks",
      `max_peak_trgh_slope` = "Peak - trough\n slope",
      `modulation_rate` = "Modulation\n rate",
      `sfm` = "Spectral\n flatness",
      `entropy` = "Entropy"
    ),
    measurement_nm = factor(measurement_nm),
    range_year = factor(range_year, levels = c("Native_2017", "Invasive_2004", "Invasive_2011", "Invasive_2019"))
  ) 

ameast_df_tmp

cols <- scales::alpha(c("navy", rep("orange", 3)), 0.75)

# Make a list of plots in order of effect sizes
# Doing this to customize y-axes of each plot, which isn't possible for plots in a single row with facetscales
gg_list <- list()

# i <- 2
invisible(pblapply(1:length(topms), function(i){
  
  tmp_df <- ameast_df_tmp %>%
    filter(measurement == topms[i]) %>%
    dplyr::mutate(
      measurement_nm = as.character(measurement_nm),
      measurement_nm = factor(measurement_nm, levels = unique(measurement_nm))
    ) 
  
  # Get ymin and ymax values, add a buffer of 2 evenly space values before and after 
  ymin <- min(tmp_df$values)
  ymax <- max(tmp_df$values)
  
  # Get the difference between max and min, use this as a buffer on the y-axis scale
  buf <- (ymax - ymin)/5
  
  # If on the first plot, add the y-axis label
  if(i == 1){
    yal <- "Values"
  } else{
    yal <- ""
  }
  
  # Initialize plot margins (top, right, bottom, left)
  tm <- 1
  rm <- 0
  bm <- 1
  lm <- 0
  
  # If on the first plot, make the left margin larger
  if(i == 1){
    lm <- 0.5
  # If on last plot, make the right margin larger
  } else if(i == length(topms)){
    rm <- 0.5
  }
  
  # Raincloud plot: Kernel smoothed with boxplot
  gg <- ggplot(tmp_df, aes(x = range_year, y = values, fill = range_year, color = range_year)) + 
    geom_flat_violin(position = position_nudge(x = 0.1, y = 0), adjust = 1, size = 0.15) +
    geom_point(position = position_jitter(width = 0.05), size = 0.75, stroke = 0.5, alpha = 0.45) +
    geom_boxplot(aes(x = as.numeric(range_year) - 0.22, y = values), outlier.shape = NA, alpha = 0.55, width = 0.15, colour = "black", size = 0.25) +
    facet_wrap(~ measurement_nm) +
    scale_fill_manual(values = cols) +
    scale_color_manual(values = cols) +
    guides(fill = FALSE, color = FALSE) +
    theme_bw() +
    xlab("") + ylab(yal) +
    scale_y_continuous(limits = round(c(ymin - buf, ymax + buf), 2), breaks = round(seq(ymin - buf, (ymax + buf), (ymax - ymin)/5), 2)) +
    theme(
      axis.title = element_text(size = 14),
      axis.text.y = element_text(size = 14),
      axis.text.x = element_text(size = 14, angle = 35, hjust = 1),
      strip.text = element_text(size = 16, margin = margin(0.5, 2, 0.5, 2, "lines")),
      panel.grid.major = element_blank(),
      panel.grid.minor = element_blank(),
      axis.ticks = element_line(size = 0.25),
      plot.margin = unit(c(tm, rm, bm, lm), "lines")
    )
  
  # gg
  
  # Return the given plot
  gg_list[[i]] <<- gg
  
}))

# Get the legend
cols <- scales::alpha(c("navy", "orange"), 0.85)

gg_leg <- gtable::gtable_filter(ggplot_gtable(ggplot_build(
  ameast_df_tmp %>%
    ggplot(aes(x = range, y = values)) +
    geom_errorbar(aes(ymin = values, ymax = values, color = range), size = 2, width = 0.25) +
    scale_color_manual(values = cols) +
    guides(color = guide_legend(title = "", override.aes = list(size = 2.5))) +
    theme_bw() +
    theme(
      legend.text = element_text(size = 16),
      legend.position = "top",
      legend.margin = margin(0, 0, 0, 0),
      legend.box.margin = margin(-10, -10, -10, -10),
      legend.key.width = unit(3, "lines")
    )
)), "guide-box")

dev.off()

# Arrange plots into a single image file
jpeg(file.path(gpath, "SupplementaryFigure2_TemporalAcousticMeasrmnts.jpeg"), units = "in", width = 11, height = 11, res = 300)

ggarrange(
  as.ggplot(gg_leg),
  as.ggplot(ggarrange(
    as.ggplot(gg_list[[1]]),
    as.ggplot(gg_list[[2]]),
    as.ggplot(gg_list[[3]]),
    as.ggplot(gg_list[[4]]),
    as.ggplot(gg_list[[5]]),
    as.ggplot(gg_list[[6]]),
    nrow = 2,
    heights = rep(4, 2)
  )),
  nrow = 2,
  heights = c(1, 10)
)

dev.off()

See the appendix for of the associated article this figure, which shows that structural differentiation among years in the invasive range was not comparable to the level of structural differentiation we reported between ranges for these same acoustic measurements.

Validation of habitat differences

We performed analyses to address whether habitat differences between ranges influenced the structure of contact calls. Native range birds were largely recorded in agricultural settings, although we did record birds in the city of Montevideo (living in large green spaces). In the invasive range, birds were largely recorded in cities (although often near green spaces as well).

Random forests misclassification by native range habitat

Four sites were recorded in the city of Montevideo: Bodegas Carrau (BCAR, a vineyard on the edge of Montevideo next to a busy road), la Facultad de Agronomia (FAGR, a department of la Universidad de la Republica with a lot of trees and gardens), CEME (el Cementerio Central, a green space in the middle of Montevideo, far from agricultural areas), and GOLF (el Club de Golf, a green space in the middle of Montevideo, also far from agricultural areas).

24 native range calls were misclassified as invasive range calls during random forests prediction back to range. The only native range “urban” sites represented in the misclassified calls were BCAR and FAGR, with 1 out of 5 and 1 out of 2 calls misclassified during random forests prediction. 10 other agro-interface sites had anywhere from 1 to 6 calls misclassified during this analysis.

CEME and GOLF had 100% correct classification back to the native range (2 and 7 calls, respectively) during random forests prediction.

This indicates that at least in the native range, there was no clear pattern of urban native range calls being misclassified as invasive range calls (which were mostly recorded in urban areas). If there were a significant difference in call structure between habitats, we would expect to see more native range urban calls misclassified to the invasive range.

See SimplerSignatures_AdditionalMaterials_02_AcousticStructure_SupervisedML.Rmd and the associated RMarkdown output for more information.

Visualizing habitat differences

For random forests analysis, a subset of calls per site were employed for the prediction step. Here I decided to look more closely at structural differences among native and invasive range habitats using all calls per site.

I picked 4 invasive range urban sites from the city of Austin recorded in 2019 (SOCC, MART, INTR, ELEM) and 4 native range non-urban sites recorded in Colonia, Uruguay in 2017 (PIED, INES-08, LENA, INES-03), to match site sample sizes for native range urban sites.

Given the number of calls recorded at invasive range sites, there are more invasive urban calls than in either habitat category for the native range.

nu_sites <- c("BCAR", "FAGR", "GOLF", "CEME")
iu_sites <- c("SOCC", "MART", "INTR", "ELEM")
nnu_sites <- c("PIED", "INES-08", "LENA", "INES-03")

habitat_df <- nat_inv_est %>%
  as_tibble() %>%
  filter(social_scale == "Site") %>%
  filter(year == "2019" | year == "2017") %>%
  filter(site %in% c(nu_sites, iu_sites, nnu_sites)) %>%
  dplyr::mutate(
    habitat = site,
    habitat = gsub("BCAR|FAGR|GOLF|CEME", "Native urban", habitat),
    habitat = gsub("SOCC|MART|INTR|ELEM", "Invasive urban", habitat),
    habitat = gsub("PIED|INES-08|LENA|INES-03", "Native agro-interface", habitat)
  )
  
glimpse(habitat_df)
## Rows: 416
## Columns: 27
## $ sound.files          <chr> "2017_07_11_INES-03_1068_1.WAV", "2017_07_11_INES…
## $ selec                <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ start                <dbl> 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0…
## $ end                  <dbl> 0.2476383, 0.2166655, 0.2379962, 0.2158367, 0.242…
## $ date                 <date> 2017-07-11, 2017-07-11, 2017-07-11, 2017-07-11, …
## $ range                <fct> Native, Native, Native, Native, Native, Native, N…
## $ social_scale         <chr> "Site", "Site", "Site", "Site", "Site", "Site", "…
## $ site                 <chr> "INES-03", "INES-03", "INES-03", "INES-03", "INES…
## $ Bird_ID              <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ lat                  <dbl> -34.33610, -34.33610, -34.33610, -34.33610, -34.3…
## $ lon                  <dbl> -57.66821, -57.66821, -57.66821, -57.66821, -57.6…
## $ Visual_quality_score <fct> H, H, H, H, H, H, H, H, H, H, H, H, H, H, H, H, H…
## $ Overlapping_signal   <fct> N, N, N, N, N, N, Y, Y, N, N, N, N, N, Y, N, N, N…
## $ Truncated            <chr> "N", "N", "N", "N", "N", "N", "N", "N", "N", "N",…
## $ tailored             <fct> y, y, y, y, y, y, y, y, y, y, y, y, y, y, y, y, y…
## $ prev_SNR             <dbl> 22.77419, 30.64710, 28.94328, 21.34350, 30.88052,…
## $ SNR_margin_works     <chr> "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y",…
## $ SNR_before_only      <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ old.sound.file.name  <chr> "2017_07_11_INES-03_1068_1.WAV", "2017_07_11_INES…
## $ SNR                  <dbl> 27.42401, 28.77102, 34.18083, 21.63641, 28.61547,…
## $ region               <chr> "West", "West", "West", "West", "West", "West", "…
## $ country              <chr> "Uruguay", "Uruguay", "Uruguay", "Uruguay", "Urug…
## $ site_year            <chr> "INES-03_2017", "INES-03_2017", "INES-03_2017", "…
## $ year                 <chr> "2017", "2017", "2017", "2017", "2017", "2017", "…
## $ dept_state           <chr> "Colonia", "Colonia", "Colonia", "Colonia", "Colo…
## $ invasive_city        <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ habitat              <chr> "Native agro-interface", "Native agro-interface",…
# Sample sizes by habitat
habitat_df %>%
  group_by(habitat) %>%
  dplyr::summarise(
    n_calls = n()
  )
## # A tibble: 3 x 2
##   habitat               n_calls
##   <chr>                   <int>
## 1 Invasive urban            286
## 2 Native agro-interface      82
## 3 Native urban               48

What was the maximum number of calls recorded at any of these native range sites? 27 calls was the maximum recorded.

habitat_df %>%
  filter(range == "Native") %>%
  group_by(habitat, site) %>%
  dplyr::summarise(
    n_calls = n()
  )
## `summarise()` has grouped output by 'habitat'. You can override using the `.groups` argument.
## # A tibble: 8 x 3
## # Groups:   habitat [2]
##   habitat               site    n_calls
##   <chr>                 <chr>     <int>
## 1 Native agro-interface INES-03      15
## 2 Native agro-interface INES-08      27
## 3 Native agro-interface LENA         19
## 4 Native agro-interface PIED         21
## 5 Native urban          BCAR         13
## 6 Native urban          CEME          6
## 7 Native urban          FAGR          7
## 8 Native urban          GOLF         22

Which invasive range sites here had more than 27 calls? All sites.

habitat_df %>%
  filter(range == "Invasive") %>%
  group_by(habitat, site) %>%
  dplyr::summarise(
    n_calls = n()
  )
## `summarise()` has grouped output by 'habitat'. You can override using the `.groups` argument.
## # A tibble: 4 x 3
## # Groups:   habitat [1]
##   habitat        site  n_calls
##   <chr>          <chr>   <int>
## 1 Invasive urban ELEM       61
## 2 Invasive urban INTR       82
## 3 Invasive urban MART       50
## 4 Invasive urban SOCC       93

20 calls were randomly sampled from each of these invasive range sites, so that sample sizes would be more balanced for effect size calculations across habitat types.

set.seed(seed)

habitat_dfr <- habitat_df %>%
  filter(site %in% iu_sites) %>%
  droplevels() %>%
  # Random sampling by site
  group_by(site) %>%
  nest() %>%
  ungroup() %>%
  dplyr::mutate(
    rsamp_calls = purrr::map2(data, 20, sample_n, replace = FALSE)
  ) %>%
  dplyr::select(-data) %>%
  unnest(rsamp_calls) %>%
  # Add back the other two habitat categories
  bind_rows(
    habitat_df %>%
      filter(!site %in% iu_sites) %>%
      droplevels()
  )

# Check sample sizes by site, looks good
habitat_dfr %>%
  group_by(habitat, site) %>%
  dplyr::summarise(
    n_calls = n()
  )
## `summarise()` has grouped output by 'habitat'. You can override using the `.groups` argument.
## # A tibble: 12 x 3
## # Groups:   habitat [3]
##    habitat               site    n_calls
##    <chr>                 <chr>     <int>
##  1 Invasive urban        ELEM         20
##  2 Invasive urban        INTR         20
##  3 Invasive urban        MART         20
##  4 Invasive urban        SOCC         20
##  5 Native agro-interface INES-03      15
##  6 Native agro-interface INES-08      27
##  7 Native agro-interface LENA         19
##  8 Native agro-interface PIED         21
##  9 Native urban          BCAR         13
## 10 Native urban          CEME          6
## 11 Native urban          FAGR          7
## 12 Native urban          GOLF         22
# Median and range of calls per site 
habitat_dfr %>%
  group_by(site) %>%
  dplyr::summarise(
    n_calls = n()
  ) %>%
  dplyr::summarise(
    median_calls = median(n_calls),
    range_calls = paste(range(n_calls), collapse = " - ")
  )
## # A tibble: 1 x 2
##   median_calls range_calls
##          <dbl> <chr>      
## 1           20 6 - 27
# Check sample sizes by habitat, looks good
habitat_dfr %>%
  group_by(habitat) %>%
  dplyr::summarise(
    n_calls = n()
  )
## # A tibble: 3 x 2
##   habitat               n_calls
##   <chr>                   <int>
## 1 Invasive urban             80
## 2 Native agro-interface      82
## 3 Native urban               48

Principal components analysis was performed to summarize acoustic parameters into linear principal components during feature extraction for supervised machine learning. Using previous PCA results so as to evaluate habitat differences in the same acoustic space as the full dataset of calls between ranges.

# Image features were obtained for a complementary analysis, not used here
# Obtained just principal components representing the standard set of acoustic parameters above
pca_feats <- read.csv(file.path(path, "supervised_RF_datasets_acousticSimilarity.csv")) %>%
  dplyr::select(-c(names(.)[grep("rf_set|img_params|SPCC|DTW|cep", names(.))]))
dim(pca_feats)
## [1] 1596   39
glimpse(pca_feats)
## Rows: 1,596
## Columns: 39
## $ sound.files       <chr> "2017_07_17_EMBR_INDIV-RAW_MZ000199_1.WAV", "2017_07…
## $ date              <chr> "2017-07-17", "2017-07-17", "2017-07-17", "2017-07-1…
## $ range             <chr> "Native", "Native", "Native", "Native", "Native", "N…
## $ social_scale      <chr> "Individual", "Individual", "Individual", "Individua…
## $ site              <chr> "EMBR", "EMBR", "EMBR", "EMBR", "EMBR", "EMBR", "EMB…
## $ Bird_ID           <chr> "NAT-RAW", "NAT-RAW", "NAT-RAW", "NAT-RAW", "NAT-ZW8…
## $ region            <chr> "West", "West", "West", "West", "West", "West", "Wes…
## $ country           <chr> "Uruguay", "Uruguay", "Uruguay", "Uruguay", "Uruguay…
## $ site_year         <chr> "EMBR_2017", "EMBR_2017", "EMBR_2017", "EMBR_2017", …
## $ year              <int> 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017…
## $ dept_state        <chr> "Colonia", "Colonia", "Colonia", "Colonia", "Colonia…
## $ invasive_city     <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ acs_params_PCA_1  <dbl> 5.9885332, 2.0683667, 3.9787413, 4.3710869, -0.43713…
## $ acs_params_PCA_2  <dbl> -1.63804315, 0.03891284, -1.00282905, -2.07614285, -…
## $ acs_params_PCA_3  <dbl> 2.03156634, 2.47540887, 2.43952602, 0.31692257, -0.4…
## $ acs_params_PCA_4  <dbl> -0.16409658, 1.95677637, -1.51760292, -0.67394465, -…
## $ acs_params_PCA_5  <dbl> 0.68805443, -1.23680222, -1.62094013, -0.01851360, -…
## $ acs_params_PCA_6  <dbl> 1.01881183, -1.39264059, -1.52794591, 1.71705068, 1.…
## $ acs_params_PCA_7  <dbl> 0.96469499, -0.57166081, 0.02975126, 1.09039057, -1.…
## $ acs_params_PCA_8  <dbl> -0.67114660, 0.81857321, -0.45827139, -0.81428595, -…
## $ acs_params_PCA_9  <dbl> 0.33815298, -0.61494962, 0.26448579, 0.52574510, -0.…
## $ acs_params_PCA_10 <dbl> -0.07929202, 0.80063497, 0.06309654, -0.24800356, 0.…
## $ acs_params_PCA_11 <dbl> 1.1576928, 1.4581693, 0.6292717, 0.3026599, -1.27259…
## $ acs_params_PCA_12 <dbl> -0.0001338046, 0.5145824323, 0.7239512929, 0.2639849…
## $ acs_params_PCA_13 <dbl> 0.240945995, -0.460497838, -0.733393218, -0.88481842…
## $ acs_params_PCA_14 <dbl> -0.189424566, 0.503279858, 0.125130993, 0.550569621,…
## $ acs_params_PCA_15 <dbl> 0.61256426, 0.42286280, 0.65426949, 1.00034595, -0.2…
## $ acs_params_PCA_16 <dbl> -0.06222089, -0.27997128, -0.20473404, 0.07761578, -…
## $ acs_params_PCA_17 <dbl> 0.11750926, 0.38298396, -0.40083812, -0.28566724, 0.…
## $ acs_params_PCA_18 <dbl> 0.132192321, 0.072859437, 0.322546799, -0.120827960,…
## $ acs_params_PCA_19 <dbl> 0.180281107, -0.151500881, -0.038306170, 0.230016806…
## $ acs_params_PCA_20 <dbl> 0.174729821, -0.070735230, -0.461659616, -0.05690037…
## $ acs_params_PCA_21 <dbl> -0.096562879, -0.043733452, -0.035746983, -0.1414701…
## $ acs_params_PCA_22 <dbl> 2.217414e-01, 2.627062e-01, 2.135996e-01, 6.690039e-…
## $ acs_params_PCA_23 <dbl> -0.601176618, 0.220473691, -0.246015159, 0.053751392…
## $ acs_params_PCA_24 <dbl> 0.02887520, -0.13341300, 0.06532388, 0.05138502, 0.0…
## $ acs_params_PCA_25 <dbl> 0.023057755, 0.020047637, 0.075901896, 0.067463045, …
## $ acs_params_PCA_26 <dbl> -1.049769e-03, 1.049868e-03, -1.696452e-05, -9.01268…
## $ acs_params_PCA_27 <dbl> 1.776357e-15, 2.131628e-14, 1.154632e-14, 1.021405e-…
# Write this filtered .csv out for data sharing. Use the following .csv to reproduce most results below
# write.csv(pca_feats, file.path(path, "PCA_spectral_acoustic_measurements.csv"), row.names = FALSE)

# Loadings on the first principal component (PC)?
pp_pca <- readRDS(file.path(path, "PCA_results_Site_acs_params.RDS"))
str(pp_pca)
## List of 5
##  $ sdev    : num [1:27] 2.49 2.4 2 1.62 1.35 ...
##  $ rotation: num [1:27, 1:27] 0.2036 -0.1723 -0.3044 -0.1423 -0.0518 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:27] "duration" "meanfreq" "sd" "freq.median" ...
##   .. ..$ : chr [1:27] "PC1" "PC2" "PC3" "PC4" ...
##  $ center  : logi FALSE
##  $ scale   : logi FALSE
##  $ x       : num [1:1367, 1:27] 5.347 0.991 6.38 5.825 2.353 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:1367] "98" "99" "100" "101" ...
##   .. ..$ : chr [1:27] "PC1" "PC2" "PC3" "PC4" ...
##  - attr(*, "class")= chr "prcomp"
# Acoustic parameters that loaded onto the first 2 PCs
pp_pca$rotation[, "PC1"][order(abs(pp_pca$rotation[, "PC1"]), decreasing = TRUE)]
##      entropy          sfm       sp.ent           sd     freq.IQR     freq.Q75 
## -0.365155312 -0.337332119 -0.324184933 -0.304423904 -0.267466233 -0.262524879 
##     time.Q75     time.IQR  time.median     duration      dfrange     time.ent 
##  0.224571825  0.209819382  0.204904018  0.203649858 -0.192297488 -0.190185831 
##     meanfreq     time.Q25       maxdom  freq.median      modindx       mindom 
## -0.172324538  0.161829425 -0.150381998 -0.142325545  0.136822931  0.115786377 
##         skew       enddom         kurt      meandom     startdom     freq.Q25 
##  0.102608743  0.087996940  0.067925934 -0.054672043  0.053189174 -0.051805943 
##        peakf      dfslope    meanpeakf 
##  0.027830773  0.022599338 -0.004556612
pp_pca$rotation[, "PC2"][order(abs(pp_pca$rotation[, "PC2"]), decreasing = TRUE)]
##     meanfreq  freq.median      meandom     freq.Q25     freq.Q75     time.Q75 
## -0.336649960 -0.329303473 -0.305010812 -0.298926477 -0.275884364 -0.252896629 
##     time.ent     duration  time.median     time.IQR    meanpeakf     time.Q25 
##  0.252335920 -0.248283339 -0.234116340 -0.222477619 -0.216000675 -0.204539705 
##       maxdom        peakf       enddom     startdom       mindom       sp.ent 
## -0.194459229 -0.166982562 -0.126757674 -0.113831591 -0.107940427 -0.107756233 
##          sfm      modindx      dfrange     freq.IQR      entropy         kurt 
## -0.091160526 -0.085606731 -0.048086839 -0.038564164 -0.009743798 -0.007963251 
##           sd         skew      dfslope 
## -0.005504976 -0.005296572  0.003268564
# PCs 1 and 2 explained 44.3% of variation
summary(pp_pca)
## Importance of components:
##                           PC1    PC2    PC3     PC4     PC5     PC6     PC7
## Standard deviation     2.4940 2.3950 2.0029 1.61618 1.34666 1.29938 1.07556
## Proportion of Variance 0.2304 0.2124 0.1486 0.09674 0.06717 0.06253 0.04285
## Cumulative Proportion  0.2304 0.4428 0.5914 0.68812 0.75529 0.81782 0.86067
##                            PC8     PC9    PC10   PC11   PC12    PC13   PC14
## Standard deviation     0.86061 0.82710 0.71317 0.6795 0.5693 0.51315 0.4618
## Proportion of Variance 0.02743 0.02534 0.01884 0.0171 0.0120 0.00975 0.0079
## Cumulative Proportion  0.88810 0.91344 0.93227 0.9494 0.9614 0.97113 0.9790
##                           PC15    PC16    PC17    PC18    PC19    PC20    PC21
## Standard deviation     0.40283 0.31672 0.31417 0.20423 0.19647 0.18151 0.17154
## Proportion of Variance 0.00601 0.00372 0.00366 0.00154 0.00143 0.00122 0.00109
## Cumulative Proportion  0.98504 0.98876 0.99241 0.99396 0.99539 0.99661 0.99770
##                          PC22    PC23    PC24    PC25     PC26      PC27
## Standard deviation     0.1642 0.13648 0.09868 0.08307 0.002677 1.532e-14
## Proportion of Variance 0.0010 0.00069 0.00036 0.00026 0.000000 0.000e+00
## Cumulative Proportion  0.9987 0.99938 0.99974 1.00000 1.000000 1.000e+00
# Join with the habitat data frame
habitat_pca <- habitat_dfr %>%
  dplyr::select(habitat, sound.files, site, year, range) %>%
  inner_join(
    pca_feats %>%
      dplyr::select(sound.files, names(.)[grep("acs_params", names(.))]),
    by = "sound.files"
    )
# glimpse(habitat_pca)

Figure A3

Acoustic space was visualized with the first two principal components to assess separation of calls in acoustic space by habitat type.

I parameterized density kernels similarly to the random forests main visual, in which contours delineated bins of 1/10th of the total density.

gg_df <- habitat_pca %>%
  dplyr::mutate(
    habitat = factor(habitat, levels = c("Native agro-interface", "Native urban", "Invasive urban"))
  ) %>%
  dplyr::rename(
    `X` = "acs_params_PCA_1",
    `Y` = "acs_params_PCA_2"
  ) %>%
  dplyr::select(-c(names(.)[grep("acs_params_PCA_", names(.))]))

h <- 5 # bandwidth (kernel density estimator or KDE width)

# Get density values and bin them to check out bins used by geom_density2d

# Native agro-interface
nat_nu_dens <- kde2d(
  x = gg_df %>%
      filter(habitat == "Native agro-interface") %>%
      pull(X),
  y = gg_df %>%
      filter(habitat == "Native agro-interface") %>%
      pull(Y),
  h = h
)

str(nat_nu_dens)
## List of 3
##  $ x: num [1:25] -2.475 -2.074 -1.673 -1.272 -0.872 ...
##  $ y: num [1:25] -3.58 -3.31 -3.04 -2.77 -2.5 ...
##  $ z: num [1:25, 1:25] 0.00388 0.00528 0.00672 0.0081 0.00934 ...
# Native urban
nat_u_dens <- kde2d(
  x = gg_df %>%
      filter(habitat == "Native urban") %>%
      pull(X),
  y = gg_df %>%
      filter(habitat == "Native urban") %>%
      pull(Y),
  h = h
)

str(nat_u_dens)
## List of 3
##  $ x: num [1:25] -5.11 -4.69 -4.27 -3.85 -3.43 ...
##  $ y: num [1:25] -3.08 -2.77 -2.46 -2.15 -1.84 ...
##  $ z: num [1:25, 1:25] 0.000426 0.000694 0.001104 0.001698 0.002491 ...
# Inasive urban
inv_u_dens <- kde2d(
  x = gg_df %>%
      filter(habitat == "Invasive urban") %>%
      pull(X),
  y = gg_df %>%
      filter(habitat == "Invasive urban") %>%
      pull(Y),
  h = h
)

str(inv_u_dens)
## List of 3
##  $ x: num [1:25] -4.78 -4.41 -4.04 -3.67 -3.29 ...
##  $ y: num [1:25] -3.18 -2.83 -2.49 -2.14 -1.8 ...
##  $ z: num [1:25, 1:25] 0.00471 0.00561 0.0064 0.00706 0.00755 ...
range(nat_nu_dens)
## [1] -3.580786  7.143172
range(nat_u_dens)
## [1] -5.111697  4.964518
range(inv_u_dens)
## [1] -4.782957  5.109649
bw_nat_nu <- max(nat_nu_dens$z)/10
bw_nat_u <- max(nat_u_dens$z)/10
bw_inv_u <- max(inv_u_dens$z)/10

bw_nat_nu
## [1] 0.003090135
bw_nat_u
## [1] 0.003293136
bw_inv_u
## [1] 0.003497231
brks_nat_nu <- seq(min(nat_nu_dens$z), max(nat_nu_dens$z), bw_nat_nu)
brks_nat_u <- seq(min(nat_u_dens$z), max(nat_u_dens$z), bw_nat_u)
brks_inv_u <- seq(min(inv_u_dens$z), max(inv_u_dens$z), bw_inv_u)

# 10 bins each, looks good
length(brks_nat_nu)
## [1] 10
length(brks_nat_u)
## [1] 10
length(brks_inv_u)
## [1] 10
levels(gg_df$habitat)
## [1] "Native agro-interface" "Native urban"          "Invasive urban"
cols <- scales::alpha(c("navy", "dodgerblue", "orange"), 0.65)
shps <- c(21, 24, 25)

gg <- ggplot(data = gg_df, aes(x = X, y = Y, fill = habitat, color = habitat)) +
  geom_density2d(
    data = gg_df %>%
      filter(habitat == "Native agro-interface"),
    size = 0.65,
    binwidth = bw_nat_nu,
    h = h,
    color = cols[1]
    ) +
  geom_density2d(
    data = gg_df %>%
      filter(habitat == "Native urban"),
    size = 0.65,
    binwidth = bw_nat_u,
    h = h,
    color = cols[2]
    ) +
  geom_density2d(
    data = gg_df %>%
      filter(habitat == "Invasive urban"),
    size = 0.65,
    binwidth = bw_inv_u,
    h = h,
    color = cols[3]
    ) +
  xlab("Principal Component 1") + 
  ylab("Principal Component 2") + 
  theme_bw() +
  theme(
    legend.position = "top",
    axis.title = element_text(size = 14),
    axis.text = element_text(size = 12),
    panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
    axis.ticks = element_line(size = 0.35),
    legend.margin = margin(0, 0, 0, 0),
    legend.box.margin = margin(-5, -5, -5, -5),
    legend.key.width = unit(3, "lines")
  )

# Get the legend
gg_leg <- gtable::gtable_filter(ggplot_gtable(ggplot_build(
  gg_df %>%
    ggplot(aes(x = X, y = Y, color = habitat)) +
    geom_line() +
    scale_color_manual(values = cols) +
    guides(color = guide_legend(title = "", override.aes = list(size = 2))) +
    theme_bw() +
    theme(
      legend.text = element_text(size = 12),
      legend.position = "top",
      legend.margin = margin(0, 0, 0, 0),
      legend.box.margin = margin(-10, -10, -10, -10),
      legend.key.width = unit(2, "lines")
    )
)), "guide-box")

dev.off()
## null device 
##           1
# Arrange the legend and plots into a single image file
# 1.5 column, 140mm or 5.51 in
# jpeg(file.path(gpath, "SimplerSigs_HabitatPCA.jpeg"), units = "in", width = 5.51, height = 4, res = 600)
# ggarrange(
#   as.ggplot(gg_leg),
#   as.ggplot(gg),
#   nrow = 2,
#   heights = c(1, 9)
# )
# dev.off()

See the appendix for this figure. The centers of the density contours for the native and invasive range urban calls were closer together compared to the native range agro-interface calls, indicating structural similarities by urban habitat.

Assessing the effect of habitat on standard acoustic parameters

Which acoustic measurements were the most different among calls in each habitat? Using the same calls selected for validation as above.

Get the standard set of 27 spectral acoustic measurements measured for supervised machine learning, merge with these calls. 210 calls total.

acs_params_habitat <- habitat_dfr %>%
  dplyr::select(habitat, sound.files, site, year, range) %>%
  inner_join(
    read.csv(file.path(path, "acoustic_parameters.csv")) %>%
      dplyr::select(-c(selec)),
    by = "sound.files"
    )
glimpse(acs_params_habitat)
## Rows: 210
## Columns: 32
## $ habitat     <chr> "Invasive urban", "Invasive urban", "Invasive urban", "Inv…
## $ sound.files <chr> "2019_08_06_ELEM_GSV1036_3.WAV", "2019_08_06_ELEM_1039_11.…
## $ site        <chr> "ELEM", "ELEM", "ELEM", "ELEM", "ELEM", "ELEM", "ELEM", "E…
## $ year        <chr> "2019", "2019", "2019", "2019", "2019", "2019", "2019", "2…
## $ range       <fct> Invasive, Invasive, Invasive, Invasive, Invasive, Invasive…
## $ duration    <dbl> 0.1464288, 0.1722485, 0.1812131, 0.2025061, 0.1666182, 0.1…
## $ meanfreq    <dbl> 4.156489, 3.439495, 4.333800, 3.553249, 4.178555, 4.105074…
## $ sd          <dbl> 1.951559, 1.603174, 2.008342, 1.815870, 1.939180, 1.823958…
## $ freq.median <dbl> 4.148714, 3.350751, 4.004870, 3.334980, 3.909605, 4.013333…
## $ freq.Q25    <dbl> 2.870981, 2.299863, 2.829221, 2.248402, 2.769068, 2.931515…
## $ freq.Q75    <dbl> 5.173633, 4.366803, 5.771104, 4.727775, 5.452331, 5.007576…
## $ freq.IQR    <dbl> 2.302653, 2.066940, 2.941883, 2.479372, 2.683263, 2.076061…
## $ time.median <dbl> 0.07048512, 0.07344039, 0.07754854, 0.08603456, 0.07426526…
## $ time.Q25    <dbl> 0.04302338, 0.04261356, 0.04057772, 0.04929063, 0.04618937…
## $ time.Q75    <dbl> 0.10069303, 0.11242726, 0.11812626, 0.13711757, 0.10596384…
## $ time.IQR    <dbl> 0.05766964, 0.06981370, 0.07754854, 0.08782694, 0.05977448…
## $ skew        <dbl> 2.229691, 1.618626, 1.970044, 1.869156, 2.315089, 2.137946…
## $ kurt        <dbl> 11.094742, 6.036978, 9.638473, 9.229402, 9.537073, 8.52949…
## $ sp.ent      <dbl> 0.9574609, 0.9344177, 0.9538707, 0.9484680, 0.9403040, 0.9…
## $ time.ent    <dbl> 0.8951390, 0.8886264, 0.8870844, 0.8838866, 0.8893543, 0.8…
## $ entropy     <dbl> 0.8570606, 0.8303483, 0.8461638, 0.8383381, 0.8362634, 0.8…
## $ sfm         <dbl> 0.7158050, 0.5089952, 0.6766197, 0.6068520, 0.6179022, 0.6…
## $ meandom     <dbl> 3.556159, 3.034860, 3.395232, 2.669714, 3.100140, 3.478226…
## $ mindom      <dbl> 0.5250000, 0.5250000, 0.5250000, 0.5250000, 0.5250000, 0.5…
## $ maxdom      <dbl> 5.191667, 4.608333, 6.358333, 4.258333, 6.125000, 5.191667…
## $ dfrange     <dbl> 4.666667, 4.083333, 5.833333, 3.733333, 5.600000, 4.666667…
## $ modindx     <dbl> 5.375000, 9.914286, 7.600000, 8.562500, 9.166667, 9.700000…
## $ startdom    <dbl> 0.5250000, 0.5250000, 2.3916667, 0.9916667, 0.5250000, 0.5…
## $ enddom      <dbl> 0.6416667, 1.5750000, 0.5250000, 0.7583333, 0.5250000, 0.7…
## $ dfslope     <dbl> 0.7967466, 6.0958446, -10.3009461, -1.1522285, 0.0000000, …
## $ meanpeakf   <dbl> 4.4916667, 3.6750000, 4.0250000, 2.5083333, 4.0250000, 4.1…
## $ peakf       <dbl> 4.43116582, 2.84441227, 3.98902790, 0.33079937, 2.75437474…

I did not assess normality or homogeneity of variance per acoustic measurement per habitat because I decided to use Cliff’s delta (not Cohen’s d) for effect sizes for more direct comparison to effect sizes calculated between ranges above.

Calculate median and interquartile range per acoustic measurement per habitat.

acs_params_habitat_ss <- acs_params_habitat %>%
  dplyr::select(-c(range, site, year)) %>%
  dplyr::mutate(
    habitat = factor(habitat, levels = c("Native agro-interface", "Native urban", "Invasive urban"))
  ) %>%
  # Convert to long format
  pivot_longer(
    cols = names(.)[-grep("^habitat$|^sound.files$", names(.))],
    names_to = "measurement",
    values_to = "values"
  ) %>%
  group_by(measurement, habitat) %>%
  dplyr::summarise(
    median = round(median(values), 2),
    IQR = round(IQR(values), 2)
  ) %>%
  # Convert to long format again
  pivot_longer(
    cols = c(median, IQR),
    names_to = "statistic",
    values_to = "values"
  ) %>%
  # Make wider to split out the mean and se columns
  pivot_wider(
    names_from = c(statistic, habitat),
    values_from = values
  ) %>%
  ungroup()
## `summarise()` has grouped output by 'measurement'. You can override using the `.groups` argument.
acs_params_habitat_ss %>%
  kable()
measurement median_Native agro-interface IQR_Native agro-interface median_Native urban IQR_Native urban median_Invasive urban IQR_Invasive urban
dfrange 3.97 1.75 3.97 0.73 4.55 0.38
dfslope 1.91 16.01 2.06 16.76 0.00 3.19
duration 0.18 0.02 0.16 0.02 0.17 0.01
enddom 3.15 2.19 3.03 2.89 0.76 0.70
entropy 0.83 0.01 0.84 0.02 0.84 0.01
freq.IQR 1.97 0.39 2.13 0.37 2.22 0.30
freq.median 3.96 0.35 3.91 0.38 3.94 0.30
freq.Q25 2.91 0.33 2.81 0.29 2.75 0.39
freq.Q75 4.95 0.47 4.88 0.43 4.99 0.48
kurt 7.56 2.42 7.95 4.04 7.98 2.90
maxdom 4.96 0.47 4.84 0.47 5.08 0.35
meandom 3.51 0.31 3.47 0.37 3.48 0.42
meanfreq 4.10 0.35 4.02 0.34 4.04 0.32
meanpeakf 4.03 0.67 4.03 1.52 4.14 1.28
mindom 0.88 1.75 0.76 1.08 0.52 0.03
modindx 10.73 5.35 9.19 2.19 8.98 2.10
peakf 3.30 1.22 2.81 1.30 3.46 1.53
sd 1.66 0.13 1.68 0.16 1.83 0.16
sfm 0.57 0.06 0.60 0.09 0.65 0.07
skew 1.97 0.39 2.02 0.63 2.00 0.42
sp.ent 0.94 0.01 0.94 0.02 0.95 0.01
startdom 2.51 2.89 1.40 2.48 0.88 0.82
time.ent 0.89 0.00 0.89 0.00 0.89 0.00
time.IQR 0.08 0.01 0.07 0.01 0.07 0.01
time.median 0.08 0.01 0.08 0.01 0.08 0.01
time.Q25 0.05 0.01 0.04 0.01 0.04 0.01
time.Q75 0.13 0.01 0.11 0.01 0.11 0.01

Calculate Cliff’s delta as the effect size of range per acoustic measurement. Filtered by native urban and invasive urban calls to assess effect of range given the same habitat, then by native agro-interface and native urban to assess the effect of habitat.

var_nms <- names(acs_params_habitat)[-grep("^habitat$|^range$|^sound.files$|^site$|^year$", names(acs_params_habitat))]
var_nms

# i <- 1
eff_resh_df <- rbindlist(pblapply(1:length(var_nms), function(i){
  
  # Calculate effect of range
  
  tmp_df <- acs_params_habitat %>%
    dplyr::select(habitat, var_nms[i])
  
  nat <- tmp_df %>%
    filter(habitat == "Native urban") %>%
    pull(var_nms[i])
  
  inv <- tmp_df %>%
    filter(habitat == "Invasive urban") %>%
    pull(var_nms[i])
  
  # Get Cliff's delta for independent groups (all values of y compared to all values of x)
  # orddom package v 3.1
  cliff_delta <- dmes(nat, inv)$dc
  
  # Using the bias-corrected and accelerated bootstrap method, found to be robust to non-normality and heterogeneous variance in a simulation study by Hess & Kromey 2004
  # Using 10000 bootstrap iterations
  ci_res <- dmes.boot(nat, inv, theta.es = "dc", ci.meth = "BCA", B = 10000)
  ci_res
  
  lower_95_CI <- ci_res$theta.bci.lo
  upper_95_CI <- ci_res$theta.bci.up
  
  range_df <- data.frame(
    measurement = var_nms[i],
    type = "Range",
    cliff_delta = cliff_delta,
    lower_95_CI = lower_95_CI,
    upper_95_CI = upper_95_CI
  )
  
  # Calculate effect of habitat
  nat_agr <- tmp_df %>%
    filter(habitat == "Native agro-interface") %>%
    pull(var_nms[i])
  
  nat_urb <- tmp_df %>%
    filter(habitat == "Native urban") %>%
    pull(var_nms[i])
  
  # Get Cliff's delta for independent groups (all values of y compared to all values of x)
  # orddom package v 3.1
  cliff_delta <- dmes(nat_agr, nat_urb)$dc
  
  # Using the bias-corrected and accelerated bootstrap method, found to be robust to non-normality and heterogeneous variance in a simulation study by Hess & Kromey 2004
  # Using 10000 bootstrap iterations
  ci_res <- dmes.boot(nat_agr, nat_urb, theta.es = "dc", ci.meth = "BCA", B = 10000)
  ci_res
  
  lower_95_CI <- ci_res$theta.bci.lo
  upper_95_CI <- ci_res$theta.bci.up
  
  habitat_df <- data.frame(
    measurement = var_nms[i],
    type = "Habitat",
    cliff_delta = cliff_delta,
    lower_95_CI = lower_95_CI,
    upper_95_CI = upper_95_CI
  )
  
  return(
    range_df %>%
           bind_rows(habitat_df)
  )
  
}))

eff_resh_df

write.csv(eff_resh_df, file.path(path, "cliffs_delta_bootstrappedCIs_habitat.csv"), row.names = FALSE)

Effect sizes ordered by absolute magnitude.

eff_resh_df <- read.csv(file.path(path, "cliffs_delta_bootstrappedCIs_habitat.csv"))
glimpse(eff_resh_df)
## Rows: 54
## Columns: 5
## $ measurement <chr> "duration", "duration", "meanfreq", "meanfreq", "sd", "sd"…
## $ type        <chr> "Range", "Habitat", "Range", "Habitat", "Range", "Habitat"…
## $ cliff_delta <dbl> 0.2505208333, -0.6570121951, 0.0682291667, -0.2012195122, …
## $ lower_95_CI <dbl> 0.031770833, -0.792682927, -0.135416667, -0.397865854, 0.4…
## $ upper_95_CI <dbl> 0.43906250, -0.46697154, 0.26510417, 0.01270325, 0.7322916…
# Make a table for appendix, in order of decreasing largest absolute effect size within comparisons
eff_resh_df %>%
  dplyr::mutate(
    effect_size = round(cliff_delta, 2),
    lower_95_CI = round(lower_95_CI, 2),
    upper_95_CI = round(upper_95_CI, 2),
    `95_CI` = paste("(", lower_95_CI, ", ", upper_95_CI, ")", sep = "")
  ) %>%
  # Arrange by decreasing effect size within habitat type
  arrange(type, desc(abs(effect_size))) %>%
  dplyr::select(type, measurement, effect_size, `95_CI`) %>%
  kable()
type measurement effect_size 95_CI
Habitat duration -0.66 (-0.79, -0.47)
Habitat time.ent 0.66 (0.47, 0.8)
Habitat time.Q75 -0.64 (-0.78, -0.45)
Habitat time.IQR -0.60 (-0.75, -0.41)
Habitat time.median -0.48 (-0.64, -0.28)
Habitat time.Q25 -0.44 (-0.61, -0.25)
Habitat modindx -0.31 (-0.49, -0.12)
Habitat entropy 0.30 (0.09, 0.49)
Habitat freq.Q25 -0.29 (-0.48, -0.09)
Habitat sfm 0.23 (0, 0.42)
Habitat freq.IQR 0.21 (0.01, 0.41)
Habitat meanfreq -0.20 (-0.4, 0.01)
Habitat sd 0.15 (-0.06, 0.35)
Habitat meandom -0.15 (-0.35, 0.06)
Habitat peakf -0.15 (-0.35, 0.06)
Habitat maxdom -0.12 (-0.32, 0.09)
Habitat freq.median -0.10 (-0.3, 0.11)
Habitat sp.ent 0.09 (-0.13, 0.3)
Habitat freq.Q75 -0.07 (-0.27, 0.15)
Habitat dfrange 0.05 (-0.15, 0.25)
Habitat dfslope 0.05 (-0.17, 0.26)
Habitat mindom -0.04 (-0.24, 0.16)
Habitat startdom 0.02 (-0.19, 0.21)
Habitat enddom 0.02 (-0.2, 0.23)
Habitat skew -0.01 (-0.24, 0.21)
Habitat kurt -0.01 (-0.23, 0.21)
Habitat meanpeakf -0.01 (-0.23, 0.2)
Range sd 0.60 (0.43, 0.73)
Range dfrange 0.57 (0.37, 0.72)
Range mindom -0.55 (-0.7, -0.38)
Range enddom -0.51 (-0.68, -0.29)
Range sp.ent 0.48 (0.28, 0.63)
Range sfm 0.45 (0.26, 0.61)
Range entropy 0.40 (0.2, 0.57)
Range startdom -0.34 (-0.52, -0.13)
Range freq.IQR 0.32 (0.12, 0.5)
Range maxdom 0.31 (0.1, 0.48)
Range time.ent -0.26 (-0.45, -0.04)
Range duration 0.25 (0.03, 0.44)
Range freq.Q75 0.20 (0, 0.39)
Range dfslope -0.16 (-0.39, 0.07)
Range skew -0.14 (-0.34, 0.07)
Range meanfreq 0.07 (-0.14, 0.27)
Range kurt -0.07 (-0.28, 0.15)
Range freq.Q25 -0.06 (-0.26, 0.13)
Range time.Q25 0.06 (-0.16, 0.26)
Range peakf 0.06 (-0.14, 0.27)
Range modindx -0.05 (-0.25, 0.16)
Range freq.median 0.04 (-0.16, 0.24)
Range time.Q75 0.04 (-0.18, 0.24)
Range meanpeakf 0.04 (-0.16, 0.24)
Range meandom -0.03 (-0.22, 0.17)
Range time.IQR 0.01 (-0.2, 0.22)
Range time.median 0.00 (-0.22, 0.21)
# Note that spectral flatness has a 95% CI that includes 0 

# Acoustic measurements with large effect sizes? In either direction
eff_reshL_df <- eff_resh_df %>%
  # Keep the top largest effect sizes per type
  # Using a rule-of-thumb for Cliff's delta reported in the literature
  filter(abs(cliff_delta) >= 0.474) %>%
  dplyr::mutate(
    effect_size = round(cliff_delta, 2),
    lower_95_CI = round(lower_95_CI, 2),
    upper_95_CI = round(upper_95_CI, 2),
    `95_CI` = paste("(", lower_95_CI, ", ", upper_95_CI, ")", sep = "")
  ) %>%
  arrange(type, desc(abs(effect_size))) %>%
  dplyr::select(type, measurement, effect_size, `95_CI`)

eff_reshL_df
##       type measurement effect_size          95_CI
## 1  Habitat    duration       -0.66 (-0.79, -0.47)
## 2  Habitat    time.ent        0.66    (0.47, 0.8)
## 3  Habitat    time.Q75       -0.64 (-0.78, -0.45)
## 4  Habitat    time.IQR       -0.60 (-0.75, -0.41)
## 5  Habitat time.median       -0.48 (-0.64, -0.28)
## 6    Range          sd        0.60   (0.43, 0.73)
## 7    Range     dfrange        0.57   (0.37, 0.72)
## 8    Range      mindom       -0.55  (-0.7, -0.38)
## 9    Range      enddom       -0.51 (-0.68, -0.29)
## 10   Range      sp.ent        0.48   (0.28, 0.63)
topms <- eff_reshL_df %>%
  pull(measurement) %>%
  as.character()
topms
##  [1] "duration"    "time.ent"    "time.Q75"    "time.IQR"    "time.median"
##  [6] "sd"          "dfrange"     "mindom"      "enddom"      "sp.ent"

The largest effect sizes of habitat (native agro-interface versus native urban) were all temporal parameters. These effect sizes were statistcically significant (95% CIs that did not cross zero).

The largest effect sizes of range (native urban versus invasive urban) were related to frequency. These effect sizes also had 95% CIs that did not cross zero.

Effect of habitat on frequency modulation measurements

And are frequency modulation patterns actually simpler (e.g. more similar to invasive range) for urban native range calls in the frequency modulation dataset? 2 sites, only 8 calls, 1/5 of the native range dataset used for frequency modulation. I retained 2 native range agro-interface and 2 invasive urban sites for these comparisons. This analysis was done separately from other acoustic measurements because a different set of calls was employed.

Using the frequency modulation measurements obtained above.

# peaks_spat_df %>%
  # filter(year == "2019") %>%
  # distinct(site)

nat_sites <- c("PIED", "INES-08", "FAGR", "GOLF")
# inv_sites <- c("SOCC", "INTR") # for FM used 2004, 2011 calls
inv_sites <- c("AIRP", "MANO") # 2019

# Get calls for these sites and add a habitat column
peaks_hab_df <- peaks_spat_df %>%
  filter(site %in% c(nat_sites, inv_sites)) %>%
  # Add a habitat column
  dplyr::mutate(
    habitat = ifelse(range == "Invasive", "Invasive urban", "Native agro-interface"),
    habitat = ifelse(range == "Native" & site %in% c("FAGR", "GOLF"), "Native urban", habitat)
  ) %>%
  # distinct(site, habitat)
  # Ensure levels of the range and habitat columns are in the expected order
  dplyr::mutate(
    range = factor(range, levels = c("Native", "Invasive")),
    habitat = factor(habitat, levels = c("Native agro-interface", "Native urban", "Invasive urban"))
  )

glimpse(peaks_hab_df)
## Rows: 125
## Columns: 16
## $ sound.files      <fct> 2017_10_25_PIED_1563_2.WAV, 2017_10_25_PIED_1563_2.WA…
## $ range            <fct> Native, Native, Native, Native, Native, Native, Nativ…
## $ site             <fct> PIED, PIED, PIED, PIED, PIED, PIED, PIED, PIED, PIED,…
## $ year             <fct> 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017,…
## $ site_year        <fct> PIED_2017, PIED_2017, PIED_2017, PIED_2017, PIED_2017…
## $ region           <fct> West, West, West, West, West, West, West, West, West,…
## $ dept_state       <fct> Colonia, Colonia, Colonia, Colonia, Colonia, Colonia,…
## $ invasive_city    <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ peak_number      <dbl> 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 7, 8, 1, 3, 4, 5,…
## $ peak_height      <dbl> 2.801363, 4.615108, 4.977125, 5.180768, 4.928268, 4.4…
## $ peak_max_index   <dbl> 28, 87, 161, 240, 308, 369, 24, 71, 118, 186, 236, 29…
## $ peak_start_index <dbl> 1, 39, 110, 199, 282, 348, 1, 40, 92, 161, 208, 277, …
## $ peak_end_index   <dbl> 39, 110, 199, 282, 348, 398, 40, 92, 137, 208, 277, 3…
## $ peak_trgh_slope  <dbl> -0.05169762, -0.41990629, -0.32406322, -0.24725622, -…
## $ trgh_index       <dbl> 39, 110, 199, 282, 348, 398, 40, 92, 161, 208, 277, 3…
## $ habitat          <fct> Native agro-interface, Native agro-interface, Native …
peaks_hab_df %>%
  distinct(site, habitat, year)
##       site year               habitat
## 1:    PIED 2017 Native agro-interface
## 2:    FAGR 2017          Native urban
## 3: INES-08 2017 Native agro-interface
## 4:    GOLF 2017          Native urban
## 5:    AIRP 2019        Invasive urban
## 6:    MANO 2019        Invasive urban
# Get number of peaks, modulation rate, peak-trough slope
FM_hab_df <- peaks_hab_df %>%
  # Find the number of peaks per call
  group_by(habitat, sound.files) %>%
  dplyr::summarise(
     num_peaks = length(peak_number)
  ) %>%
  ungroup() %>%
  # Add modulation rate
  inner_join(
    peaks_hab_df %>%
      # Calculate duration per call
      inner_join(
        freq_mod_df %>%
          dplyr::mutate(
            duration = end - start
          ) %>%
          dplyr::select(sound.files, duration),
        by = "sound.files"
      ) %>%
      # Calculate modulation rate per call
      group_by(habitat, sound.files, duration) %>%
      dplyr::summarise(
        num_peaks = length(peak_number)
      ) %>%
      ungroup() %>%
      dplyr::mutate(
        modulation_rate = (num_peaks/duration) # units for rate of change = peaks/second
      ) %>%
      dplyr::select(sound.files, modulation_rate),
    by = "sound.files"
  ) %>%
  # Add peak-trough slope
  inner_join(
    peaks_hab_df %>%
    # Drop peaks that were not matched to troughs
    filter(!is.na(peak_trgh_slope)) %>%
    # Get max peak-trough range per call
    group_by(habitat, sound.files) %>%
    dplyr::summarise(
      max_peak_trgh_slope = min(peak_trgh_slope)
    ) %>%
    ungroup() %>%
      dplyr::select(sound.files, max_peak_trgh_slope),
    by = "sound.files"
  ) 
## `summarise()` has grouped output by 'habitat'. You can override using the `.groups` argument.
## `summarise()` has grouped output by 'habitat', 'sound.files'. You can override using the `.groups` argument.
## `summarise()` has grouped output by 'habitat'. You can override using the `.groups` argument.
# 24 calls
FM_hab_df
## # A tibble: 24 x 5
##    habitat       sound.files         num_peaks modulation_rate max_peak_trgh_sl…
##    <fct>         <chr>                   <int>           <dbl>             <dbl>
##  1 Native agro-… 2017_10_25_PIED_15…         6            33.6            -0.420
##  2 Native agro-… 2017_10_25_PIED_15…         8            47.8            -0.465
##  3 Native agro-… 2017_10_25_PIED_15…         6            31.0            -0.442
##  4 Native agro-… 2017_10_25_PIED_15…         7            38.7            -0.362
##  5 Native agro-… 2017_07_13_INES-08…         6            34.8            -0.247
##  6 Native agro-… 2017_07_13_INES-08…         6            35.6            -0.347
##  7 Native agro-… 2017_07_13_INES-08…         7            38.8            -0.392
##  8 Native agro-… 2017_07_13_INES-08…         6            33.0            -0.336
##  9 Native urban  2017_09_05_FAGR_13…         6            31.4            -0.467
## 10 Native urban  2017_09_05_FAGR_12…         4            23.8            -0.350
## # … with 14 more rows

Figure 4

A raincloud plot for the distributions of the 3 frequency modulation measurements per range-habitat.

FM_hab_df_tmp <- FM_hab_df %>%
  # Convert to long format
  pivot_longer(
    cols = names(.)[-grep("^habitat$|^sound.files$", names(.))],
    names_to = "measurement",
    values_to = "values"
  ) %>%
  dplyr::mutate(
    measurement_nm = measurement,
    # Add new line symbols to all measurement names
    measurement_nm = recode(
      measurement_nm,
      `num_peaks` = "Number of\n peaks",
      `max_peak_trgh_slope` = "Peak - trough\n slope",
      `modulation_rate` = "Modulation\n rate",
    ),
    measurement_nm = factor(measurement_nm),
  ) 

FM_hab_df_tmp

topms <- c("num_peaks", "modulation_rate", "max_peak_trgh_slope")
cols <- scales::alpha(c("navy", "dodgerblue", "orange"), 0.85)

# Make a list of plots in order of effect sizes
# Doing this to customize y-axes of each plot, which isn't possible for plots in a single row with facetscales
gg_list <- list()

# i <- 1
invisible(pblapply(1:length(topms), function(i){
  
  tmp_df <- FM_hab_df_tmp %>%
    filter(measurement == topms[i]) %>%
    dplyr::mutate(
      measurement_nm = as.character(measurement_nm),
      measurement_nm = factor(measurement_nm, levels = unique(measurement_nm)),
      habitat = recode(
        habitat,
        `Native agro-interface` = "Native\n agro-interface",
        `Native urban` = "Native\n urban",
        `Invasive urban` = "Invasive\n urban"
      )
    ) 
  
  # Get ymin and ymax values, add a buffer of 2 evenly space values before and after 
  ymin <- min(tmp_df$values)
  ymax <- max(tmp_df$values)
  
  # Get the difference between max and min, use this as a buffer on the y-axis scale
  buf <- (ymax - ymin)/5
  
  # If on the first plot, add the y-axis label
  if(i == 1){
    yal <- "Values"
  } else{
    yal <- ""
  }
  
  # Initialize plot margins (top, right, bottom, left)
  tm <- 1
  rm <- 0
  bm <- 1
  lm <- 0
  
  # If on the first plot, make the left margin larger
  if(i == 1){
    lm <- 0.5
  # If on last plot, make the right margin larger
  } else if(i == length(topms)){
    rm <- 0.5
  }
  
  # Raincloud plot: Kernel smoothed with boxplot
  gg <- ggplot(tmp_df, aes(x = habitat, y = values, fill = habitat, color = habitat)) + 
    geom_flat_violin(position = position_nudge(x = 0.1, y = 0), adjust = 1, size = 0.25)+
    geom_point(position = position_jitter(width = 0.05), size = 2, stroke = 0.5, alpha = 0.45) +
    geom_boxplot(aes(x = as.numeric(habitat) - 0.22, y = values), outlier.shape = NA, alpha = 0.55, width = 0.2, colour = "black", size = 0.25) +
    facet_wrap(~ measurement_nm) +
    scale_fill_manual(values = cols) +
    scale_color_manual(values = cols) +
    guides(fill = FALSE, color = FALSE) +
    theme_bw() +
    xlab("") + ylab(yal) +
    scale_y_continuous(limits = round(c(ymin - buf, ymax + buf), 2), breaks = round(seq(ymin - buf, (ymax + buf), (ymax - ymin)/5), 2)) +
    theme(
      axis.title = element_text(size = 11),
      axis.text.y = element_text(size = 11),
      # , angle = 35, hjust = 1
      axis.text.x = element_text(size = 9),
      strip.text = element_text(size = 12, margin = margin(0.5, 1, 0.5, 1, "lines")),
      panel.grid.major = element_blank(),
      panel.grid.minor = element_blank(),
      axis.ticks = element_line(size = 0.25),
      plot.margin = unit(c(tm, rm, bm, lm), "lines")
    )
  
  # gg
  
  # Return the given plot
  gg_list[[i]] <<- gg
  
}))

# Get the legend
cols <- scales::alpha(c("navy", "dodgerblue", "orange"), 0.85)

gg_leg <- gtable::gtable_filter(ggplot_gtable(ggplot_build(
  FM_hab_df_tmp %>%
    ggplot(aes(x = habitat, y = values, color = habitat)) +
    geom_errorbar(aes(ymin = values, ymax = values), size = 2, width = 0.25) +
    scale_color_manual(values = cols) +
    guides(color = guide_legend(title = "", override.aes = list(size = 2))) +
    theme_bw() +
    theme(
      legend.text = element_text(size = 12),
      legend.position = "top",
      legend.margin = margin(0, 0, 0, 0),
      legend.box.margin = margin(-10, -10, -10, -10),
      legend.key.width = unit(2, "lines")
    )
)), "guide-box")

dev.off()

# Arrange plots into a single image file
# Animal Behavior double column figure = 190 mm or 7.48 in
jpeg(file.path(gpath, "Figure4_FmMeasrmntsHabitats_final.jpeg"), units = "in", width = 7.48, height = 4, res = 1000)

ggarrange(
  as.ggplot(gg_leg),
  as.ggplot(ggarrange(
    as.ggplot(gg_list[[1]]),
    as.ggplot(gg_list[[2]]),
    as.ggplot(gg_list[[3]]),
    nrow = 1,
    widths = rep(3, 3)
  )),
  nrow = 2,
  heights = c(1, 10)
)

dev.off()

See the main text of the associated publication for this figure.

Calculated median and interquartile range per acoustic measurement per habitat.

FM_hab_df_ss <- FM_hab_df %>%
  dplyr::mutate(
    habitat = factor(habitat, levels = c("Native agro-interface", "Native urban", "Invasive urban"))
  ) %>%
  # Convert to long format
  pivot_longer(
    cols = names(.)[-grep("^habitat$|^sound.files$", names(.))],
    names_to = "measurement",
    values_to = "values"
  ) %>%
  group_by(measurement, habitat) %>%
  dplyr::summarise(
    median = round(median(values), 2),
    IQR = round(IQR(values), 2)
  ) %>%
  # Convert to long format again
  pivot_longer(
    cols = c(median, IQR),
    names_to = "statistic",
    values_to = "values"
  ) %>%
  # Make wider to split out the mean and se columns
  pivot_wider(
    names_from = c(statistic, habitat),
    values_from = values
  ) %>%
  ungroup()
## `summarise()` has grouped output by 'measurement'. You can override using the `.groups` argument.
glimpse(FM_hab_df_ss)
## Rows: 3
## Columns: 7
## $ measurement                    <chr> "max_peak_trgh_slope", "modulation_rate…
## $ `median_Native agro-interface` <dbl> -0.38, 35.18, 6.00
## $ `IQR_Native agro-interface`    <dbl> 0.08, 5.27, 1.00
## $ `median_Native urban`          <dbl> -0.4, 30.3, 5.0
## $ `IQR_Native urban`             <dbl> 0.09, 4.71, 1.25
## $ `median_Invasive urban`        <dbl> -0.26, 25.11, 4.00
## $ `IQR_Invasive urban`           <dbl> 0.05, 6.48, 0.50
# FM_hab_df_ss %>%
  # View()

Calculate Cliff’s delta as the effect size of range and habitat per acoustic measurement. Filtered by native urban and invasive urban calls to assess effect of range given the same habitat, then by native agro-interface and native urban to assess the effect of habitat.

var_nms <- names(FM_hab_df)[-grep("^habitat$|^sound.files$", names(FM_hab_df))]
var_nms

# Got a t-test error when running i = 1 for num_peaks
eff_resh_df <- rbindlist(pblapply(2:length(var_nms), function(i){
  
  # cat(paste("i = ", i, sep = ""))
  
  # Calculate effect of range
  tmp_df <- FM_hab_df %>%
    dplyr::select(habitat, var_nms[i])
  
  nat <- tmp_df %>%
    filter(habitat == "Native urban") %>%
    pull(var_nms[i])
  
  inv <- tmp_df %>%
    filter(habitat == "Invasive urban") %>%
    pull(var_nms[i])
  
  # Get Cliff's delta for independent groups (all values of y compared to all values of x)
  # orddom package v 3.1
  cliff_delta <- dmes(nat, inv)$dc
  
  # Using the bias-corrected and accelerated bootstrap method, found to be robust to non-normality and heterogeneous variance in a simulation study by Hess & Kromey 2004
  # Using 10000 bootstrap iterations
  ci_res <- dmes.boot(nat, inv, theta.es = "dc", ci.meth = "BCA", B = 10000)
  ci_res
  
  lower_95_CI <- ci_res$theta.bci.lo
  upper_95_CI <- ci_res$theta.bci.up
  
  range_df <- data.frame(
    measurement = var_nms[i],
    type = "Range",
    cliff_delta = cliff_delta,
    lower_95_CI = lower_95_CI,
    upper_95_CI = upper_95_CI
  )
  
  # Calculate effect of habitat
  nat_agr <- tmp_df %>%
    filter(habitat == "Native agro-interface") %>%
    pull(var_nms[i])
  
  nat_urb <- tmp_df %>%
    filter(habitat == "Native urban") %>%
    pull(var_nms[i])
  
  # Get Cliff's delta for independent groups (all values of y compared to all values of x)
  # orddom package v 3.1
  cliff_delta <- dmes(nat_agr, nat_urb)$dc
  
  # Using the bias-corrected and accelerated bootstrap method, found to be robust to non-normality and heterogeneous variance in a simulation study by Hess & Kromey 2004
  # Using 10000 bootstrap iterations
  ci_res <- dmes.boot(nat_agr, nat_urb, theta.es = "dc", ci.meth = "BCA", B = 10000)
  ci_res
  
  lower_95_CI <- ci_res$theta.bci.lo
  upper_95_CI <- ci_res$theta.bci.up
  
  habitat_df <- data.frame(
    measurement = var_nms[i],
    type = "Habitat",
    cliff_delta = cliff_delta,
    lower_95_CI = lower_95_CI,
    upper_95_CI = upper_95_CI
  )

  return(
    range_df %>%
           bind_rows(habitat_df)
  )
  
}))

eff_resh_df

# Ran this code for num_peaks outside of the loop, which avoided the t.test error 
# Calculate effect of range
tmp_df <- FM_hab_df %>%
  dplyr::select(habitat, "num_peaks")

nat <- tmp_df %>%
  filter(habitat == "Native urban") %>%
  pull(num_peaks)

inv <- tmp_df %>%
  filter(habitat == "Invasive urban") %>%
  pull(num_peaks)

# Get Cliff's delta for independent groups (all values of y compared to all values of x)
# orddom package v 3.1
cliff_delta <- dmes(nat, inv)$dc

# Using the bias-corrected and accelerated bootstrap method, found to be robust to non-normality and heterogeneous variance in a simulation study by Hess & Kromey 2004
# Using 10000 bootstrap iterations
ci_res <- dmes.boot(nat, inv, theta.es = "dc", ci.meth = "BCA", B = 10000)
ci_res

lower_95_CI <- ci_res$theta.bci.lo
upper_95_CI <- ci_res$theta.bci.up

range_df <- data.frame(
  measurement = "num_peaks",
  type = "Range",
  cliff_delta = cliff_delta,
  lower_95_CI = lower_95_CI,
  upper_95_CI = upper_95_CI
)

# Calculate effect of habitat
nat_agr <- tmp_df %>%
  filter(habitat == "Native agro-interface") %>%
  pull(num_peaks)

nat_urb <- tmp_df %>%
  filter(habitat == "Native urban") %>%
  pull(num_peaks)

# Get Cliff's delta for independent groups (all values of y compared to all values of x)
# orddom package v 3.1
cliff_delta <- dmes(nat_agr, nat_urb)$dc

# Using the bias-corrected and accelerated bootstrap method, found to be robust to non-normality and heterogeneous variance in a simulation study by Hess & Kromey 2004
# Using 10000 bootstrap iterations
ci_res <- dmes.boot(nat_agr, nat_urb, theta.es = "dc", ci.meth = "BCA", B = 10000)
ci_res

lower_95_CI <- ci_res$theta.bci.lo
upper_95_CI <- ci_res$theta.bci.up

habitat_df <- data.frame(
  measurement = "num_peaks",
  type = "Habitat",
  cliff_delta = cliff_delta,
  lower_95_CI = lower_95_CI,
  upper_95_CI = upper_95_CI
)

eff_resh_df2 <- range_df %>% 
  bind_rows(habitat_df)

eff_resh_df3 <- eff_resh_df %>%
  bind_rows(eff_resh_df2)

eff_resh_df3 

write.csv(eff_resh_df3, file.path(path, "cliffs_delta_bootstrappedCIs_habitatFm.csv"), row.names = FALSE)

Effect sizes ordered by absolute magnitude.

eff_resh_df <- read.csv(file.path(path, "cliffs_delta_bootstrappedCIs_habitatFm.csv"))
glimpse(eff_resh_df)
## Rows: 6
## Columns: 5
## $ measurement <chr> "modulation_rate", "modulation_rate", "max_peak_trgh_slope…
## $ type        <chr> "Range", "Habitat", "Range", "Habitat", "Range", "Habitat"
## $ cliff_delta <dbl> -0.562500, -0.625000, 0.906250, -0.250000, -0.656250, -0.7…
## $ lower_95_CI <dbl> -0.906250, -1.000000, 0.250000, -0.750000, -0.921875, -0.9…
## $ upper_95_CI <dbl> 0.00000, -0.06250, 1.00000, 0.34375, -0.15625, -0.15625
# Make a table for appendix, in order of largest absolute effect size per comparison
eff_resh <- eff_resh_df %>%
  dplyr::mutate(
    effect_size = round(cliff_delta, 2),
    lower_95_CI = round(lower_95_CI, 2),
    upper_95_CI = round(upper_95_CI, 2),
    `95_CI` = paste("(", lower_95_CI, ", ", upper_95_CI, ")", sep = "")
  ) %>%
  arrange(type, desc(abs(effect_size))) %>%
  ungroup() %>%
  dplyr::select(type, measurement, effect_size, `95_CI`)

eff_resh %>%
  kable(align = c(rep("c", 4)))
type measurement effect_size 95_CI
Habitat num_peaks -0.77 (-0.92, -0.16)
Habitat modulation_rate -0.62 (-1, -0.06)
Habitat max_peak_trgh_slope -0.25 (-0.75, 0.34)
Range max_peak_trgh_slope 0.91 (0.25, 1)
Range num_peaks -0.66 (-0.92, -0.16)
Range modulation_rate -0.56 (-0.91, 0)

References

1. Araya-Salas, M., Smith-Vidaurre, G., and M. Webster. 2017. Assessing the effect of sound file compression and background noise on measures of acoustic signal structure. Bioacoustics 28(1), 57-73.

2. Hess, M.R., & Kromrey, J.D. 2004. Robust confidence intervals for effect sizes: A comparative study of Cohen’s d and Cliff’s delta under non-normality and heterogeneous variances. Paper presented at the annual meeting of the American Educational Research Association, San Diego, 12–16 April 2004.

The session info printed here represents the environment used for the final RMarkdown knitting. The software and package versions employed for main results are reported in the appendix of the associated article.

sessionInfo()
## R version 4.1.0 (2021-05-18)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.5 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] grid      stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] orddom_3.1             psych_2.0.12           rstatix_0.7.0         
##  [4] MASS_7.3-54            gtable_0.3.0           ggplotify_0.0.7       
##  [7] facetscales_0.1.0.9000 egg_0.4.5              gridExtra_2.3         
## [10] pracma_2.3.3           data.table_1.14.0      pbapply_1.4-3         
## [13] forcats_0.5.1          stringr_1.4.0          dplyr_1.0.6           
## [16] purrr_0.3.4            readr_1.4.0            tidyr_1.1.3           
## [19] tibble_3.1.2           tidyverse_1.3.1        ggplot2_3.3.3         
## [22] warbleR_1.1.27         NatureSounds_1.0.4     knitr_1.33            
## [25] seewave_2.1.6          tuneR_1.3.3            plyr_1.8.6            
## 
## loaded via a namespace (and not attached):
##  [1] nlme_3.1-152        bitops_1.0-7        fs_1.5.0           
##  [4] lubridate_1.7.10    httr_1.4.2          tools_4.1.0        
##  [7] backports_1.2.1     utf8_1.2.1          R6_2.5.0           
## [10] DBI_1.1.1           colorspace_2.0-1    withr_2.4.2        
## [13] mnormt_2.0.2        tidyselect_1.1.1    curl_4.3.1         
## [16] compiler_4.1.0      cli_2.5.0           rvest_1.0.0        
## [19] xml2_1.3.2          labeling_0.4.2      scales_1.1.1       
## [22] proxy_0.4-25        dtw_1.22-3          digest_0.6.27      
## [25] foreign_0.8-81      rmarkdown_2.8       rio_0.5.26         
## [28] pkgconfig_2.0.3     htmltools_0.5.1.1   highr_0.9          
## [31] dbplyr_2.1.1        rlang_0.4.11        readxl_1.3.1       
## [34] rstudioapi_0.13     farver_2.1.0        gridGraphics_0.5-1 
## [37] generics_0.1.0      jsonlite_1.7.2      zip_2.1.1          
## [40] car_3.0-10          RCurl_1.98-1.3      magrittr_2.0.1     
## [43] Rcpp_1.0.6          munsell_0.5.0       fansi_0.5.0        
## [46] abind_1.4-5         lifecycle_1.0.0     stringi_1.6.2      
## [49] yaml_2.2.1          carData_3.0-4       parallel_4.1.0     
## [52] crayon_1.4.1        lattice_0.20-44     haven_2.4.1        
## [55] hms_1.1.0           tmvnsim_1.0-2       pillar_1.6.1       
## [58] rjson_0.2.20        fftw_1.0-6          reprex_2.0.0       
## [61] glue_1.4.2          evaluate_0.14       BiocManager_1.30.15
## [64] modelr_0.1.8        vctrs_0.3.8         cellranger_1.1.0   
## [67] assertthat_0.2.1    xfun_0.23           openxlsx_4.2.3     
## [70] broom_0.7.6         signal_0.7-7        rvcheck_0.1.8      
## [73] ellipsis_0.3.2