This script was the second part of our analysis of structural differences between ranges. Here we asked whether frequency modulation patterns, a structural component of learned signals that can partially encode individual identity, were significantly simpler in invasive range calls. We traced second harmonic frequency contours for a subset of native and invasive range calls, then obtained 3 frequency modulation measurements from these contours. We also obtained a standard set of spectral acoustic measurements from the full signals, and evaluated the effect sizes of range across all acoustic measurements, as well as the independent effects of habitat and range on these measurements. Some of the code in this script may need to be updated in order to reproduce results with more recent versions of the packages used.
rm(list = ls())
# load plyr before dplyr to avoid masking of certain functions
library(plyr)
X <- c("warbleR", "ggplot2", "tidyverse", "pbapply", "dplyr", "data.table", "pracma", "egg", "gridExtra", "facetscales", "grDevices", "ggplotify", "grid", "gtable", "MASS", "rstatix", "orddom")
invisible(lapply(X, library, character.only = TRUE))
path <- "/media/gsvidaurre/MYIOPSITTA/R/VocalLearning_PostInvasion/Data"
gpath <- "/home/gsvidaurre/Desktop/MANUSCRIPTS/SimplerSignatures_PostInvasion/FIGURES"
seed <- 401
cores <- parallel::detectCores() - 2
# Call R code to make raincloud plots, downloaded from https://github.com/RainCloudPlots/RainCloudPlots
# Loading these scripts will install packages if these are not already on your machine
source("/home/gsvidaurre/Desktop/Software/RainCloudPlots-master/tutorial_R/R_rainclouds.R")
source("/home/gsvidaurre/Desktop/Software/RainCloudPlots-master/tutorial_R/summarySE.R")
source("/home/gsvidaurre/Desktop/Software/RainCloudPlots-master/tutorial_R/simulateData.R")
Read in extended selection table (EST). Throughout this script, you can reproduce analyses by reading in the file “nat_inv_indiv_site_seltbl.csv”, which is a selection table made available on figshare that contains metadata needed here. Measurements needed to reproduce analyses here have also been provided on figshare.
nat_inv_est <- readRDS(file.path(path, "nat_inv_indiv_site_EST.RDS"))
# glimpse(nat_inv_est)
Decent number of samples over time. We can assess change over time at two scales, either the city or sites within cities sampled over time. New Orleans can be assessed only at the city scale, Austin can be assessed at specific sites over time.
# Which sites per city were sampled in each year (larger dataset of more geographic spread only)
nat_inv_est %>%
filter(social_scale == "Site") %>%
filter(!is.na(invasive_city)) %>%
filter(invasive_city %in% c("Austin", "New Orleans")) %>%
group_by(invasive_city, year) %>%
dplyr::summarise(
uniq_sites = paste(unique(site), collapse = "; ")
)
## `summarise()` has grouped output by 'invasive_city'. You can override using the `.groups` argument.
## # A tibble: 5 x 3
## # Groups: invasive_city [2]
## invasive_city year uniq_sites
## <chr> <chr> <chr>
## 1 Austin 2004 SOCC; COMM; VALL
## 2 Austin 2011 ELEM; INTR; MART; VALL; SOFT
## 3 Austin 2019 ELEM; AIRP; INTR; SOCC; MANO; MART
## 4 New Orleans 2004 CANA; BALL; FOLS
## 5 New Orleans 2011 LAKE; ROBE
Which invasive range site-years were sampled over time?
# Get site-years in Austin sampled over time, and New Orleans sites over years (larger dataset of more geographic spread only)
austin <- nat_inv_est %>%
filter(social_scale == "Site") %>%
filter(invasive_city == "Austin") %>%
group_by(invasive_city, site) %>%
dplyr::summarise(
n_years = n_distinct(year)
) %>%
filter(n_years > 1) %>%
pull(site)
## `summarise()` has grouped output by 'invasive_city'. You can override using the `.groups` argument.
new_orleans <- nat_inv_est %>%
filter(social_scale == "Site") %>%
filter(invasive_city == "New Orleans") %>%
pull(site) %>%
unique()
temporal_inv_sites <- c(austin, new_orleans)
temporal_inv_sites
## [1] "ELEM" "INTR" "MART" "SOCC" "VALL" "LAKE" "ROBE" "CANA" "BALL" "FOLS"
We included calls of repeatedly sampled individuals per range in frequency tracing, as this allowed us to ask whether frequency modulation patterns account for the high individual information content in calls (next script with Beecher’s statistic).
Randomly sampled 5 calls per known repeatedly sampled individual per range, and took all calls if less than 5 total for any bird.
# Random sampling for the repeatedly sampled individuals (5 calls, or the total if less, per bird)
# Get birds with less than 5 calls
ids_few_calls <- nat_inv_est %>%
filter(social_scale == "Individual") %>%
# Drop site-years with less than 4 calls
group_by(Bird_ID) %>%
dplyr::summarise(
n_calls = length(sound.files)
) %>%
# For now, drop birds with 5 calls or less, these will be excluded from random sampling and added back later
group_by(Bird_ID) %>%
filter(n_calls <= 5) %>%
pull(Bird_ID)
set.seed(seed)
freq_mod_indiv_df <- nat_inv_est %>%
filter(social_scale == "Individual") %>%
# For now, drop birds with 5 calls or less, these will be excluded from random sampling and added back later
filter(!Bird_ID %in% ids_few_calls) %>%
group_by(Bird_ID) %>%
nest() %>%
ungroup() %>%
# Randomly sample 5 calls
dplyr::mutate(
rsamp_calls = purrr::map2(data, 5, sample_n, replace = FALSE)
) %>%
dplyr::select(-data) %>%
unnest(rsamp_calls) %>%
# Add back birds with 5 calls or less
bind_rows(
nat_inv_est %>%
filter(social_scale == "Individual") %>%
filter(Bird_ID %in% ids_few_calls)
) %>%
droplevels()
glimpse(freq_mod_indiv_df)
# Checking, 17 individuals total (8 native, 9 invasive), 5 calls each or the total if less than 5, looks good
freq_mod_indiv_df %>%
group_by(Bird_ID) %>%
dplyr::summarise(
n_calls = length(sound.files)
)
# Includes calls from individual sampled at site BART in 2011 (INV-UM1)
freq_mod_indiv_df %>%
group_by(site, Bird_ID) %>%
dplyr::summarise(
n_calls = length(sound.files)
)
Performed random sampling of calls from the dataset of broader geographic spread for the comparison between ranges and the temporal comparison in the invasive range. Here we relied on all calls in the full dataset (did not remove calls with the “site_scale” suffix).
# Random sampling for the spatial question (40 native, 40 invasive calls over 10 sites in each range)
set.seed(seed)
freq_mod_spatial_df <- nat_inv_est %>%
filter(social_scale == "Site") %>%
# Drop site-years with less than 4 calls
group_by(site_year) %>%
dplyr::summarise(
n_calls = length(sound.files)
) %>%
filter(n_calls >= 4) %>%
ungroup() %>%
dplyr::select(-c(n_calls)) %>%
# Join back with the EST to sample site-years
inner_join(
nat_inv_est %>%
filter(social_scale == "Site"),
by = "site_year"
) %>%
# Make a data frame in which each row is a site-year, for random sampling below
group_by(range, site_year) %>%
dplyr::summarise(
n_calls = length(sound.files)
) %>%
ungroup() %>%
# Group by range for random sampling
group_by(range) %>%
nest() %>%
ungroup() %>%
# Randomly sample 10 native and 10 invasive range site-years
dplyr::mutate(
rsamp_sites = purrr::map2(data, 10, sample_n, replace = FALSE)
) %>%
dplyr::select(-data) %>%
unnest(rsamp_sites) %>%
# Join back with the EST to sample calls per site-year
inner_join(
nat_inv_est %>%
filter(social_scale == "Site"),
by = c("range", "site_year")
) %>%
group_by(site_year) %>%
nest() %>%
ungroup() %>%
# Randomly sample 4 calls per site-year
dplyr::mutate(
rsamp_calls = purrr::map2(data, 4, sample_n, replace = FALSE)
) %>%
dplyr::select(-data) %>%
unnest(rsamp_calls)
glimpse(freq_mod_spatial_df)
# Checking, 4 calls per site and 20 sites total, looks good
# Does not include BART 2011, good
freq_mod_spatial_df %>%
group_by(range, site_year) %>%
dplyr::summarise(
n_calls = length(sound.files)
)
# Random sampling for the temporal question: Austin sites sampled over time, and New Orleans sites, 5 calls per site-year
set.seed(seed)
freq_mod_temporal_df <- nat_inv_est %>%
filter(social_scale == "Site") %>%
# Filter by the Austin sites sampled over time, and New Orleans sites
filter(site %in% temporal_inv_sites | invasive_city == "New Orleans") %>%
group_by(site_year) %>%
dplyr::summarise(
n_calls = length(sound.files)
) %>%
ungroup() %>%
dplyr::select(-c(n_calls)) %>%
# Join back with the EST to sample calls
inner_join(
nat_inv_est %>%
filter(social_scale == "Site") %>%
# Filter by the Austin sites sampled over time, and New Orleans sites
filter(site %in% temporal_inv_sites | invasive_city == "New Orleans"),
by = "site_year"
) %>%
# Group by site_year for random sampling
group_by(site_year) %>%
nest() %>%
ungroup() %>%
# Randomly sample 5 calls per site-year
dplyr::mutate(
rsamp_calls = purrr::map2(data, 5, sample_n, replace = FALSE)
) %>%
dplyr::select(-data) %>%
unnest(rsamp_calls)
glimpse(freq_mod_temporal_df)
# Checking, 15 sites total, 5 calls each, looks good
# Does not include BART 2011, good
freq_mod_temporal_df %>%
group_by(invasive_city, site_year) %>%
dplyr::summarise(
n_calls = length(sound.files)
)
155 calls, 149 total unique calls for the frequency modulation analysis of calls from the dataset of broader geographic sprea. Then 84 calls for repeatedly sampled individuals, for 233 unique calls total. Valeria and I ended up dividing the frequency tracing workload.
Combine these calls into a single EST and write out to share with Valeria.
freq_mod_df <- freq_mod_spatial_df %>%
mutate(
question = "spatial"
) %>%
bind_rows(freq_mod_temporal_df %>%
mutate(
question = "temporal"
)
) %>%
bind_rows(freq_mod_indiv_df %>%
mutate(
question = "indiv_scale"
)
)
glimpse(freq_mod_df)
# Save this data frame as a record of the calls to be used for each question/comparison (has duplicates)
saveRDS(freq_mod_df, file.path(path, "freq_mod_df.RDS"))
freq_mod_df <- readRDS(file.path(path, "freq_mod_df.RDS"))
glimpse(freq_mod_df)
# Save as a .csv as well for sharing data
# write.csv(freq_mod_df %>%
# dplyr::select(-c(n_calls)),
# file.path(path, "freq_mod_df.csv"), row.names = FALSE)
# Get calls in this data frame
calls <- freq_mod_df %>%
pull(sound.files) %>%
as.character()
head(calls)
# Actually 233 unique calls
length(calls)
length(unique(calls))
# Which calls are duplicated? Duplicates shared between the spatial and temporal comparisons
freq_mod_df %>%
filter(duplicated(sound.files)) %>%
View()
# Remove duplicates to make an EST for frequency tracing
freq_mod_df2 <- freq_mod_df %>%
filter(!duplicated(sound.files)) %>%
droplevels()
glimpse(freq_mod_df2)
# Randomly sample half of each set of calls per question, then VP will do tailoring for half and I will do the other half
set.seed(seed)
freq_mod_df_VP <- freq_mod_df2 %>%
group_by(question) %>%
nest() %>%
ungroup() %>%
# Create a vector with half of calls per comparison, to be used for random sampling
# group_by(question) %>%
inner_join(
freq_mod_df2 %>%
group_by(question) %>%
dplyr::summarise(
total_calls = length(sound.files)
) %>%
dplyr::mutate(
n = round(total_calls/2)
) %>%
ungroup() %>%
dplyr::select(-total_calls),
by = "question"
) %>%
# Randomly half of the calls used per question/comparison
dplyr::mutate(
rsamp_calls = purrr::map2(data, n, sample_n, replace = FALSE)
) %>%
dplyr::select(-data) %>%
unnest(rsamp_calls) %>%
droplevels() %>%
dplyr::select(-c(n)) %>%
dplyr::select(all_of(names(freq_mod_df2)[-grep("question", names(freq_mod_df2))]), "question")
freq_mod_df_GSV <- freq_mod_df2 %>%
filter(!sound.files %in% freq_mod_df_VP$sound.files) %>%
droplevels()
glimpse(freq_mod_df_VP)
glimpse(freq_mod_df_GSV)
# Filter the overall EST by the calls in each data frame
sub_est_VP <- nat_inv_est[grep(paste(paste("^", freq_mod_df_VP$sound.files, "$", sep = ""), collapse = "|"), nat_inv_est$sound.files), ]
sub_est_GSV <- nat_inv_est[grep(paste(paste("^", freq_mod_df_GSV$sound.files, "$", sep = ""), collapse = "|"), nat_inv_est$sound.files), ]
# Restore the EST class of each data frame above
freq_mod_est_VP <- fix_extended_selection_table(freq_mod_df_VP, sub_est_VP)
glimpse(freq_mod_est_VP)
class(freq_mod_est_VP)
length(attr(freq_mod_est_VP, "wave.objects"))
freq_mod_est_GSV <- fix_extended_selection_table(freq_mod_df_GSV, sub_est_GSV)
glimpse(freq_mod_est_GSV)
class(freq_mod_est_GSV)
length(attr(freq_mod_est_GSV, "wave.objects"))
# Write out the EST
saveRDS(freq_mod_est_VP, file.path(path, "freq_mod_est_VP.RDS"))
saveRDS(freq_mod_est_GSV, file.path(path, "freq_mod_est_GSV.RDS"))
VP and GSV performed manual tailoring of frequency measurements to trace the 2nd harmonic visible in each call. First, we measured the fundamental frequency over 100 timepoints per call, which yielded fundamental frequency estimates that jumped around among harmonics. These frequency contours were then manually tailored below.
# VP call set
freq_mod_est_VP <- readRDS(file.path(path, "freq_mod_est_VP.RDS"))
glimpse(freq_mod_est_VP)
ff_trace_VP <- ffts(freq_mod_est_VP, wl = 378, length.out = 100, wn = "hanning", ovlp = 90, bp = c(0.5, 9), threshold = 15, img = FALSE, parallel = cores, path = path, img.suffix = "ffts", pb = TRUE, clip.edges = TRUE, leglab = "ffts", ff.method = "tuneR", flim = c(0.5, 9))
glimpse(ff_trace_VP)
write.csv(ff_trace_VP, file.path(path, "ff_trace_VP.csv"))
# GSV call set
freq_mod_est_GSV <- readRDS(file.path(path, "freq_mod_est_GSV.RDS"))
glimpse(freq_mod_est_GSV)
ff_trace_GSV <- ffts(freq_mod_est_GSV, wl = 378, length.out = 100, wn = "hanning", ovlp = 90, bp = c(0.5, 9), threshold = 15, img = FALSE, parallel = cores, path = path, img.suffix = "ffts", pb = TRUE, clip.edges = TRUE, leglab = "ffts", ff.method = "tuneR", flim = c(0.5, 9))
glimpse(ff_trace_GSV)
write.csv(ff_trace_GSV, file.path(path, "ff_trace_GSV.csv"))
We used the following code to manually tailor fundamental frequency curves to trace the second harmonic per call. VP and GSV performed this on separate machines, here showing just the code for GSV call set.
ff_trace <- read.csv(file.path(path, "ff_trace_GSV.csv"))
glimpse(ff_trace)
seltailor(X = freq_mod_est_GSV, ts.df = ff_trace, wl = 378, flim = c(0.5, 9), mar = 0.05, osci = TRUE, ovlp = 90, auto.contour = TRUE, width = 15, height = 8, path = path)
When done with manual tailoring, we combined the .csvs generated by seltailor with the ESTs to be used for the frequency modulation questions. One observer (GSV) performed a final check of the tailored frequency traces.
freq_mod_est_VP <- readRDS(file.path(path, "freq_mod_est_VP.RDS"))
glimpse(freq_mod_est_VP)
freq_mod_est_GSV <- readRDS(file.path(path, "freq_mod_est_GSV.RDS"))
glimpse(freq_mod_est_GSV)
freq_trace_VP <- read.csv(file.path(path, "seltailor_output_freq_VP.csv"))
glimpse(freq_trace_VP)
freq_trace_GSV <- read.csv(file.path(path, "seltailor_output_freq_GSV.csv"))
glimpse(freq_trace_GSV)
# Join the frequency traces with the respective ESTs
freq_mod_est_VP2 <- freq_mod_est_VP %>%
inner_join(
freq_trace_VP %>%
dplyr::select(c(names(freq_trace_VP)[grep("sound.files|ffreq", names(freq_trace_VP))])),
by = "sound.files"
)
glimpse(freq_mod_est_VP2)
freq_mod_est_GSV2 <- freq_mod_est_GSV %>%
inner_join(
freq_trace_GSV %>%
dplyr::select(c(names(freq_trace_GSV)[grep("sound.files|ffreq", names(freq_trace_GSV))])),
by = "sound.files"
)
glimpse(freq_mod_est_GSV2)
# Restore the EST class to each EST
freq_mod_est_VP2 <- fix_extended_selection_table(freq_mod_est_VP2, freq_mod_est_VP)
class(freq_mod_est_VP2)
length(attr(freq_mod_est_VP2, "wave.objects"))
freq_mod_est_GSV2 <- fix_extended_selection_table(freq_mod_est_GSV2, freq_mod_est_GSV)
class(freq_mod_est_GSV2)
length(attr(freq_mod_est_GSV2, "wave.objects"))
# Join ESTs
freq_mod_est <- rbind(freq_mod_est_VP2, freq_mod_est_GSV2)
class(freq_mod_est)
length(attr(freq_mod_est, "wave.objects"))
glimpse(freq_mod_est)
ts.df <- freq_mod_est %>%
dplyr::select(names(freq_mod_est)[grep("sound.files|selec|ffreq", names(freq_mod_est))])
# glimpse(ts.df)
# Run seltailor one more time to check traces and modify any as needed
seltailor(X = freq_mod_est[, -grep("ffreq", names(freq_mod_est))], ts.df = ts.df, wl = 378, flim = c(0.5, 9), mar = 0.05, osci = TRUE, ovlp = 90, auto.contour = TRUE, width = 15, height = 8, path = path)
Fixed a good number of traces. Read these back in and added back to the combined EST for the frequency modulation analysis.
st <- read.csv(file.path(path, "seltailor_output_2ndharm_final_check.csv"))
glimpse(st)
freq_mod_est2 <- freq_mod_est %>%
dplyr::select(-c(names(freq_mod_est)[grep("ffreq", names(freq_mod_est))])) %>%
inner_join(
st %>%
dplyr::select(names(st)[grep("sound.files|ffreq", names(st))]),
by = "sound.files"
)
# Restore EST class
freq_mod_est2 <- fix_extended_selection_table(freq_mod_est2, freq_mod_est)
class(freq_mod_est2)
length(attr(freq_mod_est2, "wave.objects"))
Make spectrograms that contain the manually traced 2nd harmonic.
ts.df <- freq_mod_est2 %>%
dplyr::select(names(freq_mod_est2)[grep("sound.files|selec|ffreq", names(freq_mod_est2))]) %>%
as.data.frame()
# glimpse(ts.df)
class(ts.df)
trackfreqs(X = freq_mod_est2, wn = "hanning", wl = 378, collev = seq(-53, 0, 1), flim = c(0.5, 9), ovlp = 90, line = FALSE, mar = 0.005, res = 200, parallel = cores, path = gpath, it = "jpeg", custom.contour = ts.df, pch = 19, col = "firebrick")
The frequency traces look great, no more tailoring needed. Note that while GSV and VP split the manual tracing work, GSV did the final round of checking to make ensure consistency in frequency tracing between observers. See the example image file below.
Second harmonic contour
Saved the EST with the final tailored 2nd harmonic traces.
freq_mod_est3 <- freq_mod_est2 %>%
dplyr::select(-c(n_calls))
saveRDS(freq_mod_est3, file.path(path, "freq_mod_est_m2h.RDS"))
# Save as a .csv as well for sharing data
write.csv(freq_mod_est3, file.path(path, "freq_mod_est_m2h.csv"), row.names = FALSE)
Read in the EST (a unique sound file per row) and the data frame with calls allocated for different analyses (sound files per row, with 6 shared across intended spatial and temporal comparisons as described above).
# The EST with the manually tailored 2nd harmonic traces
freq_mod_est <- readRDS(file.path(path, "freq_mod_est_m2h.RDS"))
# freq_mod_est <- read.csv(file.path(path, "freq_mod_est_m2h.csv"))
# glimpse(freq_mod_est)
# Read in the data frame with used for each question/comparison (some calls used for both spatial and temporal question, only 6 total)
freq_modQ_df <- readRDS(file.path(path, "freq_mod_df.RDS"))
# freq_modQ_df <- read.csv(file.path(path, "freq_mod_df.csv"))
# glimpse(freq_modQ_df)
freq_modQ_df %>%
group_by(question) %>%
dplyr::summarise(n = length(sound.files))
## # A tibble: 3 x 2
## question n
## <chr> <int>
## 1 indiv_scale 84
## 2 spatial 80
## 3 temporal 75
# 4 calls each from native range urban sites GOLF and FAGR
freq_modQ_df %>%
filter(range == "Native") %>%
group_by(site) %>%
dplyr::summarise(n_calls = length(sound.files))
## # A tibble: 13 x 2
## site n_calls
## <chr> <int>
## 1 1145 25
## 2 CHAC 5
## 3 ELTE 4
## 4 EMBR 9
## 5 FAGR 4
## 6 GOLF 4
## 7 INBR 4
## 8 INES-04 4
## 9 INES-08 4
## 10 PAVO 4
## 11 PIED 4
## 12 PLVE 4
## 13 QUEB 4
Randomly select 5 calls per range to groundtruth a peak searching method. First, make visuals of each call, then manually count peaks.
freq_mod_est2 <- freq_mod_est %>%
as_tibble()
# Randomly select 5 site-scale calls per range
set.seed(seed)
calls <- freq_mod_est2 %>%
filter(social_scale == "Site") %>%
group_by(range) %>%
nest() %>%
ungroup() %>%
dplyr::mutate(
rsamp_calls = purrr::map2(data, 5, sample_n, replace = FALSE)
) %>%
dplyr::select(-data) %>%
unnest(rsamp_calls) %>%
pull(sound.files) %>%
as.character()
calls
tmp_df <- freq_mod_est2 %>%
filter(sound.files %in% calls) %>%
droplevels()
n_tot <- 100 # the number of timepoints per frequency trace
n_rem <- 5 # remove 5 points at the start and end of each call, for 90 frequency measurements total across each call
# the number of timepoints per frequency trace, minus 5 on either end
n <- n_tot - (n_rem*2)
n
# Make visuals of the frequency traces of these calls to count peaks
dev.off()
invisible(pblapply(1:nrow(tmp_df), function(i){
# Get the frequency trace for the given call
tmp <- tmp_df[i, ]
# glimpse(tmp)
# Drop 5 points on either end
tmp <- tmp[, -grep(paste(paste("^", paste("ffreq", c(seq(1, n_rem, 1), seq(n_tot - n_rem + 1, n_tot, 1)), sep = "."), "$", sep = ""), collapse = "|"), names(tmp))]
# glimpse(tmp)
# Get the frequency trace as a vector
ftrace <- as.vector(t(tmp[, grep("ffreq", names(tmp))]))
jpeg(file.path(gpath, paste(tmp_df$sound.files[i], "count_peaks.jpeg", sep = "-")), units = "in", width = 5, height = 4, res = 200)
plot(ftrace)
lines(ftrace, col = "red")
dev.off()
}))
tmp_df$sound.files
# Sound file # "Large" peaks # Troughs following peaks
# [1] "2011_02_15_PLEA_WRIG1012_3.WAV" # 4 # 3
# [2] "2011_02_18_LAKE_WRIG1039_10.WAV" # 5 # 5
# [3] "SBD_Austin.1002.0180_resamp.WAV" # 5 # 3
# [4] "SBD_NewOrleans.1006.0493_resamp.WAV" # 5 # 4
# [5] "2017_05_21_PLVE_1005_6.WAV" # 8 # 7
# [6] "2017_10_25_PIED_1561_2.WAV" # 7 # 6
# [7] "2017_09_13_ELTE_1405_3.WAV" # 5 # 5
# [8] "2017_09_03_INBR_1269_3.WAV" # 6 # 5
# [9] "2017_11_20_GOLF_1644_1.WAV" # 6 # 5
# [10] "2011_02_15_ELEM_WRIG1013_4.WAV" # 4 # 3
# Add these vaues to the temporary data frame
tmp_df$mnl_pks <- c(4, 5, 5, 5, 8, 7, 5, 6, 6, 4)
tmp_df$mnl_trghs <- c(3, 5, 3, 4, 7, 6, 5, 5, 5, 3)
Find peaks across these calls using a general peak search method. These peaks will be used as a rough estimate of peak size, to impose a threshold on minimum peak size later on the magnitude of peaks with respect to local points. Use the peak searching code to also identify troughs, by inverting the frequency trace.
n_tot <- 100 # the number of timepoints per frequency trace
n_rem <- 5 # remove 5 points at the start and end of each call, for 90 frequency measurements total across each call
# the number of timepoints per frequency trace, minus 5 on either end
n <- n_tot - (n_rem*2)
n
nups <- 2
ndowns <- 0
minpeakheight <- 1 # 1 kHz as min peak height
zero <- "0"
pk_hght_df <- rbindlist(pblapply(1:nrow(tmp_df), function(i){
# Get the frequency trace for the given call
tmp <- tmp_df[i, ]
# glimpse(tmp)
# Drop 5 points on either end
tmp <- tmp[, -grep(paste(paste("^", paste("ffreq", c(seq(1, n_rem, 1), seq(n_tot - n_rem + 1, n_tot, 1)), sep = "."), "$", sep = ""), collapse = "|"), names(tmp))]
# glimpse(tmp)
# Get the frequency trace as a vector
ftrace <- as.vector(t(tmp[, grep("ffreq", names(tmp))]))
pks <- findpeaks(ftrace, nups = nups, ndowns = ndowns, zero = zero, minpeakheight = minpeakheight)
return(data.frame(sound.files = tmp_df$sound.files[i], range = tmp_df$range[i], peak_number = seq(1, nrow(pks), 1), peak_height = pks[, 1], peak_max_index = pks[, 2]))
}))
glimpse(pk_hght_df)
max(pk_hght_df$peak_height)
Next, I customized this routine again to locate peaks that picks up large peaks but not gradual increases. Used the dataset of 10 randomly selected calls above for ground truthing, with spline-smoothing, and determine the degree of smoothing that yields results closest to visual inspection of peaks, as well as good detection of troughs (inverted peaks). Then moved on to identifying peaks across all calls with frequency traces for the spatial and temporal questions.
n_tot <- 100 # the number of timepoints per frequency trace
n_rem <- 5 # remove 5 points at the start and end of each call, for 90 frequency measurements total across each call
# the number of timepoints per frequency trace, minus 5 on either end
n <- n_tot - (n_rem*2)
n
nups <- 2
ndowns <- 0
minpeakheight <- 1 # 1 kHz as min peak height
zero <- "0"
# Degrees of freedom for smoothing
dfs <- seq(25, 125, 25)
dfs
# Impose a threshold for minimum peak rise with respect to local neigboring points before it
# This is 1/100th of the maximum peak height determined above with this subset of calls
thresh <- round(max(pk_hght_df$peak_height)/100, 1)
thresh # 0.1
# Iterate over calls and degrees of freedom for smoothing
# i <- 1
# j <- 1
pks_gt <- rbindlist(pblapply(1:nrow(tmp_df), function(i){
tmp2 <- rbindlist(lapply(1:length(dfs), function(j){
# Get the frequency trace for the given call
tmp <- tmp_df[i, ]
# glimpse(tmp)
# Drop 5 points on either end
tmp <- tmp[, -grep(paste(paste("^", paste("ffreq", c(seq(1, n_rem, 1), seq(n_tot - n_rem + 1, n_tot, 1)), sep = "."), "$", sep = ""), collapse = "|"), names(tmp))]
# glimpse(tmp)
# Get the frequency trace as a vector
ftrace <- as.vector(t(tmp[, grep("ffreq", names(tmp))]))
# ftrace
# Fit a spline and smooth it, to remove small local maxima arising from manual tracing
# Interpolate a total of n equally spaced values (here 5 times the length of the trace), take the mean of tied x-values
splf <- spline(ftrace, n = 5*length(ftrace), method = "fmm")
# str(splf)
# plot(splf)
# plot(ftrace)
# Perform spline smoothing without weighting. Here, df represents the degree of smoothing. Higher values of df lead to less smoothing
splf_smooth <- smooth.spline(splf, df = dfs[j])
# str(splf_smooth)
# plot(splf_smooth)
# plot(splf)
# Output is a matrix, each row is a peak found, first column is the height, second column is the index of the vector where the maximum is reached, then the third and fourth are the indices where the peak starts and ends
pks <- findpeaks(splf_smooth$y, nups = nups, ndowns = ndowns, zero = zero, minpeakheight = minpeakheight)
# pks
# str(pks)
# Impose additional filters on the peaks:
# 1) Remove "peaks" identified within the last 2 points of the trace
# 2) Remove peaks with maximum values that are less than 0.1kHz than the frequency value 5 points before (if the given peak 5 points or less from the beginning, remove peaks that are less than the threshold than the point before)
# 3) Remove peaks identified close together (less than 5 points apart) that are probably points on the same wide peak
rem1 <- which(pks[, 2] > (length(splf_smooth$x) - 2))
# rem1
if(length(rem1) > 0){
pks <- pks[-rem1, ]
}
# z <- 1
rem2 <- unlist(lapply(1:nrow(pks), function(z){
# cat(paste("z = ", z, "\n"))
indx <- pks[z, 2]
idiff <- indx - 10
if(!idiff < 0 & idiff != 0){
rel_ht <- pks[z, 1] - splf_smooth$y[idiff]
if(rel_ht < thresh){
return(z)
}
} else {
idiff <- indx - 1
rel_ht <- pks[z, 1] - splf_smooth$y[idiff]
if(rel_ht < thresh){
return(z)
}
}
}))
# rem2
# pks
# Remove these indices from the peaks
if(length(rem2) > 0){
pks <- pks[-rem2, ]
}
# If more than one peak remains, find those that are too close together (5 points or less)
if(is.matrix(pks) & nrow(pks) > 0){
rem3 <- which(diff(pks[, 2]) < 5)
if(length(rem3) > 0){
pks <- pks[-rem3, ]
}
}
# Return the number of peaks
if(is.matrix(pks)){
num_peaks = nrow(pks)
} else {
num_peaks = length(pks)
}
# Identify troughs by applying the findpeaks code to the inverted frequency trace
# Removed the minpeakheight limitation here
trghs <- findpeaks(-splf_smooth$y, nups = nups, ndowns = ndowns, zero = zero)
# trghs
# str(trghs)
# Filter the troughs to retain those following the peaks identified above
# In other words, assign each trough to a peak, and drop troughs without matches
# Do the peak-trough assignments by finding the closest trough following each peak
# If more than one peak was identified for the given call
if(is.matrix(pks) & nrow(pks) > 0){
trough_df <- rbindlist(lapply(1:nrow(pks), function(z){
# Get the difference in indices between the given peak and all troughs
diff_inds <- pks[z, 2] - trghs[, 2]
# Find the negative differences (trough following a peak), and the max of these
wh <- which(diff_inds < 0 & diff_inds == max(diff_inds[diff_inds < 0]))
return(data.frame(peak_indx = pks[z, 2], trgh_indx = trghs[wh, 2]))
}))
# If only a single peak was identified per call
} else if(!is.matrix(pks) & nrow(pks) == 0){
# Get the difference in indices between the given peak and all troughs
diff_inds <- pks[2] - trghs[, 2]
# Find the negative differences (trough following a peak), and the max of these
wh <- which(diff_inds < 0 & diff_inds == max(diff_inds[diff_inds < 0]))
trough_df <- data.frame(peak_indx = pks[2], trgh_indx = trghs[wh, 2])
}
# Return the degree of smoothing and number of peaks
return(data.frame(sound.files = tmp_df$sound.files[i], range = tmp_df$range[i], num_peaks = nrow(pks), num_trghs = nrow(trough_df), mnl_num_peaks = tmp_df$mnl_pks[i], mnl_num_trghs = tmp_df$mnl_trghs[i], smooth_param = dfs[j]))
}))
return(tmp2)
}))
glimpse(pks_gt)
# View(pks_gt)
# Calculate the difference in peaks per call and smoothing parameter compared to the visually identified peaks and troughs, then find the sum of differences per smoothing parameter
pks_gt %>%
dplyr::mutate(
numdiffp = abs(num_peaks - mnl_num_peaks),
numdifft = abs(num_trghs - mnl_num_trghs)
) %>%
group_by(smooth_param) %>%
dplyr::summarise(
sumdiffp = sum(numdiffp),
sumdifft = sum(numdifft)
)
# smooth_param (df in smooth.spline) of 50 had the least number of differences in peaks identified compared to visual inspection, and more troughs. More of the measurements below depend on better peak estimates, so decided to proceed with smoothing parameter of 50
Make the code above into a function, then checked out the peaks identified across the groundtruthing calls with this smoothing threshold. I improved the peak and trough searching in this function compared to the code above.
peak_locator <- function(X, n_tot, n_rem, degf, nups, ndowns, minpeakheight, zero, thresh, img, neighb, pkdist){
tmp_df <- rbindlist(pblapply(1:nrow(X), function(i){
# The number of timepoints per frequency trace, minus 5 on either end
n <- n_tot - (n_rem*2)
# Get the frequency trace for the given call
tmp <- X[i, ]
# Drop 5 points on either end
tmp <- tmp[, -grep(paste(paste("^", paste("ffreq", c(seq(1, n_rem, 1), seq(n_tot - n_rem + 1, n_tot, 1)), sep = "."), "$", sep = ""), collapse = "|"), names(tmp))]
# Get the frequency trace as a vector
ftrace <- as.vector(t(tmp[, grep("ffreq", names(tmp))]))
# Fit a spline and smooth it, to remove small local maxima arising from manual tracing
# Interpolate a total of n equally spaced values (here 5 times the length of the trace), take the mean of tied x-values
splf <- spline(ftrace, n = 5*length(ftrace), method = "fmm")
# Perform spline smoothing without weighting. Here, df represents the degree of smoothing. Higher values of df lead to less smoothing
splf_smooth <- smooth.spline(splf, df = degf)
# Output is a matrix, each row is a peak found, first column is the height, second column is the index of the vector where the maximum is reached, then the third and fourth are the indices where the peak starts and ends
pks <- findpeaks(splf_smooth$y, nups = nups, ndowns = ndowns, zero = zero, minpeakheight = minpeakheight)
# pks
# str(pks)
# Impose additional filters on the peaks:
# 1) Remove "peaks" identified within the last 2 points of the trace
# 2) Remove peaks with maximum values that when compared to the frequency value of neighoring points before have a difference of less than the given threshold (if the given peak is the number of neighboring points or less from the beginning, compare to neighbors within half the distance of the index from the start point)
# 3) Remove peaks identified close together that are probably points on the same wide peak
rem1 <- which(pks[, 2] > (length(splf_smooth$x) - 2))
if(length(rem1) > 0){
pks <- pks[-rem1, ]
}
rem2 <- unlist(lapply(1:nrow(pks), function(z){
indx <- pks[z, 2]
idiff <- indx - neighb
if(!idiff < 0 & idiff != 0){
rel_ht <- pks[z, 1] - splf_smooth$y[idiff]
if(rel_ht < thresh){
return(z)
}
} else {
idiff <- indx - (indx/2)
rel_ht <- pks[z, 1] - splf_smooth$y[idiff]
if(rel_ht < thresh){
return(z)
}
}
}))
# Remove these indices from the peaks
if(length(rem2) > 0){
pks <- pks[-rem2, ]
}
# If more than one peak remains, find those that are too close together
if(is.matrix(pks) & nrow(pks) > 0){
rem3 <- which(diff(pks[, 2]) < pkdist)
if(length(rem3) > 0){
pks <- pks[-rem3, ]
}
}
# Identify troughs by applying the findpeaks code to the inverted frequency trace
# Removed the minpeakheight limitation here
trghs <- findpeaks(-splf_smooth$y, nups = nups, ndowns = ndowns, zero = zero)
# Filter the troughs to retain those following the peaks identified above
# In other words, assign each trough to a peak, and drop troughs without matches
# Do the peak-trough assignments by finding the closest trough following each peak
# If more than one peak was identified for the given call
if(is.matrix(pks) & nrow(pks) > 0){
trough_df <- rbindlist(lapply(1:nrow(pks), function(z){
# Get the difference in indices between the given peak and all troughs
diff_inds <- pks[z, 2] - trghs[, 2]
# Find the negative differences (trough following a peak)
negs <- diff_inds[diff_inds < 0]
# Check whether there are 2 troughs between this peak and the next
# If so, return the deepest trough, to account for tall peaks with shoulders that drop lower after the shoulder
if(z != nrow(pks) & z > 1){
neigh_trghs <- which(trghs[, 2] >= pks[z, 2] & trghs[, 2] <= pks[z + 1, 2])
# If there are neighboring troughs prior to the next peak, and the first trough is not as deep as the following troughs, find which of the following troughs is the deepest
if(length(neigh_trghs) > 1 & all(trghs[neigh_trghs[1], 1] < trghs[neigh_trghs[-1], 1]) & length(negs) > 0){
wh <- which(diff_inds < 0 & seq(1, nrow(trghs), 1) %in% neigh_trghs & diff_inds == min(diff_inds[seq(1, nrow(trghs), 1) %in% neigh_trghs]))
# If there are neighboring troughs prior to the next peak, but the first trough is deeper than the following troughs, get this first trough
} else if(length(neigh_trghs) > 1 & all(trghs[neigh_trghs[1], 1] > trghs[neigh_trghs[-1], 1]) & length(negs) > 0){
wh <- neigh_trghs[1]
# Else if there's just a single trough identified between peaks
} else if(length(neigh_trghs) == 1 & length(negs) > 0){
wh <- which(diff_inds < 0 & diff_inds == max(negs, na.rm = TRUE))
# Else if no neighboring trough is identified for the given peak
} else if(length(neigh_trghs) == 0){
wh <- NULL
}
# Else if on the first or last peak, just a single trough or none should be identified
} else if(z == nrow(pks) | z == 1 & length(negs) > 0){
if(length(negs) > 0){
wh <- which(diff_inds < 0 & diff_inds == max(negs, na.rm = TRUE))
} else {
wh <- NULL
}
}
# Return a data frame with the slope of the frequency curve between the peak and its matching trough, unless the peak wasn't matched to a trough (e.g. a peak close to the end of a call)
if(length(wh) > 0){
pt_slope <- (splf_smooth$y[trghs[wh, 2]] - splf_smooth$y[pks[z, 2]])/(splf_smooth$x[trghs[wh, 2]] - splf_smooth$x[pks[z, 2]])
# Make sure to convert the trough heights back to positive values here
return(data.frame(peak_indx = pks[z, 2], trgh_indx = trghs[wh, 2], trgh_hght = -trghs[wh, 1], pt_slope = pt_slope))
} else{
return(data.frame(peak_indx = pks[z, 2], trgh_indx = NA, trgh_hght = NA, pt_slope = NA))
}
}))
# If only a single peak was identified per call
} else if(!is.matrix(pks) & nrow(pks) == 0){
# Get the difference in indices between the given peak and all troughs
diff_inds <- pks[2] - trghs[, 2]
# Find the negative differences (trough following a peak), and the max of these if troughs were found
negs <- diff_inds[diff_inds < 0]
if(length(negs) > 0){
wh <- which(diff_inds < 0 & diff_inds == max(negs, na.rm = TRUE))
} else {
wh <- NULL
}
# Return a data frame with the difference in height between the peak and its matching trough
if(length(wh) > 0){
(splf_smooth$y[trghs[wh, 2]] - splf_smooth$y[pks[2]])/(splf_smooth$x[trghs[wh, 2]] - splf_smooth$x[pks[2]])
return(data.frame(peak_indx = pks[2], trgh_indx = trghs[wh, 2], trgh_hght = -trghs[wh, 1], pt_slope = pt_slope))
} else{
return(data.frame(peak_indx = pks[z, 2], trgh_indx = NA, trgh_hght = NA, pt_slope = NA))
}
}
# If img = TRUE, generate an image file with the peaks and troughs
if(img){
jpeg(file.path(gpath, paste(X$sound.files[i], "freq_peaks.jpeg", sep = "-")), units = "in", width = 5, height = 4, res = 200)
plot(splf_smooth$y)
points(x = pks[, 2], y = pks[, 1], pch = 4, cex = 2, col = "red")
points(x = trough_df$trgh_indx, y = trough_df$trgh_hght, pch = 4, cex = 2, col = "blue")
dev.off()
}
# Return the peaks found per call, the rate of change, and the change in frequency between matched peaks and troughs
return(data.frame(
sound.files = X$sound.files[i],
range = X$range[i],
site = X$site[i],
year = X$year[i],
site_year = X$site_year[i],
region = X$region[i],
dept_state = X$dept_state[i],
invasive_city = X$invasive_city[i],
peak_number = seq(1, nrow(pks), 1),
peak_height = pks[, 1],
peak_max_index = pks[, 2],
peak_start_index = pks[, 3],
peak_end_index = pks[, 4],
peak_trgh_slope = trough_df$pt_slope,
trgh_index = trough_df$trgh_indx
)
)
}))
return(tmp_df)
}
# Used the resulting image files to visually assess how well this customized routine picks up large peaks and neighboring troughs
peak_res <- peak_locator(X = tmp_df, n_tot = 100, n_rem = 5, nups = 2, ndowns = 0, minpeakheight = 1, zero = "0", degf = 50, thresh = 0.05, img = TRUE, neighb = 25, pkdist = 20)
# glimpse(peak_res)
This peak locator function performed well in locating large frequency peaks, and some smaller ones for the reduced call dataset. Each peak was matched pretty well with a following trough. For some wider peaks, the trough comes somewhat late, but overall still looks great for final analysis.
Next, checked peaks identified for all calls for which we performed frequency tracing. Here, I checked that all large visible peaks had been identified using the same parameters as for the subset of calls above. A handful of calls had one peak not identified, but these peaks were medium-sized and typically nested right in between larger peaks (more difficult to identify). Other calls had smaller peaks identified. Overall, the function picked up large modulation peaks per call, and results looked good to continue (see example image file below).
freq_mod_df <- freq_modQ_df %>%
inner_join(
freq_mod_est %>%
as.data.frame() %>%
dplyr::select(names(freq_mod_est)[grep("sound.files|ffreq", names(freq_mod_est))]),
by = "sound.files"
) %>%
droplevels() %>%
# Remove duplicated sound files chosen for both spatial and temporal comparisons
filter(!duplicated(sound.files))
# glimpse(freq_mod_df)
nrow(freq_mod_df)
## [1] 233
peaks_troughs_df <- peak_locator(X = freq_mod_df, n_tot = 100, n_rem = 5, nups = 2, ndowns = 0, minpeakheight = 1, zero = "0", degf = 50, thresh = 0.05, img = TRUE, neighb = 25, pkdist = 20)
glimpse(peaks_troughs_df)
# Save this data for use later as needed
saveRDS(peaks_troughs_df, file.path(path, "peaks_troughs_df.RDS"))
# Save as a .csv as well for sharing data
write.csv(peaks_troughs_df, file.path(path, "peaks_troughs_df.csv"), row.names = FALSE)
The image files look great. Nearly every peak was assigned a trough. Some intermediate peaks close to larger peaks were picked up, and some more gradual increases were picked up as peaks, but overall, the function did a good job with identifying large, visible peaks and troughs across all calls for which we performed frequency tracing. In the image file below, peaks are shown in red and troughs in dark blue.
Spline-smoothed second harmonic frequency contour with peaks and troughs for 2017_09_13_QUEB_1432_2.WAV
I performed a final found of filtering using the frequency modulation measurements for the full dataset prior to parsing out calls chosen above for the spatial comparison between ranges. Here I removed very small peaks that would lead to overestimation of frequency modulation.
# The function returns one row per peak, with trough information per row
# Dropping peaks below will therefore drop associated trough information
peaks_trghs_df <- readRDS(file.path(path, "peaks_troughs_df.RDS"))
# peaks_trghs_df <- read.csv(file.path(path, "peaks_troughs_df.csv"))
glimpse(peaks_trghs_df)
## Rows: 1,309
## Columns: 15
## $ sound.files <fct> 2017_07_11_INES-04_MZ000149_1.WAV, 2017_07_11_INES-04…
## $ range <fct> Native, Native, Native, Native, Native, Native, Nativ…
## $ site <fct> INES-04, INES-04, INES-04, INES-04, INES-04, INES-04,…
## $ year <fct> 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017,…
## $ site_year <fct> INES-04_2017, INES-04_2017, INES-04_2017, INES-04_201…
## $ region <fct> West, West, West, West, West, West, West, West, West,…
## $ dept_state <fct> Colonia, Colonia, Colonia, Colonia, Colonia, Colonia,…
## $ invasive_city <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ peak_number <dbl> 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4,…
## $ peak_height <dbl> 2.677283, 4.936604, 4.949482, 4.875870, 5.060024, 4.8…
## $ peak_max_index <dbl> 46, 108, 150, 203, 256, 304, 364, 53, 87, 144, 203, 2…
## $ peak_start_index <dbl> 1, 54, 125, 176, 229, 280, 352, 1, 59, 116, 167, 226,…
## $ peak_end_index <dbl> 54, 125, 176, 229, 280, 339, 443, 59, 116, 167, 226, …
## $ peak_trgh_slope <dbl> -0.016857731, -0.395336964, -0.336700987, -0.35746471…
## $ trgh_index <dbl> 54, 125, 176, 229, 280, 339, 443, 59, 116, 167, 226, …
# How many peaks in this dataset weren't matched to troughs? 2
length(which(is.na(peaks_trghs_df$peak_trgh_slope)))
## [1] 2
# Remove these for distribution calculations below
peaks_trghs_df2 <- peaks_trghs_df %>%
filter(!is.na(peak_trgh_slope)) %>%
droplevels()
# What is the distribution of peak - trough ranges? Can this be used to exclude smaller peaks?
range(peaks_trghs_df2$peak_trgh_slope, na.rm = TRUE)
## [1] -0.90275383 0.01155562
# hist(peaks_trghs_df2$peak_trgh_slope, breaks = 50)
# If we split these distances into 50 bins (yields similar results to 100 intervals), then discard the last two bins of slopes, which represent the smallest peaks
levels(cut(peaks_trghs_df2$peak_trgh_slope, 50))
## [1] "(-0.904,-0.884]" "(-0.884,-0.866]" "(-0.866,-0.848]"
## [4] "(-0.848,-0.83]" "(-0.83,-0.811]" "(-0.811,-0.793]"
## [7] "(-0.793,-0.775]" "(-0.775,-0.756]" "(-0.756,-0.738]"
## [10] "(-0.738,-0.72]" "(-0.72,-0.702]" "(-0.702,-0.683]"
## [13] "(-0.683,-0.665]" "(-0.665,-0.647]" "(-0.647,-0.628]"
## [16] "(-0.628,-0.61]" "(-0.61,-0.592]" "(-0.592,-0.574]"
## [19] "(-0.574,-0.555]" "(-0.555,-0.537]" "(-0.537,-0.519]"
## [22] "(-0.519,-0.5]" "(-0.5,-0.482]" "(-0.482,-0.464]"
## [25] "(-0.464,-0.446]" "(-0.446,-0.427]" "(-0.427,-0.409]"
## [28] "(-0.409,-0.391]" "(-0.391,-0.372]" "(-0.372,-0.354]"
## [31] "(-0.354,-0.336]" "(-0.336,-0.318]" "(-0.318,-0.299]"
## [34] "(-0.299,-0.281]" "(-0.281,-0.263]" "(-0.263,-0.244]"
## [37] "(-0.244,-0.226]" "(-0.226,-0.208]" "(-0.208,-0.19]"
## [40] "(-0.19,-0.171]" "(-0.171,-0.153]" "(-0.153,-0.135]"
## [43] "(-0.135,-0.116]" "(-0.116,-0.0982]" "(-0.0982,-0.0799]"
## [46] "(-0.0799,-0.0616]" "(-0.0616,-0.0433]" "(-0.0433,-0.025]"
## [49] "(-0.025,-0.00673]" "(-0.00673,0.0125]"
# Add these bins back to the data frame, then drop peaks that fell into the first interval
peaks_trghs_df2$peak_size_class <- cut(peaks_trghs_df2$peak_trgh_slope, 50, labels = FALSE)
# Numbers of peaks by interval
# 160 peaks will be dropped after excluding peaks in the last two bins
table(peaks_trghs_df2$peak_size_class)
##
## 1 11 12 13 15 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37
## 1 1 1 2 1 1 1 1 3 3 6 9 12 9 8 17 28 25 25 33 46 52 61 50 76 69
## 38 39 40 41 42 43 44 45 46 47 48 49 50
## 94 81 80 58 55 42 54 28 31 31 52 95 65
# Exclude peaks in the first 2 bins
peaks_trghs_df3 <- peaks_trghs_df2 %>%
dplyr::mutate(
peak_size_class = as.numeric(peak_size_class)
) %>%
filter(peak_size_class < 49) %>%
droplevels() %>%
# Remove the peak_size_class column
dplyr::select(-c(peak_size_class)) %>%
# Add back the peaks that were not assigned to troughs
bind_rows(
peaks_trghs_df %>%
filter(is.na(peak_trgh_slope)) %>%
droplevels()
)
# 1149 peaks remain, 160 were removed, looks great
nrow(peaks_trghs_df3)
## [1] 1149
nrow(peaks_trghs_df) - nrow(peaks_trghs_df3)
## [1] 160
Obtained frequency modulation measurements and a standard set of spectral acoustic measurements for the calls set aside for the spatial comparison between ranges.
calls <- freq_modQ_df %>%
filter(question == "spatial") %>%
pull(sound.files) %>%
as.character()
length(calls)
## [1] 80
peaks_spat_df <- peaks_trghs_df3 %>%
filter(sound.files %in% calls) %>%
droplevels()
# glimpse(peaks_spat_df)
# 40 native range calls, 40 invasive range calls, each call represents a "unique" individual
# 4 randomly sampled calls from 10 randomly sites per range
peaks_spat_df %>%
group_by(range) %>%
dplyr::summarise(
n_calls = n_distinct(sound.files)
)
## # A tibble: 2 x 2
## range n_calls
## <fct> <int>
## 1 Native 40
## 2 Invasive 40
# How many years are represented in the invasive range? More calls from 2004, but some from each 2011 and 2019
peaks_spat_df %>%
filter(range == "Invasive") %>%
group_by(year) %>%
dplyr::summarise(
n_calls = n_distinct(sound.files)
)
## # A tibble: 3 x 2
## year n_calls
## <fct> <int>
## 1 2004 24
## 2 2011 8
## 3 2019 8
# Native spatial areas represented?
peaks_spat_df %>%
filter(range == "Native") %>%
group_by(region, dept_state) %>%
dplyr::summarise(
n_calls = n_distinct(sound.files)
)
## `summarise()` has grouped output by 'region'. You can override using the `.groups` argument.
## # A tibble: 5 x 3
## # Groups: region [3]
## region dept_state n_calls
## <fct> <fct> <int>
## 1 West Colonia 12
## 2 East Maldonado 12
## 3 East San Jose 4
## 4 Central Montevideo 8
## 5 Central Canelones 4
# Which native range sites? None of these had known repeatedly sampled individuals
peaks_spat_df %>%
filter(range == "Native") %>%
group_by(site) %>%
dplyr::summarise(
n_calls = n_distinct(sound.files)
)
## # A tibble: 10 x 2
## site n_calls
## <fct> <int>
## 1 INES-04 4
## 2 PLVE 4
## 3 QUEB 4
## 4 PAVO 4
## 5 PIED 4
## 6 FAGR 4
## 7 ELTE 4
## 8 INBR 4
## 9 INES-08 4
## 10 GOLF 4
# Invasive spatial areas represented?
# More representation from Texas, as expected
peaks_spat_df %>%
filter(range == "Invasive") %>%
group_by(region, dept_state) %>%
dplyr::summarise(
n_calls = n_distinct(sound.files)
)
## `summarise()` has grouped output by 'region'. You can override using the `.groups` argument.
## # A tibble: 4 x 3
## # Groups: region [3]
## region dept_state n_calls
## <fct> <fct> <int>
## 1 Southwest Texas 24
## 2 Southeast Louisiana 4
## 3 Southeast Florida 4
## 4 Northeast Connecticut 8
# Which inasvive range site-years?
# We recorded repeatedly sampled individuals at some of these sites
# But calls from repeatedly sampled individuals used in the final dataset are not included here (see below)
peaks_spat_df %>%
filter(range == "Invasive") %>%
group_by(site_year) %>%
dplyr::summarise(
n_calls = n_distinct(sound.files)
)
## # A tibble: 10 x 2
## site_year n_calls
## <fct> <int>
## 1 SOCC_2004 4
## 2 BALL_2004 4
## 3 INTR_2011 4
## 4 VALL_2004 4
## 5 AUDU_2004 4
## 6 AIRP_2019 4
## 7 BAPT_2004 4
## 8 MANO_2019 4
## 9 SHAK_2004 4
## 10 SOFT_2011 4
# Were any calls from known repeatedly sampled individuals used in frequency tracing?
# Yes, but these 2 calls are from birds that were not retained as repeatedly sampled individuals in the final dataset (not enough calls remained after pre-processing)
unique(peaks_spat_df$sound.files[grep("site_scale", peaks_spat_df$sound.files)])
## [1] 2019_08_07_AIRP_GSV1040_8_site_scale.WAV
## [2] 2019_08_07_AIRP_GSV1043_2_site_scale.WAV
## 80 Levels: 2017_07_11_INES-04_MZ000149_1.WAV ...
Calculated frequency modulation measurements per call.
# Get the number of peaks per call
num_peaks_df <- peaks_spat_df %>%
# Find the number of peaks per call
group_by(range, sound.files) %>%
dplyr::summarise(
num_peaks = length(peak_number)
) %>%
ungroup() %>%
# Ensure levels of the range column are in the expected order
dplyr::mutate(
range = as.character(range),
range = factor(range, levels = c("Native", "Invasive"))
) %>%
dplyr::select(c(sound.files, num_peaks))
## `summarise()` has grouped output by 'range'. You can override using the `.groups` argument.
glimpse(num_peaks_df)
## Rows: 80
## Columns: 2
## $ sound.files <fct> 2017_07_11_INES-04_MZ000149_1.WAV, 2017_07_11_INES-04_MZ00…
## $ num_peaks <int> 6, 6, 5, 7, 7, 5, 7, 5, 7, 7, 6, 5, 6, 5, 5, 5, 6, 8, 6, 7…
# Modulation rate per call
modRate_df <- peaks_spat_df %>%
# Calculate duration per call
inner_join(
freq_mod_df %>%
dplyr::mutate(
duration = end - start
) %>%
dplyr::select(sound.files, duration),
by = "sound.files"
) %>%
# Calculate modulation rate value per call
group_by(range, sound.files, duration) %>%
dplyr::summarise(
num_peaks = length(peak_number)
) %>%
ungroup() %>%
dplyr::mutate(
modulation_rate = (num_peaks/duration) # units for rate of change = peaks/second
) %>%
# Ensure levels of the range column are in the expected order
dplyr::mutate(
range = as.character(range),
range = factor(range, levels = c("Native", "Invasive"))
) %>%
dplyr::select(c(sound.files, modulation_rate))
## `summarise()` has grouped output by 'range', 'sound.files'. You can override using the `.groups` argument.
glimpse(modRate_df)
## Rows: 80
## Columns: 2
## $ sound.files <chr> "2017_05_21_PLVE_1005_2.WAV", "2017_05_21_PLVE_1005_3.…
## $ modulation_rate <dbl> 32.02348, 42.26115, 38.95694, 32.40491, 33.43053, 32.5…
# Maximum peak - trough slope per call
peakTroughSlope_df <- peaks_spat_df %>%
# Drop peaks that were not matched to troughs
filter(!is.na(peak_trgh_slope)) %>%
# Ensure levels of the range column are in the expected order
dplyr::mutate(
range = as.character(range),
range = factor(range, levels = c("Native", "Invasive"))
) %>%
# Get maximum (absolute) peak-trough range per call
group_by(range, sound.files) %>%
dplyr::summarise(
max_peak_trgh_slope = min(peak_trgh_slope)
) %>%
ungroup() %>%
dplyr::select(sound.files, max_peak_trgh_slope)
## `summarise()` has grouped output by 'range'. You can override using the `.groups` argument.
glimpse(peakTroughSlope_df)
## Rows: 80
## Columns: 2
## $ sound.files <fct> 2017_07_11_INES-04_MZ000149_1.WAV, 2017_07_11_INES…
## $ max_peak_trgh_slope <dbl> -0.4120342, -0.4661997, -0.2942515, -0.4826950, -0…
Combined these frequency modulation measurements with the 27 standard spectral acoustic measurements used in machine learning (see previous script, prior to filtering for collinearity).
acoustic_measuremnts <- read.csv(file.path(path, "acoustic_parameters.csv")) %>%
dplyr::select(-c(selec))
glimpse(acoustic_measuremnts)
## Rows: 1,596
## Columns: 28
## $ sound.files <chr> "2017_07_17_EMBR_INDIV-RAW_MZ000199_1.WAV", "2017_07_17_EM…
## $ duration <dbl> 0.2146363, 0.1810324, 0.2009257, 0.1878781, 0.1709179, 0.1…
## $ meanfreq <dbl> 4.261767, 4.207837, 4.300952, 4.449192, 4.737341, 4.649069…
## $ sd <dbl> 1.745325, 1.934379, 1.751721, 1.747474, 1.874080, 1.903127…
## $ freq.median <dbl> 4.106908, 3.744148, 4.157787, 4.316218, 4.539256, 4.525018…
## $ freq.Q25 <dbl> 3.114309, 2.799090, 2.933548, 3.283657, 3.263085, 3.113488…
## $ freq.Q75 <dbl> 4.992325, 5.065020, 5.018735, 5.242329, 6.113981, 5.967368…
## $ freq.IQR <dbl> 1.878015, 2.265930, 2.085187, 1.958673, 2.850895, 2.853880…
## $ time.median <dbl> 0.09390873, 0.09818989, 0.09419643, 0.08540507, 0.07958632…
## $ time.Q25 <dbl> 0.04829592, 0.05314866, 0.04844388, 0.04225304, 0.04250633…
## $ time.Q75 <dbl> 0.1466765, 0.1396278, 0.1462287, 0.1420421, 0.1175707, 0.1…
## $ time.IQR <dbl> 0.09838057, 0.08647917, 0.09778486, 0.09978908, 0.07506437…
## $ skew <dbl> 2.312270, 2.415746, 1.805732, 2.205839, 1.796657, 1.766370…
## $ kurt <dbl> 9.950427, 11.207737, 6.606223, 9.014162, 7.522156, 7.14385…
## $ sp.ent <dbl> 0.9416864, 0.9436562, 0.9395492, 0.9406724, 0.9496999, 0.9…
## $ time.ent <dbl> 0.8827616, 0.8870927, 0.8842926, 0.8860298, 0.8891064, 0.8…
## $ entropy <dbl> 0.8312847, 0.8371105, 0.8308364, 0.8334637, 0.8443843, 0.8…
## $ sfm <dbl> 0.6051624, 0.6396942, 0.5738982, 0.5991476, 0.6141410, 0.6…
## $ meandom <dbl> 3.728250, 3.458416, 3.490407, 3.845556, 4.110351, 3.755370…
## $ mindom <dbl> 2.275000, 0.875000, 0.875000, 2.275000, 2.391667, 0.525000…
## $ maxdom <dbl> 4.841667, 5.075000, 5.425000, 5.075000, 7.291667, 7.291667…
## $ dfrange <dbl> 2.566667, 4.200000, 4.550000, 2.800000, 4.900000, 6.766667…
## $ modindx <dbl> 16.863636, 7.194444, 11.307692, 18.333333, 12.928571, 9.91…
## $ startdom <dbl> 4.608333, 1.225000, 0.875000, 3.791667, 5.191667, 5.541667…
## $ enddom <dbl> 3.3250000, 3.2083333, 3.5583333, 3.5583333, 4.6083333, 1.2…
## $ dfslope <dbl> -5.9791080, 10.9556803, 13.3548542, -1.2419399, -3.4129451…
## $ meanpeakf <dbl> 4.141667, 3.325000, 4.258333, 4.375000, 4.608333, 2.508333…
## $ peakf <dbl> 4.14631312, 3.24191609, 4.05568720, 4.30036206, 3.12994163…
ameas_df <- acoustic_measuremnts %>%
# Add back the range column
inner_join(
nat_inv_est %>%
as_tibble() %>%
dplyr::select(range, sound.files),
by = "sound.files"
) %>%
# Filter by calls used in the frequency modulation analysis
filter(sound.files %in% peaks_spat_df$sound.files) %>%
# Join with the frequency modulation measurements
full_join(
num_peaks_df,
by = "sound.files"
) %>%
full_join(
modRate_df,
by = "sound.files"
) %>%
full_join(
peakTroughSlope_df,
by = "sound.files"
)
glimpse(ameas_df)
## Rows: 80
## Columns: 32
## $ sound.files <chr> "2017_05_21_PLVE_1005_2.WAV", "2017_05_21_PLVE_100…
## $ duration <dbl> 0.1561354, 0.1656367, 0.1796856, 0.1542976, 0.1794…
## $ meanfreq <dbl> 4.151223, 3.808729, 4.141827, 3.912671, 3.872825, …
## $ sd <dbl> 1.877118, 1.537948, 1.599546, 1.543469, 1.466119, …
## $ freq.median <dbl> 4.080633, 3.790128, 4.012443, 3.942792, 3.593443, …
## $ freq.Q25 <dbl> 2.709872, 2.914773, 2.932547, 2.762777, 2.935738, …
## $ freq.Q75 <dbl> 5.086285, 4.375710, 4.769483, 4.746758, 4.351475, …
## $ freq.IQR <dbl> 2.376413, 1.460938, 1.836935, 1.983982, 1.415738, …
## $ time.median <dbl> 0.06718847, 0.07332887, 0.08759500, 0.07807336, 0.…
## $ time.Q25 <dbl> 0.03359424, 0.04254885, 0.05327944, 0.03812885, 0.…
## $ time.Q75 <dbl> 0.1153099, 0.1086354, 0.1182984, 0.1152944, 0.1253…
## $ time.IQR <dbl> 0.08171571, 0.06608651, 0.06501897, 0.07716553, 0.…
## $ skew <dbl> 2.470123, 2.159739, 2.303599, 1.797583, 2.263534, …
## $ kurt <dbl> 11.029776, 7.635150, 8.583492, 6.169945, 8.442140,…
## $ sp.ent <dbl> 0.9507935, 0.9196218, 0.9271719, 0.9265415, 0.9097…
## $ time.ent <dbl> 0.8926133, 0.8900954, 0.8870764, 0.8930942, 0.8872…
## $ entropy <dbl> 0.8486909, 0.8185512, 0.8224723, 0.8274888, 0.8072…
## $ sfm <dbl> 0.6920527, 0.5146865, 0.5469261, 0.5295827, 0.4444…
## $ meandom <dbl> 3.647351, 3.381410, 3.461013, 3.669542, 3.383333, …
## $ mindom <dbl> 2.2750000, 0.5250000, 0.8750000, 1.9250000, 0.5250…
## $ maxdom <dbl> 4.958333, 4.491667, 5.075000, 5.191667, 4.375000, …
## $ dfrange <dbl> 2.683333, 3.966667, 4.200000, 3.266667, 3.850000, …
## $ modindx <dbl> 14.347826, 5.911765, 10.055556, 11.321429, 8.36363…
## $ startdom <dbl> 2.3916667, 2.7416667, 4.3750000, 2.2750000, 0.5250…
## $ enddom <dbl> 3.3250000, 0.5250000, 3.2083333, 4.7250000, 3.0916…
## $ dfslope <dbl> 5.9777166, -13.3826988, -6.4928236, 15.8784059, 14…
## $ meanpeakf <dbl> 4.258333, 4.025000, 4.025000, 4.608333, 3.208333, …
## $ peakf <dbl> 4.245433, 2.879236, 4.101161, 4.354276, 3.080760, …
## $ range <fct> Native, Native, Native, Native, Native, Native, Na…
## $ num_peaks <int> 5, 7, 7, 5, 6, 6, 5, 7, 6, 7, 6, 6, 6, 6, 6, 4, 5,…
## $ modulation_rate <dbl> 32.02348, 42.26115, 38.95694, 32.40491, 33.43053, …
## $ max_peak_trgh_slope <dbl> -0.2642049, -0.5134121, -0.4015570, -0.4108509, -0…
Assess normality per acoustic measurement per range.
vars <- names(ameas_df)[-grep("^range$|^sound.files$", names(ameas_df))]
# Shapiro-Wilks test to assess normality
shap_df <- ameas_df %>%
dplyr::select(-c(sound.files)) %>%
dplyr::mutate(
range = factor(range, levels = c("Native", "Invasive"))
) %>%
dplyr::group_by(range) %>%
rstatix::shapiro_test(vars = vars)
# More than 10 acoustic measurements violate the assumption of normality for at least one range
shap_df %>%
filter(p <= 0.05) %>%
pull(variable) %>%
unique()
## [1] "dfrange" "enddom" "kurt"
## [4] "max_peak_trgh_slope" "maxdom" "meanpeakf"
## [7] "mindom" "num_peaks" "peakf"
## [10] "startdom" "time.ent" "dfslope"
Assess homogeneity of variance per acoustic measurement between ranges.
vars <- names(ameas_df)[-grep("^range$|^sound.files$", names(ameas_df))]
# Levene's test to assess homogeneity of variance
lev_df <- ameas_df %>%
dplyr::select(-c(sound.files)) %>%
dplyr::mutate(
range = factor(range, levels = c("Native", "Invasive"))
) %>%
# Convert to long format
pivot_longer(
cols = names(.)[-grep("^range$|^sound.files$", names(.))],
names_to = "measurement",
values_to = "values"
) %>%
group_by(measurement) %>%
rstatix::levene_test(values ~ range)
lev_df
## # A tibble: 30 x 5
## measurement df1 df2 statistic p
## <chr> <int> <int> <dbl> <dbl>
## 1 dfrange 1 78 1.30 0.257
## 2 dfslope 1 78 4.33 0.0408
## 3 duration 1 78 2.10 0.152
## 4 enddom 1 78 0.0367 0.849
## 5 entropy 1 78 1.69 0.198
## 6 freq.IQR 1 78 5.78 0.0186
## 7 freq.median 1 78 1.12 0.294
## 8 freq.Q25 1 78 0.432 0.513
## 9 freq.Q75 1 78 0.375 0.542
## 10 kurt 1 78 0.649 0.423
## # … with 20 more rows
# Five acoustic measurements violate the assumption of equal variance between ranges
lev_df %>%
filter(p <= 0.05) %>%
pull(measurement)
## [1] "dfslope" "freq.IQR" "meanpeakf" "mindom" "time.ent"
Calculated Cliff’s delta as the effect size of range per acoustic measurement.
var_nms <- names(ameas_df)[-grep("^range$|^sound.files$", names(ameas_df))]
var_nms
# i <- 1
eff_res_df <- rbindlist(pblapply(1:length(var_nms), function(i){
tmp_df <- ameas_df %>%
dplyr::select(range, var_nms[i])
nat <- tmp_df %>%
filter(range == "Native") %>%
pull(var_nms[i])
inv <- tmp_df %>%
filter(range == "Invasive") %>%
pull(var_nms[i])
# Get Cliff's delta for independent groups (all values of y compared to all values of x)
# orddom package v 3.1
cliff_delta <- dmes(nat, inv)$dc
# Using the bias-corrected and accelerated bootstrap method, found to be robust to non-normality and heterogeneous variance in a simulation study (Hess & Kromey 2004)
# Using 10000 bootstrap iterations
ci_res <- dmes.boot(nat, inv, theta.es = "dc", ci.meth = "BCA", B = 10000)
ci_res
lower_95_CI <- ci_res$theta.bci.lo
upper_95_CI <- ci_res$theta.bci.up
return(data.frame(
measurement = var_nms[i],
cliff_delta = cliff_delta,
lower_95_CI = lower_95_CI,
upper_95_CI = upper_95_CI
))
}))
eff_res_df
write.csv(eff_res_df, file.path(path, "cliffs_delta_bootstrappedCIs.csv"), row.names = FALSE)
Effect sizes ordered by absolute magnitude. Add median and interquartile range per acoustic measurement per range.
eff_res_df <- read.csv(file.path(path, "cliffs_delta_bootstrappedCIs.csv"))
glimpse(eff_res_df)
## Rows: 30
## Columns: 4
## $ measurement <chr> "duration", "meanfreq", "sd", "freq.median", "freq.Q25", "…
## $ cliff_delta <dbl> -0.291250, -0.056250, 0.781250, -0.188750, -0.255000, 0.19…
## $ lower_95_CI <dbl> -0.522500, -0.316250, 0.606250, -0.433750, -0.495000, -0.0…
## $ upper_95_CI <dbl> -0.027500, 0.198750, 0.891250, 0.075000, 0.006250, 0.43125…
# Make a table for appendix in order of largest absolute effect size
eff_res_df %>%
dplyr::mutate(
effect_size = round(cliff_delta, 2),
lower_95_CI = round(lower_95_CI, 2),
upper_95_CI = round(upper_95_CI, 2),
`95_CI` = paste("(", lower_95_CI, ", ", upper_95_CI, ")", sep = "")
) %>%
arrange(desc(abs(effect_size))) %>%
dplyr::select(measurement, effect_size, `95_CI`) %>%
left_join(
ameas_df %>%
dplyr::select(-c(sound.files)) %>%
dplyr::mutate(
range = factor(range, levels = c("Native", "Invasive"))
) %>%
# Convert to long format
pivot_longer(
cols = names(.)[-grep("^range$|^sound.files$", names(.))],
names_to = "measurement",
values_to = "values"
) %>%
group_by(measurement, range) %>%
dplyr::summarise(
median = round(median(values), 3),
IQR = round(IQR(values), 3)
) %>%
# Convert to long format again
pivot_longer(
cols = c(median, IQR),
names_to = "statistic",
values_to = "values"
) %>%
# Make wider to split out the mean and se columns
pivot_wider(
names_from = c(statistic, range),
values_from = values
) %>%
ungroup() %>%
dplyr::mutate(
median_IQR_NAT = paste(median_Native, IQR_Native, sep = "; "),
median_IQR_INV = paste(median_Invasive, IQR_Invasive, sep = "; ")
),
by = "measurement"
) %>%
rowid_to_column() %>%
dplyr::select(rowid, measurement, median_IQR_NAT, median_IQR_INV, effect_size, `95_CI`) %>%
kable()
| rowid | measurement | median_IQR_NAT | median_IQR_INV | effect_size | 95_CI |
|---|---|---|---|---|---|
| 1 | sd | 1.692; 0.161 | 1.864; 0.191 | 0.78 | (0.61, 0.89) |
| 2 | num_peaks | 6; 1 | 4.5; 1 | -0.69 | (-0.82, -0.49) |
| 3 | max_peak_trgh_slope | -0.411; 0.136 | -0.309; 0.063 | 0.69 | (0.48, 0.83) |
| 4 | modulation_rate | 32.833; 4.891 | 26.334; 8.826 | -0.63 | (-0.79, -0.41) |
| 5 | sfm | 0.583; 0.077 | 0.631; 0.082 | 0.58 | (0.35, 0.75) |
| 6 | entropy | 0.834; 0.009 | 0.841; 0.015 | 0.51 | (0.27, 0.69) |
| 7 | sp.ent | 0.938; 0.016 | 0.943; 0.015 | 0.45 | (0.2, 0.64) |
| 8 | freq.IQR | 2.026; 0.292 | 2.377; 0.638 | 0.44 | (0.19, 0.64) |
| 9 | time.Q75 | 0.121; 0.018 | 0.112; 0.013 | -0.34 | (-0.56, -0.08) |
| 10 | dfrange | 4.025; 1.75 | 4.492; 0.846 | 0.32 | (0.06, 0.55) |
| 11 | meanpeakf | 4.083; 0.467 | 3.383; 1.225 | -0.32 | (-0.56, -0.07) |
| 12 | duration | 0.171; 0.013 | 0.165; 0.024 | -0.29 | (-0.52, -0.03) |
| 13 | time.ent | 0.889; 0.002 | 0.89; 0.005 | 0.29 | (0.02, 0.52) |
| 14 | time.IQR | 0.075; 0.013 | 0.07; 0.011 | -0.28 | (-0.52, -0.02) |
| 15 | modindx | 10.339; 4.803 | 9.039; 3.485 | -0.27 | (-0.5, -0.01) |
| 16 | freq.Q25 | 2.914; 0.483 | 2.754; 0.5 | -0.26 | (-0.5, 0.01) |
| 17 | maxdom | 4.958; 0.379 | 5.133; 1.079 | 0.26 | (0, 0.49) |
| 18 | enddom | 3.033; 2.625 | 1.225; 2.158 | -0.26 | (-0.5, 0) |
| 19 | time.median | 0.082; 0.012 | 0.078; 0.011 | -0.25 | (-0.49, 0) |
| 20 | time.Q25 | 0.046; 0.01 | 0.044; 0.006 | -0.24 | (-0.47, 0.02) |
| 21 | peakf | 3.122; 1.37 | 2.871; 0.859 | -0.24 | (-0.48, 0.02) |
| 22 | meandom | 3.479; 0.361 | 3.352; 0.448 | -0.22 | (-0.46, 0.04) |
| 23 | freq.median | 3.961; 0.408 | 3.859; 0.469 | -0.19 | (-0.43, 0.07) |
| 24 | freq.Q75 | 4.962; 0.643 | 5.059; 0.665 | 0.19 | (-0.07, 0.43) |
| 25 | startdom | 2.217; 2.275 | 0.992; 1.575 | -0.17 | (-0.41, 0.09) |
| 26 | kurt | 8.009; 2.276 | 8.364; 3.673 | 0.16 | (-0.1, 0.41) |
| 27 | mindom | 0.758; 1.429 | 0.642; 0.35 | -0.16 | (-0.39, 0.1) |
| 28 | skew | 2.071; 0.342 | 2.054; 0.49 | 0.07 | (-0.19, 0.32) |
| 29 | meanfreq | 4.048; 0.413 | 4.044; 0.482 | -0.06 | (-0.32, 0.2) |
| 30 | dfslope | 2.003; 19.876 | 1.356; 5.679 | -0.04 | (-0.29, 0.23) |
# The 6 acoustic measurements with the largest effect sizes? In either direction
# sd is standard deviation of frequency, similar to freq.IQR
# sfm is spectral flatness, similar to spectral entropy
eff_res_df %>%
arrange(desc(abs(cliff_delta))) %>%
slice(1:6)
## measurement cliff_delta lower_95_CI upper_95_CI
## 1 sd 0.781250 0.60625 0.89125
## 2 num_peaks -0.693125 -0.82500 -0.49375
## 3 max_peak_trgh_slope 0.691250 0.47875 0.83375
## 4 modulation_rate -0.633750 -0.78750 -0.41250
## 5 sfm 0.576250 0.34750 0.74750
## 6 entropy 0.508750 0.26625 0.69375
topms <- eff_res_df %>%
arrange(desc(abs(cliff_delta))) %>%
slice(1:6) %>%
pull(measurement) %>%
as.character()
topms
## [1] "sd" "num_peaks" "max_peak_trgh_slope"
## [4] "modulation_rate" "sfm" "entropy"
A raincloud plot for acoustic measurements that displayed the largest effect sizes of range.
# Filter the measurements data frame by these variables, and rename these
ameas_df_tmp <- ameas_df %>%
# Convert to long format
pivot_longer(
cols = names(.)[-grep("^range$|^sound.files$", names(.))],
names_to = "measurement",
values_to = "values"
) %>%
filter(measurement %in% topms) %>%
dplyr::mutate(
measurement_nm = measurement,
# Add new line symbols to all measurement names
measurement_nm = recode(
measurement_nm,
`sd` = "Standard\n deviation\n of frequency",
`num_peaks` = "Number of\n peaks",
`max_peak_trgh_slope` = "Peak - trough\n slope",
`modulation_rate` = "Modulation\n rate",
`sfm` = "Spectral\n flatness",
`entropy` = "Entropy\n"
),
measurement_nm = factor(measurement_nm)
)
ameas_df_tmp
cols <- scales::alpha(c("navy", "orange"), 0.75)
# Make a list of plots in order of effect sizes
# Doing this to customize y-axes of each plot, which isn't possible for plots in a single row with facetscales
gg_list <- list()
i <- 1
invisible(pblapply(1:length(topms), function(i){
tmp_df <- ameas_df_tmp %>%
filter(measurement == topms[i]) %>%
dplyr::mutate(
measurement_nm = as.character(measurement_nm),
measurement_nm = factor(measurement_nm, levels = unique(measurement_nm))
)
# Get ymin and ymax values, add a buffer of 2 evenly space values before and after
ymin <- min(tmp_df$values)
ymax <- max(tmp_df$values)
# Get the difference between max and min, use this as a buffer on the y-axis scale
buf <- (ymax - ymin)/5
# If on the first plot, add the y-axis label
if(i == 1){
yal <- "Values"
} else{
yal <- ""
}
# Initialize plot margins (top, right, bottom, left)
tm <- 1
rm <- 0
bm <- 1
lm <- 0
# If on the first plot, make the left margin larger
if(i == 1){
lm <- 0.5
# If on last plot, make the right margin larger
} else if(i == length(topms)){
rm <- 0.5
}
# Raincloud plot: Kernel smoothed with boxplot
gg <- ggplot(tmp_df, aes(x = range, y = values, fill = range, color = range)) +
geom_flat_violin(position = position_nudge(x = 0.1, y = 0), adjust = 1, size = 0.25)+
geom_point(position = position_jitter(width = 0.05), size = 1, stroke = 0.5, alpha = 0.45) +
geom_boxplot(aes(x = as.numeric(range) - 0.22, y = values), outlier.shape = NA, alpha = 0.55, width = 0.2, colour = "black", size = 0.25) +
facet_wrap(~ measurement_nm) +
scale_fill_manual(values = cols) +
scale_color_manual(values = cols) +
guides(fill = FALSE, color = FALSE) +
theme_bw() +
xlab("") + ylab(yal)
# gg
# If on entropy, make the breaks differently to avoid a gap between 0.86 and 0.88
if(topms[i] != "entropy"){
gg <- gg +
scale_y_continuous(limits = round(c(ymin - buf, ymax + buf), 2), breaks = round(seq(ymin - buf, (ymax + buf), (ymax - ymin)/5), 2))
} else {
brks <- round(seq(ymin - buf, (ymax + buf), (ymax - ymin)/5), 2)
brks <- c(brks[-length(brks)], 0.87)
gg <- gg +
scale_y_continuous(limits = round(c(ymin - buf, 0.87), 2), breaks = brks)
}
# Add final aesthetics
gg <- gg +
theme(
axis.title = element_text(size = 14),
axis.text.y = element_text(size = 12),
axis.text.x = element_text(size = 10),
strip.text = element_text(size = 13, margin = margin(0.5, 2, 0.5, 2, "lines")),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
axis.ticks = element_line(size = 0.25),
plot.margin = unit(c(tm, rm, bm, lm), "lines")
)
# gg
# Return the given plot
gg_list[[i]] <<- gg
}))
# Get the legend
cols <- scales::alpha(c("navy", "orange"), 0.85)
gg_leg <- gtable::gtable_filter(ggplot_gtable(ggplot_build(
ameas_df_tmp %>%
ggplot(aes(x = range, y = values)) +
geom_errorbar(aes(ymin = values, ymax = values, color = range), size = 2, width = 0.25) +
scale_color_manual(values = cols) +
guides(color = guide_legend(title = "", override.aes = list(size = 4))) +
theme_bw() +
theme(
legend.text = element_text(size = 20),
legend.position = "top",
legend.margin = margin(0, 0, 0, 0),
legend.box.margin = margin(-10, -10, -10, -10),
legend.key.width = unit(3, "lines")
)
)), "guide-box")
dev.off()
# Write out the legend separately
jpeg(file.path(gpath, "Figure3_AcousticMeasrmnts_leg.jpeg"), units = "in", width = 4, height = 1, res = 300)
grid.draw(gg_leg)
dev.off()
# Arrange plots into a single image file
jpeg(file.path(gpath, "Figure3B_AcousticMeasrmnts.jpeg"), units = "in", width = 11, height = 3.85, res = 300)
ggarrange(
as.ggplot(gg_list[[1]]),
as.ggplot(gg_list[[2]]),
as.ggplot(gg_list[[3]]),
as.ggplot(gg_list[[4]]),
as.ggplot(gg_list[[5]]),
as.ggplot(gg_list[[6]]),
nrow = 1,
widths = rep(2.2, 6)
)
dev.off()
See the main body of the article for this figure.
Frequency quartiles indicate the distribution of energy across the spectrum. Here quartiles are the 3 frequencies that divide the spectrum per call into 4 even intervals. The 25% quartile represents the frequency at which 25% of total energy in the signal is located below the given frequency. Likewise, the 75% quartile represents the frequency that splits off 75% of the total energy of the call.
The interquartile range represents the difference in frequency between these quartiles, a measurement of the frequency spread of the total energy of the signal. Higher IQR in the invasive range indicates that energy is more spread out over the range of frequency than native range calls.
For spectral flatness and spectral entropy, values closer to 1 are noisy signals. Lower spectral flatness and entropy values represent more ordered and informative signals, which ties lower flatness and spectral entropy in native range calls to higher frequency modulation levels.
High background noise can affect frequency quantile measurements and spectral entropy[1], but we pre-processed calls using a strict SNR threshold prior to these analyses, so we did not expect differences in these parameters to arise from differences in background noise between ranges.
Get the invasive range calls set aside for a temporal comparison, as well as native range calls.
calls <- freq_modQ_df %>%
filter(question == "temporal" | range == "Native") %>%
filter(question != "indiv_scale") %>% # exclude repeatedly sampled individual calls
pull(sound.files) %>%
as.character()
length(calls)
## [1] 115
peaks_temp_df <- peaks_trghs_df3 %>%
filter(sound.files %in% calls) %>%
droplevels()
# glimpse(peaks_temp_df)
# 40 native range calls, 75 invasive range calls
peaks_temp_df %>%
group_by(range) %>%
dplyr::summarise(
n_calls = n_distinct(sound.files)
)
## # A tibble: 2 x 2
## range n_calls
## <fct> <int>
## 1 Native 40
## 2 Invasive 75
# How many years are represented in the invasive range? 25 calls from 2004, 30 calls for 2011, and 20 2019 calls
peaks_temp_df %>%
filter(range == "Invasive") %>%
group_by(year) %>%
dplyr::summarise(
n_calls = n_distinct(sound.files)
)
## # A tibble: 3 x 2
## year n_calls
## <fct> <int>
## 1 2004 25
## 2 2011 30
## 3 2019 20
# Invasive spatial areas represented?
# Louisiana and Texas, as expected (New Orleans, Austin)
peaks_temp_df %>%
filter(range == "Invasive") %>%
group_by(region, dept_state) %>%
dplyr::summarise(
n_calls = n_distinct(sound.files)
)
## `summarise()` has grouped output by 'region'. You can override using the `.groups` argument.
## # A tibble: 2 x 3
## # Groups: region [2]
## region dept_state n_calls
## <fct> <fct> <int>
## 1 Southwest Texas 50
## 2 Southeast Louisiana 25
# Which invasive range site-years?
# We recorded repeatedly sampled individuals at some of these sites
# But calls from repeatedly sampled individuals used in the final dataset are not included here (see below)
peaks_temp_df %>%
filter(range == "Invasive") %>%
group_by(site_year) %>%
dplyr::summarise(
n_calls = n_distinct(sound.files)
)
## # A tibble: 15 x 2
## site_year n_calls
## <fct> <int>
## 1 SOCC_2004 5
## 2 BALL_2004 5
## 3 INTR_2011 5
## 4 VALL_2004 5
## 5 CANA_2004 5
## 6 ELEM_2011 5
## 7 ELEM_2019 5
## 8 FOLS_2004 5
## 9 INTR_2019 5
## 10 LAKE_2011 5
## 11 MART_2011 5
## 12 MART_2019 5
## 13 ROBE_2011 5
## 14 SOCC_2019 5
## 15 VALL_2011 5
# City-years
peaks_temp_df %>%
filter(range == "Invasive") %>%
group_by(year, invasive_city) %>%
dplyr::summarise(
n_calls = n_distinct(sound.files)
)
## `summarise()` has grouped output by 'year'. You can override using the `.groups` argument.
## # A tibble: 5 x 3
## # Groups: year [3]
## year invasive_city n_calls
## <fct> <fct> <int>
## 1 2004 Austin 10
## 2 2004 New Orleans 15
## 3 2011 Austin 20
## 4 2011 New Orleans 10
## 5 2019 Austin 20
# Were any calls from known repeatedly sampled individuals used in frequency tracing?
# Yes, but these 2 calls are also from birds that were not retained as repeatedly sampled individuals in the final dataset (not enough calls remained after pre-processing)
unique(peaks_temp_df$sound.files[grep("site_scale", peaks_temp_df$sound.files)])
## [1] 2011_02_15_ELEM_WRIG1013_17_site_scale.WAV
## [2] 2011_02_15_ELEM_WRIG1013_10_site_scale.WAV
## 115 Levels: 2017_07_11_INES-04_MZ000149_1.WAV ...
# These two calls were from INV-UM3 and INV-UM4, not present in calls for the repeatedly sampled individuals
# nat_inv_est[grep(paste(peaks_temp_df$sound.files[grep("site_scale", peaks_temp_df$sound.files)], collapse= "|"), nat_inv_est$sound.files), ]
# nat_inv_est %>%
# filter(social_scale == "Individual") %>%
# pull(Bird_ID) %>%
# unique()
Calculated frequency modulation measurements per call.
# Get the number of peaks per call
num_peaks_tdf <- peaks_temp_df %>%
# Create a range-year column
dplyr::mutate(
range_year = paste(range, year, sep = "_")
) %>%
# Find the number of peaks per call
group_by(range_year, sound.files) %>%
dplyr::summarise(
num_peaks = length(peak_number)
) %>%
ungroup() %>%
dplyr::select(c(sound.files, num_peaks))
## `summarise()` has grouped output by 'range_year'. You can override using the `.groups` argument.
glimpse(num_peaks_tdf)
## Rows: 115
## Columns: 2
## $ sound.files <fct> SBD_NewOrleans.1013.0143_resamp.WAV, SBD_Austin.1001.0192_…
## $ num_peaks <int> 5, 4, 4, 3, 3, 4, 5, 4, 5, 3, 6, 5, 5, 5, 5, 5, 5, 5, 4, 4…
# Modulation rate per call
modRate_tdf <- peaks_temp_df %>%
# Create a range-year column
dplyr::mutate(
range_year = paste(range, year, sep = "_")
) %>%
# Calculate duration per call
inner_join(
freq_mod_df %>%
dplyr::mutate(
duration = end - start
) %>%
dplyr::select(sound.files, duration),
by = "sound.files"
) %>%
# Calculate modulation rate value per call
group_by(range_year, sound.files, duration) %>%
dplyr::summarise(
num_peaks = length(peak_number)
) %>%
ungroup() %>%
dplyr::mutate(
modulation_rate = (num_peaks/duration) # units for rate of change = peaks/second
) %>%
dplyr::select(c(sound.files, modulation_rate))
## `summarise()` has grouped output by 'range_year', 'sound.files'. You can override using the `.groups` argument.
glimpse(modRate_tdf)
## Rows: 115
## Columns: 2
## $ sound.files <chr> "SBD_Austin.1001.0192_resamp.WAV", "SBD_Austin.1001.02…
## $ modulation_rate <dbl> 27.81052, 25.32145, 19.28747, 16.71678, 21.47669, 27.2…
# Maximum peak - trough slope per call
peakTroughSlope_tdf <- peaks_temp_df %>%
# Create a range-year column
dplyr::mutate(
range_year = paste(range, year, sep = "_")
) %>%
# Drop peaks that were not matched to troughs
filter(!is.na(peak_trgh_slope)) %>%
# Get maximum (absolute) peak-trough range per call
group_by(range, sound.files) %>%
dplyr::summarise(
max_peak_trgh_slope = min(peak_trgh_slope)
) %>%
ungroup() %>%
dplyr::select(sound.files, max_peak_trgh_slope)
## `summarise()` has grouped output by 'range'. You can override using the `.groups` argument.
glimpse(peakTroughSlope_tdf)
## Rows: 115
## Columns: 2
## $ sound.files <fct> 2017_07_11_INES-04_MZ000149_1.WAV, 2017_07_11_INES…
## $ max_peak_trgh_slope <dbl> -0.4120342, -0.4661997, -0.2942515, -0.4826950, -0…
Combined these frequency modulation measurements with the 27 standard acoustic measurements used in machine learning (see previous script, prior to filtering for collinearity). 115 calls total across years.
acoustic_measuremnts <- read.csv(file.path(path, "acoustic_parameters.csv")) %>%
dplyr::select(-c(selec))
glimpse(acoustic_measuremnts)
## Rows: 1,596
## Columns: 28
## $ sound.files <chr> "2017_07_17_EMBR_INDIV-RAW_MZ000199_1.WAV", "2017_07_17_EM…
## $ duration <dbl> 0.2146363, 0.1810324, 0.2009257, 0.1878781, 0.1709179, 0.1…
## $ meanfreq <dbl> 4.261767, 4.207837, 4.300952, 4.449192, 4.737341, 4.649069…
## $ sd <dbl> 1.745325, 1.934379, 1.751721, 1.747474, 1.874080, 1.903127…
## $ freq.median <dbl> 4.106908, 3.744148, 4.157787, 4.316218, 4.539256, 4.525018…
## $ freq.Q25 <dbl> 3.114309, 2.799090, 2.933548, 3.283657, 3.263085, 3.113488…
## $ freq.Q75 <dbl> 4.992325, 5.065020, 5.018735, 5.242329, 6.113981, 5.967368…
## $ freq.IQR <dbl> 1.878015, 2.265930, 2.085187, 1.958673, 2.850895, 2.853880…
## $ time.median <dbl> 0.09390873, 0.09818989, 0.09419643, 0.08540507, 0.07958632…
## $ time.Q25 <dbl> 0.04829592, 0.05314866, 0.04844388, 0.04225304, 0.04250633…
## $ time.Q75 <dbl> 0.1466765, 0.1396278, 0.1462287, 0.1420421, 0.1175707, 0.1…
## $ time.IQR <dbl> 0.09838057, 0.08647917, 0.09778486, 0.09978908, 0.07506437…
## $ skew <dbl> 2.312270, 2.415746, 1.805732, 2.205839, 1.796657, 1.766370…
## $ kurt <dbl> 9.950427, 11.207737, 6.606223, 9.014162, 7.522156, 7.14385…
## $ sp.ent <dbl> 0.9416864, 0.9436562, 0.9395492, 0.9406724, 0.9496999, 0.9…
## $ time.ent <dbl> 0.8827616, 0.8870927, 0.8842926, 0.8860298, 0.8891064, 0.8…
## $ entropy <dbl> 0.8312847, 0.8371105, 0.8308364, 0.8334637, 0.8443843, 0.8…
## $ sfm <dbl> 0.6051624, 0.6396942, 0.5738982, 0.5991476, 0.6141410, 0.6…
## $ meandom <dbl> 3.728250, 3.458416, 3.490407, 3.845556, 4.110351, 3.755370…
## $ mindom <dbl> 2.275000, 0.875000, 0.875000, 2.275000, 2.391667, 0.525000…
## $ maxdom <dbl> 4.841667, 5.075000, 5.425000, 5.075000, 7.291667, 7.291667…
## $ dfrange <dbl> 2.566667, 4.200000, 4.550000, 2.800000, 4.900000, 6.766667…
## $ modindx <dbl> 16.863636, 7.194444, 11.307692, 18.333333, 12.928571, 9.91…
## $ startdom <dbl> 4.608333, 1.225000, 0.875000, 3.791667, 5.191667, 5.541667…
## $ enddom <dbl> 3.3250000, 3.2083333, 3.5583333, 3.5583333, 4.6083333, 1.2…
## $ dfslope <dbl> -5.9791080, 10.9556803, 13.3548542, -1.2419399, -3.4129451…
## $ meanpeakf <dbl> 4.141667, 3.325000, 4.258333, 4.375000, 4.608333, 2.508333…
## $ peakf <dbl> 4.14631312, 3.24191609, 4.05568720, 4.30036206, 3.12994163…
ameast_df <- acoustic_measuremnts %>%
# Add back the range column
inner_join(
nat_inv_est %>%
as_tibble() %>%
dplyr::select(range, year, sound.files),
by = "sound.files"
) %>%
# Create a range-year column
dplyr::mutate(
range_year = paste(range, year, sep = "_")
) %>%
dplyr::select(-c(year)) %>%
# Filter by calls used in the frequency modulation analysis
filter(sound.files %in% peaks_temp_df$sound.files) %>%
# Join with the frequency modulation measurements
full_join(
num_peaks_tdf,
by = "sound.files"
) %>%
full_join(
modRate_tdf,
by = "sound.files"
) %>%
full_join(
peakTroughSlope_tdf,
by = "sound.files"
)
glimpse(ameast_df)
## Rows: 115
## Columns: 33
## $ sound.files <chr> "2017_05_21_PLVE_1005_2.WAV", "2017_05_21_PLVE_100…
## $ duration <dbl> 0.1561354, 0.1656367, 0.1796856, 0.1542976, 0.1794…
## $ meanfreq <dbl> 4.151223, 3.808729, 4.141827, 3.912671, 3.872825, …
## $ sd <dbl> 1.877118, 1.537948, 1.599546, 1.543469, 1.466119, …
## $ freq.median <dbl> 4.080633, 3.790128, 4.012443, 3.942792, 3.593443, …
## $ freq.Q25 <dbl> 2.709872, 2.914773, 2.932547, 2.762777, 2.935738, …
## $ freq.Q75 <dbl> 5.086285, 4.375710, 4.769483, 4.746758, 4.351475, …
## $ freq.IQR <dbl> 2.376413, 1.460938, 1.836935, 1.983982, 1.415738, …
## $ time.median <dbl> 0.06718847, 0.07332887, 0.08759500, 0.07807336, 0.…
## $ time.Q25 <dbl> 0.03359424, 0.04254885, 0.05327944, 0.03812885, 0.…
## $ time.Q75 <dbl> 0.1153099, 0.1086354, 0.1182984, 0.1152944, 0.1253…
## $ time.IQR <dbl> 0.08171571, 0.06608651, 0.06501897, 0.07716553, 0.…
## $ skew <dbl> 2.470123, 2.159739, 2.303599, 1.797583, 2.263534, …
## $ kurt <dbl> 11.029776, 7.635150, 8.583492, 6.169945, 8.442140,…
## $ sp.ent <dbl> 0.9507935, 0.9196218, 0.9271719, 0.9265415, 0.9097…
## $ time.ent <dbl> 0.8926133, 0.8900954, 0.8870764, 0.8930942, 0.8872…
## $ entropy <dbl> 0.8486909, 0.8185512, 0.8224723, 0.8274888, 0.8072…
## $ sfm <dbl> 0.6920527, 0.5146865, 0.5469261, 0.5295827, 0.4444…
## $ meandom <dbl> 3.647351, 3.381410, 3.461013, 3.669542, 3.383333, …
## $ mindom <dbl> 2.2750000, 0.5250000, 0.8750000, 1.9250000, 0.5250…
## $ maxdom <dbl> 4.958333, 4.491667, 5.075000, 5.191667, 4.375000, …
## $ dfrange <dbl> 2.683333, 3.966667, 4.200000, 3.266667, 3.850000, …
## $ modindx <dbl> 14.347826, 5.911765, 10.055556, 11.321429, 8.36363…
## $ startdom <dbl> 2.3916667, 2.7416667, 4.3750000, 2.2750000, 0.5250…
## $ enddom <dbl> 3.3250000, 0.5250000, 3.2083333, 4.7250000, 3.0916…
## $ dfslope <dbl> 5.9777166, -13.3826988, -6.4928236, 15.8784059, 14…
## $ meanpeakf <dbl> 4.258333, 4.025000, 4.025000, 4.608333, 3.208333, …
## $ peakf <dbl> 4.245433, 2.879236, 4.101161, 4.354276, 3.080760, …
## $ range <fct> Native, Native, Native, Native, Native, Native, Na…
## $ range_year <chr> "Native_2017", "Native_2017", "Native_2017", "Nati…
## $ num_peaks <int> 5, 7, 7, 5, 6, 6, 5, 7, 6, 7, 6, 6, 6, 6, 6, 4, 5,…
## $ modulation_rate <dbl> 32.02348, 42.26115, 38.95694, 32.40491, 33.43053, …
## $ max_peak_trgh_slope <dbl> -0.2642049, -0.5134121, -0.4015570, -0.4108509, -0…
I did not assess normality per acoustic measurement per range-year, since I proceeded with just summary statistics for the temporal analysis.
Calculate median and interquartile range per acoustic measurement per range-year.
ameast_ss <- ameast_df %>%
dplyr::select(-c(sound.files)) %>%
dplyr::mutate(
range = factor(range, levels = c("Native", "Invasive"))
) %>%
# Convert to long format
pivot_longer(
cols = names(.)[-grep("^range$|^range_year$|^sound.files$", names(.))],
names_to = "measurement",
values_to = "values"
) %>%
group_by(measurement, range_year) %>%
dplyr::summarise(
median = round(median(values), 2),
IQR = round(IQR(values), 2)
) %>%
# Convert to long format again
pivot_longer(
cols = c(median, IQR),
names_to = "statistic",
values_to = "values"
) %>%
# Make wider to split out the mean and se columns
pivot_wider(
names_from = c(statistic, range_year),
values_from = values
) %>%
ungroup()
## `summarise()` has grouped output by 'measurement'. You can override using the `.groups` argument.
ameast_ss %>%
kable()
| measurement | median_Invasive_2004 | IQR_Invasive_2004 | median_Invasive_2011 | IQR_Invasive_2011 | median_Invasive_2019 | IQR_Invasive_2019 | median_Native_2017 | IQR_Native_2017 |
|---|---|---|---|---|---|---|---|---|
| dfrange | 4.08 | 0.58 | 4.90 | 1.22 | 4.49 | 0.55 | 4.03 | 1.75 |
| dfslope | 8.47 | 16.86 | 0.00 | 15.03 | -1.50 | 6.07 | 2.00 | 19.88 |
| duration | 0.16 | 0.02 | 0.17 | 0.02 | 0.16 | 0.01 | 0.17 | 0.01 |
| enddom | 2.51 | 2.92 | 0.99 | 1.98 | 0.82 | 0.47 | 3.03 | 2.62 |
| entropy | 0.85 | 0.01 | 0.84 | 0.01 | 0.84 | 0.01 | 0.83 | 0.01 |
| freq.IQR | 2.26 | 0.48 | 2.52 | 0.60 | 2.10 | 0.53 | 2.03 | 0.29 |
| freq.median | 3.71 | 0.22 | 4.00 | 0.47 | 3.99 | 0.31 | 3.96 | 0.41 |
| freq.Q25 | 2.67 | 0.31 | 2.85 | 0.46 | 2.79 | 0.51 | 2.91 | 0.48 |
| freq.Q75 | 4.95 | 0.50 | 5.28 | 0.70 | 4.99 | 0.44 | 4.96 | 0.64 |
| kurt | 6.89 | 5.86 | 7.10 | 4.13 | 7.87 | 2.64 | 8.01 | 2.28 |
| max_peak_trgh_slope | -0.32 | 0.09 | -0.30 | 0.15 | -0.31 | 0.07 | -0.41 | 0.14 |
| maxdom | 4.72 | 0.58 | 5.60 | 1.46 | 5.13 | 0.35 | 4.96 | 0.38 |
| meandom | 3.28 | 0.34 | 3.44 | 0.66 | 3.50 | 0.29 | 3.48 | 0.36 |
| meanfreq | 3.96 | 0.35 | 4.15 | 0.45 | 3.98 | 0.34 | 4.05 | 0.41 |
| meanpeakf | 3.21 | 1.28 | 3.67 | 1.25 | 4.38 | 0.96 | 4.08 | 0.47 |
| mindom | 0.64 | 0.35 | 0.52 | 0.35 | 0.52 | 0.23 | 0.76 | 1.43 |
| modindx | 8.94 | 3.56 | 9.18 | 3.93 | 8.76 | 2.97 | 10.34 | 4.80 |
| modulation_rate | 27.52 | 6.10 | 24.87 | 6.33 | 24.90 | 7.89 | 32.83 | 4.89 |
| num_peaks | 5.00 | 1.00 | 4.00 | 1.00 | 4.00 | 1.25 | 6.00 | 1.00 |
| peakf | 2.86 | 0.84 | 2.69 | 1.62 | 3.35 | 2.42 | 3.12 | 1.37 |
| sd | 1.87 | 0.13 | 1.88 | 0.20 | 1.83 | 0.19 | 1.69 | 0.16 |
| sfm | 0.66 | 0.05 | 0.64 | 0.08 | 0.65 | 0.08 | 0.58 | 0.08 |
| skew | 1.79 | 0.70 | 1.89 | 0.59 | 2.00 | 0.46 | 2.07 | 0.34 |
| sp.ent | 0.95 | 0.01 | 0.95 | 0.01 | 0.95 | 0.01 | 0.94 | 0.02 |
| startdom | 0.99 | 1.52 | 0.99 | 2.86 | 1.11 | 0.85 | 2.22 | 2.28 |
| time.ent | 0.89 | 0.01 | 0.89 | 0.00 | 0.89 | 0.00 | 0.89 | 0.00 |
| time.IQR | 0.07 | 0.01 | 0.08 | 0.01 | 0.07 | 0.01 | 0.08 | 0.01 |
| time.median | 0.07 | 0.01 | 0.08 | 0.02 | 0.07 | 0.01 | 0.08 | 0.01 |
| time.Q25 | 0.04 | 0.01 | 0.04 | 0.01 | 0.04 | 0.01 | 0.05 | 0.01 |
| time.Q75 | 0.11 | 0.01 | 0.12 | 0.02 | 0.11 | 0.01 | 0.12 | 0.02 |
A raincloud plot for acoustic measurements that displayed the largest effect sizes of range, shown here for range-years in the temporal analysis.
# Filter the measurements data frame by these variables, and rename these
ameast_df_tmp <- ameast_df %>%
# Convert to long format
pivot_longer(
cols = names(.)[-grep("^range$|^range_year$|^sound.files$", names(.))],
names_to = "measurement",
values_to = "values"
) %>%
filter(measurement %in% topms) %>%
dplyr::mutate(
measurement_nm = measurement,
# Add new line symbols to all measurement names
measurement_nm = recode(
measurement_nm,
`sd` = "Standard\n deviation\n of frequency",
`num_peaks` = "Number of\n peaks",
`max_peak_trgh_slope` = "Peak - trough\n slope",
`modulation_rate` = "Modulation\n rate",
`sfm` = "Spectral\n flatness",
`entropy` = "Entropy"
),
measurement_nm = factor(measurement_nm),
range_year = factor(range_year, levels = c("Native_2017", "Invasive_2004", "Invasive_2011", "Invasive_2019"))
)
ameast_df_tmp
cols <- scales::alpha(c("navy", rep("orange", 3)), 0.75)
# Make a list of plots in order of effect sizes
# Doing this to customize y-axes of each plot, which isn't possible for plots in a single row with facetscales
gg_list <- list()
# i <- 2
invisible(pblapply(1:length(topms), function(i){
tmp_df <- ameast_df_tmp %>%
filter(measurement == topms[i]) %>%
dplyr::mutate(
measurement_nm = as.character(measurement_nm),
measurement_nm = factor(measurement_nm, levels = unique(measurement_nm))
)
# Get ymin and ymax values, add a buffer of 2 evenly space values before and after
ymin <- min(tmp_df$values)
ymax <- max(tmp_df$values)
# Get the difference between max and min, use this as a buffer on the y-axis scale
buf <- (ymax - ymin)/5
# If on the first plot, add the y-axis label
if(i == 1){
yal <- "Values"
} else{
yal <- ""
}
# Initialize plot margins (top, right, bottom, left)
tm <- 1
rm <- 0
bm <- 1
lm <- 0
# If on the first plot, make the left margin larger
if(i == 1){
lm <- 0.5
# If on last plot, make the right margin larger
} else if(i == length(topms)){
rm <- 0.5
}
# Raincloud plot: Kernel smoothed with boxplot
gg <- ggplot(tmp_df, aes(x = range_year, y = values, fill = range_year, color = range_year)) +
geom_flat_violin(position = position_nudge(x = 0.1, y = 0), adjust = 1, size = 0.15) +
geom_point(position = position_jitter(width = 0.05), size = 0.75, stroke = 0.5, alpha = 0.45) +
geom_boxplot(aes(x = as.numeric(range_year) - 0.22, y = values), outlier.shape = NA, alpha = 0.55, width = 0.15, colour = "black", size = 0.25) +
facet_wrap(~ measurement_nm) +
scale_fill_manual(values = cols) +
scale_color_manual(values = cols) +
guides(fill = FALSE, color = FALSE) +
theme_bw() +
xlab("") + ylab(yal) +
scale_y_continuous(limits = round(c(ymin - buf, ymax + buf), 2), breaks = round(seq(ymin - buf, (ymax + buf), (ymax - ymin)/5), 2)) +
theme(
axis.title = element_text(size = 14),
axis.text.y = element_text(size = 14),
axis.text.x = element_text(size = 14, angle = 35, hjust = 1),
strip.text = element_text(size = 16, margin = margin(0.5, 2, 0.5, 2, "lines")),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
axis.ticks = element_line(size = 0.25),
plot.margin = unit(c(tm, rm, bm, lm), "lines")
)
# gg
# Return the given plot
gg_list[[i]] <<- gg
}))
# Get the legend
cols <- scales::alpha(c("navy", "orange"), 0.85)
gg_leg <- gtable::gtable_filter(ggplot_gtable(ggplot_build(
ameast_df_tmp %>%
ggplot(aes(x = range, y = values)) +
geom_errorbar(aes(ymin = values, ymax = values, color = range), size = 2, width = 0.25) +
scale_color_manual(values = cols) +
guides(color = guide_legend(title = "", override.aes = list(size = 2.5))) +
theme_bw() +
theme(
legend.text = element_text(size = 16),
legend.position = "top",
legend.margin = margin(0, 0, 0, 0),
legend.box.margin = margin(-10, -10, -10, -10),
legend.key.width = unit(3, "lines")
)
)), "guide-box")
dev.off()
# Arrange plots into a single image file
jpeg(file.path(gpath, "SupplementaryFigure2_TemporalAcousticMeasrmnts.jpeg"), units = "in", width = 11, height = 11, res = 300)
ggarrange(
as.ggplot(gg_leg),
as.ggplot(ggarrange(
as.ggplot(gg_list[[1]]),
as.ggplot(gg_list[[2]]),
as.ggplot(gg_list[[3]]),
as.ggplot(gg_list[[4]]),
as.ggplot(gg_list[[5]]),
as.ggplot(gg_list[[6]]),
nrow = 2,
heights = rep(4, 2)
)),
nrow = 2,
heights = c(1, 10)
)
dev.off()
See the appendix for of the associated article this figure, which shows that structural differentiation among years in the invasive range was not comparable to the level of structural differentiation we reported between ranges for these same acoustic measurements.
We performed analyses to address whether habitat differences between ranges influenced the structure of contact calls. Native range birds were largely recorded in agricultural settings, although we did record birds in the city of Montevideo (living in large green spaces). In the invasive range, birds were largely recorded in cities (although often near green spaces as well).
Four sites were recorded in the city of Montevideo: Bodegas Carrau (BCAR, a vineyard on the edge of Montevideo next to a busy road), la Facultad de Agronomia (FAGR, a department of la Universidad de la Republica with a lot of trees and gardens), CEME (el Cementerio Central, a green space in the middle of Montevideo, far from agricultural areas), and GOLF (el Club de Golf, a green space in the middle of Montevideo, also far from agricultural areas).
24 native range calls were misclassified as invasive range calls during random forests prediction back to range. The only native range “urban” sites represented in the misclassified calls were BCAR and FAGR, with 1 out of 5 and 1 out of 2 calls misclassified during random forests prediction. 10 other agro-interface sites had anywhere from 1 to 6 calls misclassified during this analysis.
CEME and GOLF had 100% correct classification back to the native range (2 and 7 calls, respectively) during random forests prediction.
This indicates that at least in the native range, there was no clear pattern of urban native range calls being misclassified as invasive range calls (which were mostly recorded in urban areas). If there were a significant difference in call structure between habitats, we would expect to see more native range urban calls misclassified to the invasive range.
See SimplerSignatures_AdditionalMaterials_02_AcousticStructure_SupervisedML.Rmd and the associated RMarkdown output for more information.
For random forests analysis, a subset of calls per site were employed for the prediction step. Here I decided to look more closely at structural differences among native and invasive range habitats using all calls per site.
I picked 4 invasive range urban sites from the city of Austin recorded in 2019 (SOCC, MART, INTR, ELEM) and 4 native range non-urban sites recorded in Colonia, Uruguay in 2017 (PIED, INES-08, LENA, INES-03), to match site sample sizes for native range urban sites.
Given the number of calls recorded at invasive range sites, there are more invasive urban calls than in either habitat category for the native range.
nu_sites <- c("BCAR", "FAGR", "GOLF", "CEME")
iu_sites <- c("SOCC", "MART", "INTR", "ELEM")
nnu_sites <- c("PIED", "INES-08", "LENA", "INES-03")
habitat_df <- nat_inv_est %>%
as_tibble() %>%
filter(social_scale == "Site") %>%
filter(year == "2019" | year == "2017") %>%
filter(site %in% c(nu_sites, iu_sites, nnu_sites)) %>%
dplyr::mutate(
habitat = site,
habitat = gsub("BCAR|FAGR|GOLF|CEME", "Native urban", habitat),
habitat = gsub("SOCC|MART|INTR|ELEM", "Invasive urban", habitat),
habitat = gsub("PIED|INES-08|LENA|INES-03", "Native agro-interface", habitat)
)
glimpse(habitat_df)
## Rows: 416
## Columns: 27
## $ sound.files <chr> "2017_07_11_INES-03_1068_1.WAV", "2017_07_11_INES…
## $ selec <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ start <dbl> 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0…
## $ end <dbl> 0.2476383, 0.2166655, 0.2379962, 0.2158367, 0.242…
## $ date <date> 2017-07-11, 2017-07-11, 2017-07-11, 2017-07-11, …
## $ range <fct> Native, Native, Native, Native, Native, Native, N…
## $ social_scale <chr> "Site", "Site", "Site", "Site", "Site", "Site", "…
## $ site <chr> "INES-03", "INES-03", "INES-03", "INES-03", "INES…
## $ Bird_ID <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ lat <dbl> -34.33610, -34.33610, -34.33610, -34.33610, -34.3…
## $ lon <dbl> -57.66821, -57.66821, -57.66821, -57.66821, -57.6…
## $ Visual_quality_score <fct> H, H, H, H, H, H, H, H, H, H, H, H, H, H, H, H, H…
## $ Overlapping_signal <fct> N, N, N, N, N, N, Y, Y, N, N, N, N, N, Y, N, N, N…
## $ Truncated <chr> "N", "N", "N", "N", "N", "N", "N", "N", "N", "N",…
## $ tailored <fct> y, y, y, y, y, y, y, y, y, y, y, y, y, y, y, y, y…
## $ prev_SNR <dbl> 22.77419, 30.64710, 28.94328, 21.34350, 30.88052,…
## $ SNR_margin_works <chr> "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y",…
## $ SNR_before_only <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ old.sound.file.name <chr> "2017_07_11_INES-03_1068_1.WAV", "2017_07_11_INES…
## $ SNR <dbl> 27.42401, 28.77102, 34.18083, 21.63641, 28.61547,…
## $ region <chr> "West", "West", "West", "West", "West", "West", "…
## $ country <chr> "Uruguay", "Uruguay", "Uruguay", "Uruguay", "Urug…
## $ site_year <chr> "INES-03_2017", "INES-03_2017", "INES-03_2017", "…
## $ year <chr> "2017", "2017", "2017", "2017", "2017", "2017", "…
## $ dept_state <chr> "Colonia", "Colonia", "Colonia", "Colonia", "Colo…
## $ invasive_city <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ habitat <chr> "Native agro-interface", "Native agro-interface",…
# Sample sizes by habitat
habitat_df %>%
group_by(habitat) %>%
dplyr::summarise(
n_calls = n()
)
## # A tibble: 3 x 2
## habitat n_calls
## <chr> <int>
## 1 Invasive urban 286
## 2 Native agro-interface 82
## 3 Native urban 48
What was the maximum number of calls recorded at any of these native range sites? 27 calls was the maximum recorded.
habitat_df %>%
filter(range == "Native") %>%
group_by(habitat, site) %>%
dplyr::summarise(
n_calls = n()
)
## `summarise()` has grouped output by 'habitat'. You can override using the `.groups` argument.
## # A tibble: 8 x 3
## # Groups: habitat [2]
## habitat site n_calls
## <chr> <chr> <int>
## 1 Native agro-interface INES-03 15
## 2 Native agro-interface INES-08 27
## 3 Native agro-interface LENA 19
## 4 Native agro-interface PIED 21
## 5 Native urban BCAR 13
## 6 Native urban CEME 6
## 7 Native urban FAGR 7
## 8 Native urban GOLF 22
Which invasive range sites here had more than 27 calls? All sites.
habitat_df %>%
filter(range == "Invasive") %>%
group_by(habitat, site) %>%
dplyr::summarise(
n_calls = n()
)
## `summarise()` has grouped output by 'habitat'. You can override using the `.groups` argument.
## # A tibble: 4 x 3
## # Groups: habitat [1]
## habitat site n_calls
## <chr> <chr> <int>
## 1 Invasive urban ELEM 61
## 2 Invasive urban INTR 82
## 3 Invasive urban MART 50
## 4 Invasive urban SOCC 93
20 calls were randomly sampled from each of these invasive range sites, so that sample sizes would be more balanced for effect size calculations across habitat types.
set.seed(seed)
habitat_dfr <- habitat_df %>%
filter(site %in% iu_sites) %>%
droplevels() %>%
# Random sampling by site
group_by(site) %>%
nest() %>%
ungroup() %>%
dplyr::mutate(
rsamp_calls = purrr::map2(data, 20, sample_n, replace = FALSE)
) %>%
dplyr::select(-data) %>%
unnest(rsamp_calls) %>%
# Add back the other two habitat categories
bind_rows(
habitat_df %>%
filter(!site %in% iu_sites) %>%
droplevels()
)
# Check sample sizes by site, looks good
habitat_dfr %>%
group_by(habitat, site) %>%
dplyr::summarise(
n_calls = n()
)
## `summarise()` has grouped output by 'habitat'. You can override using the `.groups` argument.
## # A tibble: 12 x 3
## # Groups: habitat [3]
## habitat site n_calls
## <chr> <chr> <int>
## 1 Invasive urban ELEM 20
## 2 Invasive urban INTR 20
## 3 Invasive urban MART 20
## 4 Invasive urban SOCC 20
## 5 Native agro-interface INES-03 15
## 6 Native agro-interface INES-08 27
## 7 Native agro-interface LENA 19
## 8 Native agro-interface PIED 21
## 9 Native urban BCAR 13
## 10 Native urban CEME 6
## 11 Native urban FAGR 7
## 12 Native urban GOLF 22
# Median and range of calls per site
habitat_dfr %>%
group_by(site) %>%
dplyr::summarise(
n_calls = n()
) %>%
dplyr::summarise(
median_calls = median(n_calls),
range_calls = paste(range(n_calls), collapse = " - ")
)
## # A tibble: 1 x 2
## median_calls range_calls
## <dbl> <chr>
## 1 20 6 - 27
# Check sample sizes by habitat, looks good
habitat_dfr %>%
group_by(habitat) %>%
dplyr::summarise(
n_calls = n()
)
## # A tibble: 3 x 2
## habitat n_calls
## <chr> <int>
## 1 Invasive urban 80
## 2 Native agro-interface 82
## 3 Native urban 48
Principal components analysis was performed to summarize acoustic parameters into linear principal components during feature extraction for supervised machine learning. Using previous PCA results so as to evaluate habitat differences in the same acoustic space as the full dataset of calls between ranges.
# Image features were obtained for a complementary analysis, not used here
# Obtained just principal components representing the standard set of acoustic parameters above
pca_feats <- read.csv(file.path(path, "supervised_RF_datasets_acousticSimilarity.csv")) %>%
dplyr::select(-c(names(.)[grep("rf_set|img_params|SPCC|DTW|cep", names(.))]))
dim(pca_feats)
## [1] 1596 39
glimpse(pca_feats)
## Rows: 1,596
## Columns: 39
## $ sound.files <chr> "2017_07_17_EMBR_INDIV-RAW_MZ000199_1.WAV", "2017_07…
## $ date <chr> "2017-07-17", "2017-07-17", "2017-07-17", "2017-07-1…
## $ range <chr> "Native", "Native", "Native", "Native", "Native", "N…
## $ social_scale <chr> "Individual", "Individual", "Individual", "Individua…
## $ site <chr> "EMBR", "EMBR", "EMBR", "EMBR", "EMBR", "EMBR", "EMB…
## $ Bird_ID <chr> "NAT-RAW", "NAT-RAW", "NAT-RAW", "NAT-RAW", "NAT-ZW8…
## $ region <chr> "West", "West", "West", "West", "West", "West", "Wes…
## $ country <chr> "Uruguay", "Uruguay", "Uruguay", "Uruguay", "Uruguay…
## $ site_year <chr> "EMBR_2017", "EMBR_2017", "EMBR_2017", "EMBR_2017", …
## $ year <int> 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017…
## $ dept_state <chr> "Colonia", "Colonia", "Colonia", "Colonia", "Colonia…
## $ invasive_city <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ acs_params_PCA_1 <dbl> 5.9885332, 2.0683667, 3.9787413, 4.3710869, -0.43713…
## $ acs_params_PCA_2 <dbl> -1.63804315, 0.03891284, -1.00282905, -2.07614285, -…
## $ acs_params_PCA_3 <dbl> 2.03156634, 2.47540887, 2.43952602, 0.31692257, -0.4…
## $ acs_params_PCA_4 <dbl> -0.16409658, 1.95677637, -1.51760292, -0.67394465, -…
## $ acs_params_PCA_5 <dbl> 0.68805443, -1.23680222, -1.62094013, -0.01851360, -…
## $ acs_params_PCA_6 <dbl> 1.01881183, -1.39264059, -1.52794591, 1.71705068, 1.…
## $ acs_params_PCA_7 <dbl> 0.96469499, -0.57166081, 0.02975126, 1.09039057, -1.…
## $ acs_params_PCA_8 <dbl> -0.67114660, 0.81857321, -0.45827139, -0.81428595, -…
## $ acs_params_PCA_9 <dbl> 0.33815298, -0.61494962, 0.26448579, 0.52574510, -0.…
## $ acs_params_PCA_10 <dbl> -0.07929202, 0.80063497, 0.06309654, -0.24800356, 0.…
## $ acs_params_PCA_11 <dbl> 1.1576928, 1.4581693, 0.6292717, 0.3026599, -1.27259…
## $ acs_params_PCA_12 <dbl> -0.0001338046, 0.5145824323, 0.7239512929, 0.2639849…
## $ acs_params_PCA_13 <dbl> 0.240945995, -0.460497838, -0.733393218, -0.88481842…
## $ acs_params_PCA_14 <dbl> -0.189424566, 0.503279858, 0.125130993, 0.550569621,…
## $ acs_params_PCA_15 <dbl> 0.61256426, 0.42286280, 0.65426949, 1.00034595, -0.2…
## $ acs_params_PCA_16 <dbl> -0.06222089, -0.27997128, -0.20473404, 0.07761578, -…
## $ acs_params_PCA_17 <dbl> 0.11750926, 0.38298396, -0.40083812, -0.28566724, 0.…
## $ acs_params_PCA_18 <dbl> 0.132192321, 0.072859437, 0.322546799, -0.120827960,…
## $ acs_params_PCA_19 <dbl> 0.180281107, -0.151500881, -0.038306170, 0.230016806…
## $ acs_params_PCA_20 <dbl> 0.174729821, -0.070735230, -0.461659616, -0.05690037…
## $ acs_params_PCA_21 <dbl> -0.096562879, -0.043733452, -0.035746983, -0.1414701…
## $ acs_params_PCA_22 <dbl> 2.217414e-01, 2.627062e-01, 2.135996e-01, 6.690039e-…
## $ acs_params_PCA_23 <dbl> -0.601176618, 0.220473691, -0.246015159, 0.053751392…
## $ acs_params_PCA_24 <dbl> 0.02887520, -0.13341300, 0.06532388, 0.05138502, 0.0…
## $ acs_params_PCA_25 <dbl> 0.023057755, 0.020047637, 0.075901896, 0.067463045, …
## $ acs_params_PCA_26 <dbl> -1.049769e-03, 1.049868e-03, -1.696452e-05, -9.01268…
## $ acs_params_PCA_27 <dbl> 1.776357e-15, 2.131628e-14, 1.154632e-14, 1.021405e-…
# Write this filtered .csv out for data sharing. Use the following .csv to reproduce most results below
# write.csv(pca_feats, file.path(path, "PCA_spectral_acoustic_measurements.csv"), row.names = FALSE)
# Loadings on the first principal component (PC)?
pp_pca <- readRDS(file.path(path, "PCA_results_Site_acs_params.RDS"))
str(pp_pca)
## List of 5
## $ sdev : num [1:27] 2.49 2.4 2 1.62 1.35 ...
## $ rotation: num [1:27, 1:27] 0.2036 -0.1723 -0.3044 -0.1423 -0.0518 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:27] "duration" "meanfreq" "sd" "freq.median" ...
## .. ..$ : chr [1:27] "PC1" "PC2" "PC3" "PC4" ...
## $ center : logi FALSE
## $ scale : logi FALSE
## $ x : num [1:1367, 1:27] 5.347 0.991 6.38 5.825 2.353 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:1367] "98" "99" "100" "101" ...
## .. ..$ : chr [1:27] "PC1" "PC2" "PC3" "PC4" ...
## - attr(*, "class")= chr "prcomp"
# Acoustic parameters that loaded onto the first 2 PCs
pp_pca$rotation[, "PC1"][order(abs(pp_pca$rotation[, "PC1"]), decreasing = TRUE)]
## entropy sfm sp.ent sd freq.IQR freq.Q75
## -0.365155312 -0.337332119 -0.324184933 -0.304423904 -0.267466233 -0.262524879
## time.Q75 time.IQR time.median duration dfrange time.ent
## 0.224571825 0.209819382 0.204904018 0.203649858 -0.192297488 -0.190185831
## meanfreq time.Q25 maxdom freq.median modindx mindom
## -0.172324538 0.161829425 -0.150381998 -0.142325545 0.136822931 0.115786377
## skew enddom kurt meandom startdom freq.Q25
## 0.102608743 0.087996940 0.067925934 -0.054672043 0.053189174 -0.051805943
## peakf dfslope meanpeakf
## 0.027830773 0.022599338 -0.004556612
pp_pca$rotation[, "PC2"][order(abs(pp_pca$rotation[, "PC2"]), decreasing = TRUE)]
## meanfreq freq.median meandom freq.Q25 freq.Q75 time.Q75
## -0.336649960 -0.329303473 -0.305010812 -0.298926477 -0.275884364 -0.252896629
## time.ent duration time.median time.IQR meanpeakf time.Q25
## 0.252335920 -0.248283339 -0.234116340 -0.222477619 -0.216000675 -0.204539705
## maxdom peakf enddom startdom mindom sp.ent
## -0.194459229 -0.166982562 -0.126757674 -0.113831591 -0.107940427 -0.107756233
## sfm modindx dfrange freq.IQR entropy kurt
## -0.091160526 -0.085606731 -0.048086839 -0.038564164 -0.009743798 -0.007963251
## sd skew dfslope
## -0.005504976 -0.005296572 0.003268564
# PCs 1 and 2 explained 44.3% of variation
summary(pp_pca)
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## Standard deviation 2.4940 2.3950 2.0029 1.61618 1.34666 1.29938 1.07556
## Proportion of Variance 0.2304 0.2124 0.1486 0.09674 0.06717 0.06253 0.04285
## Cumulative Proportion 0.2304 0.4428 0.5914 0.68812 0.75529 0.81782 0.86067
## PC8 PC9 PC10 PC11 PC12 PC13 PC14
## Standard deviation 0.86061 0.82710 0.71317 0.6795 0.5693 0.51315 0.4618
## Proportion of Variance 0.02743 0.02534 0.01884 0.0171 0.0120 0.00975 0.0079
## Cumulative Proportion 0.88810 0.91344 0.93227 0.9494 0.9614 0.97113 0.9790
## PC15 PC16 PC17 PC18 PC19 PC20 PC21
## Standard deviation 0.40283 0.31672 0.31417 0.20423 0.19647 0.18151 0.17154
## Proportion of Variance 0.00601 0.00372 0.00366 0.00154 0.00143 0.00122 0.00109
## Cumulative Proportion 0.98504 0.98876 0.99241 0.99396 0.99539 0.99661 0.99770
## PC22 PC23 PC24 PC25 PC26 PC27
## Standard deviation 0.1642 0.13648 0.09868 0.08307 0.002677 1.532e-14
## Proportion of Variance 0.0010 0.00069 0.00036 0.00026 0.000000 0.000e+00
## Cumulative Proportion 0.9987 0.99938 0.99974 1.00000 1.000000 1.000e+00
# Join with the habitat data frame
habitat_pca <- habitat_dfr %>%
dplyr::select(habitat, sound.files, site, year, range) %>%
inner_join(
pca_feats %>%
dplyr::select(sound.files, names(.)[grep("acs_params", names(.))]),
by = "sound.files"
)
# glimpse(habitat_pca)
Acoustic space was visualized with the first two principal components to assess separation of calls in acoustic space by habitat type.
I parameterized density kernels similarly to the random forests main visual, in which contours delineated bins of 1/10th of the total density.
gg_df <- habitat_pca %>%
dplyr::mutate(
habitat = factor(habitat, levels = c("Native agro-interface", "Native urban", "Invasive urban"))
) %>%
dplyr::rename(
`X` = "acs_params_PCA_1",
`Y` = "acs_params_PCA_2"
) %>%
dplyr::select(-c(names(.)[grep("acs_params_PCA_", names(.))]))
h <- 5 # bandwidth (kernel density estimator or KDE width)
# Get density values and bin them to check out bins used by geom_density2d
# Native agro-interface
nat_nu_dens <- kde2d(
x = gg_df %>%
filter(habitat == "Native agro-interface") %>%
pull(X),
y = gg_df %>%
filter(habitat == "Native agro-interface") %>%
pull(Y),
h = h
)
str(nat_nu_dens)
## List of 3
## $ x: num [1:25] -2.475 -2.074 -1.673 -1.272 -0.872 ...
## $ y: num [1:25] -3.58 -3.31 -3.04 -2.77 -2.5 ...
## $ z: num [1:25, 1:25] 0.00388 0.00528 0.00672 0.0081 0.00934 ...
# Native urban
nat_u_dens <- kde2d(
x = gg_df %>%
filter(habitat == "Native urban") %>%
pull(X),
y = gg_df %>%
filter(habitat == "Native urban") %>%
pull(Y),
h = h
)
str(nat_u_dens)
## List of 3
## $ x: num [1:25] -5.11 -4.69 -4.27 -3.85 -3.43 ...
## $ y: num [1:25] -3.08 -2.77 -2.46 -2.15 -1.84 ...
## $ z: num [1:25, 1:25] 0.000426 0.000694 0.001104 0.001698 0.002491 ...
# Inasive urban
inv_u_dens <- kde2d(
x = gg_df %>%
filter(habitat == "Invasive urban") %>%
pull(X),
y = gg_df %>%
filter(habitat == "Invasive urban") %>%
pull(Y),
h = h
)
str(inv_u_dens)
## List of 3
## $ x: num [1:25] -4.78 -4.41 -4.04 -3.67 -3.29 ...
## $ y: num [1:25] -3.18 -2.83 -2.49 -2.14 -1.8 ...
## $ z: num [1:25, 1:25] 0.00471 0.00561 0.0064 0.00706 0.00755 ...
range(nat_nu_dens)
## [1] -3.580786 7.143172
range(nat_u_dens)
## [1] -5.111697 4.964518
range(inv_u_dens)
## [1] -4.782957 5.109649
bw_nat_nu <- max(nat_nu_dens$z)/10
bw_nat_u <- max(nat_u_dens$z)/10
bw_inv_u <- max(inv_u_dens$z)/10
bw_nat_nu
## [1] 0.003090135
bw_nat_u
## [1] 0.003293136
bw_inv_u
## [1] 0.003497231
brks_nat_nu <- seq(min(nat_nu_dens$z), max(nat_nu_dens$z), bw_nat_nu)
brks_nat_u <- seq(min(nat_u_dens$z), max(nat_u_dens$z), bw_nat_u)
brks_inv_u <- seq(min(inv_u_dens$z), max(inv_u_dens$z), bw_inv_u)
# 10 bins each, looks good
length(brks_nat_nu)
## [1] 10
length(brks_nat_u)
## [1] 10
length(brks_inv_u)
## [1] 10
levels(gg_df$habitat)
## [1] "Native agro-interface" "Native urban" "Invasive urban"
cols <- scales::alpha(c("navy", "dodgerblue", "orange"), 0.65)
shps <- c(21, 24, 25)
gg <- ggplot(data = gg_df, aes(x = X, y = Y, fill = habitat, color = habitat)) +
geom_density2d(
data = gg_df %>%
filter(habitat == "Native agro-interface"),
size = 0.65,
binwidth = bw_nat_nu,
h = h,
color = cols[1]
) +
geom_density2d(
data = gg_df %>%
filter(habitat == "Native urban"),
size = 0.65,
binwidth = bw_nat_u,
h = h,
color = cols[2]
) +
geom_density2d(
data = gg_df %>%
filter(habitat == "Invasive urban"),
size = 0.65,
binwidth = bw_inv_u,
h = h,
color = cols[3]
) +
xlab("Principal Component 1") +
ylab("Principal Component 2") +
theme_bw() +
theme(
legend.position = "top",
axis.title = element_text(size = 14),
axis.text = element_text(size = 12),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
axis.ticks = element_line(size = 0.35),
legend.margin = margin(0, 0, 0, 0),
legend.box.margin = margin(-5, -5, -5, -5),
legend.key.width = unit(3, "lines")
)
# Get the legend
gg_leg <- gtable::gtable_filter(ggplot_gtable(ggplot_build(
gg_df %>%
ggplot(aes(x = X, y = Y, color = habitat)) +
geom_line() +
scale_color_manual(values = cols) +
guides(color = guide_legend(title = "", override.aes = list(size = 2))) +
theme_bw() +
theme(
legend.text = element_text(size = 12),
legend.position = "top",
legend.margin = margin(0, 0, 0, 0),
legend.box.margin = margin(-10, -10, -10, -10),
legend.key.width = unit(2, "lines")
)
)), "guide-box")
dev.off()
## null device
## 1
# Arrange the legend and plots into a single image file
# 1.5 column, 140mm or 5.51 in
# jpeg(file.path(gpath, "SimplerSigs_HabitatPCA.jpeg"), units = "in", width = 5.51, height = 4, res = 600)
# ggarrange(
# as.ggplot(gg_leg),
# as.ggplot(gg),
# nrow = 2,
# heights = c(1, 9)
# )
# dev.off()
See the appendix for this figure. The centers of the density contours for the native and invasive range urban calls were closer together compared to the native range agro-interface calls, indicating structural similarities by urban habitat.
Which acoustic measurements were the most different among calls in each habitat? Using the same calls selected for validation as above.
Get the standard set of 27 spectral acoustic measurements measured for supervised machine learning, merge with these calls. 210 calls total.
acs_params_habitat <- habitat_dfr %>%
dplyr::select(habitat, sound.files, site, year, range) %>%
inner_join(
read.csv(file.path(path, "acoustic_parameters.csv")) %>%
dplyr::select(-c(selec)),
by = "sound.files"
)
glimpse(acs_params_habitat)
## Rows: 210
## Columns: 32
## $ habitat <chr> "Invasive urban", "Invasive urban", "Invasive urban", "Inv…
## $ sound.files <chr> "2019_08_06_ELEM_GSV1036_3.WAV", "2019_08_06_ELEM_1039_11.…
## $ site <chr> "ELEM", "ELEM", "ELEM", "ELEM", "ELEM", "ELEM", "ELEM", "E…
## $ year <chr> "2019", "2019", "2019", "2019", "2019", "2019", "2019", "2…
## $ range <fct> Invasive, Invasive, Invasive, Invasive, Invasive, Invasive…
## $ duration <dbl> 0.1464288, 0.1722485, 0.1812131, 0.2025061, 0.1666182, 0.1…
## $ meanfreq <dbl> 4.156489, 3.439495, 4.333800, 3.553249, 4.178555, 4.105074…
## $ sd <dbl> 1.951559, 1.603174, 2.008342, 1.815870, 1.939180, 1.823958…
## $ freq.median <dbl> 4.148714, 3.350751, 4.004870, 3.334980, 3.909605, 4.013333…
## $ freq.Q25 <dbl> 2.870981, 2.299863, 2.829221, 2.248402, 2.769068, 2.931515…
## $ freq.Q75 <dbl> 5.173633, 4.366803, 5.771104, 4.727775, 5.452331, 5.007576…
## $ freq.IQR <dbl> 2.302653, 2.066940, 2.941883, 2.479372, 2.683263, 2.076061…
## $ time.median <dbl> 0.07048512, 0.07344039, 0.07754854, 0.08603456, 0.07426526…
## $ time.Q25 <dbl> 0.04302338, 0.04261356, 0.04057772, 0.04929063, 0.04618937…
## $ time.Q75 <dbl> 0.10069303, 0.11242726, 0.11812626, 0.13711757, 0.10596384…
## $ time.IQR <dbl> 0.05766964, 0.06981370, 0.07754854, 0.08782694, 0.05977448…
## $ skew <dbl> 2.229691, 1.618626, 1.970044, 1.869156, 2.315089, 2.137946…
## $ kurt <dbl> 11.094742, 6.036978, 9.638473, 9.229402, 9.537073, 8.52949…
## $ sp.ent <dbl> 0.9574609, 0.9344177, 0.9538707, 0.9484680, 0.9403040, 0.9…
## $ time.ent <dbl> 0.8951390, 0.8886264, 0.8870844, 0.8838866, 0.8893543, 0.8…
## $ entropy <dbl> 0.8570606, 0.8303483, 0.8461638, 0.8383381, 0.8362634, 0.8…
## $ sfm <dbl> 0.7158050, 0.5089952, 0.6766197, 0.6068520, 0.6179022, 0.6…
## $ meandom <dbl> 3.556159, 3.034860, 3.395232, 2.669714, 3.100140, 3.478226…
## $ mindom <dbl> 0.5250000, 0.5250000, 0.5250000, 0.5250000, 0.5250000, 0.5…
## $ maxdom <dbl> 5.191667, 4.608333, 6.358333, 4.258333, 6.125000, 5.191667…
## $ dfrange <dbl> 4.666667, 4.083333, 5.833333, 3.733333, 5.600000, 4.666667…
## $ modindx <dbl> 5.375000, 9.914286, 7.600000, 8.562500, 9.166667, 9.700000…
## $ startdom <dbl> 0.5250000, 0.5250000, 2.3916667, 0.9916667, 0.5250000, 0.5…
## $ enddom <dbl> 0.6416667, 1.5750000, 0.5250000, 0.7583333, 0.5250000, 0.7…
## $ dfslope <dbl> 0.7967466, 6.0958446, -10.3009461, -1.1522285, 0.0000000, …
## $ meanpeakf <dbl> 4.4916667, 3.6750000, 4.0250000, 2.5083333, 4.0250000, 4.1…
## $ peakf <dbl> 4.43116582, 2.84441227, 3.98902790, 0.33079937, 2.75437474…
I did not assess normality or homogeneity of variance per acoustic measurement per habitat because I decided to use Cliff’s delta (not Cohen’s d) for effect sizes for more direct comparison to effect sizes calculated between ranges above.
Calculate median and interquartile range per acoustic measurement per habitat.
acs_params_habitat_ss <- acs_params_habitat %>%
dplyr::select(-c(range, site, year)) %>%
dplyr::mutate(
habitat = factor(habitat, levels = c("Native agro-interface", "Native urban", "Invasive urban"))
) %>%
# Convert to long format
pivot_longer(
cols = names(.)[-grep("^habitat$|^sound.files$", names(.))],
names_to = "measurement",
values_to = "values"
) %>%
group_by(measurement, habitat) %>%
dplyr::summarise(
median = round(median(values), 2),
IQR = round(IQR(values), 2)
) %>%
# Convert to long format again
pivot_longer(
cols = c(median, IQR),
names_to = "statistic",
values_to = "values"
) %>%
# Make wider to split out the mean and se columns
pivot_wider(
names_from = c(statistic, habitat),
values_from = values
) %>%
ungroup()
## `summarise()` has grouped output by 'measurement'. You can override using the `.groups` argument.
acs_params_habitat_ss %>%
kable()
| measurement | median_Native agro-interface | IQR_Native agro-interface | median_Native urban | IQR_Native urban | median_Invasive urban | IQR_Invasive urban |
|---|---|---|---|---|---|---|
| dfrange | 3.97 | 1.75 | 3.97 | 0.73 | 4.55 | 0.38 |
| dfslope | 1.91 | 16.01 | 2.06 | 16.76 | 0.00 | 3.19 |
| duration | 0.18 | 0.02 | 0.16 | 0.02 | 0.17 | 0.01 |
| enddom | 3.15 | 2.19 | 3.03 | 2.89 | 0.76 | 0.70 |
| entropy | 0.83 | 0.01 | 0.84 | 0.02 | 0.84 | 0.01 |
| freq.IQR | 1.97 | 0.39 | 2.13 | 0.37 | 2.22 | 0.30 |
| freq.median | 3.96 | 0.35 | 3.91 | 0.38 | 3.94 | 0.30 |
| freq.Q25 | 2.91 | 0.33 | 2.81 | 0.29 | 2.75 | 0.39 |
| freq.Q75 | 4.95 | 0.47 | 4.88 | 0.43 | 4.99 | 0.48 |
| kurt | 7.56 | 2.42 | 7.95 | 4.04 | 7.98 | 2.90 |
| maxdom | 4.96 | 0.47 | 4.84 | 0.47 | 5.08 | 0.35 |
| meandom | 3.51 | 0.31 | 3.47 | 0.37 | 3.48 | 0.42 |
| meanfreq | 4.10 | 0.35 | 4.02 | 0.34 | 4.04 | 0.32 |
| meanpeakf | 4.03 | 0.67 | 4.03 | 1.52 | 4.14 | 1.28 |
| mindom | 0.88 | 1.75 | 0.76 | 1.08 | 0.52 | 0.03 |
| modindx | 10.73 | 5.35 | 9.19 | 2.19 | 8.98 | 2.10 |
| peakf | 3.30 | 1.22 | 2.81 | 1.30 | 3.46 | 1.53 |
| sd | 1.66 | 0.13 | 1.68 | 0.16 | 1.83 | 0.16 |
| sfm | 0.57 | 0.06 | 0.60 | 0.09 | 0.65 | 0.07 |
| skew | 1.97 | 0.39 | 2.02 | 0.63 | 2.00 | 0.42 |
| sp.ent | 0.94 | 0.01 | 0.94 | 0.02 | 0.95 | 0.01 |
| startdom | 2.51 | 2.89 | 1.40 | 2.48 | 0.88 | 0.82 |
| time.ent | 0.89 | 0.00 | 0.89 | 0.00 | 0.89 | 0.00 |
| time.IQR | 0.08 | 0.01 | 0.07 | 0.01 | 0.07 | 0.01 |
| time.median | 0.08 | 0.01 | 0.08 | 0.01 | 0.08 | 0.01 |
| time.Q25 | 0.05 | 0.01 | 0.04 | 0.01 | 0.04 | 0.01 |
| time.Q75 | 0.13 | 0.01 | 0.11 | 0.01 | 0.11 | 0.01 |
Calculate Cliff’s delta as the effect size of range per acoustic measurement. Filtered by native urban and invasive urban calls to assess effect of range given the same habitat, then by native agro-interface and native urban to assess the effect of habitat.
var_nms <- names(acs_params_habitat)[-grep("^habitat$|^range$|^sound.files$|^site$|^year$", names(acs_params_habitat))]
var_nms
# i <- 1
eff_resh_df <- rbindlist(pblapply(1:length(var_nms), function(i){
# Calculate effect of range
tmp_df <- acs_params_habitat %>%
dplyr::select(habitat, var_nms[i])
nat <- tmp_df %>%
filter(habitat == "Native urban") %>%
pull(var_nms[i])
inv <- tmp_df %>%
filter(habitat == "Invasive urban") %>%
pull(var_nms[i])
# Get Cliff's delta for independent groups (all values of y compared to all values of x)
# orddom package v 3.1
cliff_delta <- dmes(nat, inv)$dc
# Using the bias-corrected and accelerated bootstrap method, found to be robust to non-normality and heterogeneous variance in a simulation study by Hess & Kromey 2004
# Using 10000 bootstrap iterations
ci_res <- dmes.boot(nat, inv, theta.es = "dc", ci.meth = "BCA", B = 10000)
ci_res
lower_95_CI <- ci_res$theta.bci.lo
upper_95_CI <- ci_res$theta.bci.up
range_df <- data.frame(
measurement = var_nms[i],
type = "Range",
cliff_delta = cliff_delta,
lower_95_CI = lower_95_CI,
upper_95_CI = upper_95_CI
)
# Calculate effect of habitat
nat_agr <- tmp_df %>%
filter(habitat == "Native agro-interface") %>%
pull(var_nms[i])
nat_urb <- tmp_df %>%
filter(habitat == "Native urban") %>%
pull(var_nms[i])
# Get Cliff's delta for independent groups (all values of y compared to all values of x)
# orddom package v 3.1
cliff_delta <- dmes(nat_agr, nat_urb)$dc
# Using the bias-corrected and accelerated bootstrap method, found to be robust to non-normality and heterogeneous variance in a simulation study by Hess & Kromey 2004
# Using 10000 bootstrap iterations
ci_res <- dmes.boot(nat_agr, nat_urb, theta.es = "dc", ci.meth = "BCA", B = 10000)
ci_res
lower_95_CI <- ci_res$theta.bci.lo
upper_95_CI <- ci_res$theta.bci.up
habitat_df <- data.frame(
measurement = var_nms[i],
type = "Habitat",
cliff_delta = cliff_delta,
lower_95_CI = lower_95_CI,
upper_95_CI = upper_95_CI
)
return(
range_df %>%
bind_rows(habitat_df)
)
}))
eff_resh_df
write.csv(eff_resh_df, file.path(path, "cliffs_delta_bootstrappedCIs_habitat.csv"), row.names = FALSE)
Effect sizes ordered by absolute magnitude.
eff_resh_df <- read.csv(file.path(path, "cliffs_delta_bootstrappedCIs_habitat.csv"))
glimpse(eff_resh_df)
## Rows: 54
## Columns: 5
## $ measurement <chr> "duration", "duration", "meanfreq", "meanfreq", "sd", "sd"…
## $ type <chr> "Range", "Habitat", "Range", "Habitat", "Range", "Habitat"…
## $ cliff_delta <dbl> 0.2505208333, -0.6570121951, 0.0682291667, -0.2012195122, …
## $ lower_95_CI <dbl> 0.031770833, -0.792682927, -0.135416667, -0.397865854, 0.4…
## $ upper_95_CI <dbl> 0.43906250, -0.46697154, 0.26510417, 0.01270325, 0.7322916…
# Make a table for appendix, in order of decreasing largest absolute effect size within comparisons
eff_resh_df %>%
dplyr::mutate(
effect_size = round(cliff_delta, 2),
lower_95_CI = round(lower_95_CI, 2),
upper_95_CI = round(upper_95_CI, 2),
`95_CI` = paste("(", lower_95_CI, ", ", upper_95_CI, ")", sep = "")
) %>%
# Arrange by decreasing effect size within habitat type
arrange(type, desc(abs(effect_size))) %>%
dplyr::select(type, measurement, effect_size, `95_CI`) %>%
kable()
| type | measurement | effect_size | 95_CI |
|---|---|---|---|
| Habitat | duration | -0.66 | (-0.79, -0.47) |
| Habitat | time.ent | 0.66 | (0.47, 0.8) |
| Habitat | time.Q75 | -0.64 | (-0.78, -0.45) |
| Habitat | time.IQR | -0.60 | (-0.75, -0.41) |
| Habitat | time.median | -0.48 | (-0.64, -0.28) |
| Habitat | time.Q25 | -0.44 | (-0.61, -0.25) |
| Habitat | modindx | -0.31 | (-0.49, -0.12) |
| Habitat | entropy | 0.30 | (0.09, 0.49) |
| Habitat | freq.Q25 | -0.29 | (-0.48, -0.09) |
| Habitat | sfm | 0.23 | (0, 0.42) |
| Habitat | freq.IQR | 0.21 | (0.01, 0.41) |
| Habitat | meanfreq | -0.20 | (-0.4, 0.01) |
| Habitat | sd | 0.15 | (-0.06, 0.35) |
| Habitat | meandom | -0.15 | (-0.35, 0.06) |
| Habitat | peakf | -0.15 | (-0.35, 0.06) |
| Habitat | maxdom | -0.12 | (-0.32, 0.09) |
| Habitat | freq.median | -0.10 | (-0.3, 0.11) |
| Habitat | sp.ent | 0.09 | (-0.13, 0.3) |
| Habitat | freq.Q75 | -0.07 | (-0.27, 0.15) |
| Habitat | dfrange | 0.05 | (-0.15, 0.25) |
| Habitat | dfslope | 0.05 | (-0.17, 0.26) |
| Habitat | mindom | -0.04 | (-0.24, 0.16) |
| Habitat | startdom | 0.02 | (-0.19, 0.21) |
| Habitat | enddom | 0.02 | (-0.2, 0.23) |
| Habitat | skew | -0.01 | (-0.24, 0.21) |
| Habitat | kurt | -0.01 | (-0.23, 0.21) |
| Habitat | meanpeakf | -0.01 | (-0.23, 0.2) |
| Range | sd | 0.60 | (0.43, 0.73) |
| Range | dfrange | 0.57 | (0.37, 0.72) |
| Range | mindom | -0.55 | (-0.7, -0.38) |
| Range | enddom | -0.51 | (-0.68, -0.29) |
| Range | sp.ent | 0.48 | (0.28, 0.63) |
| Range | sfm | 0.45 | (0.26, 0.61) |
| Range | entropy | 0.40 | (0.2, 0.57) |
| Range | startdom | -0.34 | (-0.52, -0.13) |
| Range | freq.IQR | 0.32 | (0.12, 0.5) |
| Range | maxdom | 0.31 | (0.1, 0.48) |
| Range | time.ent | -0.26 | (-0.45, -0.04) |
| Range | duration | 0.25 | (0.03, 0.44) |
| Range | freq.Q75 | 0.20 | (0, 0.39) |
| Range | dfslope | -0.16 | (-0.39, 0.07) |
| Range | skew | -0.14 | (-0.34, 0.07) |
| Range | meanfreq | 0.07 | (-0.14, 0.27) |
| Range | kurt | -0.07 | (-0.28, 0.15) |
| Range | freq.Q25 | -0.06 | (-0.26, 0.13) |
| Range | time.Q25 | 0.06 | (-0.16, 0.26) |
| Range | peakf | 0.06 | (-0.14, 0.27) |
| Range | modindx | -0.05 | (-0.25, 0.16) |
| Range | freq.median | 0.04 | (-0.16, 0.24) |
| Range | time.Q75 | 0.04 | (-0.18, 0.24) |
| Range | meanpeakf | 0.04 | (-0.16, 0.24) |
| Range | meandom | -0.03 | (-0.22, 0.17) |
| Range | time.IQR | 0.01 | (-0.2, 0.22) |
| Range | time.median | 0.00 | (-0.22, 0.21) |
# Note that spectral flatness has a 95% CI that includes 0
# Acoustic measurements with large effect sizes? In either direction
eff_reshL_df <- eff_resh_df %>%
# Keep the top largest effect sizes per type
# Using a rule-of-thumb for Cliff's delta reported in the literature
filter(abs(cliff_delta) >= 0.474) %>%
dplyr::mutate(
effect_size = round(cliff_delta, 2),
lower_95_CI = round(lower_95_CI, 2),
upper_95_CI = round(upper_95_CI, 2),
`95_CI` = paste("(", lower_95_CI, ", ", upper_95_CI, ")", sep = "")
) %>%
arrange(type, desc(abs(effect_size))) %>%
dplyr::select(type, measurement, effect_size, `95_CI`)
eff_reshL_df
## type measurement effect_size 95_CI
## 1 Habitat duration -0.66 (-0.79, -0.47)
## 2 Habitat time.ent 0.66 (0.47, 0.8)
## 3 Habitat time.Q75 -0.64 (-0.78, -0.45)
## 4 Habitat time.IQR -0.60 (-0.75, -0.41)
## 5 Habitat time.median -0.48 (-0.64, -0.28)
## 6 Range sd 0.60 (0.43, 0.73)
## 7 Range dfrange 0.57 (0.37, 0.72)
## 8 Range mindom -0.55 (-0.7, -0.38)
## 9 Range enddom -0.51 (-0.68, -0.29)
## 10 Range sp.ent 0.48 (0.28, 0.63)
topms <- eff_reshL_df %>%
pull(measurement) %>%
as.character()
topms
## [1] "duration" "time.ent" "time.Q75" "time.IQR" "time.median"
## [6] "sd" "dfrange" "mindom" "enddom" "sp.ent"
The largest effect sizes of habitat (native agro-interface versus native urban) were all temporal parameters. These effect sizes were statistcically significant (95% CIs that did not cross zero).
The largest effect sizes of range (native urban versus invasive urban) were related to frequency. These effect sizes also had 95% CIs that did not cross zero.
And are frequency modulation patterns actually simpler (e.g. more similar to invasive range) for urban native range calls in the frequency modulation dataset? 2 sites, only 8 calls, 1/5 of the native range dataset used for frequency modulation. I retained 2 native range agro-interface and 2 invasive urban sites for these comparisons. This analysis was done separately from other acoustic measurements because a different set of calls was employed.
Using the frequency modulation measurements obtained above.
# peaks_spat_df %>%
# filter(year == "2019") %>%
# distinct(site)
nat_sites <- c("PIED", "INES-08", "FAGR", "GOLF")
# inv_sites <- c("SOCC", "INTR") # for FM used 2004, 2011 calls
inv_sites <- c("AIRP", "MANO") # 2019
# Get calls for these sites and add a habitat column
peaks_hab_df <- peaks_spat_df %>%
filter(site %in% c(nat_sites, inv_sites)) %>%
# Add a habitat column
dplyr::mutate(
habitat = ifelse(range == "Invasive", "Invasive urban", "Native agro-interface"),
habitat = ifelse(range == "Native" & site %in% c("FAGR", "GOLF"), "Native urban", habitat)
) %>%
# distinct(site, habitat)
# Ensure levels of the range and habitat columns are in the expected order
dplyr::mutate(
range = factor(range, levels = c("Native", "Invasive")),
habitat = factor(habitat, levels = c("Native agro-interface", "Native urban", "Invasive urban"))
)
glimpse(peaks_hab_df)
## Rows: 125
## Columns: 16
## $ sound.files <fct> 2017_10_25_PIED_1563_2.WAV, 2017_10_25_PIED_1563_2.WA…
## $ range <fct> Native, Native, Native, Native, Native, Native, Nativ…
## $ site <fct> PIED, PIED, PIED, PIED, PIED, PIED, PIED, PIED, PIED,…
## $ year <fct> 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017,…
## $ site_year <fct> PIED_2017, PIED_2017, PIED_2017, PIED_2017, PIED_2017…
## $ region <fct> West, West, West, West, West, West, West, West, West,…
## $ dept_state <fct> Colonia, Colonia, Colonia, Colonia, Colonia, Colonia,…
## $ invasive_city <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ peak_number <dbl> 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 7, 8, 1, 3, 4, 5,…
## $ peak_height <dbl> 2.801363, 4.615108, 4.977125, 5.180768, 4.928268, 4.4…
## $ peak_max_index <dbl> 28, 87, 161, 240, 308, 369, 24, 71, 118, 186, 236, 29…
## $ peak_start_index <dbl> 1, 39, 110, 199, 282, 348, 1, 40, 92, 161, 208, 277, …
## $ peak_end_index <dbl> 39, 110, 199, 282, 348, 398, 40, 92, 137, 208, 277, 3…
## $ peak_trgh_slope <dbl> -0.05169762, -0.41990629, -0.32406322, -0.24725622, -…
## $ trgh_index <dbl> 39, 110, 199, 282, 348, 398, 40, 92, 161, 208, 277, 3…
## $ habitat <fct> Native agro-interface, Native agro-interface, Native …
peaks_hab_df %>%
distinct(site, habitat, year)
## site year habitat
## 1: PIED 2017 Native agro-interface
## 2: FAGR 2017 Native urban
## 3: INES-08 2017 Native agro-interface
## 4: GOLF 2017 Native urban
## 5: AIRP 2019 Invasive urban
## 6: MANO 2019 Invasive urban
# Get number of peaks, modulation rate, peak-trough slope
FM_hab_df <- peaks_hab_df %>%
# Find the number of peaks per call
group_by(habitat, sound.files) %>%
dplyr::summarise(
num_peaks = length(peak_number)
) %>%
ungroup() %>%
# Add modulation rate
inner_join(
peaks_hab_df %>%
# Calculate duration per call
inner_join(
freq_mod_df %>%
dplyr::mutate(
duration = end - start
) %>%
dplyr::select(sound.files, duration),
by = "sound.files"
) %>%
# Calculate modulation rate per call
group_by(habitat, sound.files, duration) %>%
dplyr::summarise(
num_peaks = length(peak_number)
) %>%
ungroup() %>%
dplyr::mutate(
modulation_rate = (num_peaks/duration) # units for rate of change = peaks/second
) %>%
dplyr::select(sound.files, modulation_rate),
by = "sound.files"
) %>%
# Add peak-trough slope
inner_join(
peaks_hab_df %>%
# Drop peaks that were not matched to troughs
filter(!is.na(peak_trgh_slope)) %>%
# Get max peak-trough range per call
group_by(habitat, sound.files) %>%
dplyr::summarise(
max_peak_trgh_slope = min(peak_trgh_slope)
) %>%
ungroup() %>%
dplyr::select(sound.files, max_peak_trgh_slope),
by = "sound.files"
)
## `summarise()` has grouped output by 'habitat'. You can override using the `.groups` argument.
## `summarise()` has grouped output by 'habitat', 'sound.files'. You can override using the `.groups` argument.
## `summarise()` has grouped output by 'habitat'. You can override using the `.groups` argument.
# 24 calls
FM_hab_df
## # A tibble: 24 x 5
## habitat sound.files num_peaks modulation_rate max_peak_trgh_sl…
## <fct> <chr> <int> <dbl> <dbl>
## 1 Native agro-… 2017_10_25_PIED_15… 6 33.6 -0.420
## 2 Native agro-… 2017_10_25_PIED_15… 8 47.8 -0.465
## 3 Native agro-… 2017_10_25_PIED_15… 6 31.0 -0.442
## 4 Native agro-… 2017_10_25_PIED_15… 7 38.7 -0.362
## 5 Native agro-… 2017_07_13_INES-08… 6 34.8 -0.247
## 6 Native agro-… 2017_07_13_INES-08… 6 35.6 -0.347
## 7 Native agro-… 2017_07_13_INES-08… 7 38.8 -0.392
## 8 Native agro-… 2017_07_13_INES-08… 6 33.0 -0.336
## 9 Native urban 2017_09_05_FAGR_13… 6 31.4 -0.467
## 10 Native urban 2017_09_05_FAGR_12… 4 23.8 -0.350
## # … with 14 more rows
A raincloud plot for the distributions of the 3 frequency modulation measurements per range-habitat.
FM_hab_df_tmp <- FM_hab_df %>%
# Convert to long format
pivot_longer(
cols = names(.)[-grep("^habitat$|^sound.files$", names(.))],
names_to = "measurement",
values_to = "values"
) %>%
dplyr::mutate(
measurement_nm = measurement,
# Add new line symbols to all measurement names
measurement_nm = recode(
measurement_nm,
`num_peaks` = "Number of\n peaks",
`max_peak_trgh_slope` = "Peak - trough\n slope",
`modulation_rate` = "Modulation\n rate",
),
measurement_nm = factor(measurement_nm),
)
FM_hab_df_tmp
topms <- c("num_peaks", "modulation_rate", "max_peak_trgh_slope")
cols <- scales::alpha(c("navy", "dodgerblue", "orange"), 0.85)
# Make a list of plots in order of effect sizes
# Doing this to customize y-axes of each plot, which isn't possible for plots in a single row with facetscales
gg_list <- list()
# i <- 1
invisible(pblapply(1:length(topms), function(i){
tmp_df <- FM_hab_df_tmp %>%
filter(measurement == topms[i]) %>%
dplyr::mutate(
measurement_nm = as.character(measurement_nm),
measurement_nm = factor(measurement_nm, levels = unique(measurement_nm)),
habitat = recode(
habitat,
`Native agro-interface` = "Native\n agro-interface",
`Native urban` = "Native\n urban",
`Invasive urban` = "Invasive\n urban"
)
)
# Get ymin and ymax values, add a buffer of 2 evenly space values before and after
ymin <- min(tmp_df$values)
ymax <- max(tmp_df$values)
# Get the difference between max and min, use this as a buffer on the y-axis scale
buf <- (ymax - ymin)/5
# If on the first plot, add the y-axis label
if(i == 1){
yal <- "Values"
} else{
yal <- ""
}
# Initialize plot margins (top, right, bottom, left)
tm <- 1
rm <- 0
bm <- 1
lm <- 0
# If on the first plot, make the left margin larger
if(i == 1){
lm <- 0.5
# If on last plot, make the right margin larger
} else if(i == length(topms)){
rm <- 0.5
}
# Raincloud plot: Kernel smoothed with boxplot
gg <- ggplot(tmp_df, aes(x = habitat, y = values, fill = habitat, color = habitat)) +
geom_flat_violin(position = position_nudge(x = 0.1, y = 0), adjust = 1, size = 0.25)+
geom_point(position = position_jitter(width = 0.05), size = 2, stroke = 0.5, alpha = 0.45) +
geom_boxplot(aes(x = as.numeric(habitat) - 0.22, y = values), outlier.shape = NA, alpha = 0.55, width = 0.2, colour = "black", size = 0.25) +
facet_wrap(~ measurement_nm) +
scale_fill_manual(values = cols) +
scale_color_manual(values = cols) +
guides(fill = FALSE, color = FALSE) +
theme_bw() +
xlab("") + ylab(yal) +
scale_y_continuous(limits = round(c(ymin - buf, ymax + buf), 2), breaks = round(seq(ymin - buf, (ymax + buf), (ymax - ymin)/5), 2)) +
theme(
axis.title = element_text(size = 11),
axis.text.y = element_text(size = 11),
# , angle = 35, hjust = 1
axis.text.x = element_text(size = 9),
strip.text = element_text(size = 12, margin = margin(0.5, 1, 0.5, 1, "lines")),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
axis.ticks = element_line(size = 0.25),
plot.margin = unit(c(tm, rm, bm, lm), "lines")
)
# gg
# Return the given plot
gg_list[[i]] <<- gg
}))
# Get the legend
cols <- scales::alpha(c("navy", "dodgerblue", "orange"), 0.85)
gg_leg <- gtable::gtable_filter(ggplot_gtable(ggplot_build(
FM_hab_df_tmp %>%
ggplot(aes(x = habitat, y = values, color = habitat)) +
geom_errorbar(aes(ymin = values, ymax = values), size = 2, width = 0.25) +
scale_color_manual(values = cols) +
guides(color = guide_legend(title = "", override.aes = list(size = 2))) +
theme_bw() +
theme(
legend.text = element_text(size = 12),
legend.position = "top",
legend.margin = margin(0, 0, 0, 0),
legend.box.margin = margin(-10, -10, -10, -10),
legend.key.width = unit(2, "lines")
)
)), "guide-box")
dev.off()
# Arrange plots into a single image file
# Animal Behavior double column figure = 190 mm or 7.48 in
jpeg(file.path(gpath, "Figure4_FmMeasrmntsHabitats_final.jpeg"), units = "in", width = 7.48, height = 4, res = 1000)
ggarrange(
as.ggplot(gg_leg),
as.ggplot(ggarrange(
as.ggplot(gg_list[[1]]),
as.ggplot(gg_list[[2]]),
as.ggplot(gg_list[[3]]),
nrow = 1,
widths = rep(3, 3)
)),
nrow = 2,
heights = c(1, 10)
)
dev.off()
See the main text of the associated publication for this figure.
Calculated median and interquartile range per acoustic measurement per habitat.
FM_hab_df_ss <- FM_hab_df %>%
dplyr::mutate(
habitat = factor(habitat, levels = c("Native agro-interface", "Native urban", "Invasive urban"))
) %>%
# Convert to long format
pivot_longer(
cols = names(.)[-grep("^habitat$|^sound.files$", names(.))],
names_to = "measurement",
values_to = "values"
) %>%
group_by(measurement, habitat) %>%
dplyr::summarise(
median = round(median(values), 2),
IQR = round(IQR(values), 2)
) %>%
# Convert to long format again
pivot_longer(
cols = c(median, IQR),
names_to = "statistic",
values_to = "values"
) %>%
# Make wider to split out the mean and se columns
pivot_wider(
names_from = c(statistic, habitat),
values_from = values
) %>%
ungroup()
## `summarise()` has grouped output by 'measurement'. You can override using the `.groups` argument.
glimpse(FM_hab_df_ss)
## Rows: 3
## Columns: 7
## $ measurement <chr> "max_peak_trgh_slope", "modulation_rate…
## $ `median_Native agro-interface` <dbl> -0.38, 35.18, 6.00
## $ `IQR_Native agro-interface` <dbl> 0.08, 5.27, 1.00
## $ `median_Native urban` <dbl> -0.4, 30.3, 5.0
## $ `IQR_Native urban` <dbl> 0.09, 4.71, 1.25
## $ `median_Invasive urban` <dbl> -0.26, 25.11, 4.00
## $ `IQR_Invasive urban` <dbl> 0.05, 6.48, 0.50
# FM_hab_df_ss %>%
# View()
Calculate Cliff’s delta as the effect size of range and habitat per acoustic measurement. Filtered by native urban and invasive urban calls to assess effect of range given the same habitat, then by native agro-interface and native urban to assess the effect of habitat.
var_nms <- names(FM_hab_df)[-grep("^habitat$|^sound.files$", names(FM_hab_df))]
var_nms
# Got a t-test error when running i = 1 for num_peaks
eff_resh_df <- rbindlist(pblapply(2:length(var_nms), function(i){
# cat(paste("i = ", i, sep = ""))
# Calculate effect of range
tmp_df <- FM_hab_df %>%
dplyr::select(habitat, var_nms[i])
nat <- tmp_df %>%
filter(habitat == "Native urban") %>%
pull(var_nms[i])
inv <- tmp_df %>%
filter(habitat == "Invasive urban") %>%
pull(var_nms[i])
# Get Cliff's delta for independent groups (all values of y compared to all values of x)
# orddom package v 3.1
cliff_delta <- dmes(nat, inv)$dc
# Using the bias-corrected and accelerated bootstrap method, found to be robust to non-normality and heterogeneous variance in a simulation study by Hess & Kromey 2004
# Using 10000 bootstrap iterations
ci_res <- dmes.boot(nat, inv, theta.es = "dc", ci.meth = "BCA", B = 10000)
ci_res
lower_95_CI <- ci_res$theta.bci.lo
upper_95_CI <- ci_res$theta.bci.up
range_df <- data.frame(
measurement = var_nms[i],
type = "Range",
cliff_delta = cliff_delta,
lower_95_CI = lower_95_CI,
upper_95_CI = upper_95_CI
)
# Calculate effect of habitat
nat_agr <- tmp_df %>%
filter(habitat == "Native agro-interface") %>%
pull(var_nms[i])
nat_urb <- tmp_df %>%
filter(habitat == "Native urban") %>%
pull(var_nms[i])
# Get Cliff's delta for independent groups (all values of y compared to all values of x)
# orddom package v 3.1
cliff_delta <- dmes(nat_agr, nat_urb)$dc
# Using the bias-corrected and accelerated bootstrap method, found to be robust to non-normality and heterogeneous variance in a simulation study by Hess & Kromey 2004
# Using 10000 bootstrap iterations
ci_res <- dmes.boot(nat_agr, nat_urb, theta.es = "dc", ci.meth = "BCA", B = 10000)
ci_res
lower_95_CI <- ci_res$theta.bci.lo
upper_95_CI <- ci_res$theta.bci.up
habitat_df <- data.frame(
measurement = var_nms[i],
type = "Habitat",
cliff_delta = cliff_delta,
lower_95_CI = lower_95_CI,
upper_95_CI = upper_95_CI
)
return(
range_df %>%
bind_rows(habitat_df)
)
}))
eff_resh_df
# Ran this code for num_peaks outside of the loop, which avoided the t.test error
# Calculate effect of range
tmp_df <- FM_hab_df %>%
dplyr::select(habitat, "num_peaks")
nat <- tmp_df %>%
filter(habitat == "Native urban") %>%
pull(num_peaks)
inv <- tmp_df %>%
filter(habitat == "Invasive urban") %>%
pull(num_peaks)
# Get Cliff's delta for independent groups (all values of y compared to all values of x)
# orddom package v 3.1
cliff_delta <- dmes(nat, inv)$dc
# Using the bias-corrected and accelerated bootstrap method, found to be robust to non-normality and heterogeneous variance in a simulation study by Hess & Kromey 2004
# Using 10000 bootstrap iterations
ci_res <- dmes.boot(nat, inv, theta.es = "dc", ci.meth = "BCA", B = 10000)
ci_res
lower_95_CI <- ci_res$theta.bci.lo
upper_95_CI <- ci_res$theta.bci.up
range_df <- data.frame(
measurement = "num_peaks",
type = "Range",
cliff_delta = cliff_delta,
lower_95_CI = lower_95_CI,
upper_95_CI = upper_95_CI
)
# Calculate effect of habitat
nat_agr <- tmp_df %>%
filter(habitat == "Native agro-interface") %>%
pull(num_peaks)
nat_urb <- tmp_df %>%
filter(habitat == "Native urban") %>%
pull(num_peaks)
# Get Cliff's delta for independent groups (all values of y compared to all values of x)
# orddom package v 3.1
cliff_delta <- dmes(nat_agr, nat_urb)$dc
# Using the bias-corrected and accelerated bootstrap method, found to be robust to non-normality and heterogeneous variance in a simulation study by Hess & Kromey 2004
# Using 10000 bootstrap iterations
ci_res <- dmes.boot(nat_agr, nat_urb, theta.es = "dc", ci.meth = "BCA", B = 10000)
ci_res
lower_95_CI <- ci_res$theta.bci.lo
upper_95_CI <- ci_res$theta.bci.up
habitat_df <- data.frame(
measurement = "num_peaks",
type = "Habitat",
cliff_delta = cliff_delta,
lower_95_CI = lower_95_CI,
upper_95_CI = upper_95_CI
)
eff_resh_df2 <- range_df %>%
bind_rows(habitat_df)
eff_resh_df3 <- eff_resh_df %>%
bind_rows(eff_resh_df2)
eff_resh_df3
write.csv(eff_resh_df3, file.path(path, "cliffs_delta_bootstrappedCIs_habitatFm.csv"), row.names = FALSE)
Effect sizes ordered by absolute magnitude.
eff_resh_df <- read.csv(file.path(path, "cliffs_delta_bootstrappedCIs_habitatFm.csv"))
glimpse(eff_resh_df)
## Rows: 6
## Columns: 5
## $ measurement <chr> "modulation_rate", "modulation_rate", "max_peak_trgh_slope…
## $ type <chr> "Range", "Habitat", "Range", "Habitat", "Range", "Habitat"
## $ cliff_delta <dbl> -0.562500, -0.625000, 0.906250, -0.250000, -0.656250, -0.7…
## $ lower_95_CI <dbl> -0.906250, -1.000000, 0.250000, -0.750000, -0.921875, -0.9…
## $ upper_95_CI <dbl> 0.00000, -0.06250, 1.00000, 0.34375, -0.15625, -0.15625
# Make a table for appendix, in order of largest absolute effect size per comparison
eff_resh <- eff_resh_df %>%
dplyr::mutate(
effect_size = round(cliff_delta, 2),
lower_95_CI = round(lower_95_CI, 2),
upper_95_CI = round(upper_95_CI, 2),
`95_CI` = paste("(", lower_95_CI, ", ", upper_95_CI, ")", sep = "")
) %>%
arrange(type, desc(abs(effect_size))) %>%
ungroup() %>%
dplyr::select(type, measurement, effect_size, `95_CI`)
eff_resh %>%
kable(align = c(rep("c", 4)))
| type | measurement | effect_size | 95_CI |
|---|---|---|---|
| Habitat | num_peaks | -0.77 | (-0.92, -0.16) |
| Habitat | modulation_rate | -0.62 | (-1, -0.06) |
| Habitat | max_peak_trgh_slope | -0.25 | (-0.75, 0.34) |
| Range | max_peak_trgh_slope | 0.91 | (0.25, 1) |
| Range | num_peaks | -0.66 | (-0.92, -0.16) |
| Range | modulation_rate | -0.56 | (-0.91, 0) |
1. Araya-Salas, M., Smith-Vidaurre, G., and M. Webster. 2017. Assessing the effect of sound file compression and background noise on measures of acoustic signal structure. Bioacoustics 28(1), 57-73.
2. Hess, M.R., & Kromrey, J.D. 2004. Robust confidence intervals for effect sizes: A comparative study of Cohen’s d and Cliff’s delta under non-normality and heterogeneous variances. Paper presented at the annual meeting of the American Educational Research Association, San Diego, 12–16 April 2004.
The session info printed here represents the environment used for the final RMarkdown knitting. The software and package versions employed for main results are reported in the appendix of the associated article.
sessionInfo()
## R version 4.1.0 (2021-05-18)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.5 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] grid stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] orddom_3.1 psych_2.0.12 rstatix_0.7.0
## [4] MASS_7.3-54 gtable_0.3.0 ggplotify_0.0.7
## [7] facetscales_0.1.0.9000 egg_0.4.5 gridExtra_2.3
## [10] pracma_2.3.3 data.table_1.14.0 pbapply_1.4-3
## [13] forcats_0.5.1 stringr_1.4.0 dplyr_1.0.6
## [16] purrr_0.3.4 readr_1.4.0 tidyr_1.1.3
## [19] tibble_3.1.2 tidyverse_1.3.1 ggplot2_3.3.3
## [22] warbleR_1.1.27 NatureSounds_1.0.4 knitr_1.33
## [25] seewave_2.1.6 tuneR_1.3.3 plyr_1.8.6
##
## loaded via a namespace (and not attached):
## [1] nlme_3.1-152 bitops_1.0-7 fs_1.5.0
## [4] lubridate_1.7.10 httr_1.4.2 tools_4.1.0
## [7] backports_1.2.1 utf8_1.2.1 R6_2.5.0
## [10] DBI_1.1.1 colorspace_2.0-1 withr_2.4.2
## [13] mnormt_2.0.2 tidyselect_1.1.1 curl_4.3.1
## [16] compiler_4.1.0 cli_2.5.0 rvest_1.0.0
## [19] xml2_1.3.2 labeling_0.4.2 scales_1.1.1
## [22] proxy_0.4-25 dtw_1.22-3 digest_0.6.27
## [25] foreign_0.8-81 rmarkdown_2.8 rio_0.5.26
## [28] pkgconfig_2.0.3 htmltools_0.5.1.1 highr_0.9
## [31] dbplyr_2.1.1 rlang_0.4.11 readxl_1.3.1
## [34] rstudioapi_0.13 farver_2.1.0 gridGraphics_0.5-1
## [37] generics_0.1.0 jsonlite_1.7.2 zip_2.1.1
## [40] car_3.0-10 RCurl_1.98-1.3 magrittr_2.0.1
## [43] Rcpp_1.0.6 munsell_0.5.0 fansi_0.5.0
## [46] abind_1.4-5 lifecycle_1.0.0 stringi_1.6.2
## [49] yaml_2.2.1 carData_3.0-4 parallel_4.1.0
## [52] crayon_1.4.1 lattice_0.20-44 haven_2.4.1
## [55] hms_1.1.0 tmvnsim_1.0-2 pillar_1.6.1
## [58] rjson_0.2.20 fftw_1.0-6 reprex_2.0.0
## [61] glue_1.4.2 evaluate_0.14 BiocManager_1.30.15
## [64] modelr_0.1.8 vctrs_0.3.8 cellranger_1.1.0
## [67] assertthat_0.2.1 xfun_0.23 openxlsx_4.2.3
## [70] broom_0.7.6 signal_0.7-7 rvcheck_0.1.8
## [73] ellipsis_0.3.2