We selected calls in Raven prior to running this code. Here we preprocessed calls from the repeated individual and site-level recordings prior to sound analysis and matrix regression. In this code and other supplementary methods, “focal individual(s)” is sometimes used to refer to repeatedly sampled individuals.
The data provided with this publication was pre-processed using the processing pipeline and code documented below.
Clean environment and install and/or load packages, including the package warbleR
[1].
# clean global environment
rm(list = ls())
###################################
# install packages from Github
# install devtools if not installed
# if (!"devtools" %in% installed.packages()[,"Package"])
# install.packages("devtools")
# library(devtools)
# devtools::install_github("maRce10/warbleR")
###################################
# install packages from CRAN in a loop
# X <- c("pbapply", "ggplot2", "data.table", "parallel")
# is_installed <- function(p) is.element(p, installed.packages()[,1])
#
# lapply(1:length(X), function(x){
# if(!is_installed(X[x])){
# install.packages(X[x], repos = "http://lib.stat.cmu.edu/R/CRAN")
# }
# })
###################################
# load all packages
X <- c("warbleR", "Rraven", "tuneR", "pbapply", "ggplot2", "data.table", "parallel")
invisible(lapply(X, library, character.only = TRUE))
path <- "/media/owner/MYIOPSITTA/R/Uruguay2017_MonkParakeet_CallAnalysis/Data/Focal_Individuals"
We imported Raven selection tables for focal individual calls and wrote out the selection table.
fsels <- Rraven::imp_raven(path, name.from.file = TRUE, ext.case = "upper", freq.cols = FALSE)
str(fsels)
# are all wav files in selection tables also in the working directory?
wavs <- list.files(pattern = ".wav$", ignore.case = TRUE)
unique(fsels$sound.files[!fsels$sound.files %in% wavs])
write.csv(fsels, file.path(path, "focal_individual_sels.csv"), row.names = FALSE)
Here we optimized spectrogram parameters. We visually evaluated the resulting spectrogram mosaic to choose the Fourier Transform window, window length and the partitioning of signal amplitude.
# pick a random call to generate a spectrogram mosaic with varying parameters
set.seed(89)
r <- sample(x = nrow(fsels), size = 1)
spec_param(fsels[r, ], length.out = 10, ovlp = 90, wl = c(200, 600), wn = c("hanning", "bartlett", "hamming"), collev.min = c(-40, -100), pal = "reverse.gray.colors.2", rm.axes = TRUE, nrow = 3, ncol = 3, img.prefix = "spec_param", flim = c(0.5, 9), mar = 0.03, cex = 0.45, path = path)
We used the parameters chosen above to generate spectrograms for visual quality control. We added information to selection table spreadsheet about visual call quality, overlapping signals and any other pertintent information during visual inspection.
specreator(fsels, wn = "hanning", wl = 378, collev = seq(-53, 0, 1), flim = c(0, 10), ovlp = 90, mar = 0.01, res = 300, cex = 0.5, line = FALSE, path = path)
We manually added columns (Visual_quality_score, Overlapping_signal Overlap_section, Bandpass_overlaps_signal_freq_range) to the metadata during visual inspection of spectrograms. We read back in the metadata and added site and individual identity to the focal individual call metadata.
# read back in the csv with manually added visual quality control columns
fsels <- read.csv(file.path(path, "focal_individual_sels.csv"))
str(fsels)
# add a site variable
fsels$site <- sapply(1:nrow(fsels), function(x){
strsplit(strsplit(as.character(fsels$sound.files[x]), split = "_")[[1]][4], split = "-")[[1]][1]
})
unique(fsels$site)
# make a column of individual IDs
fsels$indiv <- sapply(1:nrow(fsels), function(x){
strsplit(strsplit(as.character(fsels$sound.files[x]), split = "_")[[1]][5], split = "DIV-")[[1]][2]
})
# replace long IDs for a 3 character ID per individual
fsels$indiv <- gsub("UNMARKED-BIRD", "UM", fsels$indiv)
unique(fsels$indiv)
How many calls per individual, and which individuals have < 4 calls?
table(fsels$indiv)
table(fsels$indiv)[table(fsels$indiv) < 4]
We filtered out any calls flagged for overlapping signals and individuals with < 4 calls.
fsels <- fsels[fsels$Overlapping_signal != "Y", ]
fsels <- fsels[-grep(paste(names(table(fsels$indiv)[table(fsels$indiv) < 4]), collapse = "|"), fsels$indiv), ]
str(fsels)
We tailored temporal coordinates of selections to standardize how calls were selected (a single observer, GSV, performed tailoring).
seltailor(fsels, flim = c(0, 9), mar = 0.15, osci = TRUE, auto.next = TRUE, path = path)
# read the seltailor output back in for more preprocessing
fselst <- read.csv(file.path(path, "seltailor_output.csv"), header = TRUE)
str(fselst)
We filtered out calls with SNR < 7, and decided on a frequency bandpass filter of 0.5 - 9kHz after visual inspection of spectrograms.
# select a good margin for SNR calculations
# a margin of 0.05 seconds looks good - does not overlap neighboring calls for any of the selected calls
snrspecs(fselst, osci = TRUE, snrmar = 0.05, wn = "hanning", wl = 378, flim = c(0, 10), ovlp = 90, path = path)
# calculate SNR, using bandpass determined during visual quality control above
fselst_snr <- sig2noise(X = fselst, mar = 0.05, type = 1, bp = c(0.5, 9), wl = 378, path = path)
# str(fselst_snr)
# remove calls with SNR < 7
length(which(fselst_snr$SNR < 7)) # only 2 calls will be removed
fselst_snr <- fselst_snr[-which(fselst_snr$SNR < 7), ]
# str(fselst_snr)
# nrow(fselst_snr)
# remove any calls that were flagged as low quality in visual inspections
fselst_snr <- fselst_snr[fselst_snr$Visual_quality_score != "L", ]
# str(fselst_snr)
# nrow(fselst_snr)
We saved the preprocessed metadata and calls.
# save preprocessed selection table as a csv file
write.csv(fselst_snr, file.path(path, "indiv_scale_calls_preprocessed.csv"), row.names = FALSE)
Make an extended selection table of repeatedly sampled individual calls.
fselst_snr <- read.csv(file.path(path, "indiv_scale_calls_preprocessed.csv"), header = TRUE)
cores <- parallel::detectCores() - 2
indiv_sels_tbl <- selection_table(X = fselst_snr, whole.recs = FALSE, extended = TRUE, confirm.extended = TRUE, mar = 0.1, parallel = cores, path = path)
# save the extended selection table as an RDS object
# this object will be provided in Supplementary Material as a means to share the focal individual calls and associated selection table with metadata
saveRDS(indiv_sels_tbl, file.path(path, "indiv_scale_extended_seltable.RDS"))
How to access different components of an extended selection table:
indiv_sels_tbl <- readRDS(file.path(path, "indiv_scale_extended_seltable.RDS"))
str(indiv_sels_tbl)
# access the selection table (with metadata) data frame
dim(indiv_sels_tbl) # 97 calls, 14 columns
names(indiv_sels_tbl) # column names
# access the wave objects
length(attr(indiv_sels_tbl, "wave.objects"))
attr(indiv_sels_tbl, "wave.objects")[1]
# you can write these out as wave objects in your working directory to facilitate reproducing analyses:
wavs <- attr(indiv_sels_tbl, "wave.objects")
wav_nms <- names(attr(indiv_sels_tbl, "wave.objects"))
# x <- 1 testing
invisible(lapply(1:length(wavs), function(x){
writeWave(wavs[[x]], filename = wav_nms[x], extensible = FALSE)
}))
# access the check results data frame
attr(indiv_sels_tbl, "check.results")
We used the pre-processed focal individual calls to make call lexicons per individual.
i <- unique(fselst_snr$indiv)
invisible(lapply(1:length(i), function(x){
catalog(fselst_snr[fselst_snr$indiv == i[x], ], wn = "hanning", wl = 378, collev = seq(-53, 0, 1), flim = c(0, 10), ovlp = 90, mar = 0.01, labels = c("sound.files", "selec"), tags = c("selec"), tag.pal = list(topo.colors, heat.colors), res = 300, img.suffix = paste("INDIV", i[x], sep = "-"), cex = 0.5, path = path)
}))
catalog2pdf(by.img.suffix = TRUE, keep.img = FALSE, path = path)
Visual inspection of catalogs revealed that monk parakeets produce consistent and distinctive calls (e.g. high acoustic similarity within individuals), such that contact call structure appeared to be characterized by individual signatures.
How many total repeatedly sampled individual calls remained after pre-processing?
nrow(fselst_snr)
## [1] 97
Calls per repeatedly sampled individual.
table(fselst_snr$indiv)
##
## AAT RAW UM1 UM2 UM3 UM4 UM5 ZW8
## 12 4 25 23 5 13 7 8
Mean number of calls per individual.
mean(table(fselst_snr$indiv))
## [1] 12.125
Range of calls per individual.
range(table(fselst_snr$indiv))
## [1] 4 25
Calls by repeatedly sampled individuals and sites.
table(fselst_snr$indiv, fselst_snr$site)
##
## 1145 CHAC EMBR
## AAT 12 0 0
## RAW 0 0 4
## UM1 25 0 0
## UM2 23 0 0
## UM3 5 0 0
## UM4 13 0 0
## UM5 0 7 0
## ZW8 0 0 8
Mean and range of SNR across calls.
mean(fselst_snr$SNR)
## [1] 16.11411
range(fselst_snr$SNR)
## [1] 7.751148 26.992037
We read back in final preprocessed set of calls for higher social scales.
Dates for recording sessions by site?
sites <- as.character(unique(ccs$General_site))
dates <- sapply(1:length(sites), function(x){
tmp <- ccs[grep(sites[x], ccs$General_site), ]
unique(paste(tmp$Day, tmp$Month, sep = "-"))
})
data.frame(dates, sites)
We defined pairs as groups of 2 birds for which we had calls for both birds in the pair. The pair social scale was nested within the site scale, and was represented by a subset of calls in the site-level data set.
We extracted unique overall group ID from the site-level metadata.
gid <- unique(ccs$Overall_group_ID)[!is.na(unique(ccs$Overall_group_ID))]
length(gid)
## [1] 109
We collected information about all groups of birds flying together (pairs and flocks) in the data set for summary statistics.
# filter the site-level data by all groups and generate a data frame that can be easily parsed for summary stats
gid_size <- lapply(1:length(gid), function(x){
pat <- paste("^", gid[x], "$", sep = "")
st <- unique(ccs$General_site[grep(pat, ccs$Overall_group_ID)])
gi <- unique(ccs$Flock_ID[grep(pat, ccs$Overall_group_ID)])
pid <- unique(ccs$Pair_ID[grep(pat, ccs$Overall_group_ID)])
gs <- unique(ccs$Group_size[grep(pat, ccs$Overall_group_ID)])
cn <- length(ccs$Group_size[grep(pat, ccs$Overall_group_ID)])
return(data.frame(site = st, overall_group_id = gid[x], flock_ID = gi, pair_ID = pid, group_size = gs, call_num = cn))
})
gid_size <- data.table::rbindlist(gid_size)
# filter by groups for which we had > 1 call (e.g. remove groups for which only a single call remained after preprocessing)
gidf <- droplevels(gid_size[gid_size$call_num > 1, ])
# View(gidf)
pairs <- droplevels(gidf[!is.na(gidf$pair_ID), ])
# save the pair IDs for Supplementary Methods 3
saveRDS(pairs$pair_ID, file.path(path, "pair_IDs.RDS"))
Mean and range of pairs across sites?
range(table(pairs$site))
## [1] 1 5
mean(table(pairs$site))
## [1] 2.095238
How many pairs?
nrow(pairs)
## [1] 44
How many calls in the site-level data set represent this social scale?
sum(pairs$call_num)
## [1] 88
sum(pairs$call_num)/nrow(ccs)
## [1] 0.1454545
How many sites had pairs?
length(table(pairs$site))
## [1] 21
How many pairs by site?
table(pairs$site)
##
## 1145 ARAP ARAZ CISN ECIL ELTE EMBR INBR INES-01
## 1 1 2 4 1 5 1 3 2
## INES-03 INES-04 INES-08 LENA OJOS PAVO PEIX PFER-01 PIED
## 3 2 4 1 1 1 1 2 1
## RIAC-01 ROSA SAUC
## 3 3 2
Mean and range of pairs across sites?
range(table(pairs$site))
## [1] 1 5
mean(table(pairs$site))
## [1] 2.095238
How many and which sites have > 2 pairs?
length(table(pairs$site)[which(table(pairs$site) > 1)])
## [1] 12
table(pairs$site)[which(table(pairs$site) > 1)]
##
## ARAZ CISN ELTE INBR INES-01 INES-03 INES-04 INES-08 PFER-01
## 2 4 5 3 2 3 2 4 2
## RIAC-01 ROSA SAUC
## 3 3 2
We defined flocks as groups of 3 or more birds flying together for which we had 2 or more calls. This social scale was nested within the site scale, and represented by a subset of calls in the site-level data set.
How many flocks total?
nrow(gid_size[gid_size$group_size > 2, ])
## [1] 36
How many flocks remain (e.g. flocks for which we had more than 1 call)?
flocks <- droplevels(gidf[gidf$group_size > 2, ])
nrow(flocks)
## [1] 29
# save the flock IDs for Supplementary Methods 3
saveRDS(flocks$flock_ID, file.path(path, "flock_IDs.RDS"))
Mean and range of flock size across the data set?
mean(flocks$group_size)
## [1] 5.275862
range(flocks$group_size)
## [1] 3 20
How many calls from the site-level data set correspond to these flocks?
sum(flocks$call_num)
## [1] 77
sum(flocks$call_num)/nrow(ccs)
## [1] 0.1272727
How many sites had flocks?
length(table(flocks$site))
## [1] 22
How many flocks by site?
table(flocks$site)
##
## ARAP ARAZ CHAC CISN ECIL ELTE FAGR HIPE INBR
## 2 2 1 1 2 2 1 1 1
## INES-03 INES-05 INES-06 INES-07 INES-08 KIYU OJOS PAVO PEIX
## 1 1 1 1 1 1 1 2 1
## PIED QUEB RIAC-02 ROSA
## 2 1 2 1
Mean and range of flocks across sites?
range(table(flocks$site))
## [1] 1 2
mean(table(flocks$site))
## [1] 1.318182
How many and which sites have > 2 flocks?
length(table(flocks$site)[which(table(flocks$site) > 1)])
## [1] 7
table(flocks$site)[which(table(flocks$site) > 1)]
##
## ARAP ARAZ ECIL ELTE PAVO PIED RIAC-02
## 2 2 2 2 2 2 2
How many total calls were retained for subsequent sound analysis at this social scale?
nrow(ccs)
## [1] 605
How many sites represented?
length(table(ccs$General_site))
## [1] 39
How many calls by site?
table(ccs$General_site)
##
## 1145 ARAP ARAZ BAGU BCAR CEME CHAC CISN ECIL
## 13 12 15 20 13 6 12 28 17
## ELTE EMBR FAGR GOLF HIPE INBR INES-01 INES-03 INES-04
## 23 22 7 22 5 19 12 15 9
## INES-05 INES-06 INES-07 INES-08 KIYU LENA OJOS PAVO PEIX
## 6 6 9 27 8 19 23 25 19
## PFER-01 PFER-03 PIED PLVE PROO QUEB RIAC-01 RIAC-02 ROSA
## 34 19 21 11 12 16 17 8 15
## SAUC SEMI VALI
## 6 11 23
Mean and range of calls by site?
mean(table(ccs$General_site))
## [1] 15.51282
range(table(ccs$General_site))
## [1] 5 34
How many calls by department? Uruguay is sub-divided into 19 departments, and we sampled across 7 departments (representaive of a regional scale for matrix regression with geographic distance, see Supplementary Methods 3).
table(ccs$dept)
##
## Canelones Colonia Maldonado Montevideo Rocha Salto
## 19 275 89 99 46 12
## San Jose
## 65
Mean and range SNR across calls?
mean(ccs$SNR)
## [1] 19.56147
range(ccs$SNR)
## [1] 7.795863 40.149548
Colonia was the westernmost department of our geographic transect, where we sampled most intensively.
How many total calls were retained for subsequent sound analysis?
Colonia <- droplevels(ccs[grep("Colonia", ccs$dept), ])
nrow(Colonia)
## [1] 275
nrow(Colonia)/nrow(ccs) # proportion of all calls
## [1] 0.4545455
How many sites are represented?
length(table(Colonia$General_site))
## [1] 18
How many calls by site?
table(Colonia$General_site)
##
## 1145 CHAC EMBR INES-01 INES-03 INES-04 INES-05 INES-06 INES-07
## 13 12 22 12 15 9 6 6 9
## INES-08 LENA PFER-01 PFER-03 PIED RIAC-01 RIAC-02 ROSA SEMI
## 27 19 34 19 21 17 8 15 11
Mean and range of calls by site?
mean(table(Colonia$General_site))
## [1] 15.27778
range(table(Colonia$General_site))
## [1] 6 34
Mean and range SNR across calls?
mean(Colonia$SNR)
## [1] 19.54524
range(Colonia$SNR)
## [1] 9.222555 35.116192
Check out Supplementary Methods 2 for sound analysis. The extended selection tables for the individual and higher social scales contain the selected contact calls used for analyses in Supplementary Methods 2.
1. Araya‐Salas, M., & Smith‐Vidaurre, G. 2017. warbleR: An R package to streamline analysis of animal acoustic signals. Methods in Ecology and Evolution 8(2), 184-191.