The assessment of biodiversity is critical in addressing questions of ecosystem health and resilience. Traditional methods of ecological survey, however, are often time consuming, invasive and limited by accessibility to the environment. As many species emit sounds while moving, foraging and interacting with one another, passive acoustic monitoring provides an alternative method to constrain the diversity and behavior of animals (Rountree et al., 2006; Farina, 2014; Farina and Gage, 2017). This approach is relatively low-cost, minimally-disruptive, and can be deployed semi-continuously over long time periods. Acoustic recording may be particularly useful within underwater habitats where visibility and access are generally more limited than in terrestrial systems (Willis, 2001; Freeman et al., 2014; Tricas and Boyle, 2014).
In the terrestrial realm, a suite of soundscape metrics has been developed to gain information about the acoustic environment (Sueur et al., 2008a, 2014; Pieretti et al., 2011; Depraetere et al., 2012). These parameters represent the complexity or evenness of the acoustic environment using a single value to characterize a recorded audio segment (Sueur et al., 2008a, 2014; Pieretti et al., 2011) and have been correlated with important ecological variables such as biodiversity and habitat complexity (Sueur et al., 2008a; Farina et al., 2011; Depraetere et al., 2012; Gasc et al., 2013; Fuller et al., 2015).
Marine ecologists have begun to apply these metrics to underwater sound recordings; however, it remains unclear if this approach provides ecological information beyond what can be extracted from more traditional sound pressure level and spectral analyses (Kaplan et al., 2017; Staaterman et al., 2017). Differences between marine and terrestrial soundscapes have been largely ignored in adopting these metrics (Blondel and Hatta, 2017). For example, marine soundscapes are often dominated by the extremely broadband (∼0.2 to several 10’s kHz) signals associated with aggregations of snapping shrimp (Au and Banks, 1998; Lammers et al., 2008; Bohnenstiehl et al., 2016; Butler et al., 2017), for which there is no equivalent terrestrial source. In addition, while terrestrial species often partition their use of the soundscape (Krause, 1993; Medeiros et al., 2017), calls from different fish species tend to overlap within a relatively narrow (1–3 kHz wide) band, which may at times be dominated by intense chorusing (Luczkovich et al., 2008; Wall et al., 2013; Montie et al., 2015; Ricci et al., 2016, 2017; Rice et al., 2017; Staaterman et al., 2017).
Here we review two soundscape indices frequently used to assess biodiversity in marine habitats, the Acoustic Complexity Index (ACI) and Acoustic Entropy (H), with the aim of assessing their effectiveness and potential shortcomings. Both field and synthetic soundscape recordings are used to investigate the sensitivity of ACI and H to changes in the rate and composition of biological sound production. Based on our review of the marine soundscape literature and the results of this analysis, we conclude with recommendations for the use of these acoustic metrics to assess marine ecosystems.
Background and review
Acoustic Complexity Index (ACI)
The Acoustic Complexity Index (ACI) estimates the complexity of an acoustic recording in space and time (Farina and Morri, 2008; Pieretti et al., 2011). In its development for terrestrial environments, the ACI was designed to be minimally affected by sustained sounds (i.e., background noise) that have small amplitude variation over time, and generate higher values when more variable transient biological sounds were present in a recording.
Consider a spectrogram generated from a recording, with n non-overlapping time steps and m frequency bins. The ACI is first calculated for each frequency bin (i) by summing the absolute difference in intensity between adjacent time steps (k) and normalizing this total by the sum of intensity in that frequency bin. These values are then summed across all frequency bins:
In practice, the length of the recording segment is selected by the end-user, with values commonly ranging from 1 to 60 s, and the frequency range is often restricted to isolate biological sounds of interest (see Table 1 references). The number of points used in calculating the Fast Fourier Transform (NFFT) and sampling rate of the data (fs) determine both the frequency (Δf = fs/NFFT) and temporal resolution (ΔT = NFFT/fs) of the analysis. Frequency resolutions of 25–100 Hz are commonly applied (Table 1); however, this choice is rarely justified in the literature and often appears to be a consequence of selecting a default NFFT size.
|Study (system)||Δf||Freq. band||Max. variation||Observation & correlations|
|McWilliam and Hawkins, 2013 (temperate inlet)||187.5 Hz||2.0–4.0 kHz||∼80%||Possitively correlated with snap count; Not correlated with species assemblages or habitat characteristics;|
Sensitive to anthropogenic noise
|Staaterman et al., 2014 (sub-tropical coral reef)||25.0 Hz||0.001–10.0 kHz||∼12%||Diel patterns; Tightly correlated with snapping activity (HB)|
|Desjonquères et al., 2015 (temperate pond)||86.0 Hz||0.1–22.0 kHz||---||Positively correlated with richness and abundance of sound types|
|Kaplan et al., 2015 (tropical coral reef)||50 Hz||0.1–20.0 kHz||---||No correlation with species assemblage or trends in low frequency SPL|
|Harris et al., 2016* (temperate rocky reef)||281.3 Hz||0.1–24.0 kHz||∼37%||Positive correlation with Pielou’s Evenness (J′) and Shannon’s Index (H′)|
|Bertucci et al., 2016 (tropical coral reef)||39.1 Hz||0.02–2.0 kHz (LB+)||∼6.6% (LB)||Diel patterns;|
|2.0–20.0 kHz (HB+)||∼16.7% (HB)||Positive correlation with number of species (LB) and Shannon-Wiener fish diversity index (LB and HB); Higher in MPA reef than non-MPA reef|
|Butler et al., 2016 (subtropical coast)||46.9 Hz||0.01–24.0 kHz||∼13%||Diel patterns driven by snaps; Slightly higher in non-degraded habitats, but no significant difference with habitat type.|
|Buscaino et al., 2016 (Mediterranean rocky-reef)||22.2 Hz||0.125–0.5 kHz (LB)||∼25% (LB)||Positively correlated with fish sound count (LB) and snap count (HB)|
|4.0–64.0 kHz (HB)||∼70% (HB)|
|Picciulin et al. 2016 (shallow, temperate coast)||86.1 Hz||0.05–1.082 kHz||∼125%||Positively correlated with fish chorusing; differed over time but not between sites|
|Bolgan et al., 2017 (temperate lakes/rivers)||39.1 Hz||0.46–6.0 kHz||∼16%||Positively correlated with gravel noise during spawning and fish air passage noise;|
Diel patterns associated with insect calls
|Pieretti et al., 2017 (Mediterranean Sea)||39.1 Hz (LB)||0–0.62 kHz (LB)||∼23% (LB)||Tracks fish chorusing, but delayed increase following onset of chorusing (LB); Diel patterns associated with snaps (HB).|
|312.5 Hz (HB)||0.62–40.0 kHz (HB)||∼45% (HB)|
|Rice et al. 2017 (temperate offshore habitat)||3.9 Hz||0–1.0 kHz||∼10%||Diel patterns;|
Increased during periods of intense chorusing by black drum; Sensitive to anthropogenic noise
|Staaterman et al., 2017 (tropical coast)||50 Hz||0.025–1.0 kHz (LB)||∼40% (LB)||No significant differences between habitats (LB and HB); Drops during periods of intense chorusing by Bocon Toadfish (LB); Indicator of snap production (HB).|
|3.0–10.0 kHz (HB)||∼50% (HB)|
Table 1 summarizes marine soundscape studies using ACI. Because the ACI is a simple summation (Eq. 1), its upper limit is unbound and its value will depend on the length of the data segment analyzed and the chosen time-frequency resolution (i.e., the number of bins within the length of the recording segment). The maximum percent range of variation in ACI, as opposed to its absolute value, is therefore reported (Table 1) in order to make meaningful comparisons.
Two marine soundscape studies have shown a correlation between ACI and fish survey data. Within temperate reef systems in New Zealand, Harris et al. (2016) reported that ACI was positively correlated with Pielou’s Evenness and Shannon’s Index. For a south-Pacific reef system, Bertucci et al. (2016) showed that ACI increased with the number of species and the Shannon-Wiener fish diversity index. Similarly, the richness of sound types was correlated with ACI in temperate ponds in France (Desjonquères et al., 2015). While these results suggest that ACI may respond positively to an increase in acoustic diversity, these patterns are not universal (e.g., Kaplan et al., 2015; Butler et al. 2016) and other work has suggested that ACI can be influenced strongly by the rate of sound production associated with a single, or small number of species. Several papers, for example, have noted changes in ACI modulated by the broadband signals associated with snapping shrimp (e.g., McWilliam and Hawkins, 2013; Staaterman et al., 2014; Kaplan et al., 2015; Butler et al., 2016; Buscaino et al., 2016). Others have shown that ACI varies temporally in response to fish chorusing (i.e., the number of calls) (Desjonquères et al., 2015; Buscaino et al., 2016; Rice et al., 2017) and that the abundance of sounds produced by a single fish species can have a direct effect on ACI (Bolgan et al., 2017; Staaterman et al., 2017).
Acoustic Entropy (H)
Acoustic Entropy H, developed by Sueur et al. (2008a), attempts to quantify the average diversity within an acoustic community. H is calculated following the Shannon diversity index in ecology, which increases with species richness and evenness. Similarly, acoustic entropy is purported to increase with the number of vocalizing species and evenness of the acoustic environment (Sueur et al., 2008a).
Acoustic entropy is comprised of both temporal and spectral entropy. Temporal entropy calculates the evenness of a signal’s amplitude over time. Thus, for any given time series x(t) of length n, temporal entropy is calculated using:
where A(t) is the probability mass function of the amplitude envelope in the time domain. In practice, the signal is band-pass filtered to isolate biological signals prior to being enveloped. The time series segment length must be defined by the end-users, with values between 5 and 60 s commonly applied (Table 2 references). Spectral entropy calculates the evenness of a signal’s frequency-amplitude spectrum:
|Study||Δf||Freq. band||Max. range||Observation and correlations|
|Denes et al., 2014 (Bering Sea)||---||0.1–50 kHz (high sample rate recorders)||0.22–0.58||No significant relationship with counts of marine mammals; Ht positively related to species richness;|
|0.1–4.1 kHz (low sample rate recorders)||0.70–0.95||Sensitive to anthropogenic noise|
|Lillis et al., 2014 (temperate estuary)||93.8 Hz||0.1–24 kHz||0.50–0.9||Higher at oyster reef sites than soft bottom sites, driven by snapping activity (HB)|
|Parks et al., 2014 (South Atlantic, North Pacific, and Indian Oceans)||0.25 Hz||1–125 Hz||0.88–0.94||Does not correspond to biological patterns inferred from marine mammal call detections and classification; Sensitive to anthropogenic noise|
|Kaplan et al., 2015 (tropical coral reef)||50.0 Hz||0.1–1.0 kHz (LB)||0.70–0.98 (LB)||Differences between habitats, but results not correlated with visual fish and habitat survey data (LB); Trend driven by snapping shrimp (HB and FB).|
|2.0–20.0 kHz (HB)||0.85–0.95 (HB)|
|0.1–20.0 kHz (FB)||0.85–0.95 (FB)|
|Harris et al., 2016 (temperate rocky reef)||281.3 Hz||0.1–24.0 kHz||0.71–0.75||Positive correlation with number of species & Shannon’s Index (H′) for Δf ≤ 140.6 Hz; Robust to anthropogenic noise|
|Rice et al. 2017 (temperate offshore habitat)||3.9 Hz||0–1.0 kHz||0.75–0.95||Increased at onset of fish chorusing (LB); Diel patterns|
|Staaterman et al., 2017 (tropical coast)||50.0 Hz||0.025–1.0 kHz (LB)||0.2–0.6 (LB)||Significantly higher in reefs and sand than in mangroves (LB); Drops during periods of intense chorusing by Bocon Toadfish (LB). Modulated by snapping, but with no significant differences based on habitat (HB).|
|3.0–10.0 kHz (HB)||0.75–0.9 (HB)|
where S(f) is the probability mass function calculated from the mean spectrum having N frequency bins. The spectral resolution of the analysis is controlled by the user’s choice of NFFT length (Δf = fs/NFFT). The total acoustic entropy index (H) is derived from the product of spectral and temporal entropy:
H will approach 0 for a single pure tone, increase with the number of amplitude modulations and frequency bands in a time series, and approach 1 for completely random noise (Sueur et al., 2008a).
Table 2 summarizes marine soundscape studies using H. For temperate reef systems in New Zealand, Harris et al. (2016) showed that the acoustic entropy was correlated with number of species and Shannon’s Index (H′) when a spectral resolution finer than 140.6 Hz was used in the analysis. Other studies have shown significant differences in H between habitat types; however, as with ACI, most authors have attributed these differences to variable rates of sound production by snapping shrimp (Lillis et al., 2014; Kaplan et al., 2015; Staaterman et al., 2017). or a single species of fish (Staaterman et al., 2017; Rice et al., 2017).
Field recordings of single call type soundscapes
There is increasing evidence that variations in the rate of biological sound production may play a key role in modulating ACI and H in marine soundscapes (e.g., McWilliam and Hawkins, 2013; Kaplan et al., 2015; Butler et al., 2016; Buscaino et al., 2016; Staaterman et al., 2017). To evaluate this further, H and ACI were calculated within two single-call-type-dominated marine soundscapes: a tropical back-reef ecosystem where the high-frequency soundscape is modulated by the broadband impulsive sounds of invertebrate snapping shrimp, and a mid-Atlantic estuary where the low-frequency soundscape is modulated by harmonic boatwhistle calls of oyster toadfish (Opsanus tau). Based on our review of the literature (Tables 1 and 2), we hypothesize that ACI and H may be sensitive to (1) call rate, (2) call type and (3) the time-frequency resolution specified in the analysis.
Snap dominated soundscapes
As part of a study examining tropical back-reef nursery habitats, the soundscapes of seven concrete block experimental patch reefs within the Bight of Old Robinson (Abaco Island, The Bahamas) were recorded between March and July 2016. Underwater sound was recorded (fs = 96 kHz) concurrently at each patch reef for 2 min every 20 min using a set of SoundTrap 300 recorders (Ocean Instruments NZ). The hydrophones were positioned ∼1 m away from each patch reef, ∼0.5 m above the seabed and approximately 1.4–3.8 m below mean lower low water (MLLW) level at all sites.
These patch reefs were inhabited principally by juvenile and sub-adult fish, so fish vocalizations were rare and chorusing absent within these back-reef habitats (Lyon, 2018). The soundscape, particularly at higher frequencies, was instead dominated by the short-duration impulse signals produced by resident snapping shrimp (Figure 1). To estimate the rate of snapping in each recording, these signals were detected using an envelope correlation and amplitude threshold method developed by Bohnenstiehl et al. (2016). The snap detection procedure operated in the 4–20 kHz frequency band, where these acoustic arrivals exhibited the highest signal levels relative to ambient background noise (Figure 1). We set a correlation coefficient cutoff of 0.70 and a 102 dB re 1 µPa (peak-to-peak) amplitude threshold, which corresponds to the 90% quantile of the background sound levels observed throughout the recording period. The detection kernel was derived from the local recordings and left-padded to suppress the possible detection of sea surface reflected arrivals at short time delays.
Because the rate of snapping varies temporally in response to changes in temperature, light and other environmental variables (e.g., Watanabe et al., 2002; Jung et al., 2012; Bohnenstiehl et al., 2016; Lillis et al., 2017), these data can be used to investigate the response of the acoustic metrics to variations in the rate of broadband snapping activity. Both ACI and H were calculated in the 4–20 kHz frequency band using a 30 s analysis time window. ACI and Hf were estimated using variable NFFT sizes of 1024, 2048 and 4096 points; Ht was estimated from the envelope of the band-passed (5th order Butterworth) waveform. This procedure was applied in four non-overlapping time windows and the results were averaged for each 2-min duration recording.
The sensitivity of H and ACI to variation in snap rate is illustrated in Figure 2 using data from experimental reef #7 (26.339°N, 77.018°W) within the Bight of Old Robinson (Lyon, 2018). As expected, the absolute value of ACI varied based on the spectral (Δf = fs/NFFT) and temporal (ΔT = NFFT/fs) resolution used in the analysis; however, the range of variation over the observed range of snap rates is nearly 40% for all three cases. ACI responded non-linearly to an increased rate of snapping, rising initially but then leveling off at rates > ∼1000 snaps/min. H values showed little-to-no sensitivity to spectral resolution. H decreased systematically with increasing snap rate, and although the effect was small (∼0.03 units) it is not negligible compared to variation reported in the literature (Table 2). Since the snaps are broadband, the number of snaps had no influence on spectral entropy, and the decrease in H was driven instead by a drop in temporal entropy.
Boatwhistle dominated soundscapes
The Harris Creek Oyster Sanctuary is a large-scale oyster restoration project in a tributary of the Chesapeake Bay, Maryland (Paynter et al., 2012). In May of 2015, SoundTrap 300 passive acoustic recorders were deployed at eight sites (Ricci et al., 2017) that were part of a larger study evaluating ecosystem services provided by restored oyster reefs (M.L. Kellogg, VIMS, unpublished data). Acoustic data were collected for 2-min every 30 min at a sample rate of 96 kHz. The instruments were positioned ∼0.5 m above the seabed and approximately 1.0–3.5 m below mean lower low water (MLLW) at all sites.
Analysis of the acoustic data revealed that the low-frequency (0.1–1.2 kHz), late-spring soundscape of these sites was dominated by a single call type: the boatwhistle sounds of the oyster toadfish, Opsanus tau (Figure 3). Snapping shrimp are not typically present within the central and northern Chesapeake Bay estuary, and although a small number of toadfish grunt sounds were identified in our recordings, the sounds produced by other fish species are largely absent during this time of the year (Ricci et al., 2017). Unlike the broadband snaps found in The Bahamas, the low-frequency boatwhistle calls are harmonic in nature, with up to four overtones commonly observed, and of much longer (>250 ms) duration (Figure 3). Boatwhistle call frequencies are a function of water temperature; they are nearly constant within a 2-min recording but vary on longer time scales (Tavolga, 1958; Fine, 1978; Ricci et al., 2017; Ladich, 2018).
Previously, Ricci et al. (2017) used a spectrogram correlation technique to identify boatwhistle calls within these recordings. The correlation threshold was set empirically to maintain a false positive rate of ∼1%. The rate of boatwhistle calling increased dramatically over the first few days in May as male toadfish begin calling as an advertisement to attract females to their nest site. These data can therefore be used to assess the response of the acoustic metrics to variations in the rate of harmonic fish sounds. Both ACI and H were calculated in the 0.1–1.2 kHz frequency band, where the boatwhistle overtones are dominant. Following the snapping shrimp analysis, ACI and Hf were estimated using variable NFFT sizes; Ht was calculated using the envelope of the band-passed waveform. The results from four 30-second-duration non-overlapping time windows were averaged for each 2-min recording.
The sensitivity of H and ACI to variation in boatwhiste call rate is illustrated in Figure 4 using data from the restored oyster reef Little Neck (38.768°N, 76.296°W). The absolute value of ACI varied based on the spectral and temporal resolution used in the analysis. Here, the range of variation was between 35% and 65%. ACI decreased non-linearly with increasing call rate for NFFT size of 1024 and 2048; however, the trend was reversed when the analysis was performed with a NFFT of 4096 points. The calculated H value again shows little-to-no sensitivity to spectral resolution; but decreased notably (∼0.20 units) with increasing call rate (cf. Table 2). This drop was driven by a decrease in the spectral component of entropy, as acoustic power becomes increasingly concentrated in the harmonic bands.
Synthesis of single call type soundscapes
To further understand the behavior of these ecoacoustic metrics, ensembles of synthetic soundscapes (e.g., Gasc et al., 2015) with known rates of calling were constructed and analyzed. Synthetic soundscapes were simulated from a single snap, or boatwhistle, that was replicated in time and mixed with constant variance background noise. Calling was assumed to be a Poisson (random) process with the distribution of received call amplitudes derived empirically from field observations.
Simulation experiments with broadband snaps
A single snap (Figure 5) was isolated from a recording made in July 2016 within Bight of Old Robinson, The Bahamas. This signal is representative of the snaps recorded within the Bight, being impulsive in nature with rapid onset and of very short (<1.5 ms) duration. A 5 s segment lacking discernible snaps also was identified to populate the background noise field. Each simulation produced a 30 s duration synthetic recording, for which background noise was prescribed by looping this recording. Snaps were superimposed within this background. The time of each snap was selected randomly assuming a Poisson distribution for the inter-snap times, with snaps overlapping when small inter-snap times were drawn. Snap rates between 100 and 3500 snaps/min were considered, with 1000 simulations conducted at each snap rate. The amplitude of each snap was randomly drawn from the population of snap amplitudes recorded over the course of two months of monitoring within the Bight of Old Robinson (Figure 6). An example of the resulting simulated soundscape is shown in Figure 7.
Each 30 s simulation was analyzed in the 4–20 kHz range following the procedure used to assess the field recordings, with NFFT window lengths of 1024 (Δf = 93.75 Hz, ΔT = 0.010 s), 2048 (Δf = 46.88 Hz, ΔT = 0.021) and 4096 (Δf = 23.44, ΔT = 0.043) points. As expected, the absolute value of ACI varied with changing NFFT window length, but the results show a similar pattern for each set of simulations (Figure 8a). ACI initially increased with increasing snap rate, exhibiting a ∼50% range of variability, but then leveled off at with higher snap rates. The acoustic entropy shows a small decrease with increasing snap rate (Figure 8b). The patterns produced by these simple simulations mirror those observed in the field records.
Simulation experiments with boatwhistles
A single boatwhistle (Figure 9) was isolated from a recording made during May 2015 within Harris Creek, Maryland. The selected signal is 350 ms in length. A 5 s segment lacking discernible transient calls and with relatively low noise (87.3 dB-rms re 1 µPa) also was identified to seed the background sound field. As was the case in the snap simulations, each boatwhistle simulation produced a 30 s synthetic recording, for which background noise was prescribed by looping this recording. Boatwhistle arrivals were superimposed randomly within this background. Vocalization scenarios with rates between 6 and 120 call/min were considered, with 1000 simulations conducted at each call rate. The amplitude of each call was randomly assigned based on the distribution of received call amplitudes observed in the field data (Figure 10). An example of a simulated boatwhistle-dominated soundscape is shown in Figure 11.
Each 30-s duration boatwhistle simulation was analyzed in the 0.1–1.2 kHz range using NFFT sizes of 1024, 2048 and 4096 (Figure 12). Consistent with the field observations, these simulations indicated that the response of ACI to boatwhistle call rate was dependent on the NFFT size chosen for the analysis. ACI decreased with increasing call rate when a coarse (NFFT 1024 and 2048) spectral resolution was used; but, it increased with increasing call rate when assessed using a finer spectral resolution (NFFT 4096). Driven by a decrease in spectral entropy, H decreased by ∼0.2 units as boatwhistle rate increased. This pattern showed no dependence on spectral resolution. At a given call rate, the variability in ACI and H predicted by these synthesis experiments was less than that observed in the Harris Creek field recordings (Figure 12).
In this study, both H and ACI were found to be sensitive to call rate and call type. For soundscapes dominated by impulsive broadband snapping, ACI initially increased with increased snapping, but the response saturated at high snap rates (Figures 2a and 8a). When snap rates varied between ∼100 to 3500 snaps/min, a ∼40% and ∼50% range of variation in ACI was observed within the field and simulated datasets, respectively. The spectral resolution influenced the absolute value of ACI, but not the observed trends or percent range of variation in the snap dominated soundscape (Figure 13a). For soundscapes dominated by longer-duration harmonic boatwhistle calls, ACI decreased with increased calling rates when the spectral resolution was coarse (Δf = 93.75 and 46.88 Hz), but increased when a finer spectral resolution (Δf = 23.43 Hz) was used (Figures 4a, 12a, and 13a).
The range of variation in ACI driven solely by changes in the rate of biological sound production by a single species exceeds that reported in most field studies – several of which explain or correlate these changes with trends in biodiversity (Table 1). Although changes in call diversity may also drive changes in ACI (Pieretti et al., 2011; Gasc et al., 2015) these results argue that a causal relationship between the diversity of calls in a marine habitat (i.e., biodiversity) and the complexity of the underwater soundscape should not be assumed.
Our results confirm the sensitivity of ACI to variation in snap and call rate, as suggested by several authors (Table 1), and further elucidate the non-linear nature of this response. Within the low-frequency spectrum, Staaterman et al. (2017) found that ACI dropped during times of chorusing by Bocon toadfish (Amphichthys cryptocentrus), a species that produces tonal boatwhistle calls similar to the oyster toadfish. They noted a drop in ACI coincident with the times of intense calling when the calls overlapped one another and the soundscape became more monotonous. Their analysis was performed using a frequency resolution of 50 Hz, for which a drop in ACI is predicted based on our analysis (Figures 4a and 12a).
For a soundscape dominated by broadband snaps, trends in ACI were relatively insensitive to time-frequency resolution. However, ACI displayed resolution-dependence when the soundscape was dominated by harmonic boatwhistle sounds (Figure 13a). This trend can be understood by considering the time-frequency resolution used in the analysis relative to the time-frequency characteristics of the signals. To illustrate this, Figure 14a shows spectrograms of a soundscape dominated by snaps displayed with NFFT equal to 1024 (top) and 4096 (bottom). The broadband nature of the signals is evident in either presentation of the data, but when a longer time window (NFFT = 4096) is used, an increasing number of short duration (<1.5 ms) snaps are incorporated into each time step (ΔT = 42.6 ms) in the spectrogram. Figure 14b shows the equivalent spectral representations for a boatwhistle-dominated soundscape. Here, both window length choices are short relative to the call duration (350 ms); however, the harmonic character of the soundscape is not evident when the coarser frequency resolution (NFFT = 1024; Δf = 94.75 Hz) is applied, and the complexity of the spectrogram is visually reduced. ACI may therefore respond differently to call rate variation depending on the time-frequency resolution selected in the analysis and the composition (spectral bandwidth and harmonic spacing) of calls within a recording (13a).
For snap dominated soundscapes, as rates increased from ∼100 to 3500 snaps/min, H values decreased ∼0.04 units in both the field and simulated recordings (Figures 8b, 12b and 13b). As spectral entropy is not sensitive to the number of broadband signals in the recording, this decrease in H was driven by changes in temporal entropy. Although the magnitude of this drop was small, variations of this order have been reported (Table 2) and correlated with ecological parameters in some field studies (Harris et al., 2016).
H was more sensitive to changes in the rate of harmonic boatwhistle calls than it was to variations in snap rate (Figure 13). Varying the rate of boatwhistle calling between 10 and 120 calls/min produced a negative ∼0.2 unit change in the total entropy. This decrease was driven by a drop in spectral entropy as energy became increasingly concentrated in the harmonic bands while call rate increased. H, which is calculated from the signal’s probability mass functions, did not show the same dependence on spectral resolution observed for ACI. The magnitude of this response suggests that changes in the rate of harmonic calling could explain much of the variability in H reported in marine field investigations (Table 2). Moreover, these findings are consistent with Staaterman et al.’s (2017) observation of a short-term drop in H during the most intense periods of calling by the Bocon toadfish.
The simulated soundscape experiments in the present study were, by design, very simple, using a replicated single call, constant variance background noise, and a random (Poisson) call time model parameterized solely by a mean rate. Nonetheless, the ACI and H derived from snap dominated soundscape simulations closely mirror those calculated for field data from The Bahamas (Figure 8). Boatwhistle calls are inherently more variable than snaps, and the low frequency noise spectrum (e.g., 0.1–1.2 kHz) in most ocean environments tends to capture a greater diversity of background sources (wind, waves, boats, and other biological sounds) than the higher frequency (e.g., 4–20 kHz) spectrum (Wenz, 1962). Therefore, it is not surprising that field data from Harris Creek show more variability than the simulated soundscapes (Figure 12). Nonetheless, the boatwhistle simulations do reproduce the overall trends observed in the field data, including the resolution-dependent response of ACI to an increased rate of harmonic calling.
Our results suggest that the amount of variation in ACI and H reported in the marine literature can be explained largely by variations in the rate of calling by a single sound producer. This does not rule out the possibility that increasing call diversity can also drive changes in H and ACI in some environments; however, it is unclear how the response of these metrics to an increase in call diversity could be readily distinguished (i.e., without counting and classifying calls) from the rate dependence identified here. Nonetheless, the sensitivity of ACI and H to a wider range of marine call types and soundscape compositions (e.g., Bolgan et al., 2018) may be the target of future synthesis experiments.
Implications for and future directions in marine ecoacoustics
Our field and computer-based simulation experiments show that (1) the widely used ecoacoustic metrics ACI and H respond to changes in the rate of calling of a single species, and (2) that these changes are sensitive to the call composition and the resolution employed in the analysis. These dependencies present a challenge for interpreting variations in ACI and H when applied to acoustically complex field recordings. While biodiversity and habitat characteristics may display correlations with ACI and H (Tables 1 and 2), a causal relationship between these metrics and the diversity of biological sounds, or the diversity of soniferous species, should not be assumed.
ACI and H are the most widely used metrics in marine soundscape studies; however, there are dozens of terrestrial soundscape metrics that might be considered in the marine realm (Sueur et al., 2014; McPherson et al., 2016), including some emerging assessment tools that may better represent the time and/or frequency dynamics within a recording (e.g., Eldridge et al., 2016; Lossent et al., 2017). Soundscape synthesis experiments similar to those conducted in this study (and by Gasc et al., 2015 in terrestrial settings) can provide a tool for understanding these metrics. An assessment of a metric’s limitations is necessary before its connection to more traditional ecological indicators (structural habitat complexity, relatively abundance or diversity of species) can be meaningfully investigated.
For rapid ecoacoustic assessment, more traditional acoustic measurements continue to have utility in marine soundscape studies (e.g., Freeman and Freeman, 2016; Ricci et al., 2016; Blondel and Hatta, 2017). Unlike ACI and H, traditionally used sound pressure levels (SPLs) respond linearly to an increased rate of sound production. For example, high-frequency SPLs have been correlated with snap rate in several studies (Bohnenstiehl et al., 2016; Ricci et al., 2016; Lyon, 2018) and used to estimate snapping shrimp abundance and density (Butler et al., 2017). Similarly, low-frequency SPL has been used to estimate fish abundance during peak spawning periods (e.g., Rowell et al., 2017). SPL, however, like many acoustic metrics, may be sensitive to variation in natural (e.g., Matsumoto et al., 2014) or anthropogenic (e.g., Kaplan and Mooney, 2015) background noise.
A more direct accounting of the bioacoustic signals comprising a soundscape would arguably expand the ecological utility of marine passive acoustic datasets. Call detection and classification have long been embraced by the marine mammal community (Mellinger and Clark, 2000; Mellinger and Clark, 2006; Kandia and Stylianou, 2006; Jarvis et al., 2008; Roch et al., 2011; Klink and Mellinger, 2011), and recent efforts have shown that similar approaches can be applied to identify the sounds produced by invertebrates (Bohnenstiehl et al. 2016) and fish (Urazghildiiev and Van Parijs, 2016; Ricci et al., 2017). These methods are data-rich in that they provide quantitative information on the timing, source and character (amplitude, duration, frequency) of transient biological sounds—as opposed to metrics, such as ACI and H, which reflect the statistical properties of an entire recording. As advances in statistical (e.g., Noda et al., 2016; Ibrahim et al., 2018) and deep learning make it possible to concurrently track the noises produced by much broader suite of marine animals, call cataloging efforts may eventually supersede the need for proxy metrics that do not directly species-specific contributions to the soundscape.