AI Detects Whale Calls in Seismic Data

Seismological Society of America

Researchers repurposed an AI model designed for visual identification tasks to detect Bryde's whale calls contained with seismic data collected in the South China Sea.

A whale call can be visualized as a spectrogram, a "snapshot of sound" that shows how the frequencies of the sound signal change over time.

Without being trained previously on any whale call spectrograms, the foundation Segment Anything Model (SAM) used by Zhuo Xiao of Guangxi Minzu University and colleagues was able to detect and separate whale calls from data recorded at a single seismic station on Xieyang Island in the Beibu Gulf.

The detection system precisely identified calls more than 96% of the time, and even identified some call events that were missed when the data were examined by hand, according to the study published in Seismological Research Letters.

Accurate detection of baleen whale calls can help scientists monitor seasonal activity, behavioral changes and population sizes of these important marine mammals. Researchers use passive acoustic monitoring of calls by seismometers and hydrophones as a noninvasive and scalable way to monitor whale ecology.

The method still has significant challenges, however, including the problem of sorting whale calls from other environmental noise and the lack of a sufficient library of whale call spectrograms to guide identification.

Whale calls manifest as clear, repeating patterns on spectrograms, Xiao noted. "That turns call detection into an image segmentation task, which SAM excels at," he said. Moreover, "the calls' repetitiveness and fixed frequency range offer straightforward domain priors," further reinforcing SAM as a natural fit for this task.

To test the capabilities of SAM, Xiao and colleagues analyzed seismic data recorded by the Xieyang station on 26 January and 11 July 2021, to include the different seasonal vocalization modes used by Bryde's whales. The Beibu Gulf area is a key shallow-water foraging area for the whales.

The researchers also included a multi-stage denoising workflow to enhance the call-to-environmental noise ratio in the data, to improve detection precision.

Their data show clear seasonal variation in the Bryde's whales' inter-pulse interval, the time between the first and second acoustic pulses in a click vocalization. The interval was shorter in the winter and longer in the summer. The researchers suggest these differences reflect more intense vocal coordination between individuals in the winter, compared to more solitary calling in the summer.

Xiao and colleagues also validated their model on fin whale recordings from Ireland and blue whale recordings from Canada, following a reviewer's concern about the generalization of the model.

"We were pleasantly surprised by the strong performance. I think this reflects the power of foundation models" that are pre-trained on massive generic image datasets, Xiao said. "They generalize remarkably well to new domains without major workflow changes."

The researchers say their technique could be used to supplement human identification of calls.

"Of course, there are still false positives and missed calls," said Xiao. "In the future, we plan to incorporate multi‑modal sensing data such as acoustic plus seismic and other oceanographic measurements, and fine‑tune a foundation model that is specifically adapted to cetacean calls, further improving performance."

Xiao and colleagues are also testing whether combined data from two island stations, one ocean bottom seismometer and a fiber optic distributed acoustic sensing array will enhance detection of whale calls.

Their study was published as part of SRL's upcoming Focus Section on environmental seismology.

/Public Release. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).View in full here.