Multi-Scale Facial Videos Pulse Extraction Network

Beijing Zhongke Journal Publising Co. Ltd.

The heart rate (HR) can reflect patients′ cardiac function. In recent years, remote photoplethysmography (rPPG), a non-contact HR detection method, has become an active research topic, attracting an increasing number of researchers. The rPPG method is applicable to various scenarios in which contact-based methods are not suitable, such as monitoring emotional state, monitoring drivers[2], detecting biometrics liveness, and detecting faked videos. rPPG is essentially signal separation, which separates the target pulse signal from the observation. Conventional rPPG methods are divided into blind-source-separation-based and model-based methods. To solve the limitation, Prof. Yuanjing Feng designed an rPPG extraction network with two main parts: separable spatiotemporal convolution (SSTC) and dimension separable attention (DSAT). Without damaging the input feature map, SSTC makes the convolution kernel move in three directions (H-T, W-T, and H-W) to extract spatiotemporal features and emphasize attention to temporal information. DSAT extracts features in different directions and then fuses features in each direction to obtain one-dimensional (1D) features in three dimensions (H, W, and T). The spatiotemporal attention matrix is obtained according to the feature distribution in each dimension. The interaction between spatial information and long-span temporal information is realized.

The Author proposed a multi-scale facial video pulse extraction network based on SSTC and DSAT. In this network, the original ROI was projected onto multiple scale spaces for initial signal separation. SSTC and DSAT were proposed to conduct the effective modeling of spatiotemporal correlation, which was used to collaboratively learn the information of spatiotemporal dimensions with a long-time span and adaptively strengthen the temporal characteristics. Experiments on public datasets were conducted using SOTA rPPG algorithms for comparison purposes. The results showed that fusing multi-scale signals yielded better results than methods based on only single-scale signals. The proposed SSTC and dimension-separable attention mechanism contributed to more accurate pulse signal extraction. In future research, we will focus on exploring the multidimensional extraction of pulse signals and the extraction of pulse signals under unconstrained conditions.

/Public Release. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).View in full here.