RealFuVSR Unveils Enhanced Real-World Video Super-Resolution

Beijing Zhongke Journal Publising Co. Ltd.

As a popular topic in recent years, video super-resolution (VSR) has been regarded as a challenging taskbecause it is necessary to collect supplementary information from video frames for recovery. It was designedto recover a more realistic high-quality video from an unknown degraded (i.e., compressed, downsampled,blurred, or noisy) low-quality video.

Spatial alignment, which is a vital method in VSR, is responsible for aligning highly relevant but unalignedfeatures for subsequent recovery. Many methods, have been proposed to solve the problem of VSR alignment. Previous methods typically used optical flow to predict the motion field between the reference(near) frame and the target frame and then used the corresponding motion field warp to the target frame. Subsequent approaches have used more complex implicit alignment methods. For example, a deformable convolution was used by TDAN to align various frames at the feature level. This proved to be viable; however, the training process was unstable. In EDVR, multiscale deformable convolution is used for alignment. In RBPN, multiple projection modules use multiple frames for aggregation. This not only improves its performance but also adds complexity to the model.

The BasicVSR is a robust backbone. However, its effectiveness is constrained by the precision of the optical flow estimation, and incorrectly aligned features affect the alignment of the next frame. In Real world VSR, error information is cumulative during the propagation process, which may amplify noise and hinder video restoration

In this work, we redesigned BasicVSR by means of deformable convolution, a multi-scale feature extractionmodule (MSF), a cascade residual upsampling module, and a simulation of real-world degradation. Usingthese methods, hidden state information can be propagated and aggregated more effectively.

The contributions of this study are as follows:

• We propose a new video restoration model, RealFuVSR, which can extract and fuse features from multiplescales and eliminate confusing artifacts during propagation.

• RealFuVSR uses advanced alignment and upsampling methods to restore high-quality frames whilemaintaining a certain number of parameters.

Qualitative and quantitative evaluations of our model showed that RealFuVSR can recover high-qualityvideos with richer textures and details. Our RealFuVSR model outperforms the most recent Real-BasicVSRand Real-ESRGAN models.

/Public Release. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).View in full here.