SIAT VIDEO TEAM
Journal Paper
Generative Adversarial Network Based Intra Prediction for Video Coding      [PDF]

Linwei Zhu, Sam Kwong, Yun Zhang, Shiqi Wang, and Xu Wang

IEEE Transactions on Multimedia (IEEE T-MM), vol.22, no.1, pp. 45-58, Feb. 2020.

In this paper, a novel intra prediction method is proposed to improve the video coding performance, in which the generative adversarial network (GAN) is adopted to intelligently remove the spatial redundancy with the inference process. The proposed GAN-based method improves the prediction by exploiting more information and generating more flexible prediction patterns. In particular, the intra prediction is modeled as an inpainting task, which is accomplished with the GAN model to fill in the missing part by conditioning on the available reconstructed pixels. As such, the learned GAN model is incorporated into both video encoder and decoder, and the rate-distortion optimization is performed for the competition between GAN-based intra prediction and traditional angular-based intra prediction to achieve better coding performance. The proposed scheme is implemented into the high-efficiency video coding test model (HM 16.17) and the versatile video coding test model (VTM 1.1)...

Sparse Representation based Video Quality Assessment for Synthesized 3D Videos      [PDF]

Yun Zhang*, Huan Zhang, Mei Yu, Sam Kwong, and Yo-Sung Ho

IEEE Transactions on Image Processing (IEEE T-IP), vol.29, pp.509-524, Dec. 2020

The temporal flicker distortion is one of the most annoying noises in synthesized virtual view videos when they are rendered by compressed multi-view video plus depth in Three Dimensional (3D) video system. To assess the synthesized view video quality and further optimize the compression techniques in 3D video system, objective video quality assessment which can accurately measure the flicker distortion is highly needed. In this paper, we propose a full reference sparse representation-based video quality assessment method toward synthesized 3D videos. First, a synthesized video, treated as a 3D...

Machine Learning Based Video Coding Optimizations: A Survey      [PDF]

Yun Zhang, Sam Kwong*, Shiqi Wang

Information Sciences (INS), Elsevier, vol.506, pp.395-423, 2020.

Video data has become the largest source of data consumed globally. However, development of video coding has become saturated to some extent while the compression ratio continuously grows in the last three decades. Machine leaning algorithms, especially those employing deep learning, which are capable of discovering knowledge from unstructured massive data and providing data-driven predictions, provide new opportunities for further upgrading video coding technologies. In this article, we present a review on machine learning based video encoding optimization, aiming to provide researchers...

  Circular Intra Prediction for 360 Degree Video Coding      [PDF]

Linwei Zhu, Yun Zhang*, Na Li, Jinyong Pi, Shiqi Wang

Journal of Visual Communication and Image Representation (JVCI), 2020.12, Accepted

Different from traditional 2D video, the contents of 360 degree video are deformed due to the projection from 3D sphere to 2D plane. As a result, the traditional Angular Intra Prediction (AIP) with a linear pattern may not be always efficient. To further improve the coding performance of 360 degree video, a novel intra prediction method is presented in this paper, i.e., Circular Intra Prediction (CIP), which takes consideration of the spherical characteristics of 360 degree video. In specific, the proposed CIP is performed in a circular pattern, where the center of circle is located around the to-be-predicted...

Deep Learning-Based Chroma Prediction for Intra Versatile Video Coding      [PDF]

Linwei Zhu, Yun Zhang(*Corresponding Author), Shiqi Wang, Sam Kwong, Xin Jin, and Yu Qiao

IEEE Transactions on Circuits and Systems for Video Technology (IEEE T-CSVT), 2020.10, Accepted.

DOI:10.1109/TCSVT.2020.3035356

Color images always exhibit a high correlation between luma and chroma components. Cross component linear model (CCLM) has been introduced to exploit such correlation for removing redundancy in the on-going video coding standard, i.e., versatile video coding (VVC). To further improve the coding performance, this paper presents a deep learning based intra chroma prediction method, termed as convolutional neural network based chroma prediction (CNNCP). More specifically, the process of chroma prediction is formulated to produce the colorful version from available information input...

A Novel Deep Neural Network Based Approach for Sparse Code Multiple Access      [PDF]

Jinzhi Lin, Shengzhong Feng, Yun Zhang*(*Corresponding Author), Zhile Yang, Yong Zhang

Neurocomputing, vol. 382, pp. 52-63, 2020

Sparse code multiple access (SCMA) has been one of the non-orthogonal multiple access (NOMA) schemes aiming to support high spectral efficiency and ubiquitous access requirements for 5G communication networks. Conventional SCMA approaches are confronting challenges in designing low-complexity high-accuracy decoding algorithm and constructing optimum codebooks. Fortunately, the recent spotlighted deep learning technologies are of significant potentials in solving many communication engineering problems. Inspired by this, we propose and train a deep neural network (DNN) called DL-SCMA...

SUR-FeatNet: Predicting the Satisfied User Ratio Curve for Image Compression with Deep Feature Learning      [PDF]

Hanhe Lin, Vlad Hosu, Chunling Fan, Yun Zhang, Yuchen Mu, Raouf Hamzaoui, and Dietmar Saupe

Quality and User Experience, vol. 5, article no. 5, May 2020

The satisfied user ratio (SUR) curve for a lossy image compression scheme, e.g., JPEG, characterizes the complementary cumulative distribution function of the just noticeable difference (JND), the smallest distortion level that can be perceived by a subject when a reference image is compared to a distorted one. A sequence of JNDs can be defined with a suitable successive choice of reference images. We propose the first deep learning approach to predict SUR curves. We show how to apply maximum likelihood estimation and the Anderson-Darling test to select a suitable parametric model for the distribution function...

Efficient In-loop Filtering Based on Enhanced Deep Convolutional Neural Networks for HEVC      [PDF]

Zhaoqing Pan, Xiaokai Yi, Yun Zhang, Byeungwoo Jeon, Sam Kwong

IEEE Transactions on Image Processing (IEEE T-IP), vol.29, no. pp. 5352 - 5366, March 2020

The raw video data can be compressed much by the latest video coding standard, high efficiency video coding (HEVC). However, the block-based hybrid coding used in HEVC will incur lots of artifacts in compressed videos, the video quality will be severely influenced. To settle this problem, the in-loop filtering is used in HEVC to eliminate artifacts. Inspired by the success of deep learning, we propose an efficient in-loop filtering algorithm based on the enhanced deep convolutional neural networks (EDCNN) for significantly improving the performance of in-loop filtering in HEVC. Firstly, the problems of traditional convolutional neural networks models, including the normalization method, network learning ability, and loss function, are analyzed...

Viewport Perception Based Blind Stereoscopic Omnidirectional Image Quality Assessment      [PDF]

Yubin Qi, Gangyi Jiang, Mei Yu, Yun Zhang, and Yo-Sung Ho

IEEE Transactions on Circuits and Systems for Video Technology (IEEE T-CSVT), 2020.12, Accepted,

10.1109/TCSVT.2020.3043349

Compared with traditional 2D images, stereoscopic omnidirectional images (SOIs) usually have more complex perceptual factors due to the particularities of imaging and display, making the objective quality assessment of SOIs challenging. In this paper, we construct a large and diverse subjective SOIs database named as NBU-SOID for further research demand. And then, we propose a viewport perception based blind SOIs quality assessment (VP-BSOIQA) method by considering the impacts of viewport, user behavior and stereoscopic perception on human visual system, which is mainly composed of binocular perception...

Deep Learning Based Picture-Wise Just Noticeable Distortion Prediction Model for Image Compression      [PDF]

Huanhua Liu, Yun Zhang(*Corresponding Author), Huan Zhang, Chunling Fan, Sam Kwong, C.C. Jay Kuo, and Xiaoping Fan

IEEE Transactions on Image Processing (IEEE T-IP), vol.29, no.1, pp. 641-656, Dec.2020.

Picture Wise Just Noticeable Difference (PW-JND), which accounts for the minimum difference of a picture that human visual system can perceive, can be widely used in perception-oriented image and video processing. However, the conventional Just Noticeable Difference (JND) models calculate the JND threshold for each pixel or sub-band separately, which may not reflect the total masking effect of a picture accurately...

Conference Papers
Satisfied user ratio prediction with support vector regression for compressed stereo images.

Chunling Fan, Yun Zhang, Raouf Hamzaoui, Djemel Ziou, Qingshan Jiang.

IEEE International Conference on Multimedia and Expo (ICME) Workshop, London, United Kingdom, July 2020.

We propose the first method to predict the Satisfied User Ratio (SUR) for compressed stereo images. The method consists of two main steps. First, considering binocular vision properties, we extract three types of features from stereo images: image quality features, monocular visual features, and binocular visual features. Then, we train a Support Vector Regression (SVR) model to learn a mapping function from the feature space to the SUR values...

Sparse Representation-Based Intra Prediction for Lossless/Near Lossless Video Coding

Linwei Zhu, Yun Zhang, Na Li, Jinyong Pi, Xinju Wu

IEEE Conference on Visual Communications and Image Processing (VCIP'2020), Lecture Notes in Computer Sciences, Macau, Dec. 2020.

In this paper, a novel intra prediction method is presented for lossless/near lossless High Efficiency Video Coding (HEVC), termed as Sparse Representation based Intra Prediction (SRIP). In specific, the existing Angular Intra Prediction (AIP) modes in HEVC are organized as a mode dictionary, which is utilized to sparsely represent the visual signal by minimizing the difference with respect to the ground truth. For the match of encoding and decoding, the sparse coefficients are also required to be encoded and transmitted to the decoder side...

Content-aware Hybrid Equi-angular Cubemap Projection for Omnidirectional Video Coding

Jinyong Pi, Yun Zhang, Linwei Zhu, Xinju Wu, Xuemei Zhou

IEEE Conference on Visual Communications and Image Processing (VCIP'2020), Lecture Notes in Computer Sciences, Macau, Dec. 2020.

Omnidirectional video is required to be projected from the Three-Dimensional (3D) sphere to a Two-Dimensional (2D) plane before compression due to its spherical characteristics. Therefore, various projection formats have been proposed in recent years. However, these existing projection methods have problems of either oversampling or discontinuous boundary, which penalize the coding performance. Among them, Hybrid Equiangular Cubemap (HEC) projection has achieved significant coding gains by keeping boundary continuity when compared with Equi-Angular Cubemap (EAC) projection...