Welcome to SIAT Video Team (SVT), comprised of members from High Performance Computing Center (HPCC), Shenzhen Institute of Advanced Technology (SIAT), Chinese Academy of Sciences (CAS). We have long been engaged in the field of multimedia communications and visual signal processing for 2D/3D, VR/AR videos, including video coding, visual signal pre/post-processing, and computational visual perception. We are also pursuing challenging problems in the innovation areas, such as VR/AR, AI etc.
Join Us
Deep Learning-based Perceptual Video Quality Enhancement for 3D Synthesized View | |
IEEE Transactions on Circuits and Systems for Video Technology (IEEE T-CSVT), 2022 . Huan Zhang, Yun Zhang (*Corresponding Author), Linwei Zhu, Weisi Lin Full-Text | |
High Efficiency Intra Video Coding Based on Data-driven Transform | |
IEEE Transactions on Broadcasting (IEEE T-BC), 2021 . Na Li,Yun Zhang(*Corresponding Author), C.C. Jay Kuo Full-Text | |
Joint Source-Channel Decoding of Polar Codes forHEVC based Video Streaming | |
ACM Transactions on Multimedia Computing, Communications, and Applications (ACM TOMM), 2021 . Jinzhi Lin,Yun Zhang(*Corresponding Author), Na Li, and Hongling Jiang Full-Text | |
Subjective Quality Database and Objective Study of Compressed Point Clouds with 6DoF Head-mounted Display | |
IEEE Transactions on Circuits and Systems for Video Technology (IEEE T-CSVT), 2021 . Xinju Wu, Yun Zhang, Chunling Fan, Junhui Hou, Sam Kwong Full-Text Database Convideo | |
Deep Learning Based Just Noticeable Difference and Perceptual Quality Prediction Models for Compressed Video | |
IEEE Transactions on Circuits and Systems for Video Technology (IEEE T-SCVT), 2021 . Yun Zhang,Huanhua Liu, You Yang, Xiaoping Fan, Sam Kwong, C. C. Jay Kuo Full-Text | |
Highly Efficient Multiview Depth Coding Based on Histogram Projection and Allowable Depth Distortion | |
IEEE Transactions on Image Processing (IEEE T-IP), 2021 . Yun Zhang*, Linwei Zhu, Raouf Hamzaoi, Sam Kowng, Yo-Sung Ho Full-Text | |
Projection Invariant Feature and Visual Saliency-Based Stereoscopic Omnidirectional Image Quality Assessment | |
IEEE Transactions on Broadcasting (IEEE T-BC), 2021 . Xuemei Zhou, Yun Zhang*, Na Li, Xu Wang, Yang Zhou and Yo-Sung Ho Full-Text | |
Learning-based Satisfied User Ratio Prediction for Symmetrically and Asymmetrically Compressed Stereoscopic Images | |
IEEE Multimedia (IEEE MM), 2021 . Chunling Fan, Yun Zhang*, Raouf Hamzaoui, Qingshan Jiang, Djemei Ziou Full-Text | |
Deep Learning-Based Chroma Prediction for Intra Versatile Video Coding | |
IEEE Transactions on Circuits and Systems for Video Technology (IEEE T-CSVT), 2020 Linwei Zhu, Yun Zhang*, Shiqi Wang, Sam Kwong, Xin Jin, and Yu Qiao Full-Text | |
Machine Learning Based Video Coding Optimizations: A Survey | |
Information Sciences (INS), Elsevier, 2020. Yun Zhang, Sam Kwong*, Shiqi Wang Full-Text | |
Deep Learning Based Picture-Wise Just Noticeable Distortion Prediction Model for Image Compression | |
IEEE Transactions on Image Processing (IEEE T-IP). 2020 . Huanhua Liu, Yun Zhang*, Huan Zhang, Chunling Fan, Sam Kwong, C.C. Jay Kuo, and Xiaoping Fan Full-Text | |
Sparse Representation based Video Quality Assessment for Synthesized 3D Videos | |
IEEE Transactions on Image Processing (IEEE T-IP) . 2020 . Yun Zhang*, Huan Zhang, Mei Yu, Sam Kwong, and Yo-Sung Ho Full-Text | |
Generative Adversarial Network Based Intra Prediction for Video Coding | |
IEEE Transactions on Multimedia (IEEE T-MM) . 2020 . Linwei Zhu, Sam Kwong, Yun Zhang, Shiqi Wang, and Xu Wang Full-Text |
Video-based Crowd Counting Dataset in Compression Scenario Download | |
We built up a Video-based Crowd Counting Dataset in Compression Scenario(VCCD-CS)for evaluating video crowd counting methodology on crowd videos with different levels of compression distortion in terms of QP. The testing set of Fudan-ShanghaiTech dataset (FDST) is selected as the source of reference videos. The FDST is a dataset for video crowd counting. It contains 150K frames with about 394K annotated heads captured from 13 different scenes. The training set consists of 60 videos, 9000 frames and the testing set contains the remaining 40 videos, 6000 frames. We encoded the frames of the 40 video sequences in the testing set of FDST, with QP∈{0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50} in HEVC test model version 16.20 (HM16.20) under Low Delay P configuration. The “rec.yuv” is the output distorted videos, which could be decomposed with “yuvtobmp.exe” into frames for convenience of analysis in the future. Extraction code: z50m | |
Subjective Point Cloud Quality Database With 6DoF Head-Mounted Display Project Page | |
We focus on subjective and objective Point Cloud Quality Assessment (PCQA) in an immersive environment and study the effect of geometry and texture attributes in compression distortion. Using a Head-Mounted Display (HMD) with six degrees of freedom, we establish a subjective PCQA database named SIAT Point Cloud Quality Database (SIAT-PCQD). Our database consists of 340 distorted point clouds compressed by the MPEG point cloud encoder with the combination of 20 sequences and 17 pairs of geometry and texture quantization parameters. | |
SIAT Synthesized Video Quality Database Project Page | |
We develop a synthesized video quality database which includes ten different MVD sequences and 140 synthesized videos with resolutions of 1024×768 and 1920×1088. For each sequence, 14 different texture/depth quantization parameter combinations were used to generate the texture/depth view pairs with compression distortion. 56 subjects participated in the experiment. Each synthesized sequence was rated by 40 subjects using single stimulus paradigm with continuous score. The Difference Mean Opinion Scores (DMOS) are provided. | |
SIAT Depth Quality Database Project Page | |
We develop a stereoscopic video depth quality database which includes ten different stereoscopic sequences and 160 distorted stereo videos in with a resolution of 1920×1080. The ten sequences are from the Nantes-Madrid-3D-Stereoscopic-V1 (NAMA3DS1) database. There are four categories of impairments in the NAMA3DS1 database which are H.264 coding, JPEG2000 coding, down-sampling and sharpening. However, only symmetric distortions are considered in NAMA3DS1 database. Since both symmetric and asymmetric distortions are necessary to study, we generate additional stereoscopic videos with asymmetric distortion. There are 90 symmetrically distorted video pairs and 70 asymmetrically distorted video pairs. 30 subjects (24 male, 6 female) participated in the symmetric distortion experiment and 24 subjects (19 male, 5 female) participated in the asymmetric distortion experiment. | |
Picture-level JND Database (Symmetric & Asymmetric) Project Page | |
We study the Picture-level Just Noticeable Difference (PJND) of symmetrically and asymmetrically compressed stereoscopic images, , where the impaiments are JPEG2000 and H.265 intra coding. We conduct interactive subjective quality assessment tests to determine the PJND point using both a pristine image and a distorted image as the reference. We generate two PJND-based stereo image datasets, including Shenzhen Institute of Advanced Technology-picture-level Just noticeable difference-based Symmetric Stereo Image dataset (SIAT-JSSI) and Shenzhen Institute of Advanced Technology-picture-level Just noticeable difference-based Asymmetric Stereo Image dataset (SIAT-JASI). Each dataset includes ten source images. The PJNDPRI and PJNDDRI are provided. PJNDPRI reveals the minimum distortion against a pristine image. PJNDDRI reveals the minimum distortion against a distorted image. |
Ultra-High Definition 3D Video Live System Project Page | |
3D video live and on-demand system aims to solve the issues in processing, storage and transmission, and quality evaluation, providing a realistic and immersive viewing experience. This system can be widely applied in film and television production, video games, remote control, cultural relic protection, military simulation, and other fields. | |
VR/360° Video Projection Conversion Software Project Page | |
Projection is one of the essential procedures in the virtual reality video/panoramic video technology. The projection format will affect the compression efficiency of panoramic video. The selection of projection format in different application scenarios can effectively reduce the video transmission bandwidth and provide customers with high-quality virtual reality experience. | |
VR Video Live System Project Page | |
The immersive virtual reality video live system allows customers to watch 4K Ultra High Videos on demand/live broadcast, and can provide high-quality, realistic and interactive visual experience from 360 degree viewpoint. | |
JND Prediction Software Project Page | |
The distortion perceptron software was developed with the PW-JND (Picture Wise Just Noticeable Difference) prediction model, in which the deep learning tool had been utilized. This software can be applied in VR image/video compression to maximize the compression efficiency without detecting the quality degraded. |
Note: All resources shall not be used for commercial purposes, if you have any questions, please contact us: (yun.zhang@siat.ac.cn)