|文章来源：||发布时间：2016-09-13||【字号： 小 中 大 】|
报告题目: Overview of Research in TuSimple
报告人: 王乃岩 (现任图森科技首席科学家)
I am currently a principal scientist in TuSimple <http://www.tusimple.com/> . I got my PhD degree from CSE department, HongKong University of Science and Technology in 2015. My supervisor is Prof. Dit-Yan Yeung. Before that, I got my BS degree from Zhejiang University, 2011 under the supervision of Prof. Zhihua Zhang.
My research interest focuses on applying statistical computational model to real problems in computer vision and data mining. Currently, I mainly work on matrix factorization and deep learning. Especially I am interested in the area of visual tracking, object detection, image classification and recommender system.
In this talk, I will briefly introduce the research in TuSimple, which mainly focus on computer vision and machine learning techniques for autonomous driving. Currently, we have several research and engineering related jobs open. In this talk, I will introduce three works recently done.
The first one is to enhance the representation power of computational layers (fully connected and convolution layers) in deep neural network. These layers essentially combine the features from previous layer linearly despite their different forms. The nonlinearities in modern deep neural networks are mainly from the activation function. In this work, we bring up the nonlinearities into the computational layers by introducing a simple yet effective operation called factorized bilinear. This operation can capture the pairwise interaction between features, while maintaining the efficiency. Experiments on various dataset demonstrate the effectiveness of the proposed method.
The second work is for road segmentation. Though CNN has been proven to be the most effective method of segmentation, it comes with high demand in data and expensive annotation labor. On the other hand, free space estimation aims to estimate the ground plane (a.k.a non-obstacle space) of an outdoor scene. It can be performed in an unsupervised manner from depth information. Though sharing a lot of similarity, we cannot directly use the labels of free space estimation due to the noise of labels and the semantic gap between these two tasks. To solver these issues, we propose a self-paced pre-training method for road segmentation. By leveraging this method, we can first learn the affordance of the outdoor scene, then finetune the CNN for the semantic meaning of road. The proposed method can dramatically decrease the number of annotations needed while obtaining high accuracy in recent benchmarks.
The last work is on video detection. Thanks to the rapid development of deep learning, the performance of static image detection significantly improves over the last years. However, an equally important task, video detection is largely less explored, primarily due to the lack of a comprehensive evaluation protocol. In this work, we first propose an evaluation metric considering accuracy and spatial and temporal smoothness. We also provide several baselines of video detection task using de facto datasets under our metric, demonstrating the effectiveness of our proposed protocol. We hope to see more active research efforts into this field by the guidance of the new evaluation metric.