Research Area

4.2  Image analysis technology

Technology for automatic metadata assignment to video

  Raw video footage stored in video archives and broadcast stations is a valuable resource for program producers. To make active use of such footage, we are researching automatic assignment of metadata to video.
  Since program producers often retrieve video using personal names as keywords, a face recognition technology to identify persons appearing in footage is very important. A face detection technology for specifying face positions in footage is essential for face recognition and we worked to improve the accuracy by introducing a deep learning technology. Using cascaded convolutional neural networks (NNs), we developed a face detection method that can achieve both low computation cost and high detection accuracy, which improved detection omission by about 40% compared with conventional methods(1). We developed a video editing support system with face recognition using this technology and used it for producing a documentary program, "Peeping Through 100 Cameras," which was aired in September. We also developed an automatic face blur system using face recognition in cooperation with relevant departments and exhibited it at the 48th NHK Program Technology Exhibition.
  In news production sites, the trend to use tweets and SNS posts for news reporting activities is growing. To support this, we studied introducing a technology to improve the accuracy of extracting and classifying newsworthy tweets by analyzing posted images for our social media analysis system under development (see 4.1). We prepared unique training data considering Twitter's distinctive characteristic that a wide range of non-newsworthy images are posted and developed a method for classifying images into five categories related with accidents and disasters by using convolutional NNs(2). We applied this technology to the social media analysis system and exhibited it at the NHK STRL Open House 2018.
  We also worked toward the practical use of a system using technologies that we have developed. We continued with the experimental use of a video material management system equipped with an object recognition technology and a similar-image retrieval technology that we built on the intranet in FY 2017 at program production sites which handle CG synthesis and video effects, and modified the system on the basis of findings from the effort.


Automatic colorization technology for monochrome video

  Monochrome video stored in broadcast stations is a valuable video resource. In response to the growing needs for colorizing and using it for programs, we are researching the automatic conversion of 4K-resolution-equivalent monochrome film video to color video.
  For the automatic colorization system using three types of NNs (color estimation, color correction, color propagation over adjacent frames) that we developed in FY 2017, we evaluated its effect on the reduction in working hours and the suppression of color variations(3). This system was used for producing an NHK special program, "Nomonhan: An Irresponsible Battle," which was aired in August, and a program, "That Day, That Time, That Program," which was aired in November, contributing to the reduction of working hours to about 1/60 that of the conventional process. We exhibited this system at the NHK STRL Open House 2018, IBC 2018 and Inter BEE 2018.
  Additionally, we began training NNs using video obtained from production sites with the aim of using this technology for HDR (High Dynamic Range)-SDR (Standard Dynamic Range) conversion, which was proposed by a relevant department.



Figure 4-3. Automatic colorization technology for monochrome video

Video summarization technology

  Distributing digest videos of programs via websites and social media is becoming more common in program production sites. To assist with the production of these videos, we are researching a technology to automatically summarize program video.
  In FY 2017, we developed a demonstration system that can summarize a program on the basis of various information such as viewer responses via tweet analysis and image analysis results such as the faces of cast members, the size of open captions (telops) and the amount of camera work. In FY 2018, we increased the number of program genres targeted by the system and exhibited it at the NHK STRL Open House 2018 and Inter BEE 2018. Considering the convenience in production sites, we also developed an automatic video summarization tool that can operate on hardware equivalent to a laptop PC and asked relevant departments to verify its operation. Moreover, we introduced NN technology to improve the quality of summarization and constructed an automatic summarization model with a unique network structure using manually-edited summarized videos as training data.
  As an application of the video summarization technology, we began developing a technology to automatically generate materials for program websites. As the initial study, we developed a technology to identify the location by comparing the frame images of program video with images displayed on map websites with the aim of achieving the automatic generation of route maps to be published on the websites of on-location TV programs(4).


 

[References]
(1) Y. Kawai, R. Endo, N. Fujimori, T. Mochizuki: "Prototype system for supporting TV program editing based on face recognition," Proc. of ITE Annual Convention, 22B-1 (2018) (in Japanese)
(2) N. Fujimori, T. Mochizuki, Y. Kawai, M. Sano: "Investigation of Dataset Construction Method and Decision Criterion for Classification of Images Posted on SNS," Proc. of ITE Annual Convention, 21B-2 (2018) (in Japanese)
(3) R. Endo, Y. Kawai, T. Mochizuki: "Monochrome Video Colorization System Taking Account of Color Consistency," Proc. of ITE Annual Convention, 22B-3 (2018) (in Japanese)
(4) A. Matsui, T.Mochizuki, N.Fujimori: "Multimodal Location Estimation of Videos based on Named Entity Extraction and Location Search using Image Matching," Proc. of ITE Annual Convention, 12B-1 (2018) (in Japanese)