Research Area

User-friendly broadcasting technologies


  We are conducting R&D on user-friendly broadcasting that conveys information quickly and accurately to all viewers, including people with hearing or visually impaired and non-native speakers, by converting broadcast content data automatically.
  In our research on information presentation, we continued our development of sign language CG characters that have facial expressions as presenters of weather information. We developed an automatic modification function to improve the manual gestures of CG characters. We also added a function to express facial expressions and mouthing to our sign language CG translation system and conducted subjective evaluations. We put our system to practical use to automatically generate sign language CGs from weather forecast data distributed by Japan Meteorological Agency and launched an evaluation website in NHK Online.
  In our study on technologies for the kinesthetic presentation of the contour and hardness of a 3D object, we developed a device that can instantly convey the size and shape of a virtual CG object by letting the user “hold” it with their thumb, forefinger and middle finger and “active touch” its surface gently.
  In our research on speech recognition technology for closed captioning, we developed an end-to-end speech recognition technique that does not use a pronunciation dictionary for information programs containing background noise and inarticulate speech. We also developed a method of estimating the topic of a program and the probabilities of word sequences for a topic for programs in which topics are changed frequently. This method can maintain the accuracy of word sequence estimation even when the accuracy of topic estimation is low, improving the overall recognition accuracy.
  We began research on automatic audio descriptions with the aim of providing a new commentary service that can be used for live programs. For use in the Rio 2016 Olympic and Paralympic Games, we developed a system for automatically generating audio descriptions of athlete names, scores and game progress based on the analysis of competition data provided by Olympic Broadcasting Services. The system provided automatic audio descriptions for 1,929 games as of September 2016 and for 2,496 games as of January 2017. We conducted experiments on the subjective evaluation of automatically generated audio descriptions in cooperation with people with visual impairment and sighted people and confirmed their effectiveness for both. In our research on speech synthesis technology for automatic audio descriptions, we built a versatile speech synthesis system using a deep neural network (DNN)-based acoustic model.
  In our research on language processing technology, we studied the feasibility of providing news scripts with reading assistance information for non-native speakers in Japan. Reading assistance information, such as explanations in easy Japanese, kana syllables of Chinese characters, dictionary information for difficult words, coloring of proper nouns and translations into foreign languages, can make news content easier to understand. We developed a system that automatically produces reading assistance information by using machine translation technology and allows the operator to modify errors manually.
  In our research on image cognition analysis, we researched image features suitable for a wide field-of-view environment such as 8K Super Hi-Vision. We investigated the relationship between the preferred image size, image features and the viewing distance by using 100 different types of images. The results indicated that changing the viewing distance does not significantly affect the relationship between the preferred image size and image content. We also identified the influence of the shaking duration and viewing angle on the degree of unpleasantness caused by shaking images.