KINOSHITA Kotaro

Content Production Technologies Bringing New Viewing Experiences (1 of 3)

Improving Immersive Realism by Reproducing 3D Radiation Characteristics of Sound Sources

Research Engineer, Advanced Television Systems Research Division
KINOSHITA Kotaro

NHK STRL is conducting research and development on future broadcast media technologies such as AR/VR, 3D television, and haptic presentation technologies. As part of these technologies, we have on-going research on production technologies to improve the expressiveness of video and sound produced by current televisions. In this series, we introduce content production technologies that will bring new viewing experiences, relentlessly pursuing higher realism in video and sound and achieving production not previously possible.

We are conducting research on advanced audio formats with greater realism, reproducing sounds corresponding to video more realistically while the viewer moves around viewing content from any viewpoint with AR/VR.

In earlier sound production for broadcasts, a single sound source is generally assigned to each person or musical instrument, and the sound is recorded with a single microphone. As such, viewers always listen the sound from the position where it was recorded, and what they listen does not change even if the orientation of the sound source in the video changes.

However, the sound radiated from a sound source has different frequency characteristics in each direction, so it sounds different depending on which way it is facing. These characteristics are called 3D radiation characteristics. With AR/VR viewing system, the relative positions of sound sources and the viewer can change freely, such as changing the orientation of the sound source or having the viewer move around the sound source, but we hope to improve the sense of realism by reproducing the 3D radiation characteristics of sound sources.

At NHK STRL, our method for reproducing 3D radiation characteristics changes the frequency characteristics according to the direction, so we call it “steering” (Fig. 1). Currently, most of the program audio that we are studying consists of Japanese language audio. To implement steering with Japanese language audio required a database of the 3D radiation characteristics of each of the mora*1 in the Japanese language. For a given mora, we refer to its 3D radiation characteristics in the database and select frequency characteristics from them based on the recording position and the viewer’s position. Steering is implemented by generating a filter that transforms the sound based on the change in frequency characteristics between these two positions, and applying the filter to the reproduced audio.

There was no prior research on a database of 3D radiation characteristics of Japanese language, to be used for steering, so this database had to be built by taking measurements. We began by building equipment to measure the 3D radiation characteristics (Fig. 2). A person was asked to sit in the center of the equipment, and their utterances were recorded by 124 microphones, each at a different direction from the person. Then, this data was used to build the database of 3D radiation characteristics for each mora in the Japanese language. In the future, we will study filter processing using this database further, working toward implementing more realistic audio.

Figure 1: 3D radiation characteristic steering
Figure 2: 3D radiation characteristic measuring equipment
  1. mora: A unit of articulated sound. In Japanese, these generally correspond to a single kana character.