Takahiro Mochizuki

Series: AI-Based Program Production Support Technology

Automatic Video Summarization

Takahiro Mochizuki

Covid-19 has forced us to change our working habits and reassess our work-life balance, thus highlighting the importance of cutting down on long work hours and promoting the spread of telework. This series will introduce program production support technology that uses artificial intelligence (AI) to streamline, or otherwise improve, program production.

What Is Automatic Video Summarization

To encourage greater viewership for broadcast programs, there is a growing demand for services that automatically extract the important scenes from programs so that short digest videos can be created for distribution via social media and other outlets. Producing such digest videos requires editing by a specialist. To perform such work efficiently, there is also a demand for technology to generate digest videos automatically using artificial intelligence (AI).

AI-Based Automatic Video Summarization

To assist program production studios in producing these digest videos, NHK Science & Technology Research Laboratories (STRL) is researching AI-based automatic video summarization technology. The learning process for the automatic video summarization AI (Fig. 1) and the digest video generation process are described below.

Figure 1: Learning process for the automatic video summarization AI

The Learning Process

(1) Prepare Learning Data
Large volumes of human-edited digest video are prepared along with the unedited source video as AI training data.

(2) Extract Feature Data from the Video
The unedited video is cut into segments of a few seconds each, and feature data in three categories (subject features, facial features of people being shown, and camera motion features) are extracted for each video segment.

(3) Input Feature Data and Train the AI
The three types of feature data extracted are input into the AI for training. At this point, the AI is trained to assign output values to each video segment that indicate the chances that a particular segment should be used for the digest video. Higher output values are assigned to segments used in the digest video, and lower values are assigned to the unused segments.

The Digest Video Generation Process

This automatic video summarization AI automatically generates a digest video by stitching together video segments assigned high output values for input of feature data. More specifically, the video segments that the AI has determined should be used for the digest video.

Using this technology, we created a prototype digest video production support system for news programs (Fig. 2). In this prototype system, the whole video of each news story is input as data, from which the AI automatically generates a digest video for each story. After a producer makes the necessary corrections, the system outputs a news digest video that combines the digest videos of each story.

STRL will continue to research and develop automatic video summarization technology to meet the diverse needs in program production sites.

Figure 2: Concept for news digest video production support system