Photo: Yusuke Morita, Research Engineer

Translation Technology to Generate Sign-Language CG Animation

Providing more information in sign language

Yusuke Morita, Research Engineer,
Smart Production Research Division

Mr. Morita joined NHK in 2017, handling lighting for production of dramas at the Osaka Station. Working at NHK STRL since 2019, he is conducting research on sign-language translation and generating CG animation. His hobby is saltwater big-game fishing at remote islands in Japan and internationally.

Why is sign-language CG animation needed?

Many broadcast programs have subtitles included, and at first glance, they seem to provide enough information for those who are deaf or hard of hearing. However, the native language of people born without hearing, or who lost their hearing at a young age, is sign, not Japanese. For many of these “sign-language natives,” it is difficult to get information from Japanese subtitles, just as it is difficult for many Japanese people to get information from English subtitles. For this reason, NHK STRL is researching technology to generate sign-language CG, to provide information that is more accessible to sign-language natives.

Translation from Japanese to a sign-language word list

As with most audible languages, sign languages form sentences by combining movements that represent words. However, there are major differences in word order and grammar between sign language and Japanese, so they are completely different languages. To generate sign-language CG, a sequence of sign-language words must first be created from the original Japanese. We are developing an artificial intelligence (AI) able to translate Japanese into sign-language word sequences. Sign-language word motions are joined together based on the word sequence obtained from this translation, to generate the final sign-language CG.

Difficulty of visual languages

It is actually very difficult to describe sign-language motions using text. For example, the sign-language word for “say” uses the opposite motion for someone on the right speaking to someone on the left, compared to the other way around, even though it is the same word. As the number of people in a sentence increases, countless numbers of locations: right, left, above, below, in-front and behind; are used to express the positions of people and directions of actions, which can produce countless patterns for any given word. How to express this information in a sign-language word sequence that can be animated is a major difficulty in sign-language translation, and we are currently researching how to reproduce such spatial information.

Basic form for “Say”
Closed hand is held facing forward, and then suddenly opened while extending outward.
“Say” directed from the person on the right to the person on the left
Closed hand faces the other person, then opens while extending toward them (the thumb represents the other person).
“Say” directed from left to right
Closed hand faces the other person, then opens while extending toward them (thumb representing the other person).

Making it practical

There is a broad range of applications for a technology able to convert any sentence into sign-language animation, with potential for use in many daily-life scenarios, not limited to broadcasting. There are still many issues to be resolved to make the technology practical, but we are continuing research to resolve them one-at-a-time, and to develop it as a service.