Hideki TANAKA

R&D Special Series: "Learning from Human Possibilities"

Research on Language Processing

Hideki TANAKA
Language Processing Group Leader, Human & Information Science

We are researching natural language processing systems that include such as an automatic translation system that translates Japanese news items into a foreign language and a document summarization system that shortens news items with extracting key points from the news.

Natural language processing system

While natural language processing systems have become relatively common, they have not reached the level of human capabilities. One reason for this is that the main processing method relies on the word itself, without involving its meaning or content. Here is an example of this problem. A natural language processing system often determines the similarity between expressions. A decision may need to be made if there is a similarity or dissimilarity between expressions such as "Yen-wa Kyusoku-ni Urareta" (The Yen was being sold at a high pace.), and "Yen-wo Uru Ikioi-ga Tsuyoi" (There is a strong selling current for the Yen.). One way to decide with superficial processing utilizes the number of common words in the sentences. In the example sentences, the two common words "yen" and "sell" in them would

enable the computer to determine that the two sentences are similar. In contrast, the sentences "Dollar-wo Kau Ugoki-ga Tsuyoi (There is a strong buying move for the dollar.)," and "Yen-wa Kyusoku-ni Urareta" (The Yen was being sold at a high pace.) have no common words between them; the computer would say that these sentences were "dissimilar." Of course, a person can easily determine that they carry similar meanings because a human can process information with common sense and knowledge about the actual economy behind the superficial words of the sentence. Highly accurate translation systems and summarization systems will require such high level human capabilities.

Natural language processing that "learns from human possibilities"

It will be difficult to realize a natural language processing system with a human level of ability by solely using current computer systems. We therefore decided to exploit human abilities to compensate the insufficiency of computers.

Based on this idea, we are now developing a collaborative translation system that translates Japanese news items into English with human translators. The system can autonomously search typically problematic Japanese expressions such as human names and idiomatic expressions in the Web and the past human translation examples. The system then automatically finds their English translations and proposes them to a user as the translation components. The user would complete the translation using them. The collaboration is realized in the way that the system shares the task of finding and offering possible translation components and the human translator takes on the rest of the high level work. The system will be of practical value but will not be limited there. If we successfully analyze the operations of the human translators, we may learn their high level abilities to complete the translation.

Other notable areas of development may exploit the fact that massive textual databases likely contain deep hidden knowledge and common sense that are not yet utilized. Developing technologies based on such previously hidden expertise in databases will make a computer smarter, bringing it closer to the language processing capability of humans. We will continue our research to learn from human capabilities, with the aim of constructing systems with a language processing ability closer to that of our own.

Figure : Differences in human and machine translations