NHK Laboratories Note No. 447


Kazuo Fukui, Masaki Hayashi, Yuko Yamanouchi
(multimedia Services Research Division)


Computer imaging techniques are playing an important role in TV program production. We have been studying the virtual studio system(VSS), a new program production environment that enhances the freedom of image creation by composing computer-generated images and real image taken by video cameras. Where general composing methods can only be used when there is no practical camera motion, VSS can efficiently incorporate such camera activities as panning, zooming, tilting, and traveling with computer-generated images so that the final image looks as thogh it has been filmed with a single camera. Tow type of VSS have been developed and test-produced: VSS driven by actual camera motion(VSS-AC) and VSS driven by virtual camera motion (VSS-VC).

1. Introduction
Computer-generated images are playing an important role in commercial films), titles, and other television program components. For exmple, the dramatic increase of computer process speed bhas brought about hardware that can generate three-dimensional graphic images in real time. Technological advances such as this one are helping the field of computer graphics move ahead into new areas application. Predominant among these new application is virtual reality.
Despite these technological accomplishments, however, it is still difficult to generate images of real living things with personalities such as human beings. Producers have been a long-cherished desire to make video programs in which real actors and actresses appear in studio sets made up of computer graphics free from various restrictions.
With this in mind, we are trying to develop a virtual studio system (VSS) capable of creating video images by combining shots of real subjects and virtual images made up of computer graphics into single images. In other words, VSS would synthesize the images of real performers shot with real video cameras and three-dimensional computer graphic images without any discrimination between them. It give the impresion that the entire image had been shot with a single camera.
Such systems could also contribute to reduce the cost of video program production for broadcasting and improve the quality of studio sets. The VSS being proposed will constitute part of a DTPP (4)(desk-top program production ) system under development at NHK laboratory. Aided by a single work station, this system aims to create a personal program production environment that integrates the entire video production processes, from filming and editing to video effects process.
In the following sections, we will analyze virtual space created using actual camera shots and computer graphics. It will also demonstrate that image processing prior to synthesis enables us to position or relocate real performer images within the composite virtual space. Finally, two types of virtual studio system that have been test-produced will be discussed.

2. Virtual space of composed images
Television program producers have utilized a variety of image synthesizing methods to create illusion of a real scene.The most often used is a chroma-key method, which cuts out an object from the foreground image shot against chroma-key blue, and composes it against a separately prepared background. Although the foreground and background images are made up of shots of different objects, the composed images look as if all the objects were in the same three-dimensional space and shot by a single camera.
The chroma-key method is illustrated in Fig. 1. A person is filmed against chroma-key blue to form a foregraound image; the background image, a building, is shot separately. The result of composing the two images using this method is a single picture of a person standing in front of the building.
Fig. 2 shows how the distance (Za) between the camera and the object it is filming is altered in a composite image. A foreground image is filmed with a camera where a focal length equals "fa" and combined into a background image shot with a camera where focal distance is "fb". Since the background image occupies the greater part of the composite image, an observer might perceive it as if it had been shot with the background camera where focal distance is "fb." In so doing, the observer would feel that the absolute size of object A in the composed image is unchanged. Assuming that the focal distance is adequately shorter than the camera-object distance, the depth from camera (Zba) may be approximated as follows:

Zba = Za * fb / fa

Thus, if both the person in the foreground and the building in the background are filmed with cameras having the same focal distance, the result will be a composite image where virtual space preserves the geometry of the objects at the time of filming. To maintain the original geometry of the objects in a composite image which is made up of several components shot at different focal distances, the image will have to enlarged or reduced in proportion to their focal distances.
With the simple chroma-keying method the relative geometries between objects has to be altered in the composed image if the camera filming foreground or background images alters its focal distance, shooting angle, or position. This means that the chroma-key method may only be used to create efficiently composed images when no practical camera motion is necessary. This creates a problem because television program production depends on such camera motion as zooming, panning, and tilting for efficient presentation.
NHK has developed a chroma-key synthesizer which is interlocked with camera motion, what is called "Synthevision" (1). Synthevision, now used in everyday broadcasting services, adequately cuts out background images of necessary size from prepared large-screen background images, corresponding to orientation and view angle of camera shooting the foreground image. The system interlockes with camera motion but not with camera removal.
Suitable backgraund interlocked with camera motion(incluudind those shot with traveling cameras) can be generated by employing three-dimensional computer graphic images instead of actually filmed images as bacground.
There are two methods to add camera motion effects to an image composed from shots taken with two different cameras. One, VSS-AC(2), generates computer graphic images by reproducing the motion of a camera that has the same focal distance as that of the camera filming real performers. The other, VSS-VC(3), controlles the motion of both real cameras and virtual cameras; the virtual camera positioned at will to generate computer graphic images independent from the actual cameras. Hereinafter, we will refer to the system using the former method as VSS driven by actual camera motion [VSS-AC] and that using the latter method VSS driven by virtual camera motion [VSS-VC]. Both types of systems have been test-produced.

3. VSS driven by actual camera motion
The VSS-AC, is interlocked with a camera filming an actual image. Background computer graphics and synthesized so that the composed image provides a perception of depth. The positioning of the actual performers must conform with the three-dimensional configuration data of the studio sets. For this, focal distances, locations, and orientation of camera parameters for the computer graphics must be adjusted to conform with those of the real cameras.
To accurately understand the optical characteristics of real cameras and successfully conform the operations of camera models to those of real cameras. For this purpose, we have attached sensors to the camera lens to measure zooming and focusing. The system computes the focal distance of the lens and the location of the principal point from these measured values using the characteristics tables of the lens.
Camera location and orientation must be measured in terms of position coordinates based on a coordinate system arranged for the space in which the performers are to act. However, an instrument which measures absolute camera location with sufficient accuracy is cost prohibitive. In an effort to resolve the problem, a camera was mounted on a crane as shown in the Fig. 3 and all the joint angles of the crane were measured. The camera location can be computed using these angles measurements, with the crane's arm length serving as a relative location from a reference point on the crane. The absolute location of the reference point, fixed on the crane pedestal, is measured through rigid calibration and with a method of comparing computer graphics images and actual images in a mixture.
Using these parameters, the camera's orientation and principal point location and the angle of view can be computed to obtain a camera model. This model can generate studio set images using three-dimensional configuration data representing virtual studio sets.
In actually shot images, performers may be hidden by some studio sets because of their distances from the camera. In order to reproduce this effect in virtual studio sets, VSS-AC supplements conventional simple overwriting synthesis with a separate back-to-fore overwriting method that composes studio sets from the rear of the background to the front of the foreground. An example of such a composed image is shown in Fig. 4. A virtual studio system of this type has been used for production of the NHK television programs, "Nano-space","Universe with in", and so on.

4. VSS driven by virtual camera motion
Using video material made up of the actual camera images of performers, the VSS-VC uses a work station to generate a virtual studio in which the performers and virtual cameras are arranged.
The system's operator manipulates the virtual cameras in the virtual studio through an specially developed man-machine interface. (Fig. 5) Then, the system outputs images shot with these virtual cameras. Different from the VSS-AC discussed in the foregoing section, the current system requires the operator to control virtual cameras to shoot virtual space to generate images. The system is designed so that camera motion controlled by the operator does not affect the relative geometry between actually shot performers and computer-generated studio sets.

(1) Performers' images
Images of performers are filmed and recorded against a chroma-key blue background. If these objects are shot with a fixed camera, the performers may go out of the frame. While the camera can be set at a wide angle of view to avoid this, material images are excessively enlarged and picture quality may deteriorate as a result. This system thus requires camera motion to film performers in about full size and within the limits of a frame. In so doing, the system records the camera's operating data in every field. Simultaneously recording the images of performers and camera operation data is equivalent to recording the images projected on a spherical screen (Fig. 6). Only a limited part of the images on this spherical screen is actually recorded by the system. However, those not recorded area on the spherical screen contain no performers and is therefore unnecessary.

(2) Studio sets in computer graphic
The VSS-VC generats studio sets images with three-dimensional computer graphics in real time. Parameter such as the size and locations of the walls, ceilings, floors, and furniture necessary for generation must be registered in the system as mumerical data beforehand. In addition to ordinary data arrangements, the system has a special formt of 3-D configuration data that can be easily changed in interactive operation. Camera position and orientation as well as angle of view, which are included in the studio set generation parameters, are measured in real time from a virtual camera stand controlled by the operator.

(3) Performer and camera arrangements
Prior to filming, the operator must arbitrarily set the positions of the performers and the virtual camera on the work station to determine the relative geometry between the virtual camera and performers. Prerecorded performers' images are deemed to have been filmed at this newly determined virtual camera position. Then, the relative geometry is able to represent both the distance from cameras and the offset angle from virtual camera orientation. Magnification or reduction size of the performer images is computed in accordance with these distance differences. Panning angles at this time of recording must include the offset values, while panning variations must be changed to conform with desired reduction rates (Fig. 7).

(4) Filming by the virtual camera
After these preparations, the operator will retrieve recorded images and film them with a virtual camera. The position at which a performer image is composed with the studio setting must be adjusted in conform with the difference between the optical axis of the reproduced performer image and the optical axis of the virtual camera at work. Fig. 8 shows that reduction amounts and composing displacements are computed in real time in accordance with virtual camera operations.

(5) Hardware configuration
We have developed a prototype of a VSS-VC which can be broken down into the following hardware components (Fig. 9). This system can simultaneously process a program in which a maximum of two performers appears. To produce material images for composition the system's image processing section adequately enlarges or reduces reproduced performer images and alters image positions so that they may conform to their positions in the virtual space. The images of each performer are input into the system's spatiotemporal editing processor with a key signal used to extract the performer from the full screen image. The extracted performer images are overwritten based on depth information. As a result, the images reproduce a proper state of occlusion (Fig. 10). The producer is able to choose any of the recorded studio sets or performers and adjust virtual camera, performer positions or studio set configurations using a mouse on the work station's multi-window displayed on Hi-Vision monitor. Virtual space is filmed using a camera stand without a camera, as shown in Fig. 5. Images are displayed on the monitor as if they were shot with a virtual camera.

5. Conclusion
We have tried to identifying the best way to operate cameras in harmony with each other during a process of composing actual camera images and computer graphic images. To obtain more naturally composed images, it is helpful to consider other factors such as lighting consistency and shadows. In general, it is thought that actual camera images and computer graphic images belong to different video domains. However, we can improve program quality and reduce production cost by directly fusing these two types of images or utilizing computer graphics technology for image processing. The most important attribute of virtual studio systems may be that they can stimulate producers' aspirations for better work and possibility expand the capabilities of video presentations.


(1) S. Shimoda, M.Hayashi and Y.kanatsugu,"New Chroma-key Imaging Technique with Hi-Vision Background," IEEE Trans. on Broadcasting, Vol.35, No.4,Dec.1989
(2) T.Yamashita, M.Hayashi,"From synthevision to an electronic set," Proceedings of the Digimedia Conference, Geneva, Apr.1995
(3) K. Haseba, A.Suzuki, K.Fukui, M. Hayashi, "Real-Time compositing system of a real camera and a computer graphic image," Proceeding of International Broadcasting Convention Sep. 1994
(4) K. Enami, K. Fukui, N. Yagi "Program Production in the Age of Multimedia - DTPP: Desktop Program Production -" IEICE Trans. Inf.&Syst.. Vol.E79-D, No.6, June 1996

Copyright 1997 NHK (Japan Broadcasting Corporation) All rights reserved. Unauthorized copy of the pages is prohibited.