NHK Laboratories Note No. 465


Yuko Yamanouchi, Hideki Mitsumine, Seiki Inoue,
Shigeru Shimoda
(Multimedia Services Research Division)

    To generate highly realistic scenes of a virtual studio, we are developing the technology for a new type of virtual studio which is based on image components from real videos instead of CG. We call the system an Image-Based Virtual Studio. The technology for two types of image components has been developed for the system. These are an environmental image component that is used to provide long shots and a three-dimensional image component that is used to provide medium shots and close ups.
    We have recently developed a principal-point alignment camera head that can be used to compose a large number of camera images into a single high-resolution omnidirectional image, as the environmental image component. We constructed an omnidirectional image (16,000x16,000 pixels) from about 1000 camera images, and were eventually able to use them to compose a seamless whole. We have developed an experimental high-definition image-based virtual studio based on such image components. The system's utility in TV program production has been confirmed.
    The virtual studio which composes images from a combination of computer graphics (CG) and real-images [1] is becoming an increasingly popular method of program production. This production method not only opens up possibilities for making new kinds of images but can also lead to increase cost efficiency and to reduce space requirements for studios.
    However since it is extremely difficult to combine CG and real images so that they are indistinguishable, compositions made from real images and CG in real time still create a feeling of incongruity. To create a realistic virtual studio, the boundary between CG and real images needs to be removed, so that the real images can be handled in the same way as CG. The position and size of the components of a real image, as the elements that make up a scene, could therefore be changed during program production. Such virtual studios in which real images are used would enable a greater freedom in program production.
    The objective of this research is to break down real images into components so that they can be handled in the same way as CG. For this purpose, the images are broken down into images of small objects, for example a desk or a vase (three-dimensional image components), and the image of a surrounding ambient scene (environmental image component). As shown in Fig.1, the user chooses three-dimensional image components and places them on the environmental image component. The resulting image and the shot from the real studio are then used to synthesize the final image. We refer to this system as an image-based virtual studio. The image components, which are combined, are all real pictures, so images of a more realistic electronic studio set can be obtained than by using the conventional approach.

Fig.1 Concept of an image-based virtual studio

    We have developed a method of producing high-quality environmental image components for application to an image-based virtual studio. We have generated a high-definition omnidirectional image as the environmental image component by pasting together still images which are shot by a gradually rotating camera. The images cannot be pasted together seamlessly unless the position of the principal-point of the zoom lens of camera is aligned with the center of the axis of rotation. We have therefore developed a principal-point alignment camera head especially for the production of omnidirectional images. The camera platform ensures that the principal-point of the zoom lens is always aligned with the center of the axis of rotation of the camera. In an experiment, this platform was used to shoot about 1000 images. The images were then pasted together to produce a single high-resolution omnidirectional image. We then used these images as parts of an image-based virtual studio for HDTV.

    A strong interest in panoramic images has led to a variety of paste-up software[2] products appearing on the market. Such software products are used to match up and paste together images taken by rotating a camera. Since the principal-point of the camera lens is not usually at the center of the axis of rotation, some parts of the images will not match properly and it is thus impossible to produce a truly natural panoramic image.
    Meanwhile, research into systems that are specifically designed to produce panoramic and omnidirectional images is also going on. One such system produces an omnidirectional image from ten or more cameras that are installed radially about a center[3]. It is difficult to align the principal-point of the camera lenses with the radial center, so the images cannot be pasted together accurately. Another uses cameras and curved mirrors[4][5][6]. But resolution of the images is fixed to low level, since the number of mirrors and cameras is limited. Neither type of system is applicable to the production of high-definition omnidirectional images for output at the broadcast quality which we aim to achieve.
    Principal-point alignment has been examined in detail[7], but motion of the principal-point in terms of the degree of zoom and the focal length has yet to be examined. The degree of zoom and focal length has to be changed according to the view to be shot. The position of the principal-point then changes accordingly. In an omnidirectional image system the principal-point has to remain aligned regardless of the degree of zoom or focal length. Therefore, we used the relation between principal-point position and angle of view, which changes according to the degree of zoom and focal length, to develop a camera platform which is equipped with a mechanism that automatically aligns the principal-point with the axis of rotation. We then used this camera platformto provide source images for pasting to the high-definition omnidirectional image.

    As explained above, in a zoom lens, the angle of view and the position of the principal-point are both affected by the degree of zoom and focal length. In this context, the principal-point corresponds to the pinhole-camera viewpoint that is used in CG. Fig. 2 shows that if the position of the principal-point is not aligned with the center of rotation, parts of the image will be seen in some frames but not in others, as depicted in images (A) and (B). If these two images are pasted together, a discontinuous join will be visible in the resulting composite image. Therefore, the principal-point must be aligned with the center of rotation to produce seamless omnidirectional images and panoramic images.

Fig. 2 Image view in the case of positional incompatibility between the principalpoint and the center of rotation

    Fig. 3(a) shows the relationship between the amount of change in the angle of view ratio (ratio of the actual angle of view to the angle of view at maximum focal length) and the degrees of zoom and focal length. Fig. 3(b) shows the relationship between the principal-point position of the lens and the degree of zoom and focal length. These figures show that both the angle-of-view ratio and the principal-point position change according to both the degree of zoom and focal length.
    Lens data as shown in Fig. 3 is not usually available, and must be measured. The principal-point position that corresponds to a given degree of zoom and focal length is calculated as follows. Firstly, as a mechanism to measure the degree of zoom and focal length, a rotary encoder is fitted to the ring.

Fig.3 Relation between angle of view, principal-point and zoom, focal length position

    Fig. 4 shows that if the principal-point is not aligned with the center of rotation the two objects in the background will be hidden when the camera is panned (horizontal rotation). When the position of the principal-point is aligned with the center of rotation, the two objects in the background are not hidden but superimposed. We therefore measured the position required for the principal-point so that objects in the background will not become hidden even if the camera moves to the front or rear of the rotation center as in Fig.4(a).
    The principal-point is then calculated for several values of zoom and focal length and a table of the results is drawn up. After cubic spline interpolation, the principal-point position in relation to the degree of zoom and focal length can be expressed by the following pair of equations.

tan w/tanW=h(z,s).....(2)
(t1: principal point position, z: zoom position, s: focal length,
w: actual angle of view, W: angle of view in maximum focal length)

    After the principal-point position is aligned on the center of rotation, the angle of view that corresponds to each degree of zoom and focal length is calculated with these equations. The actual angle of view is calculated by matching images from two scenes with the camera rotated right and left as in Fig. 4(b) by using the pan angle (angle of horizontal rotation). The calculated angle of view was used as a standard when the images were pasted together.

Fig.4 How to measure critical principal-point and angle of view

    Fig. 5 shows our system that automatically inputs images by using a principal-point aligning camera platform. With this system, the operator selects the zoom and focus values, and the camera is then operated automatically in terms of the principal-point position and the angle of view given in the three-dimensional table. The images required are input in sequence. The system is designed so that the image and camera parameters are simultaneously input from the camera..
    The controller enables the remote control of pan (rotation horizontally), tilt (rotation vertically), movement of the principal-point movement (i.e. movement in the direction of the optical axis), and the zoom, focus, and iris values for the lens system. Fig. 6 shows the specifications of the trial camera platform.

Fig.5 Principal-point alignment camera system

Fig.6 Specifications of the camera system

    Once zoom and focus data have been input into the computer, the principal-point position is calculated according to the three-dimensional table of principal points in relation to zoom and focal length. The camera platform then automatically moves so that the principal-point position remains aligned with the center of rotation. Moreover, since all of the camera parameters shown in Fig. 6 are also under computer control, the system automatically sets the degree of pan and tilt according to the angle of view during shooting and can automatically input images to construct an omnidirectional image.

    We experimentally pasted together automatically shot images, using the corresponding camera data (pan, tilt, and angle of view) to make up an omnidirectional image. A high-resolution omnidirectional image (16,000x16,000 pixel) was thus obtained by using 1000 overlapping camera images at a horizontal angle of view of 11.0°.
    Fig. 7 shows the image obtained when the camera images were simply pasted together to form the omnidirectional image. This figure confirms that the images are not perfectly aligned along the joins. Possible causes include lens distortion and the precision with which CCDs are installed in cameras. Lens distortion can be barrel type or pincushion type, depending on the degree of zoom. The amount of distortion can be expressed as follows, in terms of the distance r from a scene's center:,
R=r+a1r 3+a2r 5,     where r is a theoretical value, R is a measured result, and a1 and a2 are coefficients of a fifth-order, odd-power only, polynomial in r.
    Ideally, CCDs should be installedin a plane perfectly perpendicular to the optical axis but, in practice, there are small errors. Therefore, taking lens distortion and CCD rotation into account, we tuned the system by matching a set of overlapping upper, lower, right, and left images. The errors were as below.
ROTATIONx -0.0002y -0.0008z -0.232
DISTORTIONa1 17X10-8a2 2.93x10-17

    When the images were again pasted together, but with the errors compensated for, the images matched almost perfectly.
    Differences in the brightness and color of each camera image, such as flare (a kind of fog caused by the reflection of light) and shading (the amount of surrounding light has decreased), can also cause problems. As a measure against shading, a uniform white was shot beforehand and the RGB value of each picture element was measured. The ratios of the RGB values of each image element to the RGB value of the central picture element were then calculated and the values for each picture element in each camera image were then adjusted according to this data.
    Since it is difficult to eliminate flare from images after they've been shot, the color of the image was adjusted by using the average color in the area of the parts of images which were superimposed. As a result of these adjustments for deformation, CCD installation error, color, and brightness, as shown in Fig. 8, the border between images was no longer obvious.

Fig. 7 Result of pasting images

Fig.8 A revised image

    Fig. 9 shows half of an omnidirectional image which contains about 16000x16000 pixels. While mismatches of one or two picture elements can be seen in places, a reasonably good image was generated. The remaining mismatches occur because of flexure in the camera platform. In future research, we plan to improve the matching process and also to reduce this mismatches.

Fig.9 A half-plane of an omnidirectional image (16,000x16,000 pixel)

    To use components from a real picture to construct a studio set, the user chooses images of small objects as three-dimensional image components then places them on the image of a surrounding scene, which is the environmental image component.
    The environmental image component consists of a high-definition omnidirectional image as mentioned above. A part of the image at any angle around the viewpoint can be taken to derive at high resolution.
    A three-dimensional image component consists of three-dimensional modeling data and images of the object's surface as a individual data. Having the separate components allows the view direction and lighting point to be freely adjusted.
    We constructed an experimental HDTV image-based virtual studio from such image components. Fig. 10 shows the components of this system. Several environmental image components and three-dimensional image components are loaded into a workstation (SGI-ONYX2) as the virtual studio set. A prepared CG animated character or a real actor on a chroma-key set can then be used in the virtual studio.

Fig.10 Experimental image-based virtual studio system

    The user operates a small camera manipulator that moves like a camera (pan, tilt, zoom, and dolly), and the data on this motion is instantly sent to the workstation. The virtual studio's image components and the relative position of the CG character in the virtual studio are thus controlled in real time. As the environmental image component has no three-dimensional data, and the actor is not computer controlled, there are certain problems. The lack of occlusion, for example, can cause problems, so the environmental image component is separated into several layers which represent different depths. The three-dimensional image components and the actor can then be moved forward or back to be between any pair of layers the environmental image. Although the camera cannot be turned into the environmental image component, it can be dollied to some degree.
    Highlights and reflections are removed from the textures of the three-dimensional image components, and this permits the free alteration of the lighting condition by using a graphical user interface. At the moment we visually determine the direction and color of lighting in the environmental image component, and then alter the position and color of lighting on the surfaces of the three-dimensional image components accordingly.
    As a result, we have verified that our system can be flexibly applied to produce a realistic studio-set image within the limits of camera-work and lighting (see Fig. 11 ).

Fig.11 Outputs of image-based virtual studio

    The dependence of the position of the principal-point of a camera's zoom lens, was determined and used to develop a principal-point aligning camera platform. The platform keeps the principal-point aligned with the center of rotation of the camera. As a result, it is possible to automatically shoot frames to make up omnidirectional images, without mismatches between frames.
    Furthermore, we were able to confirm that even when a large omnidirectional image (16,000x16,000 pixels) was produced, camera images can be pasted together so that the borders between images appear natural by compensating for the lens distortion coefficient, CCD installation error, color, and brightness.
    We applied such omnidirectional images as environmental image components for our newly developed image-based virtual studio system. The system is a virtual studio for HDTV and can be used to produce effective TV programs even though in terms of camera-work and lighting conditions.
    In the future, so that high-resolution three-dimensional images can replace the omnidirectional image as the source of the environmental image component, we will need to shoot images from many points and to construct the 3D models. We also intend to implement an automatic system for finding the position of the light source in the environmental image component.

  1. M.Hayashi, et al., "Desktop Virtual Studio System", IEEE Trans. on Broadcasting, Vol. 42, No. 3, pp.278-284, Sep. 1996

  2. S.E.Chen, "QuickTime VR -an image-based approach to virtual environment navigation", SIGGRAPH'95, pp.29-38, 1995

  3. H.Fukumoto, et al., "Construction of Whole View Image by Synthesizing Landscape Images", NIM-97-75, pp. 1-6,1997(in Japanese)

  4. Y.Yagi, et al., "Real-time Omnidirectional Image Sensor for Vision Guided Navigation", IEEE Trans. on Robotics and Automation, Vol. 10, No.1,Feb. 1994

  5. J.Hong, et al, "Image-Base Homing", Proc.1991 IEEE Int. Conf. On Robotics and Automation,pp.620-625, Apr.1991

  6. S.K.Nayar, "Omnidirectional Video camera",Proc.DARPA Image Understanding Workshop, May 1997

  7. T.Wada, et al., "Fixed Viewpoint Pan-Tilt-Zoom Camera and Its Applications", IEICE, DII vol. J81-D-II, No. 6, pp.1182-1193, June. 1998 (in Japanese)

Copyright 2000 NHK (Japan Broadcasting Corporation) All rights reserved. Unauthorized copy of the pages is prohibited.