NHK Laboratories Note No. 488


TEXTURE ACQUISITION BY A ROBOT-ARM CAMERA
-CREATION OF AN IMAGE-BASED VIRTUAL STUDIO FOR DOLLY SHOTS-


Yuko Yamanouchi(*1), Hideki Mitsumine(*1), Takashi Fukaya(*1),
Masaki Hayashi(*2)
(*1) Visual Information Technologies,
(*2) Intelligent Information Processing


Abstract

We have been studying a way to acquire high-quality textures in order to create a three-dimensional background in a virtual studio that uses real images. In this study, using the position and direction of a camera determined based on CAD data, we have designed a system that uses a camera mounted on a robot arm to shoot a virtual studio. For the shooting, the range in which the camera can move is specified; the camera moves within this range to shoot images from various positions and directions. By choosing the camera images closest to the CAD′s polygon data, the system can acquire textures that appear to be the largest for each polygon. Data from experiments show that, although there are some measurement errors depending on actual camera positions and directions and discrepancies between CAD data and the shooting object, the system as a whole can automatically acquire the high-resolution textures for the polygon of CAD data.
1.  INTRODUCTION



To produce a television program in a studio, a set is constructed, the cast performs in it, and scenes are shot by a camera. In recent years, as computer technology has advanced, TV program producers are increasingly using virtual studios[1] where computer graphics (CG) and real images are combined by the same camera manipulations. In this study, we have proposed Image-based Virtual Studio that includes real images to make the composite image look more natural[2][3][4]. In this studio, we can produce natural-looking composite images for children´s programs, weather forecasting and other programs that do not rely solely on CG techniques, and these images will look as if they were shot in real places. This virtual studio can also reduce the cost of studio setting each time it is constructed and save space. Figure 1 shows the concept of Image-based Virtual Studio. Video components produced from among the images shot by a camera are combined with virtual images. We have developed an HDTV virtual studio system using super high-definition panoramic images and used it to demonstrate that composite images produced in this virtual studio look more natural than those produced at conventional virtual studios. This new virtual studio has been also used to produce television programs.

Fig.1 Concept of Image-based Virtual Studio
Fig.1 Concept of Image-based Virtual Studio



The problem with conventional virtual studios is the lack of spherical information. In these studios, textures are attached to the surface of a sphere, allowing the camera to pan, tilt, roll, and zoom in/out. Without spherical information, however, images taken by camera manipulation (dolly shots) look unnatural. The challenge in this study is to create a virtual studio that allows more natural camera movements so as to produce a high-quality background that contains information on both 3D shapes and textures.


In this study we use the design data (CAD data) of a studio set as a 3D shape to obtain information on high-quality textures. This paper outlines some of the problems with conventional 3D composite images, how the camera positions and directions are set for the new virtual studio, and how high-quality textures can be obtained from camera images. It also describes the camera system mounted on a robot arm and the results of experiments.

2.  Challenges for texture acquisition and 3D expression



Many studies have been done on 3D-shape acquisition and various methods have been proposed. These include a method that uses images taken by a multi-viewpoint camera and then analyzes their factors to obtain characteristic 3D positions and camera data[5], and another that obtains depth by active laser scanning. Some software programs available on the market semi-automatically provide approximate information on structure from photographed images[6]. None of them, however, produce HDTV-quality textures that can be used as the background of a virtual studio. Some researchers have reported on the quality of images of a vase and other small objects[7], but not on structures that occupy a much larger space.


Our objective is to create a 3D image containing high-quality texture data that can be used for a virtual studio based on real HDTV level images (the studio set is about 5 ? 10 m in size). To produce television programs at a studio, a studio set is usually designed on a computer, and an actual studio set is then constructed based on the CAD data. In this study, we use such CAD data as 3D information. The CAD data provide information on vertex points and polygon area in the CG, enabling us to perform high-quality rendering. As for textures, they should be sufficiently good for HDTV shooting. To obtain such high-quality textures from the CAD data, we set the range of shooting and determined the position, direction, field angle, and focus of the camera. With this camera data, we can control the movement of the camera mounted on the robot arm and increase the resolution of the texture for each polygon.


3. Texture acquisition from CAD data



3.1. CAD data and camera position


As the CAD data of the studio set is given, we only need to set the range within which the camera moves in order to determine the polygon area information shown to the camera and obtain the texture information of the polygon. First, the range of camera movement is set based on the CAD data. Figure 2 shows how the camera and the studio set are positioned in relation to each other. The camera, which is fixed along the limited depth direction (Z axis) of the studio set, can move horizontally (X axis) and vertically (Y axis) for shooting. The following adjustments are made:


1) The camera position and direction are adjusted so that the camera’s view line is closely aligned to the normal line vector of the center of gravity of each polygon,
2) The camera’s field angle is adjusted so as to maximize the view of the polygon, and
3) The distance from the camera to the polygon is used as the focal value.


From the images taken under these conditions we can theoretically produce high-resolution textures.

Fig.2 CAD data of a studio set and camera position
Fig.2 CAD data of a studio set and camera position


3.2. Texture acquisition with a camera at specified positions


The method described in section 3.1 specifies the position, direction, field angle, and focus of the camera for each polygon. This means that the camera movements must be controlled for each polygon. The number of polygons increases as the shape becomes more complex, which will then require much more camera control and many more images to be shot. To avoid this, we limit the camera positions and directions to several points (see Fig. 3) and change the field angle and focus at each of these positions. From a group of these images taken by the camera, we can choose those images with the correct focal distance and in which the polygon appears the largest. In this way, it is possible to obtain high-quality textures efficiently and easily without having to use an indefinite number of images.

Fig.3 Method of setting up the robot camera position and direction
Fig.3 Method of setting up the robot camera position and direction


4. Texture acquisition experiment using a robot-arm camera



4.1. System Composition


We constructed a system to shoot a studio set measuring within about 6 x 3 x 2 m (WxHxD). Figure 4 shows the camera mounted on the robot arm. And figure 5 shows the specifications of the system. The robot arm (Denso Wave) has seven axes to change the position and direction of the camera. Horizontally, the system is manually moved on a rail. The field angle and focus of the camera lens can be adjusted. The method described in section 3.1 was employed in a preliminary experiment using CAD data, and the method described in section 3.2 was used to gather studio set information over a wide range.

Fig.4 Robot-arm camera system Fig.5 Specification of the robot-arm camera system
Fig.4 Robot-arm camera system Fig.5 Specification of the robot-arm camera system



4.2. Preliminary experiment


In this experiment which was carried out to produce textures, we employed the method described in section 3.1 to obtain the normal line vector from the CAD data of a rectangular parallelepiped and chose a camera position so that the camera´s direction was closely aligned to the vector. The range within which the camera could move was set and the camera position for each polygon was determined, assuming that six polygons of the parallelepiped (two top, two left, two right polygons) were shown to the camera. The camera position and direction were set in such a way that they could directly face the polygon as closely as possible. The angle of field was adjusted so that all the polygon´s textures were inside the camera´s viewing area. Figure 6 shows six polygonal images taken by the camera. In the experiment, the actual image of the rectangular parallelepiped and the CAD data´s wire frame did not exactly match, so the RX, RY, RZ, and zooming amount had to be manually adjusted to correct this discrepancy. This problem was probably caused by errors of the 3D sensor that is measured the camera position and the inadequate accuracy of the sensor itself (about 2 mm for each axis).


Figure 7 shows the combined textures after changing the viewing point of the camera. Some patterns of the rectangular parallelepiped, such as the flaws on top of the object, were clearly visible. The results verified that this method, by controlling the camera position and direction based on the CAD data, can be effectively used to attach high-resolution textures to a 3D model.

Fig.6 Images captured by the robot-arm camera Fig.7 Experimental result of combined textures
Fig.6 Images captured by the robot-arm camera Fig.7 Experimental result of combined textures


4.3. Experiment to acquire texture over a wide range


This time, we employed the method described in section 3.2 to obtain textures for an object put in an area measuring about 3 x 2 m. During the reproduction of a virtual studio, the camera was set such that it did not move into the set more than 3 m in depth (the depth direction (z) of the robot-arm camera was set at 3 m). The camera´s horizontal (x axis) and vertical (y axis) positions were set at four points on the x axis at 70cm intervals and at two points on the y axis at 35cm intervals. At each camera position, the vertical angle of field was fixed at 33 degrees and focus at 2 meters. The camera was adjusted in 20 directions in all?four directions longitudinally (0, 7ー, ?34ー) and five laterally (0, ア25ー, ア50ー). Altogether, we were able to take 160 images (8 x 20) and obtain camera data.


As for the CAD data of the studio set, we used triangles to form the polygons in order to make them as nearly the same size as possible so as not to create unevenness in the resolution of texture from one polygon to another. Figure 8 shows the CAD data of about 2500 polygons used in the experiment, and some sample texture images on to the CAD data from 160 captured images.

Fig.8 CAD data and sample textures from camera images Fig.8
Fig.8 CAD data and sample textures from camera images



From these 160 images we chose those in which polygons looked the largest from each camera position and direction and used them as the polygon“s textures. As in the preliminary experiment (4.2), the errors of the camera data were manually corrected. Figure 9 shows the reproduced image with textures attached. This image shows that the texture that matches each polygon is attached even when the viewing position is changed. As the camera image in which the polygon looks large is picked as the texture, the texture quality can be retained when the image is reproduced at a narrow angle.


Fig.9 Reproduced image texture
Fig.9 Reproduced image texture


4.4 Experiment to acquire texture to the real studio set


We also used a real studio set designed to look like a traditional Japanese room. From CAD data of the Japanese room set, figure 11 shows the results of the texture acquisition experiment. These two images show the textures remain in place even when the image is taken from different viewpoints.


Fig.10 Reproduced texture image of the studio set
Fig.10 Reproduced texture image of the studio set
5. Discussions



From Figure 7,9,10 it is possible the image outputs taken from any viewpoints of the camera tracking area. This means that texture images on CAD data in the camera tracking area are almost captured from the robot-arm camera.


In some areas the textures looked conspicuously out of place. This problem was attributable to the errors of the actual camera data and measurements, and geometrical discrepancies between the studio set and the CAD data. Figure 11 shows the comparison results of camera images with the texture images that enlarged of a sphere and a vase area. (a), (c) show original camera images and (b), (d) show reproduced texture images. The wall and the stand of the vase shown in dotted line square area have some errors. To eliminate these errors, the studio set data and the camera data must be precise. One way to correct the CAD data and camera positions is to start with the initial values of factorization for reconstructing the 3-D shape and then work upward until precise figures are obtained. In this case, as the field angle and focus of the camera lens were set for all camera positions and directions, not all the polygons were correctly focused. Our challenge is to devise a texture-acquisition system by which all the polygons are focused.


In a dotted line circle area of fig.11, the brightness of the sphere and the vase vary conspicuously. This is because how the light is reflected from shiny surfaces changes greatly depending on the camera position and direction. Here, another challenge is to develop a way to acquire high-quality textures from different materials.




(a)Original camera image
(sphere)
  (b)reproduced texture image (a)Original camera image
(vase)
(b)reproduced texture image
Fig.11 Comparison results of original camera images and reproduced texture images

6. Conclusions



We have developed a camera system mounted on a robot arm to obtain high-resolution textures for three-dimensional images at the image-based virtual studio that uses real images as its components. The system, by manipulating the robot arm, is able to acquire textures from polygons based on CAD data. If the number of polygons is very large, it is possible to selectively pick textures for the polygons from photographed images by limiting the position and direction of the camera. The results of the experiments show that, although the precision of CAD data and camera data needs to be improved, the system effectively produces a CG background comprising high-resolution textures and a 3D model.


In future studies, we plan to improve the texture quality through better focusing adjustment with the robot-arm camera that the distant from the camera to the polygon is given with CAD data. We will also examine how to obtain even textures from shiny surfaces and to eliminate errors of CAD data and camera data.





Reference


[1] M.Hayashi, K.Enami, H.Noguchi, K.Fukui, N.Yagi, S.Inoue, M.Shibata, Y.Yamanouchi, Y.Itoh, “Desktop Virtual Studio System“, IEEE Trans on Broadcasting, 42, (3), 1996, 278-284.
[2] Y.Yamanouchi, H.Mitsumine, S.Inoue, “Experimental Evaluation of Image-based Virtual Studio System“, IEEE-PCM 2000 Conference, 2000, 106-109.
[3] Y.Yamanouchi, H.Mitsumine, T. Fukaya, M. Kawakita, N. Yagi, S.Inoue, “Real Space-based Virtual Studio ?Seamless Synthesis of a Real Set Image with a Virtual Studio- “, ACM VRST 2002 Conference, 2002, 194-200.
[4] H.Mitsumine, Y.Yamanouchi, S.Inoue, “An Acquisition Method of 3-Dimentional Video Components for Image-based Virtual Studio“, ICIP2001 conference, 2001, 1101-1104,
[5] C. Tomasi and T. Kanade, “Shape and motion from image streams under orthography?A factorization method“, Int. J. Computer Vision, 9-2, 1992, 137?154.
[6] http://www.realviz.com/products/im/index.php
[7] Xiaohua Zhang, Yoshinari Nakanishi, Kiichi Kobayashi, Hideki Mitsumine, Suguru Saito, “Estimation of Surface Reflectance Parameters from Image Sequence“, J. for Art and Science (ISSN1347-0442), 1-1, 2002, 8-14.



Ms. Yuko Yamanouchi Ms. Yuko Yamanouchi
Yuko Yamanouchi received her M.S. degree in electrical and electronic engineering from Tokyo Institute of Technology in 1988. She joined the Broadcast Engineering Department of NHK (Japan Broadcasting Corporation) in 1988. Since moving to NHK Science and Technical Research Laboratories in 1990, she has been engaged in research on video processing, computer graphics application, and virtual studio system. She is a senior research engineer in Visual Information Technologies Division.
Mr. Hideki Mitsumine Mr. Hideki Mitsumine
Hideki Mitsumine received his M.S. degree in electrical engineering from Shibaura Institute of Technology in 1991. He joined the Nagoya station of NHK (Japan Broadcasting Corporation) in 1991. Since moving to NHK Science and Technical Research Laboratories in 1993, he has been engaged in research on image processing, virtual studio system, and video components. He is a research engineer in Visual Information Technologies Division.
Mr. Takashi Fukaya Mr. Takashi Fukaya
Takashi Fukaya received his B.S. degree in visual communication design from Kyushu Institute of design in 1992. He joined the Broadcast Engineering Department of NHK (Japan Broadcasting Corporation) in 1992. Since moving to NHK Science and Technical Research Laboratories in 2000, he has been engaged in research on human interaction in the virtual studio and video components. He is a research engineer in Visual Information Technologies Division.
Dr. Masaki Hayashi Dr. Masaki Hayashi
Masaki Hayashi received his B.S., M.S. and Dr.Eng. in electronics engineering from Tokyo Institute of Technology in 1980, 1983 and 1999 respectively. Since 1986, he has been with NHK Science and Technical Research Laboratories and currently is a senior research engineer of intelligent information Processing Division. He was also a guest associate professor in the graduate school of information Science and Engineering at Tokyo Institute of Technology from 2000 to 2003. He has engaged in research on image-processing, CG, image compositing systems and virtual studios. He is currently involved in research on automatic TV program generation using scripting language TVML.
 



Copyright 2004 NHK (Japan Broadcasting Corporation) All rights reserved. Unauthorized copy of the pages is prohibited.

BackHome