NHK Laboratories Note No. 474

A Method of Acquiring 3D Video Components without Lighting Effects

by
Hideki MITSUMINE, Yuko YAMANOUCHI, Seiki INOUE
(Multimedia Services Research Division)
Abstract
    A method is described for acquiring textures without highlights or shading for use in an image-based virtual studio for television program production. In an image-based virtual studio, a virtual set is made using video components, i.e., shapes and textures from real objects. If the textures of video components have highlights or shading due to lighting, the composite images are unnatural. The proposed method uses polarized light, a rotary polarized filter in front of a camera, and information on light position to remove the highlights and shading from textures, thereby enabling images to be composed under any lighting condition. The effectiveness of the method was demonstrated experimentally.

1. Introduction
    The virtual studio, which is now being used by broadcast stations especially for informational programs, combines a virtual set of background images created by computer graphics techniques with video taken of performers on a chroma key set. The end result is an image in which the performers look as if they are actually present at the place depicted. As such, the virtual studio has many advantages. It enables virtual images to be taken of a place where images could not, in reality, be taken, allows physically impossible camerawork to be performed, and saves on studio space, for example. The virtual studio is a production technique that should find many more uses in the years to come.
    On the other hand, real images and images generated by computer (CGI) each reflect lighting, texture, and time conditions, and if these should be inconsistent, the composite image will feel unnatural [1]. It is for this reason that the range of programs to which the virtual studio can be applied is currently limited. The construction of virtual sets also requires much labor, and to add to this problem, the expansion of high-definition television program production in the future means that virtual sets will have to be constructed with even more accuracy. The need is therefore felt for a method that can simplify the construction of high-quality virtual sets.
    In response to the above problems, the authors have proposed an "image-based virtual studio" [2]. In this type of studio, shape, texture, and other necessary information (called "video components") are extracted from actual video for each targeted object and used as a virtual set. This approach makes it possible to construct a composite image that feels natural. At this time, it is important that lighting effects (such as highlights and shading) present when acquiring video do not remain in the texture of a video component. Otherwise, highlights and shading will inevitably appear in an artificial area inconsistent with the lighting conditions of actual images, and an unnatural composite image will be produced.
    Video components with no lighting effects are essential to obtaining high-quality composite images. One technique for reducing the effects of highlights is to determine appropriate object position, lighting position, and camera position at the time of shooting. However, it is impossible to remove highlights uniformly for objects made up of various kinds of materials, and even the suppression of highlights to the extent possible requires ample know-how in lighting techniques.
    Various techniques have been proposed to remove highlights. One technique, for example, places a polarized filter in front of the camera and uses the property that light of specular reflection becomes completely polarized at Brewster's angle [3]. Another technique makes use of depth images and images taken from many directions [4][5].
    In the former technique, the incident angle of light illuminating an object must be set to Brewster's angle to sufficiently remove specular reflection. Brewster's angle, however, differs from material to material, which means that the incident angle of lighting must be considered for each type of object material. In addition, it would not be possible to satisfy all conditions for separation of highlights in the case of an object having a complex three-dimensional shape.
    The latter technique, on the other hand, while making use of object-depth information, cannot easily measure such depth at high accuracy, and this makes it difficult to achieve sufficient accuracy in highlight separation.
    In contrast to these techniques, our proposed technique can remove highlights without being dependent on lighting position and the accuracy of object-depth measurement [6]. This is accomplished by using polarized light for lighting and the difference in the way that specular reflection and diffuse reflection behave with respect to polarized light.
    In this report, we propose a technique for acquiring video components so that lighting effects like highlights and shading are not included in texture, and demonstrate its effectiveness by experiment.

2. Image-based Virtual Studio
    In an image-based virtual studio, object information for configuring a set is obtained in the form of video components. Ideally, all such video components would have shape and texture considering the desire for a high degree of freedom in representing images. Shape measurements, however, are not easy to perform for an object that cannot be observed from many viewpoints about its center.
    For this reason, we decided to represent distant objects as omni-directional images (environmental video components [7]) that are shot from an all-encompassing view and give these images layer information corresponding to depth.
    On the other hand, for nearer objects, that is, objects positioned near the viewpoint, we give them shape and texture in the form of 3D video components that can represent different views of an object as the viewpoint moves.
    This development of a video-component-formation technique based on distance from the viewpoint enables us to achieve an image-based virtual studio by practical means and, at the same time, to attain a sufficient degree of freedom in the kind of camerawork required for program production.

3. 3D Video Components
3.1 Acquisition of shape information
    Various techniques have been proposed for acquiring an object's shape [8][9][10]. When targeting 3D video components for use in a virtual studio, however, we can list the following conditions.
  • Accuracy takes precedence over convenience
  • Non-contact method is desirable (so that valuable items like fine arts can be targeted)
  • Real-time capability not required

Table 1: Range sensor specifications

    In accordance with these conditions, we decided to adopt a laser range sensor for acquiring shape information.
    Table 1 lists the specifications of the laser range sensor that we used for our experiments. On the basis of these specifications, we took measurements from multiple viewpoints taking the structure of each object into account, and then integrated measured depth information [11][12] to obtain shape information for 3D video components.

3.2 Acquisition of texture information
    When attempting to acquire texture information, two obstacles will be encountered due to the lighting used during measurements. The first is the presence of highlights caused by glare on the object, and the second is shading caused by differences in the intensity of incident light on the various parts making up the object surface. The techniques for overcoming these two obstacles are described below.

(1) Highlight removal
    In this technique, the behavior of reflection occurring at the surface of an object conforms to the dichromatic reflection model [13]. Light intensity i as observed by the camera can be expressed by Eq. (1).
i = id + is + ie---(1)
i : observed light intensity
id: light intensity of diffuse reflection
is: light intensity of specular reflection
ie: light intensity of ambient light

    The ie term considers the effects of flare and the like caused by multiple reflection and the lens system. In the case of a highlight (the is term), the light intensity of specular reflection, dominates the id term, the light intensity of diffuse reflection. This means that the highlight is observed in a state closer to the color of illuminating light as opposed to the color of the object.

Figure 1: Behavior of wave vector in specular reflection
Figure 1: Behavior of wave vector in specular reflection

    This technique uses the fact that diffuse reflection and specular reflection behave differently with respect to polarized light. When irradiating an object with linearly polarized light, diffuse reflection occurring at the object's surface gives rise to multiple reflection from multiple pigments inside the object. As a result, the direction of polarization of reflected light becomes dispersed. Specular reflection, though, behaves differently, as shown in Fig. 1. The electric-field component is divided into component P parallel to the incident surface and component S perpendicular to the incident surface. The sign of component P's amplitude reflection coefficient reverses at the boundary where incident angle 1 is equal to Brewster's angle 0, and component P of reflected light vanishes at this boundary. The sign of component S's amplitude reflection coefficient, however, does not reverse. This means that the amplitude reflection coefficient of both component P and component S can be uniquely determined by the incident angle and type of object material. In short, light of specular reflection is always linearly polarized when illuminating an object with linearly polarized light. By therefore placing a polarized filter in front of the lens when illuminating an object with linearly polarized light, we can pick up reflected light while varying the filter's direction of polarization and collect at each coordinate the pixel having the smallest value among the multiple pictures obtained. The end result is a picture with specular reflection (highlights) removed.
    In actuality, observed intensity associated with mutual reflection and ambient light (the ie term) due to flares generated by the camera system are also included in addition to diffuse-reflection components. In the experiments described here, however, it is assumed that nothing but the object in question is illuminated, and we omit the ie term.

(2) Shading removal
   Diffuse-reflection component id can be expressed by the following equation (Lambert's cosine law).
id = iin Kd cos ---(2)

id: light intensity of diffuse reflection , iin: intensity of incident light
Kd: diffuse reflection coefficient
    This technique treats the surface of the object as a uniformly diffused surface conforming to Lambert's cosine law.
    According to Lambert's cosine law, the intensity of diffuse reflection is dependent on incident angle and light-source intensity iin but independent of the direction of observation. Furthermore, if the incident angle of light illuminating the object and the amount of light are known, the diffuse reflection coefficient can be calculated. Diffuse reflection coefficient Kd is unaffected by shading caused by differences in the intensity of incident light at different parts of the object's surface, and can therefore be calculated from Eq. (2) and used as the texture video component.
    The incident angle of lighting and the amount of light, however, present a problem here. While illuminating the object uniformly with completely parallel light beams is ideal, the size of the object may prevent this from being achieved, and it is generally difficult to cover a wide area uniformly with parallel light. This technique therefore sets typical compact studio-lighting equipment (ARRI Compact200) to uniform luminous intensity distribution curves whereby light-source intensity iin becomes fixed. In addition, if the area of the light source itself is too large, light incident on the object's surface will integrate preventing Kd from being reliably calculated from Eq. (2). In response to this problem, we installed a mask as shown in Fig. 2 to devise a spot light source that could then be treated as a point light source described by a simple lighting model. The incident angle of lighting on the object's surface could then be easily approximated.
    The accuracy of the texture reflection coefficient obtained by this technique depends on the measurement accuracy of the range sensor. According to the measurement principle of a laser range sensor, the accuracy of shape information deteriorates if the angle formed between the line of observation or laser beam direction and the actual normal to the object's surface becomes large. The accuracy of the direction of the normal would also be unstable in such a case. These problems in accuracy would affect the cos term in Eq. (2). At the same time, a sufficient amount of light is not obtained by parts of the object's surface for which the incident angle of lighting is large, and S/N of the corresponding part on the obtained image is poor. This affects light intensity of diffuse reflection id of Eq. (2). In response to this problem, we established an appropriate threshold value to exclude such parts. This process, however, produced gaps on the object's surface, which could be minimized, though, by establishing a sufficient number of observation points and integrating their results.


Figure 2: Setting of light source


4. Experiments
4.1 System configuration
    Measuring equipment for the experiments was configured as shown in Fig. 3. A turntable and arm for varying elevation angle was installed to enable observation from multiple viewpoints, and the system was designed to enable viewpoint to be modified in a range of 0.0 to 359.9 horizontally (0.1 intervals) and -10.0 to 90.0 vertically (0.2 intervals). Views of the prototype measuring equipment and measuring-head interior are shown in Figs. 4 and 5, respectively.

Figure 3: System configuration
Figure 3: System configuration

Figure 4: System for measuring 3D video components
Figure 4: System for measuring 3D
video components
Figure 5: Head section interior
Figure 5: Head section interior

4.2 Test object
    To test the accuracy of the proposed technique, we constructed a cylindrical object whose simple shape made it easy to attain relatively high fabrication accuracy (Fig. 6). This test object was hollow with a height of 400 mm, a diameter of 400 mm, and a thickness of 5 mm. It was made of vinyl chloride and had a glossy surface.

Figure 6: Original image of test object
Figure 6: Original image of test object
Figure 7: Luminance intensity along X-axis of original image
Figure 7: Luminance intensity along X-axis of
original image

    Figure 7 shows luminance intensity along the line connecting points a and b in Fig. 6. In the latter figure, a bright vertically oriented section in the middle of the cylinder is a highlight, or in other words, the captured image of lighting. This section appears as a sharp peak near the center of the X-axis in Fig. 7. In the following experiment, we adjusted the amount of light incident on the camera with a ND filter so that this highlight would be collected within the camera's dynamic range.

4.3 Results with test object
(1) Highlight removal
    Specifically, we illuminated the test object with polarized light, rotated the polarized filter set in front of the measuring equipment 1 at a time, and shot images from 0 to 179. We then determined the pixel of smallest value at each coordinate of 180 captured images to obtain the processed image shown in Fig. 8.
    It can be seen that the highlight section running vertically in the center of the test object could be sufficiently removed. Figure 9 shows luminance intensity along the line connecting points a and b in Fig. 8. In both of these figures, a slight highlight disturbance can be observed in the peak area. This is due to the degree of separation of the polarized filter used here and to the angular resolution of the rotary polarized filter set in front of the camera.
Figure 8: Image of test object after removal of specular reflection
Figure 8: Image of test object after
removal of specular reflection
Figure 9: Luminance intensity along X-axis of processed image after removal of highlight
Figure 9: Luminance intensity along X-axis of
processed image after removal of highlight
Figure 10: Image of test object after removal of shading
Figure 10: Image of test object after
removal of shading
Figure 11: Luminance intensity along X-axis of processed image after removal of shading
Figure 11: Luminance intensity along X-axis of
processed image after removal of shading

(2) Shading removal
    The processed image (Fig. 8) obtained by the highlight-removal experiment described above possesses shading due to differences in the intensity of light incident on the various parts of the test object. To therefore remove this shading and determine diffuse reflection coefficient Kd from Eq. (3) below, we used lighting position information and object shape information obtained by previous measurements to normalize the intensity of incident light on each part of the object.

kd = i / ( iin cos ) ---(3)

    We performed shading-removal processing using the image of Fig. 8 obtained by the highlight-removal experiment. Processing results are shown in Fig. 10. Flat luminance intensity is obtained across the entire object. Vertical bands, however, do appear in these results. Slight but similar vertical bands were also confirmed in shape information of the object. As mentioned earlier, error in surface-normal information determined from the object's shape may become larger at object parts where the angle formed by the incident angle of lighting and actual normal direction is large.
    Figure 11 shows luminance intensity along the line connecting points a and b in Fig. 10. It can be seen here that luminance values at the left and right edges of the cylinder are not stable. In addition to error in shape information, the following two reasons are considered for this phenomenon.
  • Good accuracy in depth information cannot be obtained when the angle formed by the line of observation and the normal to the object surface is large.
  • A sufficient amount of incident light cannot be obtained when the incident angle of light is large.
    To deal with these problems, we first consider that lighting position, measuring-equipment position, and object shape are known. We then extract only stable parts by performing threshold processing using the angle between the line of observation and the normal to the object surface for the first problem and the incident angle of light.

4.4 Results with studio prop
    We next performed an experiment using an actual studio prop as a target object. This prop was an earthenware vase.
    Figures 12 and 13 show the original image of the object and its image after highlight removal. This result reveals that highlights could be favorably removed.
    In shading removal, on the other hand, we had to exclude those parts deemed unstable by the experiment with a test object. To this end, we set a threshold value for the incident angle of lighting and the angle between the line of observation and the normal to the object's surface, and excluded unstable parts accordingly.
    The value selected for this threshold (85) is set so as to exclude the unstable areas on both sides of the object. Figure 14 shows the image after shading removal. Gaps can be seen on the object's handle and in its lower section. These gaps are not the result of exclusion by threshold processing. They result rather from obstruction by the object itself of range-sensor slit light that illuminates from the left, or from the amount of reflected slit light being outside the camera's dynamic range, both of which prevent depth information from being acquired. A white part, moreover, can be seen on the left side of the earthenware vase. This part should have been excluded by threshold processing in the first place, but it remains due to insufficient accuracy of the range sensor. Processing by a threshold value determined from the experiment with a test object is therefore insufficient and appropriate measures with respect to range-sensor accuracy are needed. Other than these problem parts, it can be seen that shading has been removed.

Figure 12: Original image
Figure 12: Original image
Figure 13: Image after highlight removal
Figure 13: Image after
highlight removal
Figure 14: Image after shading removal
Figure 14: Image after
shading removal


4.5 Reconstruction and application to an image-based virtual studio
    Similar to the previous experiment, we integrated object shape and texture information from multiple viewpoints (26 points) using shape and texture integration software (Polyworks [14]). Reconstructed images are shown in Figs. 15 to 17.

Figure 15: Reconstructed result (without shading)
Figure 15: Reconstructed result
(without shading)

    Figure 15 shows a reconstructed image obtained by only the diffuse reflection coefficient with no shading. This result shows that highlight and shading in the image of Fig. 12 has been removed from the entire object. Figure 16 shows reconstructed images in which shading has been added, with the light source set directly in front of the earthenware vase in the left image (a) and at 45 to the upper left in the right image (b). Figure 17, on the other hand, shows reconstructed images in which the same specular reflection coefficient as that in Fig. 12 has been set manually, with the light source set the same as for the images of Fig. 16. These results reveal that while gaps are generated for excluded parts corresponding to unstable areas and for parts in which range data could not be achieved, they result only from automated measuring. In this way, the proposed technique can be used to reconstruct images under any lighting conditions.
    Figure 18 shows an example of applying several 3D video components acquired by the proposed technique to an image-based virtual studio. Since the environmental video components used here for background in the image-based virtual studio have no depth information, the lighting conditions cannot be modified for them. For this reason, we reconstructed the background by manually adjusting lighting conditions to prevent an unnatural look when reconstructing the 3D video components. In addition, as parts of 3D video components were lost due to automated measuring as described earlier, we also interpolated shape and texture at these parts manually.

Figure 16: Reconstructed results (With shading)
Figure 16: Reconstructed results
(With shading)
Figure 17: Reconstructed results(With shading and highlight)
Figure 17: Reconstructed results
(With shading and highlight)

Figure 18: Example of an image-based virtual studio
Figure 18: Example of an image-based virtual studio

    In the above way, unnatural results can be reduced when adjusting lighting conditions in a composite image by using video components in which lighting effects have been removed beforehand.

5. Conclusions
    We have proposed a technique for acquiring 3D video components for use in image-based virtual studios and have performed experiments using a prototype system. These experiments demonstrated that highlight and shading effects in a captured image could be removed on the basis of lighting used at the time of measurements.
    It was also found that unstable parts could occur due to the size of the angle between the measurement viewpoint and the normal to the object's surface and to the size of the incident angle of lighting. These parts must be excluded taking the accuracy of range data into account, but the gaps that result from this exclusion processing should be correctable by performing repeated measurements while varying viewpoint.
    For the future, we plan to research methods for determining lighting conditions that include measurement and lighting positions with the aim of achieving simpler and more robust measurements; the problem of self-shading that occurs when acquiring texture; integration techniques that consider unstable parts generated by the proposed technique; and evaluation techniques in relation to unnatural effects generated by composite images.


References

  1. Imari Sato, Yoichi Sato, Katsushi Ikeuchi: "Seamless integration of a real scene and computer generated objects based on a real illumination distribution", 3th Symposium on Intelligent Information Media, pp.23-32, 1997( in Japanese )
  2. Yuko Yamanouchi, Hideki Mitsumine, Seiki Inoue: "Experimental Evaluation of Image-Based Virtual Studio System", Conference Proceedings of IEEE-PCM 2000, pp.106-109,2000
  3. Kenji Kumaki, Tsuyoshi Yamamura, Toshimitsu Tanaka, Noboru Ohnishi: "A Method for Separating Real and Virtual Objects from Their Overlapping Images", Technical report of IPSJ, Vol.95, No.108, CV-97, pp.31-37, 1995( in Japanese )
  4. Masaki Ohtsuki, Yukio Sato: "Detection of Specular Reflection Using Multiple Intensity and Range Images", Technical report of IEICE, PRU95-229, pp.101-108, 1996( in Japanese )
  5. Y. Sato, Mark D. Wheeler, K. Ikeuchi: "Object shape and reflectance modeling from observation", Proc. SIGGRAPH'97, pp. 379-387, 1997
  6. Hideki Mitsumine, Yuko Yamanouchi, Seiki Inoue: "An Acuisition Method of 3-dimensionalVideo Components for Image-based Virtual Studio", Journal of ITE , Vol.54, No.3, pp.440-443,2000( in Japanese )
  7. Yuko Yamanouchi, Masaki Hayashi, Seiki Inoue, Shigeru Shimoda: "Construction of Omnidirectional Images for Image-Based Virtual Studio", Technical report of IEICE, IE99-75, pp.9-16, 1998( in Japanese )
  8. Yoshinari Nakanishi, Kiichi Kobayashi, Makoto Tadenuma, Hideki Mitsumine, Suguru Saito, Masayuki Nakajima: "Shape measurement of 3D object by block matching method using multiple viewpoint images", Technical report of IPSJ, GCAD99-1, pp.1-6,2000( in Japanese )
  9. Yukinori Matsumoto, Dieter Ritter, Kazuhide Sugimoto, Tsutomu Arakawa: "SFS2 Algorithm for Three-dimensional Scanner Based on Monoscopic Camera", National Convention of IPSJ, pp.2-289-290, 1998( in Japanese )
  10. Hideki Mitsumine, Seiki Inoue: "Acquisition of 3-dimensional component image data for DTPP system", ITE Technical Report, Vol.19, No.7, pp.13-18, 1995( in Japanese )
  11. Brian Curless, Marc Levoy: "A Volumetric Method for Building Complex Model from Range Images", Proc. SIGGRAPH, pp. 204-212, 1996
  12. Marc Soucy: "Innovmetric's Multiresolution Modeling Algorithms", Course 25, Course Notes of SIGGRAPH, 1997
  13. G. J. Klinker, S. A. Shafer, and T. Kanade: "Using a color reflection model to separate highlights from object color", Proc. ICCV, pp. 145-150, 1987
Mitsumine Mr. Hideki Mitsumine
Hideki Mitsumine received his M.S. degree in electrical engineering from Shibaura Instisute of Technoligy in 1991. He joined the Nagoya station of NHK( Japan Broadcasting Corporation ) in 1991. Since moving to NHK Science and Technical Research Laboratories in 1993, he has been engaged in research on image processing, virtual studio system and video components. He is a research engineer in the Multimedia Service Research Division.
Yamanouchi Ms. Yuko Yamanouchi
Yuko Yamanouchi received her M.S. degree in electrical and electronic engineering from Tokyo Institute of Technology in 1988. She joined the Broadcast Engineering Department of NHK (Japan Broadcasting Corporation) in 1988. Since moving to NHK Science and Technical Research Laboratories in 1990, she has been engaged in research on video processing, computer graphics application, and virtual studio system. She is a research engineer in Multimedia Services Research Division.
Inoue Mr. Seiki Inoue
Seiki Inoue got B.S. degree in electrical engineering, M.S. and Ph.D. degrees in electronics engineering all from Univ. of Tokyo in 1978, 1980 and 1992, respectively. Since 1980 he has been working for NHK(Japan Broadcasting Corporation). He was seconded at the ATR Media Integration and Communications Research Labs in 1995-1998. He has been engaged in research on video processing and virtual studio system. He is a senior research engineer of the Multimedia Service Research Division at NHK Science and Technical Research Laboratories.


Copyright 2001 NHK (Japan Broadcasting Corporation) All rights reserved. Unauthorized copy of the pages is prohibited.

BackHome