NHK Laboratories Note No. 479


HIGH-DEFINITION THREE-DIMENSION CAMERA:
HDTV VERSION OF AN AXI-VISION CAMERA


Masahiro Kawakita, Taiichro Kurita, Hiroshi Kikuchi, Yuko Yamanouchi*,
Seiki Inoue**, and Keigo Iizuka***


Display & Optical Devices
*Multimedia Services
**Broadcasting Engineering Department
***Department of Electrical and Computer Engineering, University of Toronto




Abstract

  We have developed a novel high-definition version of a three-dimensional camera (an HDTV version of an Axi-vision camera) that can simultaneously capture both the HDTV colour image and the depth image of a scene at a video frame rate. The depth image is obtained by using intensity-modulated illuminators with a near-infrared spectrum combined with a high-resolution camera with an ultra-fast shutter using an image intensifier. A high signal-to-noise ratio of the depth image, which is necessary to realize this HDTV version of the Axi-vision camera, has been achieved by (1) a new highly sensitive image intensifier, (2) novel optics, and (3) high power light emitting diode array illuminators. As a result, the camera can capture a depth image with more than 920,000 pixels at a frame rate of 29.97 Hz or one with 410,000 pixels at a frame rate of 59.94 Hz. Such high performance makes this camera suitable for practical applications such as post-production in a virtual studio where images of objects at a specific distance can be selectively extracted and synthesized with other images in real time.
1. Introduction


  A three-dimension (3D) camera can not only create highly sophisticated virtual images making use of detected 3D information, but also has the potential of playing an important role in the future development of a 3D TV broadcasting system. Efforts are being made to develop a 3D camera that can simultaneously capture the depth and colour images of a scene.
  Present depth mapping techniques, such as time-of-flight1, triangulation2-3-4, or moire patterns5, generally need either two-dimensional scanning of a laser beam or a sophisticated data processing. As a result, either the mapping speed or the number of detectable points is limited, so it is difficult to complete the operation fast enough for the video frame rate of TV images.
  In response to the above-mentioned issues, we have already proposed a novel high-speed depth mapping method, and have developed a standard definition (SD) TV version of a 3D camera, named the Axi-vision camera6-7-8. This incorporates intensity-modulated light and an ultra-fast shutter. This SDTV version of the Axi-vision camera can map a depth image of 768493 pixels at frame rate of 15Hz.
  To apply the proposed method to an HDTV post-production system, a new HDTV version of the Axi-vision camera-which can detect the depth of a scene for a high-definition TV image-was developed by improving the signal-to-noise ratio (SNR) of the depth detection. This paper reports on the basic principle and characteristics of this newly developed HDTV camera, and also demonstrates the potential application of it to a new image synthesis making use of depth information in a virtual studio.

2. PRINCIPLE OF DEPTH MAPPING


  The depth mapping camera consists of intensity-modulated illuminators operating in the near-infrared (IR) and a CCD camera with an ultra-fast shutter [Fig. 1(a)]. The principle of depth mapping is based on the fact that the intensity of the light reflected from the object is a function of the distance from the camera to the object if the intensity of the illuminating light is changed at a speed comparable to the speed of light.
  Referring to Fig. 1(b), the principle of detecting the depth from the ultra-fast snap shots is explained. The intensity of the light used for illuminating the object is triangularly modulated. The ascending light intensity I+ and descending light intensity I- , are written as



(1)

(1)
(2)
(2)

(3)

(3)



where  s  is the rate of increase or decrease, I0 is the maximum intensity of light, and  f  is the



Figure 1: Principle of depth mapping
Figure 1: Principle of depth mapping



modulation frequency. At depth d , the intensity E+ of light reflected from an object with reflectivity    is


(4)
(4)


where t+ is the instant that the shutter is opened during the ascending illumination, and v  is the velocity of light. Likewise, the intensity E-  of the light reflected from the object as the illumination decreases is


(5)
(5)


where t- is the instant that the shutter is opened during the descending illumination.
If t- is set as


(6)
(6)


the reflectivity    and divergence factor can be removed from the expressions by dividing  E+  by E- . Finally, the depth d of the object is obtained as


(7)
(7)
(8)
(8)


The depth d can thus be obtained from the ratio R of the reflected light intensity detected when the illumination intensity is ascending and descending.





Figure 2: Configuration of the HDTV Axi-vision camera







Figure 3: Photograph of HDTV Axi-vision camera



3. HDTV VERSION OF THE AXI-VISION CAMERA SYSTEM


 
3.1 Illumination and formation of the near-infrared image

  Figures 2 and 3 show the configuration and a photograph of the HDTV version of the Axi-vision camera, respectively, and the specifications of the camera are tabulated in Table 1. Near-IR LED array units are used to provide the intensity-modulated light source, because they produce fast direct modulation speeds up to 50 MHz. The near-IR light at a wavelength of 850 nm lies outside the visible wavelength and thus does not interfere with the colour image. The near-IR sources are clustered into four LED array units and are arranged around the camera lens to uniformly illuminate the object.



Table 1 Specifications of HDTV Axi-vision camera

LED array Wavelength 850 [nm]
Average light power 1 [W]
Modulation frequency 10 to 50 [MHz]
Image intensifier Gate time 1 to 20 [ns]
Repetition rate 10 to 50 [MHz]
Depth-detection
CCD camera
Number of effective pixels
1280(H) 720 (V) (29.97 Hz)
853(H) 480 (V) (59.94 Hz)
Depth image Depth resolution
(at 2 m distance)
1.7 [cm]
Output signal HD-SDI  



  The object is also illuminated by visible light such as fluorescent light, which has weak components in the near-IR region of the spectrum. The visible light reflected from the object passes through the camera lens, the dichroic prism and finally the relay lens to form a colour image on an ordinary HDTV colour camera. The near-IR light reflected from the object is separated by the dichroic prism, and enters a near-IR imaging system which includes an image intensifier. The near-IR image of the object is formed onto the photocathode of the image intensifier, where the input optical image is converted into a photoelectron image. After electron multiplication of by 3 to 4 orders of magnitude in a microchannel plate (MCP), the electric-charge image is incident on the surface of a phosphor plate which converts this charge into an optical image. The nanosecond order of performance of the shutter speed is achieved by applying ultra-short pulsed bias voltages between the photocathode and the MCP. To increase the SNR, the opening rate of the shutter is repeated at the same rate as the light-modulation frequency. The optical image appearing on the phosphor plate is then focused onto a high-resolution progressive CCD camera by way of a relay lens for creating a depth image.



3.2 Synchronization of the system

  The signal generator provides both the up-and-down ramped signals for modulating the output of the LED array and the trigger signals for the gating pulse to open the image intensifier shutter. Only the image with the ascending illumination is captured during one frame. Then the ramping direction of the modulation is reversed and the image with the descending illumination is captured during the next frame. The switching triggers that are taken from the vertical synchronizing video signal alternate the two modes of illumination.
  The signal processor sorts the images of ascending and descending illumination into frame memories. The intensities in these two images are used for calculating Eq. (7) to determine the depth of the object. The acquisition time for a depth image with 1280720 pixels is 1/30 second, and that for that with 853480 pixels is 1/60 second. The depth image is converted to an HDTV signal and output as an HD-SDI (Serial Digital Interface) signal.


3.3 Enhancement of the SNR of the system

  Increasing the number of pixels of the depth mapping CCD camera, decreases the signal current detected by one pixel. The noise level of the image therefore increases, and the depth resolution decreases accordingly. The SNR of the depth detection has the following relationship with the camera parameters.


(9)


Where    is the quantum efficiency of the photocathode of the image intensifier, I0 is the power of the illumination, T 0 is the transmittance of the optics,  A  is the area of one pixel, and    is the imaging time. The pixel size of the HDTV camera is less than one fifth of that of the SDTV camera, and the SNR is reduced accordingly. To maintain a good SNR in the HDTV depth image, we carried out the followings improvements:
(1) Improvement of the quantum efficiency of the image intensifier
(2) Redesign of the optical devices
(3) Increase in the power of the LED array


3.3.1 Improvement of the quantum efficiency of the image intensifier

  A new highly sensitive image intensifier was developed for the HDTV version of the Axi-vision camera. As shown in Fig. 4, the photocathode of the image intensifier is made of a GaAs target with a quantum efficiency of 11.7% at a wavelength of 850 nm, which is eight times higher than the multi-alkali targets used in the SDTV version. The new image intensifier has a double-layered MCP structure to give sufficient amplification for imaging by the high-resolution CCD camera and to reduce the damage to the photocathode by the ionfeedback effect. Furthermore, to enable high-definition images to be captured, the spatial resolution of the image intensifier was increased by using an MCP with a small channel diameter (6m), as well as by developing the proximity structure.




Figure 4: Spectral-response characteristics of photocathode



3.3.2 Redesign of the optical devices

  With the SDTV version of the Axi-vision camera, the dichroic mirror (which separates visible light and near-IR light) was positioned in front of the camera lens8. As result, the LED arrays were installed behind the dichroic mirror, and the camera system was bulky. With this configuration, the LED light power was not used economically, besides two separate camera lenses were required: one for colour the camera and one for the depth mapping camera.
  In order to solve these shortcomings, with the HDTV version of the camera, a small dichroic prism was arranged between the camera lens and the CCD camera. To prevent the leaking of visible light to the depth mapping camera, an optical filter (with an optical density of more than 5 at visible wavelengths) was fitted in front of the image intensifier. To ensure depth mapping with a high SNR, the dichroic prism and the optical filter were designed so that they could maintain about 90% transmittance at a wavelength of 850 nm (Fig. 5). This transmittance is more than twice that of the prototype optics. Furthermore, the camera system is compact and can handle a zoom lens.





Fig. 5 Spectra of visible light and near-infrared light overpaid with system transmittance



3.3.3 Increase in the power of the LED array

  Because of the limitation of its optical system which used a large dichroic mirror, the prototype camera had only two LED array units, one on each side of the camera lens8. The HDTV camera has compact optics and four clusters of LED array units can be arranged around the camera lens as shown in Fig. 3. The maximum power of the LED array is one watt, which is twice that of the prototype.



3.4 Camera performance

  By increasing the quantum efficiency   of the image intensifier, the transmittance T0 of the optics, and the power I0 of the LED illumination, the total SNR represented by Eq. (9) was increased by about five times compared to that of the SDTV version of the camera. Even though the area A of each CCD pixel has been reduced by a factor of five by using a high-resolution camera, and the imaging time    is reduced to half of that of the prototype camera, the camera system is sufficiently sensitive to capture depth images without reducing the SNR.
  With these improvements, the depth resolution of the new camera was evaluated from the measured noise level of the output video signal. When the modulation frequency of the near-IR light was 45 MHz, the gate width of the image intensifier shutter was 2 ns, and the distance from the camera to the object was 2 m, the resolution was 1.7 cm. Figure 6 shows the depth resolution against the distance to the objects. At 10 m, the resolution was about 5 cm. An image from the camera in (a) and the corresponding depth image in (b) are shown in Fig. 7. The camera can capture a depth image with more than 920,000 pixels at a frame rate of 29.97 Hz or with 410,000 pixels at a frame rate of 59.94 Hz with the output format of an HDTV video signal.






Fig. 6 Relationship between the depth resolution and
the distance to the bjects at the center of the dynamic range




Figure 7: Output images of HDTV Axi-vision camera.
(a) ordinary camera image (b) depth image


4. APPLICATION

  Yamanouchi et.al.9 developed an image-synthesizing technique that combines a camera image with an environmental real-image component in an HDTV virtual studio. The newly developed camera reported on here enables us to set up a new virtual studio system that uses depth-keying.
  One corner of a virtual studio of a Japanese-style-room set is shown in Fig. 8(a). As one sees in this virtual studio, there is no blue-back screen which is normally seen in a virtual studio for the purpose of chroma-keying the virtual scene. The colour and depth images of a person in front of the set were captured by the camera as shown in Fig. 8(b). Ultra-high-definition omni-directional images, such as image components of the sliding door and the ceiling [Fig. 8(c)], were captured beforehand by an ordinary HDTV camera and stored in a computer. The camera image was combined with the image components synchronized by the camera data, such as the zoom and focus of the camera lens and the pan and tilt data of the tripod. Figure 8(d) shows the final image synthesized by the depth-keying method. The wall located on the right hand side of the studio is removed and the image component of the sliding door is inserted between the person and the wall of the studio in real time. This shows the possibility of using the depth-keying method to create a virtual effect without using the blue-back screen. It is thus concluded that a three-dimensional, more realistic synthesized image can be produced by using the depth image obtained by this camera in real time. Needless to say, this kind of camera also supports tasks in post-production processing.





Figure 8: Image synthesis using depth information HDTV image-based virtual studio.
(a) virtual studio, (b) output image of HDTV Axi-vision camera,
(c) image component, (d) synthesized image



5. CONCLUSIONS

  An HDTV version of the Axi-vision camera that can simultaneously capture both high-definition colour images and corresponding depth images of the scenes at a video frame rate was developed. Good performance was achieved by improving the SNR of the depth mapping. The camera is suitable for presenting new synthesized images by using depth information in a virtual studio system. A future issue facing the practical application of the camera is to decrease the SNR of the depth detection of an object that has a weak near-IR reflection. Further improvement of the sensitivity and dynamic range of the camera and optimization of the image processing technology to reduce the noise in the depth image would improve the quality of the synthesized image. This camera has the potential to create a more attractive and expressive images. Accordingly, it is planned to further improve the performance of the camera and expand its practical use in future program production.




ACKNOWLEDGMENTS

  The authors would like to give special thanks to Fumio Sato of NHK for continued encouragement and helpful comments. Thanks are also due to Haruhito Nakamura and Itaru Mizuno of Hamamatsu Photonics Corporation for their technical support in developing the image intensifier and to Hideki Mitsumine and Takashi Fukaya of NHK for their technical support in setting up the virtual studio.





Reference


[1] R. A. Jarvis, "A laser time-of-flight range scanner for robotic vision," IEEE Trans., Vol. PAMI-5, No. 5, pp.505-512, 1983.
[2] M. Rioux, "Laser range finder based on synchronized scanners," Applied Optics, Vol. 23, pp. 3837-3844, 1984.
[3] S. Inokuchi, Y. Morita, and Y. Sakurai, "Optical pattern processing utilizing nematic liquid crystals," Applied Optics, Vol.11, pp.2223-2227, 1972.
[4] S. Kimura, H. Kano, T. Kanade, A. Yoshida, E. Kawamura, and K. Oda, "CMU video-rate stereo machine," in Proceedings of 1995 Mobile Mapping Symposium (American Society for Photogrammetry and Remote Sensing, Columbus, Ohio) pp.9-18, 1995.
[5] H. Takasaki, "Moire topography," Applied Optics, Vol.9, pp.1467-1472, 1970.
[6] M. Kawakita, K. Iizuka, H. Kikuchi, H. Fujikake, J. Yonai, and T. Aida, "A 3D camera system using a high-speed shutter and intensity modulated illuminator," (in Japanese) Institute of image information and television engineers (ITE) Tech. Rep., Vol.22, No.57, pp.19-24, 1998.
[7] M. Kawakita, K. Iizuka, T. Aida, H. Kikuchi, H. Fujikake, J. Yonai, and K. Takizawa, Axi-vision camera (Real-Time Depth- Mapping Camera), Applied Optics, Vol.39, pp.3931-3939, 2000.
[8] M. Kawakita, K. Iizuka, T. Aida, H. Kikuchi, H. Fujikake, J. Yonai, and K. Takizawa, "Axi- vision camera: a three-dimension camera," Proc. SPIE, Vol.3958, pp. 61-70, 2000.
[9] Y. Yamanouchi, H. Mitsumine, and S. Inoue, "Image-based Virtual Studio using Ultra High-definition Omnidirectional Images," (in Japanese) The Journal of the institute of image information and television engineers, Vol.55, pp.159-166, 2001.


Masahiro Kawakita Masahiro Kawakita
Masahiro Kawakita received his B.S. and M.S. degrees in physics from Kyushu University in 1988 and 1990, respectively. He joined the Japan Broadcasting Corporation (NHK), Tokyo, Japan, in 1990. Since 1993, he has been with NHK Science & Technical Research Laboratories, and engaged in research on liquid crystal devices, optically addressed spatial modulators, and 3D cameras.
Mr. Kawakita is a member of the Optical Society of America, the Japan Society of Applied Physics, and The Institute of Image Information and Television Engineers of Japan.
Taiichro Kurita Taiichro Kurita
Taiichro Kurita completed his M. E. and Ph. D. degrees at Keio Gijuku University in 1980 and 1991, respectively. He joined NHK (Japan Broadcasting Corporation) in 1980. He began work in 1982 at NHK Science and Technical Research Laboratories, on research relating to television systems and signal processing of moving pictures (including HDTV, EDTV, etc.) and research on display methods and picture quality on PDPs and LCDs. Beginning in 1993, he had also held a visiting associate professorship at the University of Electro-Communications until 2000. He is the senior research engineer of Display and Optical Devices Division in the NHK laboratories.
Hiroshi Kikuchi Hiroshi Kikuchi
Hiroshi Kikuchi received his B.E. and M.E. degrees from Toyohashi University of Technology, Toyohashi, Japan, in 1982 and 1984, respectively.
In 1984, he joined NHK (Japan Broadcasting Corporation), Tokyo, and worked as a broadcasting engineer at the Kobe Broadcasting Station. Since 1987, he has been at the Science & Technical Research Laboratories of NHK, where he has been engaged in research on optoelectronic devices, such as spatial light modulators using liquid crystals, optical bistable devices using liquid crystals and semiconductor lasers, and optically addressed projection displays. Mr. Kikuchi is a member of the Japan Society of Applied Physics.
Yuko Yamanouchi Yuko Yamanouchi
Yuko Yamanouchi received the B. Eng. Degree in electronic engineering from Sophia University in 1986, and the M. Eng. Degree in electronic engineering from Tokyo Institute of Technology in 1988. She joined NHK in 1988. Since 1990 she has been with NHK Science and Technical Research Laboratories, where she has been engaged in research on image processing and computer graphics technique for TV program production and virtual studio systems.
Seiki Inoue

Seiki Inoue
Seiki Inoue got B.S. degree in electrical engineering, M.S. and Ph.D. degrees in electronics engineering all from Univ. of Tokyo in 1978, 1980 and 1992, respectively. He joined NHK in 1980. He worked at ATR Media Integration and Communications Research Labs. from 1995 to 1998. His interest includes image and video processing, computer graphics and virtual studio.

Keigo Iizuka Keigo Iizuka
Keigo Iizuka received his Bachelor Degree in Electrical Engineering from Kyoto University in 1955 and his Master and Ph.D. Degrees both from Harvard University in Applied Physics in 1958 and 1961, respectively. He was appointed to Lecturer at Harvard University from 1964 to 1968. He has been a Professor at the University of Toronto, Canada since 1968 to present.
He is a Fellow of Optical Society of America.
He authored following books:
Fundamentals in Engineering Optics from Ohm sha..
Engineering Optics from Kyoritu Shuppan.
Engineering Optics from Spring Verlag.
Elements of Photonics Volume 1 from John Wiley and Sons.
Elements of Photonics Volume 2 from John Wiley and Sons.


Copyright 2002 NHK (Japan Broadcasting Corporation) All rights reserved. Unauthorized copy of the pages is prohibited.

BackHome