AbstractIn order to identify the conditions that make stereoscopic images easier to view, we analyzed the psychological effects using a stereoscopic HDTV system, and examined the relationship between this analysis and the parallax distribution patterns. First, we evaluated the impressions of several stereoscopic images included in standard 3-D HDTV test charts and past 3-D HDTV programs using some evaluation terms. Two factors were thus extracted, the first related to the "sense of presence" and the second related to "ease of viewing". Secondly, we applied principal component analysis to the parallax distribution of the stereoscopic images used in the subjective evaluation tests, in order to extract the features of the parallax distribution, then we examined the relationship between the factors and the features of the parallax distribution. The results indicated that the features of the parallax distribution are strongly related to "ease of viewing", and for ease of viewing 3-D images, the upper part of the screen should be located further away from the viewer with less parallax irregularity, and the entire image should be positioned behind the screen.
- 1. Introduction
- Dramatic progress in recent years in hardware such as cameras, recorders, and displays and the rapid improvement of digital technology have heightened expectations that 3-D television broadcasting will be achieved in the near future. The binocular method based on parallax, in particular, is technically close to practical application. Stereoscopic images produced by this method provide a greater sense of presence and immersion than ordinary two-dimensional images. In particular, stereoscopic HDTV will provide extremely realistic and dynamic 3-D pictures by taking full advantage of the high resolution and wide-screen features of HDTV1. On the other hand, these stereoscopic images pose some problems that require further research. For instance, too much binocular parallax may have a negative influence on viewers, and viewers may begin to feel tired (visual fatigue) from watching these parallax stereoscopic images for an extended period of time. These are problems for broadcasting as programs are watched and listened to by a large number of people for many hours. Visual fatigue is caused by a number of factors, including contradiction of convergence and accommodation2, differences in the characteristics of right and left images (size3, brightness4, 5, contrast), crosstalk6, 7, and excessive parallax. Parallax, in particular, tends to be too much, often exceeding the fusion range, as producers focus on the entertainment value of stereoscopic images.
This study examines the distribution of parallax in stereoscopic pictures that have been subjectively deemed easy to watch, and tries to identify parallax conditions favorable for stereoscopic images. This paper reports on the following:
- Results of a subjective evaluation test using stereoscopic pictures
- Analysis of parallax distribution on the screen
- Relationship between the results of subjective evaluation and the parallax distribution
- 2. Subjective evaluation test (factor analysis)
The visual impression of stereoscopic images and the psychological effects of these pictures were studied8 through a subjective evaluation test9 using factor analysis.
- 2.1 Evaluated images
For this subjective evaluation test, we chose ten scenes from among pictures used in the previous evaluation test. These pictures used in the previous evaluation test were chosen from the standard 3-D HDTV test charts10 for which shooting conditions and object positions were known, pictures used in the verification test on MPEG2 Multi-View Profile11 and those shot by a compact 3-D HDTV camera with a zoom function12. Each scene is a 15-second sequence; neither the object nor the camera was moved rapidly. Table 1 lists the name of each scene, shooting conditions, main shooting range of the scene, and objects in the scene.
Table 1 Evaluated images used in the experiment
No. Scene Focal
Camera movement Shooting range Images in the scene 1 Street organ 10mm 2.9m 70mm Dollying,Panning Near - middle distance A girl and street organ 2 An aquarium 10mm 3.5m 70mm Fixed Near Fish in an aquarium 3 Flower pot 10mm 6.1m 70mm Dollying Near - middle distance A girl and flower pots 4 A meal 12mm Parallel 65mm Fixed Near A meal at the dinner table 5 Market 12mm Parallel 65mm Fixed Near - middle distance A girl shopping in the market 6 Lion 40mm Parallel 65mm Almost fixed Far A lion on the prowl 7 Cheetah 40mm Parallel 65mm Almost fixed Far A cheetah roaming the savanna 8 A vocalist 12mm Parallel 65mm Fixed Near A singer singing in a studio 9 Bus stop 12mm Parallel 65mm Fixed Near - far People walking near a bus stop 10 Festival 12mm Parallel 65mm Panning Near - middle distance A portable shrine amid dancing confetti
*Intersection: The distance from the cameras to the intersection point of the lens axis or parallel lens axis.
- 2.2 Evaluation test
- To these 10 scenes were added their 2-D versions, making 20 images in total for the evaluation test. These images were randomly shown to the observers on a 120-inch screen and 70-inch screen by the 3-D HDTV system. The observers wore polarizing glasses. Table 2 shows the conditions of the experiment. Each of these images was shown twice on the 120-inch screen and the 70-inch screen. When assessing the 2-D images, left-eye images were shown to both eyes. It was made sure during the experiment that the polarization glasses stayed in place and that the test subjects did not know whether the image they were watching was 2-D or 3-D. A group of two to four together viewed the image on each screen at a time. The viewing distance was set at about 3H (H: screen height). The test subjects consisted of 99 people: males and females mostly in their 20s, and their stereoscopic visions were checked before the experiment. Based on the findings of the preliminary test, 13 adjectives were evaluated on a scale of one to five, as shown in Table 3.
Table 2 Conditions of experiment
No. of images 20 (10 scenes of 3-D pictures and corresponding 2-D images) No. of test subjects 99 (males and females, from teenagers to those in their 60s) No. of repetitions Twice Display system 3-D HDTV with polarization glasses Screen size 120-inch, 70-inch Peak brightness 21.4 cd/m2 (120-inch)
106 cd/m2 (70-inch)
Viewing distance about 3H (H: screen height)
4.5 m for 120-inch, 2.6 m for 70-inch
Table 3 Evaluation terms and grading scales (Original in Japanese)
1. Distracted by the screen frames 2. Ease of viewing 3. Looks like a miniature 4. Spreading out back from the screen 5. Sense of presence 6. Tiring 7. Sticking out from the screen 8. Straining to eyes 9. Natural feeling of depth 10. Large 11. Comfortable 12. Powerful 13. Flat Grading scales 5: Agree 4: Slightly agree 3: Neutral 2: Slightly disagree 1: Disagree
- 2.3 Results
- The results of the subjective evaluation were subjected to factor analysis, and from the results we picked up two factors whose eigenvalue was over 1. The contributions of these factors were 32% and 26% respectively, giving a cumulative contribution of 58%. Figure 1 shows the factor loading of each evaluation term in relation to these two factors. Of these 13 evaluation terms, "sense of presence" scored the highest (largest factor load) for factor 1, and "ease of viewing" scored the highest for factor 2. From these results, we let "sense of presence" represent factor 1 and "ease of viewing" factor 2.
Fig. 1 Relationship between two factors and evaluation termsFigure 2 shows the distribution of factor scores of the images on the 120-inch screen in relation to these two factors.
Factor 1 (Sense of presence)
Fig. 2 Factor scores of images in relation to factor 1 and factor 2
- 2.4 Consideration
- 2.4.1 Factor loading
In Fig. 1, evaluation terms "sense of presence" represents the factor 1 axis, and "ease of viewing" the factor 2 axis. This arrangement indicates that the evaluation term "ease of viewing" is scarcely affected by factor 1, and that "sense of presence" is scarcely affected by factor 2. On the other hand, terms "tiring" and "straining to eyes" are both away from these two axes, meaning that they are affected by both factor 1 and factor 2.
Evaluation terms "spreading out back from screen" and "sticking out from screen" are toward the right along the lateral axis on the positive side. This means that an image with strong depth perception creates a strong sense of presence. On the negative side down the lateral axis is an evaluation term "flat." This is because half the pictures used in the experiment were 2-D and, at the same time, these 2-D images scored low on "sense of presence." As for "miniature," a phenomenon which reduces the sense of presence of 3-D images 13, the absolute value of the factor loading is the smallest. This is probably because none of these 3-D pictures used in the experiment showed strong miniaturization effects.
Along the vertical axis on the positive side is a term "spreading out back from screen," indicating that a stereoscopic image with natural depth or natural stereoscopic effects is easy to view. "Distracted by frames," on the other hand, is designed to assess the degree of frame canceling that obstructs the binocular fusion when viewing a stereoscopic image. It is located in the negative region down the vertical axis as the occurrence of this frame canceling makes a 3-D image hard to view.
- 2.4.2 Factor 1 "sense of presence"
- Figure 2 clearly shows that for factor 1 "sense of presence," stereoscopic images have greater factor scores than 2-D images. To understand this in greater detail, we analyzed the evaluation term "sense of presence," which has the largest factor loading for factor 1, in terms of three factors: A (2-D or 3-D), B (screen size), and C (scene). The results show that, at the 1% significance level, the factorial effects and their interaction with factors A and C are significant. The results of this analysis of variance are shown in Table 4. Comparing the F values of the table, 3-D images provide a much stronger sense of presence than 2-D images, though the degree seems to depend on the type of pictures. On the other hand, significant factor effects do not occur from different screen sizes.
- 2.4.3 Factor 2 "ease of viewing"
Figure 2 shows that the scores of the 2-D images for factor 2 are nearly the same, but that those of 3-D images are widely distributed depending on the scene. For instance, the score of one 3-D image, scene No. 1, is very low, but that of scene No. 10 is higher than that of any 2-D image. This means that "ease of viewing" is largely dependent on the scene. To understand this point better, we analyzed the evaluation term "ease of viewing," which has the largest factor loading for factor 2, in terms of three factors: A (2-D or 3-D), B (screen size), and C (scene). Table 5 shows the results.
Table 4 Analysis of variance for "sense of presence"
Factors Sum of squares deviation Degree of freedom Average square Fvalue A (2-D or 3-D) 2445.59 1 2445.59 3349.54** B (screen size) 0.21 1 0.21 0.29 C (scene) 110.29 9 12.25 16.78** A x B 0.16 1 0.16 0.22 A x C 46.76 9 5.20 7.12** B x C 5.40 9 0.60 0.82 A x B x C 2.98 9 0.33 0.45 Error 2862.10 3920 0.73 Total 5473.49 3959
Table 5 Analysis of variance for "ease of viewing"
Factors Sum of squares deviation Degree of freedom Average square Fvalue A (2-D or 3-D) 337.75 1 337.75 579.79** B (screen size) 0.43 1 0.43 0.75 C (scene) 350.70 9 38.97 66.89** A x B 0.52 1 0.52 0.90 A x C 234.57 9 26.06 44.74** B x C 10.41 9 1.16 1.99* A x B x C 6.31 9 0.70 1.20 Error 2283.54 3920 0.58 Total 3224.24 3959
**1% significance *5% significance
- Comparing F values in Table 5, in addition to whether the image is 2-D or 3-D, different scenes and their interaction also cause significant effects. Interaction is also detected between factor B (screen size) and factor C (scene). In the case of scenes No. 4-10 (which were shot with parallel lens axes), the horizontal position of right-left images is adjusted depending on the screen size such that the infinite point during the shooting will be at the infinite distance when displayed on the screen. Therefore, even the same scene may create a different depth perception depending on screen size. However, as seen in Table 4, screen size did not appear to affect ease of viewing.
- 3. ANALYSIS OF PARALLAX DISTRIBUTION
- (PRINCIPAL COMPONENT ANALYSIS)
- We subjectively examined the sense of presence that stereoscopic images create and the ease (or difficulty) of viewing these images by factor analysis. In this chapter, we examine scene characteristics, focusing on the distribution of parallax. As there are no rapid camera or object movements for each scene, the analysis is made for the top one frame.
- 3.1 Detection of parallax
- To detect parallax, we employed the block matching method at the brightness level of 16 vertical pixels and 16 horizontal pixels. In the case of 3-D HDTV pictures, therefore, we can obtain parallax distribution data made up of 64 blocks (vertical) and 120 blocks (lateral). Of the ten scenes used in this experiment, those that had been shot with parallel optical axes (No. 4 - No. 10) required horizontal position adjustment13 in right-left images in accordance with the size of the projection screen. Parallax was therefore detected from these scenes separately on the 70-inch screen and the 120-inch screen. The accuracy of parallax detection by block matching is largely dependent on the type of image. Detection precision drops where there is not much brightness variation. For instance, detection errors were conspicuous in the scene No. 8 background (on-stage area behind the singer and band, where the brightness level is flat and low), so scene No. 8 was later deleted from the list of detection.
For the remaining 9 scenes and 2 screen sizes, we obtained 15 parallax distribution data. These data were then divided into 9 domains to facilitate analysis. As a main object occupies a large area in the screen center, these domains were weighted in accordance with their positions (see Fig. 3 and Fig. 4). The screen was divided into nine domains: the center (domain 5, taking up 4/16 of the total screen area) and its peripheral domains, central areas of these peripheral domains (domains 2, 4, 6, 8, taking up 2/16 of the screen area), and four corners (domains 1, 3, 7, 9, taking up 1/16 of the whole area). The parallax distribution data within each domain were then averaged. Here, negative values mean cross parallax that project forwards from the screen, while positive values are non-cross parallax aligned toward the back of the screen. These parallax data in the nine domains were subjected to principal component analysis. The results show that the data can be grouped into two principal components at the cumulative contribution factor of 92.5% (83.2% for primary principal component and 9.3% for secondary principal component). Figures 4 and 5 show the loading of these two principal components in each domain.
Fig. 3 Primary principal component
loading of domain
Fig. 4 Secondary principal component
oading of domain
- 3.2 Results of principal component analysis
- Primary and secondary principal component scores can be calculated by using the results of the principal component analysis discussed in section 3.1.
- 3.2.1 Primary principal component
- The primary principal component score in each image is expressed with the following equation.
The coefficients of these domains are all positive, falling in a narrow range of 0.85 - 0.97. This means that, in stereoscopic images, the greater the parallax in the positive direction (farther toward the back), the higher the scores for the primary principal component.
- 3.2.2 Secondary principal component
The secondary principal component score in each image is expressed with the following equation.
The coefficients of the domains are as follows: positive for the top three domains (1, 2, 3) and negative for the bottom three domains (7, 8, 9). This means that, in stereoscopic images, the farther toward the back the top part is and farther toward the front the bottom part is, the higher the scores for the secondary principal component.
- 3.3 Distribution of parallax between domains (dispersion)
Some stereoscopic images have an extremely large depth ranging from near the viewer to infinite distance, while others are very shallow with a very narrow range of depth reproduction. We added these facts as the inter-domain dispersion of parallax data and analyze them in the next section. The dispersion is given by the following equation.
- 4. RELATIONSHIP BETWEEN SUBJECTIVE VALUES AND PARALLAX DISTRIBUTION
- By using multiple regression analysis, we analyzed the relationship between each image's primary principal component score, secondary principal component score, and inter-domain dispersion, which had been obtained through the principal component analysis of parallax distribution, and those factors that had been obtained in the subjective evaluation experiment.
- 4.1 Factor 1 "sense of presence"
Clear relationships were not recognized between the factor "sense of presence," and primary principal component, secondary principal component, and parallax dispersion. The determinant coefficient of multiple regression is 0.4, meaning that regression is not significant. This means that no relationship is recognized between the "sense of presence" of a stereoscopic image and the distribution of parallax.
4.2 Factor 2 "ease of viewing"
Clear relationships were recognized between the factor "ease of viewing," and primary principal component, secondary principal component, and parallax dispersion. The determinant coefficient of multiple regression is 0.85, meaning that regression is significant. This means that there is a significant relationship between the "ease of viewing" of a stereoscopic image and the distribution of parallax. The factor score for "ease of viewing" is obtained by the following regression formula.
Of the coefficients of this equation, those of the primary principal component score and secondary principal component score are positive, and that of parallax dispersion is negative. This means that an easy-to-view stereoscopic image scores high for the principal components and and has a small dispersion of parallax. In terms of the absolute values of these coefficients, the secondary principal component and the parallax dispersion exert a greater influence than the primary principal component.
From these results, a stereoscopic image is easier to view if the parallax is shaped such that its bottom projects forward from the screen and its top is drawn toward the back. It is also significant that there is less parallax irregularity between large areas. Stereoscopic images that are located toward the back are also easier to view than those in the front.
- 5. CONCLUSION
- We examined 10 scenes including a standard chart with known shooting conditions and object positions, comparing them with corresponding two-dimensional images in terms of such factors as sense of presence and ease of viewing in a subjective evaluation experiment. The results show that stereoscopic images provide a greater sense of presence than 2-D images, but that their ease of viewing is largely dependent on the scene; in fact, some stereoscopic images were easier to view than their 2-D counterparts. Next, in order to extract characteristics from the scenes, we performed principal component analysis for the parallax distribution of each image. Lastly, we conducted a multiple regression analysis to clarify the correlation between the characteristics of parallax distribution and the sense of presence and the ease of viewing. The results show no clear relationship between the sense of presence and the parallax distribution, but a strong relationship was found between the distribution and the subjective ease of viewing. The obtained multiple regression formula can be used to create stereoscopic images that are easier to view.
However, the characteristics need to be extracted from the parallax distribution in greater detail, which requires a larger number of domains and studies of images with more complex distribution patterns of parallax. As for the range of parallax distribution that contributes to the ease of viewing, we need to understand the limits and tolerance range of binocular fusion. These studies will further clarify the conditions for creating stereoscopic images that are easy to view.
We would like to express special thanks to Y. Nojiri, S. Yano, I. Yuyama, and many colleagues for their assistance and many useful discussions in this study, as well as the people who cooperated on the evaluation tests.
1. I. Yuyama, M. Okui, "Stereoscopic HDTV," SPIE Three-Dimensional Video and Display: Devices and Systems, Vol. CR76, 2000.
2. N. Hiruma and T. Fukuda, "Accommodation Response to Binocular Stereoscopic TV Images and Their Viewing Conditions," SMPTE J., Vol. 102, pp. 1137-1144, 1993.
3. H. Yamanoue, M. Nagayama, M. Bitou, J. Tanada, T. Motoki, T. Mitsuhashi, M. Hatori, "Tolerance for Geometrical Distortion Between L/R images in 3D-HDTV," Systems and Computers in Japan, Vol. 29, No. 5, 1998.
4. B. Choquet, F. Chassaing, J. Fournier, D. Pele, A. Poussier, H. Sanson, "3D TV Studies at CCETT," Proc. TAO 1st Int. Symp., 1991.
5. Beldie, Kost, "Luminance asymmetry in stereo TV images," Proc. SPIE Stereoscopic Displays and Applications II. Vol. 1457, 1991.
6. S. Pastoor, "Human Factors of 3D Imaging: Results of Recent Research at Heinrich-Hertz-Institute Berlin," Proc. IDW'95, Vol. 3, pp. 69-72,1995.
7. A. Hanazato, M. Okui, H. Yamanoue, I. Yuyama, "Evaluation of Cross Talk in Stereoscopic Display",Proc. 3D Image Conference '99, 10-3, pp. 258-263, 1999.
8. "Subjective assessment of stereoscopic television pictures," ITU-R Rec., BT.1438
9. S. Ide, H. Yamanoue, M. Okui, I. Yuyama, M. Bitou, N. Terashima, "Subjective evaluation tests for sense of presence and ease of viewing in 3D-HDTV system," Proc. ITE Annual Convention, pp. 97-98, 1999.
10. H. Yamanoue, M. Emoto, M. Okui, S. Yano, and T. Yoshida, "Stereoscopic Test Materials," Proc. IDW'98, 3D3-2, pp. 815-818, 1998.
11. "Report on the verification test on multiview profile," ISO/IEC JTC1 SC29/WG11 N1373, 1996.
12. H. Yamanoue, M. Okui, F. Okano, and I. Yuyama, "Development of a Compact 3D HDTV Camera with Zoom Lens and Psychological Effect of the Images," Proc. IDW'99, 3D2-3, pp. 1071-1074, 1999.
13. H. Yamanoue, M. Nagayama, M. Bitou, J. Tanada, "Orthostereoscopic conditions for 3D HDTV," Proc. SPIE Stereoscopic Displays and Virtual Reality Systems V, Vol. 3295, pp. 111-120, 1998.
Hirokazu Yamanoue received his M.S. degree in 1987 and his Ph.D. degree in 2000 in electrical engineering from Waseda University in Japan. Since joining NHK Science and Technical Research Laboratories in 1989, he has engaged in research and development of 3 dimensional television system including 3-D HDTV systems and program production.
Dr. Yamanoue is a member of The Institute of Image Information and Television Engineers of Japan (ITEJ), the Institute of Electronics, Information and Communication Engineers (IEICE), Virtual Reality Society. of Japan (VRSJ) and Japanese Society for Medical and Biological Engineering (JSMBE).
Mr. Shinji Ide
Shinji Ide received the M.E. degree in electrical engineering in 1992 from Shibaura Institute of Technology. He joined Japan Broadcasting Corporation (NHK) in 1992. Since 1995, he has been with the Science and Technical Research Laboratories of Japan Broadcasting Corporation, where he is engaged in three dimensional audio-visual systems research.
Makoto Okui graduated from Tokyo Institute of Technology in 1980 with a M.E. degree in the area of digital signal processing. He has been working as a research engineer at NHK Science and Technical Research Laboratories, Tokyo since 1983.
From 1988 through 1995, he has been involved in the development of various Enhanced Television systems including HDTV system for terrestrial broadcasting and EDTV-II system.
He is now Senior Research Engineer of Three Dimensional Audio-Visual Systems Research Division of the Laboratories and his current research interests are stereoscopic HDTV systems, 3D display technologies and related studies of human factor.
Fumio Okano received the B.S., M.S., Ph.D. degrees in electrical engineering from Tohoku University, Sendai, Japan, in 1976, 1978, 1996 respectivly.
He joined the Japan Broadcasting Corporation (NHK), Tokyo, Japan in 1978. Since 1981, he has been with NHK Science and Technical Research Laboratories, and engaged in research on HDTV cameras, HDTV systems, television standards converters, and 3D television.
Dr. Okano is a member of the Optical Society of America (OSA), the Society of Photo-Optical Instrumentation Engineers (SPIE), and the Institute of Image Information and Television Engineers of Japan (ITE).
Mineo Bitou was born in 1934. He graduated from School of Science & Engineering, Waseda University in 1956. He joined Sony Corporation and mainly worked at marketing department of V.T.R since Sony developed first V.T.R. in 1963. He was seconded to Sony PCL Corporation in 1989 and was senior general manager for HDTV programming technology division. He engaged in research and development on 3D-HDTV Programming Technology of Telecommunication Advancement of Japan (TAO) in 1996. He has joined Global International Telecommunications Institute since 1998 and Mediaglue Corporation since 2000 and has engaged in development of digital image programming. He is a member of ITEJ (The Institute of Image Information and Television Engineering of Japan).
Dr.Nobuyoshi Terashima is currently the Dean of the Graduate School of Global Information and Telecommunication Studies,Waseda University,Tokyo,Japan.
He is engaged in the research on HyperReality(HR) and its applications.
HR is a paradigm for the 21st century and provides an environment where a real and a virtual can communicate together.
He is now collaborating with Victoria University of Wellington and Queensland open Learning Network,Australia on HyperClass which will be one of the potential applicaions of HR.
He has written a book on Intelligent Communication Systems which was published by Academic Press,USA in 2001.
It is a derect result of over a decade of his research and education.
He had also edited a book on HyperReality-Paradigm for the Third Millennium which was published by Ruotledge,UK in 2001.
He is a senior member of IEEE,and members of many academic societies.