A Technique For Determining the Optimum Rating Scale of Opinion Measures

H. Earl Pemberton
Fellow in Sociology, University of Southern California

H. EARL PEMBERTON

THE OBJECT of this study was to determine the optimum rating scale for a measure of group opinion. The opinion test for which a scale was desired is a series of twenty statements of possible legislative policy, for example: "The United States should adopt a general two per cent sales tax "A legal dismissal wage in industry is desirable in the United States." A rating scale was to be used by which those taking the test were to indicate the degree of their favorableness or opposition to the proposal. To clarify the problem we present a sample scale:

Figure 1

Our problem was to determine how coarse or how fine a scale would give the highest reliability to the measure; that is, should the scale range from +4 to -4 as does the sample above, or should it be finer, ranging from +5 to -5, or coarser, ranging from +3 to -3, or from +2 to -2.

Previous studies in the measurement of opinion by the use of a scale have used scales ranging from two points as


( 471) high as five points each side of a neutral position.[1] No one of these studies appears to consider the relative desirability of different size scales.

A coarse scale is generally more readily used than a fine scale. Hence, the coarse scale was regarded as more desirable if its use did not result in too great a loss of re-liability. Our problem was then to determine how coarse a scale we could use without lowering reliability beyond an arbitrary limit. This limit which we adopted was as follows : Loss in reliability permitted in order to make rating easier is the loss equivalent to a drop from .91 to .90.[2]

Our next problem was to devise a means for determining the reliability of the test. Four sets of the test of twentyitems were made, each with a different scale. The scales were from +2 to -2, +3 to -3, +4 to -4, and +5 to -5. These tests were given to 450 students in classes at the University of Southern California. Each class was divided into four equal parts and each fourth given one of the scales. Two days after he had first taken the test, each


( 472) student was given the same scale which he had taken previously. When the tests were given for the second time, the purpose of the experiment was explained to the students. Instructions were given that these second tests were not to be answered by attempted recall of what was answered the previous time. While there was, no doubt, considerable retention of what had been answered the first time, this factor was regarded as constant for each of the four scales.

We then correlated the first answers of each student to items 1, 5, 10, 15, and 20 with that student's answers the second time he took the test. All answers on the same scale were used in one correlation. Four correlations for reliability were thus obtained. The results were as follows :

Scale 2 to -2: .75 +.01

  3 to -3 : .82 +.01

  4 to -4 : .791+.01

  5 to -5 : .796+.01

According to the criteria adopted above, the scale from 3 to -3 is most desirable for this test. This scale has a higher reliability than either of the finer scales. Since coarseness is more desirable than fineness these finer scales are discarded. The scale from +2 to -2 has a reliability of .75-.01. This falls .07 below the reliability of the +3 scale. This drop is beyond the limit decided as permissable for the mere purpose of making the rating easier by making the scale coarser. This limit for a test with a reliability of .82 is about .805.

While a scale from 3 to -3 is most desirable for this particular measure of opinion there is no assurance that it would- be so for other tests. It appears probable that tests using such a scale plan should each be tested by some such method as the above to determine the most desirable scale.

Notes

  1. See for example: Floyd Allport, "The Measurement and Motivation of Atypical Opinion in a Certain Group,"American Political Science Review, (November, 1925); Harold S. Carlson, Information and Certainty in Political Opinions: A Study of University Students During a Campaign, "University of Iowa Studies in Character," (Iowa City, 1931), IV, 1; George Bradford Neumann. "A Study of Inter-national Attitudes of High School Students," Teachers College Contributions to Education, (New York: Columbia University, 1926), p. 239; G. B. Vetter, "The Measurement of Social and Political Attitudes and the Related Personality Factors," Journal of Abnormal and Social Psychology, XXV (1930), pp. 149-89.
  2. This arbitrary limit was adapted from Percival M. Symonds. See his "On the Loss of Reliability in Rating Due to Coarseness of the Scale." Journal of Experimental Psychology, 7: 456-460 (1924). P. 457: "One will tolerate more of a loss in reliability when the reliability is low to start with than when the reliability is high. It is very much easier to raise a reliability of .30 to .31 than of .90 to .91. Every gain in reliability above .90 is very valuable. The real criterion for this is the co-efficient of alienation which indicates improvement in estimate over a random estimate. This improvement is rapid as correlation coefficients go above .90. Let us assume that for the purpose of rating we are willing to lose a reliability of not more than .01 when the true reliability is .91. This arbitrary unit is reasonable, for generally we are satisfied for ordinary purposes with reliabilities over .90. This corresponds to a change of coefficient of alienation of .0213. Hence a drop from .91 to .90 is equivalent to a drop in reliability from .205 to .00, .228 to .10, .285 to .20, .361 to .30, .446 to .40, .535 to .50, .627 to .60, .721 to .70, .816 to .80, .91 to .90, .9567 to .95."

Valid HTML 4.01 Strict Valid CSS2