An Experimental Comparison of Statistical and Case History Methods of Attitude Research

V: Implication of this study for research on the theory of attitudes

Samuel Stouffer

The main purpose of this investigation was to determine to what extent there was agreement between findings as to attitudes by a statistical and case history method. This purpose has been served by the report of a correlation of +.86 and by the critical consideration, using quantitative checks whenever possible, of a score or more factors, which if improperly controlled, might have tended to make the correlation spurious. The writer's conclusion is that, as far as the present investigation shows, the test scores or the case history ratings could be used as indices of what they both purport to measure, without any important differences in results if either set of indices was to be studied in its relationship to other variables. The static frequency distributions of test scores and case history ratings differs somewhat, in that the latter tend to have a larger representation at the extremes; but this difference is of minor importance in comparison with the degree of correlation between the two sets of indices or in comparison with the fact that both sets yield about the same correlation when related to other variables,

Even if there should prove to be little or nothing in this particular investigation to fortify directly some logician’s theory of attitudes (or whatever synonym for attitudes he chooses to use), it is thought that some service has been done to attitude theory in showing to what extent statisti-

( 50) -cal and case history methods of attitude research agreed in their findings, and, as a by-product, in testing the reliability of each method apart from its validity. Many further experiments will be necessary, of course, to determine under what conditions and how far the values of the validity and reliability coefficients will vary. Particular attention is called, for example, to the fact that the wide dispersion of attitude scores in the present study tended to produce higher coefficients than one would expect ordinarily if the dispersion were less.

If further investigations tend to confirm the essential parts of the present results, the value for attitude research should be rather far-reaching, We may consider this from two standpoints„ both of which already have been referred to here and there in preceding pages:

(1) The fact that a simple test, which can be taken in 10 or 15 minutes and scored rapidly and objectively, is a fairly valid measure of attitudes would make it possible to study cheaply in a single investigation the relationship between the attitudes of several thousand subjects and a number of background factors. Take prohibition, for example. To what extent is this attitude associated with factors in the childhood environment? By dividing and successively sub-dividing our several thousand subjects according to various background factors which presumably could be determined with rough but sufficient accuracy from a direct questionnaire, one could evaluate the relative association between each of these factors

( 51) and the present attitude score. The use of a large number of subjects permits the introduction of experimental controls which may be superior to the mathematical controls of partial and multiple correlation and which, indeed, may be the only possible controls if the background categories are broad and qualitative. Moreover, by giving a retest with a parallel form of the test one could measure the relationship between the various factors and the shift over a period of time, thus getting clues, of importance to the theory of attitudes, as to the conditions under which an attitude changes — or, if me chooses to say it in another way, the conditions under which one attitude becomes replaced by another. The situation could be further controlled, as already has been attempted in some studies, by introducing in the interim a particular stimulus and then measuring its varying effect on the shift of attitudes of people with different backgrounds. However, the value of these studios depends in part on the validity as well as the reliability of the instruments of measurement. To the extent that the present investigation has demonstrated the validity of a testing technique of attitude research, to this extent it will lend additional confidence to the application of such a technique to more extensive and ambitious attitude research than even yet has been attempted.

(2) The fact that, contrary to the expectation of some students of attitudes, the present study has shown that it is possible to got fairly high agreement, even among laymen. with extremely diverse viewpoints, in interpretation of attitudes

( 52) from case history materials should lend encouragement to those who see in the case history a useful tool of attitude research. The case history provides a sequence of events in their cultural setting. A classification of these sequences into types, even if the research is based on a smaller number of cases than would be possible through statistical inquiry, should be a fruitful approach to the interpretation of processes of attitude formation. In particular, it should give clues to connecting links which escape the statistical worker who is limited to a consideration of the relationships expressed in abstract mathematical symbols. The present investigation shows that a fairly high agreement was reached by judges, even though the case histories were shorter and scantier of detail than they might be if obtained for a special study of the sequence of events in attitude formation. one particular caution should be added, besides those advanced in the section on reliability of the case histories. The present investigation measured the agreement of judges in making a single inference from a case history. As high an agreement might not be expected if the inference involved more elaborate conceptualization, such as lifting from the data a schematic picture of causal sequences.

We have then, as the fruit of the present study, a validity coefficient of +.86, fortified by an extensive series of checks using quantitative methods wherever practicable. If such validity is confirmed by subsequent investigation, it would tend to lend confidence to attitude research using either

( 53) of the statistical or case study methods examined here.

A further question may properly arise. In addition to the indirect contribution to attitude theory by way of fortifying confidence in two methods of attitude research, has this study any direct contribution to make to attitude theory? If it had, the contribution necessarily would be incidental to the main study as set up, yet might or might not be less important.

It would seem that if the findings of this study had any direct contribution to make to attitude theory, it would be in connection with the relation of verbal expressions to attitudes.

Professor Thurstone, in his discussions of attitude testing, has been explicit in disclaiming that the tests which he and his students have devised will measure every aspect of an attitude. The tests purport to use only one kind of indices, out of several possible kinds. They use the statements which a subject endorses.[1] They do not use, directly at least,

( 54) as indices overt non-verbal acts, for example. Professor Thurstone recognizes that a continuum of attitudes indicated by overt non-verbal acts might differ somewhat from a continuum of attitudes as indicated by endorsements of statements. These in turn might differ somewhat from a continuum of attitudes as indicated by feelings revealed subtly and indirectly by the tone in which statements are endorsed or by the spirit in which overt non-verbal acts are carried on. Professor Thurstone defines attitude as the "sum total of a man's in-

( 55) -clinations and feelings, prejudice, or basic, preconceived notions, ideas, fears, threats, and convictions about any subset and assumes that that portion of the "sum total" which is tapped by the verbal indices represents a large and important enough portion to justify saying that the test measures attitudes in somewhat the same sense as a yardstick measures a table if it measures the length though ignoring the volume or weight.

The present investigation confirms the findings of all previous investigations using the Thurstone method, as to the reliability of the test. That is, there is no doubt that the central tendency of scaled statements endorsed in a test situation measures something reliably. But, in addition, the present investigation has-found that this something is very nearly the same thing as attitudes inferred from the case history by either trained or lay judges. The indirect implications of this finding for attitude research in the future already have been discussed at some length. Would it be possible, on the basis of data thrown up incidentally in this investigation, to make any immediate generalizations about the qualitative nature of the relationship between (a) the central tendency of scaled statements endorsed in a test situation and (b) attitudes? The writer fears not. Nevertheless, some considerations may be set forth which not only may serve to indicate the difficulties in the way of making such immediate generalizations but also may aid some future investigators to make experiments which will clear away some of the difficulties.

( 56)

A consideration of the relationship between (a) and (b) requires first some definition of (b), namely, of attitudes. The writer's personal definition, if he had one, would be irrelevant here. What we would like to know is the definition employed by the judges, both trained scholars and laymen, whose inferences as to attitudes showed such high agreement. The judges were not handed any formal definition of attitudes for use in interpretation. Nor were the judges asked to write out a definition of their own. In neither case, incidentally, would one have assurance that the definition given was the one actually employed in making the inferences as to attitudes. It is quite conceivable that a judge might have a highly technical conception of attitudes and more or less unwittingly drop it in favor of a crude but simple common-sense conception in making his rating.[2] From what the writer gathered in talking with the judges after they made their ratings, they made little attempt to formulate a rigorous and technical definition. Apparently, they followed more or less the common-sense conception of attitude held by the layman, just as they would in a conversation with a friend with whom they discussed so-and-so's attitude toward prohibition. The concept of attitude, like the concept of love, or of religion, presumably varies even in Its common-sense usage. The reliability co-

( 57) -efficient of +.96 would seem to indicate that, if each judge used a common sense concept, the concepts used clustered rather closely around the modal group of meanings usually implicit in the word in lay conversation. What that modal group of meanings is the writer cannot attempt to say, for in formulating it he likely would fall into the trap of a logical schematization as abstract from actual usage as schematizations of love or religion. The reader understands roughly what a person means when he remarks, "So-and-so has a friendly attitude," and probably will arrive better at what the judges meant if he uses his intuition than if he translates somebody else's conceptualization.[3] But, obviously, this leaves one on uncertain footing for a comparison of (a) with (b) in the hope of deriving some immediate generalization, useful to attitude theory, with respect to the qualitative nature of the high relationship found between (a) and (b).

Thus crippled, one may nevertheless proceed somewhat farther. In the case histories, the judges had before them not only expressions of opinion but a summary of a life-time of experiences in their cultural setting, suffused with an emotional shading which cast its color over the whole picture and gave to opinions or events a meaning which often would not be discernible if they were Interpreted apart from their context. Here were indices of attitude (however the judges may

( 58) have defined attitude) which were not present in the Smith test, Yet the correlation between a measure from the presumably single kind of indices of attitude in the Smith test and a measure from the variety of indices of attitude in the case histories was +.86. What does this indicate as to the qualitative relationship between the two measures? Can one infer that+.86 represents solely the association between verbal expressions of opinion, as found in the two procedures, and that the difference between+.86 and unity represents that aspect of attitude, untapped by the test , which is tapped by the other indices available to the judges of the case histories? [4] If we could say this with any confidence, this would seem to be an important fact for attitude theory, even if such a quantitative formulation left the structure and function of the attitude without further qualitative explanation.

Unfortunately, there are a number of difficulties in the path of an interpretation.

In the first place, a part of the lack of correlation between the test scores and case history ratings can be explained, not on the ground of presence in the case histories of accurate groups of indices not possessed by the test, but on the ground of errors in both test and case histories which were not chance errors or which did not behave as chance errors in such a way that they tended to cancel one another out. The

( 59) subject may have consistently faked on the test, or consistently have had an antagonistic attitude toward taking the test, with the result that his mean score, though reliable in the sense that scores on the two parallel forms of the test agreed, would not reflect the same attitude as would the case history if he was honest and cooperative in writing the latter. Similarly, a subject who was conscientious and cooperative in taking the test, may have misled; all the judges alike in their interpretation of his case history, either by clumsy language, carelessness, omissions, overemphasis of certain details, or deliberate falsification. In either case the validity coefficient might be lowered. Indeed, it would seem that some margin of error of this type, even in instances where the subject was conscientious and frank, is to be expected. [5]

In the second place, part of the lack of perfect correlation may be explained on the ground that the attitude of the student, in a case where it was at all volatile, may have shifted somewhat between the time when he filled out the test and copied the final draft of his case history. The very process of writing the case history might have produced perceptible changes in attitude which would be reflected in the completed copy of the paper, but not in the test, because the test was given first. This might have been offset, in part, by a strain for consistency, the subject attempting to duplicate on the case history the same relative attitude score which he

( 60) thought he might have made on the test. [6]

In the third place, an explanation of the relationship between the two measures of attitudes which assumes that the +.86 represents only the relationship between two sets of "opinions" and that the difference between +.86 and unity (or +.74 and unity) represents the weight of other indices of attitude which are neglected by the test but considered by the judges of the case histories, involves a possibly erroneous assumption about the test. This assumption is that the test fails to take into account non-verbal experience and behavior in producing its attitude score. It will be sufficient to suggest here that the scale value of an opinion was the median value determined from ratings of 300 judges, who presumably judged by considering, "What would be the attitude of a person who endorsed an opinion like this?"[7] Now one of these 300 Judges may have thought that a given opinion would be endorsed, on the average, by acquaintances whose attitudes, inferred from a complex of overt acts and of feelings, might be considered decidedly opposed to prohibition. Another judge might

( 61) have thought that this opinion would be endorsed, on the average, by acquaintances whose attitudes, inferred from a complex of overt acts and of feelings, might be considered friendly to prohibition. If such a difference in inferences was comon among the 300 judges, the opinion was thrown out by an objective criterion of ambiguity. (See above, p, 14)  The statements selected were those in which the 300 judges tended to agree quite closely in their inferences as to the attitudes of which the statements were to be indices. To schematise the quick, more or less intuitive acts of the 300 judges in their examination of an individual opinion is, needless to my, a doubtful procedure. It would be easy for the writer to generalize too far from his own personal experience in serving as one of several hundred judges in constructing a ""one scale. Wt, putting it conservatively, it some rather unlikely that a Judge charged with the task of inferring an attitude hardly could avoid (even if he tried to avoid it) imagining some non-verbal as well as verbal acts o f himself and acquaintances,, in allocating a given opinion to an attitude scale.[8]


In the fourth place, even if the preceding three points all could be waived, it may be that the statement endorsed on the Smith test would not be found to correlate perfectly with the opinions expressed in the life histories. On the one hand, correlation might be reduced by the fact that the statements on the test were endorsed in an artificial situation different from the artificial situation in which the opinions in the case history were expressed.[9] The situations in which a statement is endorsed on a test or expressed in a life history are social situations The subject's endorsement is an act. When he writes he is talking to somebody, even though the situation is rather highly abstract.. The somebody may be himself, other individuals, or a generalized other. His responses may be representative of the kind of responses made in only a part of-the social situations in real life, or in none of them.

Some differences in expression of opinion through the two media

( 63) might appear if the imaginary interlocutor were different in each case or played a different role. On the other hand, there may be a qualitative difference between the respective sets of verbal expressions. It has been suggested, for example. that there is a considerable difference been a statement which is an article of faith and a statement which voices a more or less transient whim. If it should be that statements on the Smith test tended to be more or less of the latter sort and that the statements of opinion made in the case histories tended to be of the former sort (or vice versa), and if it should be that these two types did not correlate very highly, one would have to draw all the more heavily on the third point consider" above to account for the relationship found between the test scores and judges' ratings.

Finally, one may add that, even if it were possible in the present investigation to control all the factors discussed and evaluate precisely the extent to which the difference between the validity coefficient found and unity could be accounted for in terms of the failure of the test indices to tap that which was tapped by indices which were the peculiar contribution of the case histories, one would need. to be hesitant in making use of this in a general theory of the relation of opinions to attitudes. For the fact that opinions as measured by the test and opinions as measured by the ease history were the only or main source of correlation might be due to excessive weight given to the opinions, in preference

( 64) to other less tangible indices, by the judges who read the case histories.

To repeat what was emphasized at the beginning of this section, the important contribution of this investigation, if it has the importance which the writer and his advisers hope for it, Is primarily in showing that, under the experimental conditions here set up, it probably would make little difference in the results of actual research whether other variables were used in conjunction with the test scores as indices of attitudes or whether they were used in conjunction with the case history ratings as indices of attitudes. This rests on experimental evidence. A serious effort to evaluate the exact extent to which indices of attitude which are neglected by the test but considered in the life history are factors in keeping the validity coefficient from rising above +.88 must wait on the development of tools of greater precision than are now available. The validity coefficient of +.86 stands as one of the highest validity coefficients ever found, and, if confirmed by further investigations, will speak for itself in justification of the use of either the Thurstone method or the case history method [10] as a device for measuring attitudes, if attitudes are to be defined in a way which does not differ far from the modal common-sense usage of the term.


  1. Quite apart from the validity of the Smith scale, it is important to keep in mind the high reliability. If any single statement endorsed by a given person is considered, it is evident at once that it is likely to differ rather widely from the scale values of some other statements endorsed. How does one reconcile the high agreement (r = +.94) of mean scale values on two parallel forms of the test with the unreliability of statements considered individually? An explanation may be found in considering the type of "errors" present with respect to individual statements. (1) There are chance errors. (2) There are errors due to the variability of the scale value itself. (3) There are errors due to carelessness, the subject misreading a statement just as he might express a careless or unintentionally a typical remark in conversation. (4) There are errors due to the fact that each individual statement introduces other objects in addition to prohibition. Take, for example, the statement, "Prohibition is needed to conserve the family," One's response to this statement is influenced not only by his attitude toward prohibition but also by his attitude toward conservation of the family. But this statement has exactly the same scale value (2.5) as the statement, "Prohibition has been the means of eliminating a great economic waste of production and distribution of a useless commodity." One's response to the latter statement is influenced not only by his attitude toward prohibition but also by his attitude toward economic waste, toward the economic utility of alcohol for other than beverage purposes, and so forth. Therefore, it hardly could be expected that every person endorsing the one statement would endorse the other also. (The extent to which individual "parallel" opinions are endorsed by the same individual is one of the important factors in determining the reliability of the test by the more refined measure of internal consistency reported by Professor Thurstone in the Psychological Review, op. cit., May, 1929.) The explanation which presents itself of the high reliability of the mean score of a person's opinions, each of which, considered individually, is subject to a wide variety of errors, is found in the statistical theory of errors. Most of the errors above suggested are not, chance errors. Nevertheless, if they are so distributed that some tend to pull the subject's score below his true score and some tend to pull his score above, the mean may became an accurate and fairly stable index of his verbal expressions on the subject of prohibition, in the test situation.
  2. An investigator interested particularly in this point could set up an experiment using a rather large number of judges and study minutely the variability in ratings with reference to certain criteria in the formal definitions used by the judges.
  3. The above has been read and approved by each of the four graduate students who served as judges.
  4. In this case, it would be more precise, perhaps, to speak of the difference between the square of the correlation coefficient (+.74) and unity.
  5. These factors are discussed in detail in Sections II and III.
  6. These factors also have been discussed in detail in the section on the reliability of the case history ratings.
  7. The judgment may not have been made wittingly in this way by all or even any of the judges, but this is the imaginative process which Professor Thurstone usually asks the judges to go through. Dr. Smith did not suggest this as specifically as has been done in the construction of some subsequent scales. Even if the judge did not deliberately say to himself "Go to, I will now imagine what the attitude of a person would be if he endorsed an opinion like this," it is rather difficult to see how such an imaginative process could have been avoided in making his decisions.
  8. It may be pointed out that a judge's rating of attitude from the case history in the present investigation is also an inference. Given a description of experiences, the judge infers what would be the attitude of a person who would express the opinions, report the feelings, and engage in the overt acts here described? Back of the judge's inference are his own experiences and his imagination of the experiences of acquaintances in whose behavior he has seen the outward symbols of an attitude. Moreover, there are conflicting symbols in almost every document. The judge must weigh them and strike some sort of informal average which seems to his sympathetic introspection accurate. But one judge may give more weight to a given reported event than would another judge. Their informal averages will differ. By taking the mean of four of these informal averages ore cancels out, to a certain extent, individual judges' idiosyncrasies, and arrives at a value which has a fairly satisfactory degree of objectivity, as shown by the high reliability coefficients. It may be suggested, then, that one reason for the high validity coefficient found is that both methods are processes of averaging inferences as to attitudes, taking into account to some extent non -verbal to as well as verbal acts.
  9. Professor Faris calls particular attention to the importance of considering the situation in which an opinion is expressed. See Faris, op. cit., pp. 279-280.
  10. This is not to say that measurement is the sole, or even the most important, aim of the case history method. As Cooley has said, "The phenomena of life are often better distinguished by pattern than by quantity.. Measurement is only one kind of precision. What could be more precise, as a record of human behavior, than a motion picture. Yet it is not quantitative, Its precision is total, not incremental, a matter of patterns rather than of minute differences in space." -- Charles H, Cooley, "Case Study of Small Institutions as a Method of Research," Publications, of the American Sociological Society, XXII, 124-25.

