On Mindful Measurement, from a Distanced Perspective

Oct 6

“Mindful measurement” is a great name for a blog. It certainly sounds like a good thing, but the phrase is short enough that different people can have different views of what it means to them. I’ve enjoyed reading the entries that have been posted so far, learned from each one, and resonated with the writers’ unique insights and experiences. Here’s mine.

The title has two meanings. First, I’m looking back across a distance of nearly half a century in educational measurement. A cautionary note is therefore in order. I once saw a poster in a realtor’s office saying “Real estate values have gone up every year for 25 years in a row. That means this is a great time to buy real estate.” Not really. It means 25 years ago was a great time to buy real estate. All I claim for what follows is that it has bestowed me some occasional moments of mindfulness amid the deadlines and crashed drives.

Second, the perspective that has made for mindful moments comes from a handful of general ways for thinking about educational measurement that helps me make sense of the myriad details, uniquenesses, and peculiarities of the rapidly increasing variety of projects and problems we encounter. I can share the four quotes about what is important to think about in assessment, and three frames (epistemic frames, to be pedantic, or paradigms, if you prefer) that provide tools for thinking about them. They help to see each new project or problem from a broader perspective as a unique instance of foundational concepts in educational measurement. The exercise affords a bit of mindfulness as well as helping get the job done.

Four Quotations

The first two quotations are from Educational Measurement (3rd ed.) (Linn, 1989). They signaled a growing awareness in the field of new ways of thinking about assessment and measurement. The third and fourth are both from the same article by Sam Messick just a few years later, which took a major step forward in how to think about assessment applications in these terms.

Quotation 1: Validity is an integrated evaluative judgment of the degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness of inferences and actions based on test scores or other modes of assessment. (Messick, 1989, p. 13)

Quotation 2: Summary test scores, and factors based on them, have often been thought of as “signs” indicating the presence of underlying, latent traits. . . . An alternative interpretation of test scores as samples of cognitive processes and contents, and of correlations as indicating the similarity or overlap of this sampling, is equally justifiable and could be theoretically more useful. The evidence from cognitive psychology suggests that test performances are comprised of complex assemblies of component information-processing actions that are adapted to task requirements during performance. The implication is that sign-trait interpretations of test scores and their intercorrelations are superficial summaries at best. At worst, they have misled scientists, and the public, into thinking of fundamental, fixed entities, measured in amounts. Whatever their practical value as summaries, for selection, classification, certification, or program evaluation, the cognitive psychological view is that such interpretations no longer suffice as scientific explanations of aptitude and achievement constructs. (Snow & Lohman, 1989, p. 317)

I’ll say more later about what I’ve come to think of this last sentence.

Quotation 3: A construct-centered approach would begin by asking what complex of knowledge, skills, or other attribute should be assessed, presumably because they are tied to explicit or implicit objectives of instruction or are otherwise valued by society. Next, what behaviors or performances should reveal those constructs, and what tasks or situations should elicit those behaviors? Thus, the nature of the construct guides the selection or construction of relevant tasks as well as the rational development of construct-based scoring criteria and rubrics. (Messick, 1994, p. 16)

Quotation 4: [V]alidity, reliability, comparability, and fairness are not just measurement issues, but social values that have meaning and force outside of measurement wherever evaluative judgments and decisions are made. (Messick, 1994, p. 13; emphasis original)

Three Frames

All well and good; these quotations are thought-provoking, and they prompt us to think about the principles that underlie familiar practices in educational assessment and measurement. But they are all rather abstract. Like other educational measurement researchers at that time, I was moved to ponder just what those principles might be, and what they mean for practice—for designing, assessments, interpreting performance, for using them for the many purposes and contexts that assessments appear in. Three frameworks outside ed measurement per se have been invaluable to me for recognizing the principles that underlie ed assessment and measurement, both to help use them better and to see how they might extend to new developments in psychology, technology, and social contexts. They are evidentiary argument, a sociocognitive psychological perspective, and a subjective Bayesian approach to modeling.

Evidentiary Argument

I encountered the evidentiary argument frame in the writings of David Schum. His decades-long quest was to discover whether there might exist “a science of evidence” underlying the reasoning in the presence of uncertainty we find across varied forms and sources of evidence, concerning disparate inferences and purposes, spanning philosophy, logic, probability, statistics, history, medicine, psychology, and other disciplines. The answer, he concluded, is yes. I first found Evidence and inference for the intelligence analyst (Schum, 1987), somewhat less comprehensive than his The evidential foundations of probabilistic reasoning (Schum, 1994) a couple years later, but a more engaging read. Evidential foundations is a textbook, which students are assigned to read; Evidence and inference is from a course he taught to working analysts at the CIA, who would only read it if they wanted to. Seeing what we do in educational measurement as instances of these same foundations proves most illuminating—for better understanding what has been accomplished over the years, what we do in each application, and how we can move forward as psychology and technology advance. Toulmin diagrams and Wigmore evidence charts, for example, are splendid tools for working through the intricacies of assessment design (Q3) and measurement issues cum social values (Q1 and Q4). There is more going on than meets the eye.

Sociocognitive Perspective

I use the term sociocognitive perspective as a shorthand for the emerging confluence of ideas across social, cognitive, and situative psychology, as they apply to how people learn and act. I’m particularly interested in how people learn the kinds of things and act in the kinds of situations we encounter in the educational, work, and professional settings that we are concerned with in educational measurement. The key concepts draw from an across-person social view, a within-person cognitive view, a situative take on the interplay between the two, and a complex adaptive systems view of the variation and stabilities that emerge from the myriad interactions among individuals. Jim Gee’s (1992) readable The social mind was my introduction; I like Dan Sperber’s (1996) Explaining culture a lot too. Nick Ellis and Diane Larsen-Freeman (2009) edited a fascinating special issue of Language Learning on language as a complex adaptive system (CAS); it explains CAS and illustrates how this perspective has revolutionized linguistics.

This perspective provides us concepts, tools, and research findings to rethink assessment and measurement in the ways that Snow and Lohman urged (Q2). We can interpret constructs as regularities in the ways people think and act, through resources they have developed in their unique trajectories of personal experience, in linguistic, cultural, and substantive regularities across the myriad unique, socially-situated, experiences across time and people. There’s more going on here too than meets the eye. Digging into it can help us understand when and how across-person ed measurement models are useful and when they aren’t, and how to build them and use them better.

Bayesian Modeling

A subjective Bayesian approach to modeling has two facets. A Bayesian approach had proved useful earlier in my work with IRT, for a variety of problems like hard-to-estimate parameters, computerized adaptive testing, missing responses, multilevel models, and using collateral information about items and test-takers. The advantage was being able to put all of the structures, variables, and data into a full model and reason coherently in any direction. The subjectivist facet came later, in the form of Bruno de Finetti’s take on model-based reasoning in a Bayesian framework. Models express what we believe to understand, predict, and use what we observe, through Bayesian probability structures to handle uncertainty. The models are inevitably approximations of reality, based on theory and experience, take into account what we know and what we don’t, and fit the purposes and contexts of inference.

The Payoff

Together, these three frames take us back to Snow and Lohman and Q2. The widely used across-person measurement models with trait-based constructs can indeed be useful for purposes like selection, classification, certification, or program evaluation, but they are limited for the purposes of understanding the nature of peoples’ capabilities and how we develop them. Sociocognitive concepts and methods are better suited for those purposes. What’s more, sociocognitive concepts and methods are useful for understanding across-person models in context: when and how they work, and for whom and with what consequences they can be misleading. They are indispensable too, I think, for using the concepts and methods of educational measurement with interactive assessment tasks like simulations, games, and collaboration; the elements of the design and interpretation arguments are grounded in a sociocognitive perspective, and evidence and inference are managed through a subjective Bayesian structure. I wrote about how these frames came together for me in a book called Sociocognitive foundations of educational assessment (Mislevy, 2018).

A pursuit of ever-elusive mindful measurement has provided the occasional glimpse of the chalice, and perhaps some work that has benefitted from the attempt. But I should close with another cautionary note. I overheard a visitor to Yellowstone Park ask a ranger if he would see a bear. The ranger said “If you want to see a bear badly enough, you will see one. And what you see might actually be a bear.”

References

Ellis, N., & Larsen-Freeman, D. (Eds.). (2009). Language as a complex adaptive system. Language Learning, 59, Supplement 1.

Gee, J. P. (1992). The social mind: Language, ideology, and social practice. New York: Bergin & Garvey.

Mislevy, R. J. (2018). Sociocognitive foundations of educational measurement. New York/London: Routledge.

Schum, D. A. (1987). Evidence and inference for the intelligence analyst. Lanham, MD: University Press of America.

Schum, D. A. (1994). The evidential foundations of probabilistic reasoning. New York: Wiley.

Sperber, D. (1996). Explaining culture: A naturalistic approach. Oxford: Blackwell.

Robert Mislevy

Robert Mislevy holds the Frederic M. Lord Chair in Measurement and Statistics at Educational Testing Service. His work has included a multiple-imputation approach for integrating sampling and test-theoretic models in the National Assessment of Educational Progress (NAEP), a Bayesian inference network for updating the student model in an intelligent tutoring system, and an evidence-centered design framework for assessment. He has recently published books on Bayesian Networks in Educational Assessment, Bayesian Psychometric Modeling, and Sociocognitive Foundations of Educational Measurement. Among his honors and awards is the American Educational Research Association’s Raymond B. Cattell Early Career Award for Programmatic Research, the National Council of Measurement in Education’s Award for Technical Contributions to Educational Measurement (three times), AERA’s Lindquist Award, and the International Language Testing Association’s Samuel J. Messick Memorial Lecture Award.