Mindful Measurement and COVID-19

Sep 4

I view mindfulness as being present, or in the moment, meaning there is an intentionality to what you are experiencing, and an intentionality to being thoughtful, and reflective, about the situation at hand. I find it difficult to be mindful under pressure, such as when deadlines are looming, or when my mental or physical energy is occupied by some other consideration. A sense of urgency, frustration, or feeling the need to attend to other matters all run counter to me being mindful.

At the moment, I can think of no better context to discuss mindfulness in measurement than the consideration of the novel coronavirus and associated COVID-19 disease. COVID has affected if not upended essentially all aspects of society, including educational assessment, generating headline-grabbing controversies such as those surrounding the controversies about college admissions testing in the U.S. and the U.K.

But rather than the impact of COVID on measurement, I’d like to consider the measurement of COVID. Though this falls more in the domains of medical assessment and public health—and as will be quite clear, I am speaking about things I know quite a bit less about—the general point of relevance remains: to a large degree measurement is measurement, and the principles of measurement transcend applications. The upshot is that the lessons learned in educational measurement can be brought to bear in the measurement of COVID, and vice versa—what we are seeing in the measurement of COVID can inform our understanding of measurement in other domains.

In what follows, I will delve into just a few of the measurement issues involved in COVID testing. I do so through the public-facing aspect of measurement of the virus, as has appeared in things like the popular press, rather than delve into the academic literature.

To date, the primary focus has been on testing individuals for the virus. (I am focusing on the diagnostic (PCR) tests rather than serology tests.) And measurement issues have broken through as part of the discussion of these tests and the results. Measurement terms like sensitivity and specificity, false positives and false negatives, rates of infection, percent positive, and so on have become part of our discourse. We even see tutorials on how to conduct inference and attempts to explain and resolve confusion on the interpretation of test results.

Testing an individual for the presence of the virus would seem to be the atomic unit of measurement of COVID. Anything we want to do should be built out of that. Want to know geographical areas where the virus is prevalent? Well, just aggregate the tests of individuals! That should work! And it does, to an extent. However, a weakness here is that, by the nature of the virus (e.g., incubation period, how many people are symptomatic) and how people and society behave (e.g., under what conditions people seek out tests, when a test is even available, the turnaround time to get back results), aggregates of individual tests are at best retrospective accounts. They tell where the virus has been, or what things were like a few weeks ago. They tell us quite a bit less about what is happening in real time, or where things may be headed in the near future.

Note that there has been a shift in the desired inference: in shifting from the question of whether an individual has COVID to where COVID is prevalent in groups of people, we have shifted the question from one about medical testing to one about public health. As noted above, just using the tests of individuals is limited. To address the public health question of the scale of the outbreak and the desire to get ahead of the virus, we need different evidence, which may come in wildly different forms. Examples include testing for the presence of the virus in wastewater, or Google searches for symptoms such as a loss of smell.

The point is that different inferences may call for different forms of evidence, particularly if the results are to be actionable. A tabulation of the number of Google searches in the area where I live is unlikely to be actionable evidence regarding whether I, in particular, have COVID. Test results that lag days or weeks behind when a virus is transmittable is unlikely to be actionable regarding when communities are at risk. Complicating factors come in the form constraints—ethical, legal, or financial—regarding the testing at hand. Let us return to the situation of testing whether an individual has COVID. If the resources to conduct the testing are lacking, such that the turnaround times for test results are more than just a few days, it is unlikely that people will behave in ways that make efforts to mitigate the spread of the virus feasible; it may be that the tests are nearly useless for supporting those actions.

In light of these constraints, other approaches have gained attention. One is pooled testing, in which samples from a group, or pool, of people is simultaneously tested for the presence of the virus. A positive result would suggest that at least one person in the pool has the virus, but would not indicate who in the pool has the virus. To detect who in the pool has the virus, follow-up tests of the individuals’ samples would be needed. (Thus, these strategies often involve splitting the sample from each individual. For each person, half of their sample gets put in the pool to be tested; the other half is held out in case a test of each individual is warranted.) On the surface, this doesn’t seem ideal for the inference for any individual. But it might, if pooled testing helps clear the backlog of tests, conserves resources, and reduces the actual turnaround time for the individual. In addition, this approach may be better suited to also address the public health question. Even without conducting the individual follow-ups, the increasing presence of the virus in pools might suggest an outbreak in the area.

Another approach would be to use tests that give results more quickly, and are fairly inexpensive to boot. The tradeoff? They’re less accurate. But they may be better to deploy, given the current constraints under the following thinking: sure they’re less accurate, but they do give some evidence, and in a timely fashion that may be actionable. On the surface, the prospect of a less-accurate test as opposed to a more-accurate test is unappealing. But such a comparison may be something of a red herring, at least until the constraints change. That is, the operative framing isn’t one of a less-accurate test vs. more-accurate test, it’s one of a test that is less-accurate-but-still-useful-in-the-current-context vs. one that is more-accurate-but-mostly-useless-in-the-current-context.

Related questions have emerged on whether the diagnostic test results in place are even the proper results to consider or report. The yes-no conclusion from a diagnostic test is not actually the most granular or fundamental result from the test. These tests result in an estimate of the viral load, or the amount of the virus, in the sample. They do this through cycles; the fewer cycles needed to detect the virus, the greater the viral load. So what results from the test is more of a quantitative attribute: the count of the number of cycles needed to detect the virus. If the virus is detected within a certain number of cycles (usually around 40 cycles), the person is diagnosed with having COVID. This is a fairly high value, such that the test is incredibly sensitive. This may be a good thing for the diagnosis of the individual, but may be suboptimal for larger questions of the consequences for the individual (should you isolate?) or public health (is transmission likely?), if those individuals with lower viral loads are less likely to be contagious. Rather than categorizing the results into a yes-no dichotomy, a more differentiated approach may be called for.

Based on this short synopsis, what does COVID testing tell us about measurement? The above examples will likely resonate with those familiar with educational measurement in terms of things like validity, test use, standard setting, and the like. But it can be a reminder of several things that the educational measurement community knows, but sometimes overlooks. The desired inference should drive what evidence we wish to collect. The measures that will produce that evidence should be intentionally sought out or constructed. Different inferences call for different forms of evidence. The same evidence that is useful for one inference may not be useful for another. Measurement and inference do not occur in a vacuum. They are always subject to constraints. And they are useful to the extent that they are actionable in ways we want, which is evaluated not based on the test in isolation, but in how it is used by people, in the real world, subject to their purposes, constraints and so on.

What does COVID testing tell us about mindful measurement? That measurement, or rather, the thinking that needs to go on to be successful in measurement—scoping out the desired inference(s), nature and sources of evidence, competing demands, limiting constraints, etc.—is hard! And even harder when the pressures of the moment are bearing down! Again: looming deadlines, the feeling that time spent feels like time (or lives!) lost, having mental or physical energy being occupied by other considerations—these are the enemies of mindfulness.

If there is a lesson here, I think it is this: Because it is difficult to be mindful when going through tumult, it is of even greater importance to be mindful when we can, even if that means seemingly overloading in times of relative serenity. To quote an old saying in emergency management: “Disaster is the wrong time to exchange business cards. And it’s absolutely the wrong time to make up new procedures.” The time for such work is before disaster strikes, deadlines loom, and time slips away as we are otherwise occupied mentally or physically. Of course, with COVID testing, some of the ongoing work could not have be done otherwise: it is not plausible that we could have the diagnostic tests, or awareness about particular distinguishing symptoms, prior to the emergence of the virus. Nevertheless, it is difficult to overstate the value of more mindful efforts towards thinking through the possibly competing desired inferences, testing strategies, and test uses, all in light of societal or resource constraints.

Roy Levy

Roy Levy is a professor in the T. Denny Sanford School of Social and Family Dynamics, specializing in measurement and statistical analysis. His research and teaching interests include methodological investigations and applications in psychometrics and statistical modeling, focusing on item response theory, structural equation modeling, and Bayesian approaches to inference and modeling. He also works in areas of assessment design, focusing on evidentiary principles and applications in simulation-based assessments. For more on his work, visit https://sites.google.com/a/asu.edu/roylevy/

https://isearch.asu.edu/profile/947297

Mindful Measurement and COVID-19

The Role of Measurement in Promoting Diversity

Mindful Measurement: Reflections on Expertise, Power, and Biases in Research