The Erosion of High-Stakes Summative Tests: Adding a New Compass

Imagine a doctor who has only two forms of feedback on whether her treatments are helping patients. First, she can take direct measurements (e.g., body temperature), but no one collects this data when she is not with the patient. Also, the doctor has no way to order repeated lab tests (e.g., changes in liver function); as a result, she has little diagnostic information on which to base her interventions.

Second, if patients die, the doctor can get an autopsy report on what killed them; if they get well and leave the hospital, she can get a medical workup about their condition on release. However, both of these assessments come far too late to improve treatment. Without multi-dimensional, frequent, longitudinal, diagnostic data that enable formative shifts in her therapies, this doctor is severely handicapped in curing her patients.

Of course, this is the difficult challenge teachers face every day. They lack what medical staff have: frequent diagnostic assessments that include real-time guidance about appropriate individual interventions. In education, well-designed diagnostic measures can provide formative feedback that helps students and teachers improve, as well as generating learning trajectories of performance gains over time to inform parents, school administrators, policy makers, and other stakeholders.

Rather than frequently interrupting learning with benchmarking tests, “stealth assessments” provide guidance just-in-time, which means many students will do better when high-stakes summative tests are administered. These diagnostic assessments are not substitutes for psychometrically reliable and valid summative tests, but they can accomplish useful functions not offered by those high-stakes measures. I believe that Mindful Measurement requires these complementary approaches.

An evidence-based illustration of this approach is Dr. Joseph Reilly’s LENS enhancements to the EcoXPT curriculum. These automated assessment and formative feedback tools designed for virtual-environment-based curricula helped students learn about epistemology and the nature of science, increased the number of claims student teams made, increased the relevance of the evidence they used, and increased the accuracy of classifiers designed to predict student success.

Personalizing learning is important in preparing students for the next half century, as is inculcating knowledge, skills, and dispositions whose learning trajectories are easier to measure with longitudinal diagnostic assessments than with snapshot summative psychometric tests. Creative teachers have developed classroom assessments that accomplish some of these important objectives; the pandemic has underscored the importance of sharing those innovations across the world. But given the many responsibilities teachers have, asking them to develop their own fair and valid diagnostic assessments interwoven with learning is as unrealistic as expecting them to develop their own curriculum from scratch. 

Over the decades, I’ve served on many expert commissions that highlighted the opportunity to improve assessment by complementing high-stakes tests with pervasive diagnostic measures embedded in learning activities. Among others, these included a turn-of-the-millennium National Research Council Committee report, the US Department of Education’s 2010 Educational Technology Plan, a Research Symposium on Technology Enhanced Assessments convened by the Educational Testing Service in 2012, and invitational workshops on Data-intensive Research in Education commissioned by the National Science Foundation in 2015. This list could go on, with assessment specialists repeatedly agreeing that high-stakes tests alone are too limited a method for educational improvement, and that complementary diagnostic measures would be of substantial value.

Despite all this expert support and validation, over the decades this valuable innovation in assessment practice has never occurred at scale. Some of the barriers are financial, others institutional, and the dead hand of past tradition is always an obstacle to any educational improvement. Further, parents, school leaders, policymakers, and admissions officers want a single numerical score that determines who won in education’s competition for valedictorians, National Merit scholars, and admittees to elite colleges. Just as it would be in medicine, this simplistic, one-sided approach has a high long-term cost in ineffective instruction, suboptimal learning, and wasted human talent.

Suddenly, an unexpected situation has arisen to potentially break this impasse. Because of the pandemic, the business model for summative high-stakes testing is under pressure. It’s not financially practical to provide social distancing at proctored face-to-face sites for the volume of students who want these tests. As a result of this, a movement in higher education away from using these assessments for college admission has accelerated, and the major testing companies are experiencing hard times. The unavailability of these summative measures is a major problem, but also an opportunity, since transformative change in education occurs primarily when people are forced to do so because traditional methods have become impossible.

At present, many talented assessment professionals are seeking jobs outside of the traditional testing industry. This unprecedented situation offers the chance to aggregate that talent and develop sophisticated models for diagnostic/formative assessments that could pervade the curriculum. Internal groups at major assessment companies (e.g., ACTNext, ETS) have been conducting foundational work on models of assessment complementary to summative tests. I believe a non-profit organization with philanthropic funding could tackle this as a Grand Challenge to create and implement scalable education practices, programs, and tools that support teachers and students.

The Defense Advanced Research Projects Agency (DARPA) has achieved major technological breakthroughs through this type of strategic investment. A distributed portfolio of grants and contracts blending top-down funding through research centers with bottom-up innovation by small teams is a proven approach to transformational breakthroughs. For example, over several decades, a substantial amount of investment in intelligent tutoring systems and related forms of artificial intelligence in education has resulted in many benefits to teaching and learning, leading to a potential National Artificial Intelligence Institute in AI-Augmented Learning.

This could be a similar target of opportunity to create a suite of models for diagnostic/formative assessment, each generalizable across a particular type of instruction. As an illustration, a model could be developed for learning interpersonal skills through simulation; these assessment measures and methods could be refined through proof of concept research studies. After decades of advocating for stealth diagnostic/formative assessments, let’s seize the opportunity and act! We will reap valuable benefits for improving learning and teaching through mindful measurement.

References

Confrey, J. (2019). A synthesis of research on learning trajectories/progressions in mathematics. Paris, France: OECD Directorate for Education and Skills, Education Policy Committee. https://www.oecd.org/education/2030/A-Synthesis-of-Research-on-Learning-Trajectories-Progressions-in-Mathematics.pdf

Dede, C. (2012). Interweaving assessments into immersive authentic simulations: Design strategies for diagnostic and instructional insights (Commissioned White Paper for the ETS Invitational Research Symposium on Technology Enhanced Assessments). Princeton, NJ: Educational Testing Service. http://www.ets.org/Media/Research/pdf/session4-dede-paper-tea2012.pdf

Dede, C., Editor. (2015). Data-intensive research in education: Current work and next steps. Arlington, VA: Computing Research Association.
https://cra.org/wp-content/uploads/2015/10/CRAEducationReport2015.pdf

National Research Council. (2001). Knowing What Students Know: The Science and Design of Educational Assessment. Washington, DC: The National Academies Press. https://doi.org/10.17226/10019. https://www.nap.edu/catalog/10019/knowing-what-students-know-the-science-and-design-of-educational

Shute, V. J., Rahimi, S., & Emihovich, B. (2017). Assessment for learning in immersive environments. In D. Lui, C. Dede, R. Huang, & J. Richards (Eds.), Virtual, augmented, and mixed realities in education (pp. 71-89). Heidelberg, Germany: Springer-Verlag. http://myweb.fsu.edu/vshute/pdf/IE.pdf

US Department of Education. (2010). Transforming American Education: Learning Powered by Technology, Washington, D.C.: Office of Educational Technology, US DoEd. https://www.ed.gov/sites/default/files/netp2010.pdf

Thanks to Dr. Ed Dieterle and Professor Punya Mishra for their assistance!

Chris Dede

Chris Dede is the Timothy E. Wirth Professor in Learning Technologies at Harvard’s Graduate School of Education.  His fields of scholarship include emerging technologies, policy, and leadership. In 2011 he was named a Fellow of the American Educational Research Association.

https://www.gse.harvard.edu/faculty/christopher-dede
Previous
Previous

A Shift in Measurement Mindset

Next
Next

Mindful Measurement: A Multidisciplinary Team Leader Perspective