Mindful Measurement: A Multidisciplinary Team Leader Perspective
Written by Susan Lottridge, Senior Director of Automated Scoring at the Cambium Assessment, Inc. (CAI). She leads CAI's machine learning and scoring team on the research, development, and operation of CAI's automated scoring software for automated essay scoring, short answer scoring, speech scoring, and the detection of disturbing content in student responses. Dr. Lottridge has worked in automated scoring for 12 years and has contributed to the design, research, and use of multiple automated scoring engines. In this blog post, she takes the guiding questions I posed to all my blog contributors to discuss her perspective on the impact of mindful measurement on leadership in this kind of role.
What does mindful measurement mean to you?
To me, mindful measurement is the practice of thinking about, communicating, and doing psychometric work that’s important, meaningful, and practical. Most critically, it’s about understanding why a project exists. What problem are we trying to solve and for whom? Why do they care, and what does success look like? What happens if we fail to deliver? How do we best deliver given the constraints and affordances in this organization? These questions sound more like the those of business than psychometrics and don’t typically appear in a standard technical reports or presentations. However, I think our work greatly benefits from asking and answering these types of questions because they frame how we build, deliver, score, and report measurement information.
Regularly communicating the answers to these questions to staff is critical, as these serve as the basis for decisions around the design of measurement systems and studies. They also inform the day-to-day decisions of staff that may not never be explicitly reviewed. On my teams, I have continually discussed the why, have asked staff to tie their decisions to the big picture, and have asked them to articulate why and how their choices align with the goal of the project at hand. Any choice around sampling, data cleaning, analysis, and reporting should be explainable, defensible, and documented relative to the goal. Being mindful – even about the smallest of choices – helps to reduce re-work and creates a culture on a team that is oriented toward the key goals.
Innovation is embedded in mindful measurement. Circumstances change and so will measurement. Expecting to continually improve helps to build not just resilience to change, but also a willingness to view change as a positive driver for better measurement. On the automated scoring front, the use of deep learning, greater computing power, and access to large corpora has already significantly changed how we build and validate models. It’s too early to see how COVID-19 and the re-examination of structural racism will impact measurement, but I can imagine them driving greater use of automated scoring and requirements that we produce evidence-based results around fairness in automated scoring.
How are you creating appropriate mindsets and practices for mindful measurement in your own work when working by yourself, with team members, and with disciplinary colleagues?
Communicating regularly with people outside the team – other staff at my organization, teachers, state-level personnel – helps to ensure that I and the team are on the right track. People in these roles are customers of our services at various levels from low-level (receiving scores and meta-data for processing), mid-level (receiving scores via reports for the purposes of instruction), and high-level (receiving aggregate results and responding to queries from other stakeholders around scores). Conversations with each user group help to refine the purpose, identify areas to improve, and ensure our team doesn’t become too insular in its focus.
The team I lead now is multi-disciplinary – we have software engineers, data scientists, linguists, and psychometricians – all of whom have different concerns, priorities, and backgrounds. It takes the full team to create an automated scoring product, and it’s critical that people can communicate in ways that we all understand and at the level of detail needed in order to meaningfully contribute. If we don’t do this, then key issues are missed or discovered too late to correct. Keeping the team focused on the key drivers for the work is helpful in that each person can see their (and other’s) role in supporting the final deliverable.
Finally, because innovation is a key component of our work, it’s also critical to value experimentation, competing views, and expect that there will be things we try that don’t work. I’ve come to realize there is no single right way to do something; rather, there are multiple ways and we just need to pick one. Moreover, the best way to resolve competing perspectives is to empirically test them with data and to iterate continually to improve both our work and our team.
What are some success stories that you can share where mindful measurement has made an important difference?
The first example concerns reproducibility of scores. In a summative context, reproducibility is paramount; we must be able to exactly reproduce scores for all examinees within a test window. This means that we need to be sure all processes in scoring are deterministic. If engineers or data scientists are not aware of this requirement, they may encode stochastic behavior into the automated scoring process (e.g., randomization of spell-correction choices) or they may have higher tolerance for score differences than would a psychometrician (e.g., 1% of scores different is acceptable). Stating a non-obvious requirement explicitly and testing that the requirement is satisfied is one area where mindful measurement comes into play; staff would not know to care about something like reproducibility unless they understood why a client would value it.
Another example concerns standards and metrics for performance. Engineers and data scientists may not know that there are common metrics and standards for automated scoring performance, and they may not know that there are defined processes and standards around hand-scoring. These metrics and methods are somewhat specific to our field. Making sure staff know these helps them to know the targets for evaluation as they build engines and train models and helps to frame their understanding when interpreting differences in scoring.
A final example concerns maintainability of systems. Educational measurement is generally low-risk compared to other technology areas; changes are introduced gradually and tested heavily to examine the impact on scores prior to their use. Again, explaining this culture and the rationale behind it to staff helps to situate the kinds of innovation that are feasible and lays the foundation for iterative improvement as one key source of innovation. We are in the process of releasing a new version of our automated scoring engine, and great effort has been expended on ensuring it has a similar architecture and output to the original version while also embedding in it key improvements that are amenable to our clients.
What recommendations do you have for others for how to engage in mindful measurement?
Mindful measurement is a mindset that that frames how to do work. I think a big piece of it is setting ego aside while focusing on creating or implementing the best possible work for a client while also adhering to the standards of our profession. Focusing on client needs inherently keeps the work practical, important, and meaningful. Prioritizing client needs and focusing on the ‘why’ of the work as a leader motivates staff to do their best work because they see the impact of their work clearly.
As for how to do the work, I think it comes trusting staff to do their best work and being engaged with the work regularly – daily, in my case – to respond to questions, give feedback, and keep a diverse team on track. Trust is essential to give staff the freedom to solve problems given their expertise, and results in better work overall!