Mindful Measurement: The Case for Simplicity

Jul 16

The assessment world is pushing further into: the assessment of difficult to measure constructs, the use of log file data, the use of machine learning and AI, activities designed to measure multiple constructs simultaneously, and assessment without tests. Each of these things individually presents complex problems and, in combination with each other, adds exponential complexity. In addition, when we try to add these innovations into real world classrooms with their already established systems, norms, and practices, we often run into bigger roadblocks.

In his initial post on Mindful Measurement, André points out this complexity (along with the speed that seems to be associated with it) often creates a conflict with a more mindful (thoughtful, aware, present) approach to measurement. I have been doing work in many of these areas for more years than I would like to count now, particularly on implementing these new ideas at scale, and can confirm our attempts at innovation often are at odds with principles of mindfulness. Not only that, but our push for new kinds of tasks, new kinds of evidence, and new kinds of models have yet to yield results that support some of our complex systems are better than our simpler systems created mindfully.

A lot of noise without a lot of signal

There is a lot of hype around all the data we will have in digital systems, with the accompanying claims of how it will let us make better decisions for learners. However, my experience so far is that there is a lot of noise and very little signal in that data. This would be OK if we truly had huge amounts of data but so far in education, our production systems do not. So, what end up being our strongest signals are the things we carefully planned and designed into our digital experiences.

This was the case in the work on SimCityEDU. I was lucky enough to be part of a team tasked with developing game-based assessments based on Sim City. We decided to target systems thinking as our construct of interest and designed scenarios to elicit learners’ understanding of univariate and multivariate cause and effect in a system. In a key level, we asked players to come and fix the air pollution problem in a city. The scenario was designed such that much of the pollution was coming from coal plants, but those coal plants also helped power the city. If players bulldozed the coal plants prior to replacing the energy sources, that was evidence of a univariate understanding of the effect of coal plants. Sim City though is a complex game and there are hundreds of other data points we could use. We engaged in a lot of exploratory data analysis (DiCerbo et al., 2015) to try to identify other indicators of systems thinking. In the end, the strongest (really the only) signal came from the handful of pieces we had designed into the activity.

Anyone who has looked at all into the field of game-based assessment has likely come across the pioneering work of Val Shute. She has worked with a number of games, but a big part of her work is on a game called Physics Playground in which players draw simple objects (ramp, lever, pendulum, and springboard) to move objects. Based on their play, players win trophies; solving the level with less than three objects yields a gold trophy and solving it with three or more yields a silver trophy. The game is designed to informally assess qualitative physics. There has been a lot of work by the team, particularly by the brilliant Russell Almond, to design Bayesian Networks to estimate the probabilities of different levels of understanding of Newton’s Three Laws. But early work on the system shows the simple measure of winning the gold trophy correlates moderately with other measures of qualitative physics understanding. It isn’t obvious to me how much more the complexity of looking at other performance measures and creating complex models buys us in terms of accuracy of our inference. Given that the game isn’t used in any high stakes situation, just reporting trophies may give enough information about performance.

The case for simplicity

I took this lesson to heart when starting design of a math app for Syrian refugee children (called Space Hero). They did not need complex assessment models to give them precise measurement. They just needed some feedback that both told them how they were doing and to motivate them to keep working. So we create a simple rule-based star system based on attempts and correctness that served the need.

Finally, I’m not sure added complexity is what educators and learners want. I don’t have good quantitative data to support this, but anecdotally, I hear lots of teachers who are skeptical of our artificial intelligence-produced models. They want to understand how decisions get made about what a student should do next and many of our algorithms are not describable in ways that will make sense. In addition, their practical experience is that these algorithms recommend things that don’t make sense to them or their students. Without the ability to explain why it may have made these choices, educators and students lose trust in the system.

So, what does this mean? Believe it or not, this isn’t a call to stop our research and pushes for the things I name in the first paragraph. This isn’t to say we don’t need complexity but 1) we should be clear what complexity gains us and 2) we should introduce one form of complexity at a time. Rather than introducing task complexity (e.g., game play) AND evidence identification complexity (log files) AND evidence aggregation complexity (e.g., complex statistical models), I argue we should potentially separate them, experiment with one or two at a time, see what we learn, and build from there. We should mindfully approach our attempts at innovation.

Reference

DiCerbo, K. E., Bertling, M., Stephenson, S., Jia, Y., Mislevy, R. J., Bauer, M., & Jackson, G. T. (2015). An application of exploratory data analysis in the development of game-based assessments. In C. S. Loh, Y. Sheng, D. Ifenthaler (eds), Serious Games Analytics (pp. 319-342). Springer.

Kristen DiCerbo

Kristen is currently Chief Learning Officer (CLO) at the Khan Academy. She is an experienced education leader with a demonstrated history bringing sound learning science and assessment research to the development of digital learning experiences.

https://www.linkedin.com/in/kristen-dicerbo-414138b/

Mindful Measurement: The Case for Simplicity

Mindful Measurement in Writing Assessment

A Shift in Measurement Mindset