Many rigorous studies suggest math software can help kids improve their math skills, especially programs built around "mastery learning." In these systems, students progress at their own pace, advancing after they've demonstrated understanding, rather than moving on after a fixed amount of time or a set of problems. Despite that evidence, mastery-based programs aren't a mainstream tool in classrooms.
"The really big assessments of math progress have suggested that, particularly since the pandemic, students in fourth grade and eighth grade are doing much worse on math assessments," Emma Brunskill, associate professor of computer science at Stanford. "It robs those individuals of opportunities, and it robs our society of their contributions. And so I think anything we can do to better foster math proficiency is really important."
This brings up many questions: If the evidence is strong, why aren't these programs being used? What can the programs achieve in the real world? And what do they demand of students and teachers?
Brunskill specializes in AI methods for estimating causal outcomes of single and sequential interventions and using such to improve decision policies. In collaboration with Taryn Eames and Philip Oreopoulos at The University of Toronto and Bogdan Yamkovenko and Kodi Weatherholtz at Khan Academy - a well-known non-profit that makes mastery-based math instruction software - Brunskill recently co-authored research that further supports the beneficial potential of mastery-based educational technology for math.
Their study, published in the Proceedings of the National Academy of Sciences, analyzed outcomes from more than 200,000 students and found that even modest engagement - on the order of just a few hours a year - was associated with measurable learning gains.
Below, Brunskill explains what the study found and why adoption of mastery-based programs has lagged behind the evidence.
What do we already know about these math programs?
Everyone thinks that right now is a big moment for technology and education. Yet, some educational technologies have been around for decades and have research evidence showing they are quite effective but that still hasn't translated to those tools being used often in the classroom.
I think one of the potential reasons why this software hasn't been adopted as much is because it requires teachers to change how they teach. For example, Jane might be at grade level three in the program and John is at grade level two, and that can be hard for teachers to manage. It's different than making sure everyone has the same worksheet.
That is part of why it's so important to better understand whether this technology is helpful in real settings. Is it worth it for teachers to change how they do classroom instruction because it leads to real benefits for student learning?
Can the average person make an accurate guess about what educational technologies work well?
Personally, I think not. Most of us are not experts, myself included. So our intuitions about what is likely to be effective are not always borne out in data. And it's important to respect the fact that, if we recommend to teachers to teach one way, that means we're replacing another option.
Sometimes these software systems are evaluated through randomized control trials, where some classrooms receive the tool and some don't. This method can be incredibly powerful in terms of getting causal evidence. But the setting may not be replicable at a large scale. For example, there might be administrators who are really excited to participate, additional training for teachers, or additional researcher support.
Backing up our recommendations with data is necessary, but super hard because it requires capturing how teachers use the software in many settings.
How did you study this topic and what did you find?
We looked at data from around 200,000 students in the U.S. who had been using Khan Academy. (We don't think our results are specific to Khan Academy, but they're a platform that provides students with a mastery-based way to progress through material.) We looked at 2021, 2022, and 2023. The math assessment we used spans grade levels, so we can see if students are above or below grade level.
We found a roughly linear improvement: If you use the platform more, we saw a small but notable gain on external test scores, and in classrooms where people used less, we generally saw decreases in test scores. There are lots of caveats, but this is pretty strong evidence that using this sort of mastery-based mathematical software can yield benefits for math learning.
One caveat is that our data covers roughly zero to 30 hours of usage in a year. So it doesn't touch on completely throwing out the curriculum and replacing it with these technologies. Also, keep in mind this isn't a randomized control trial, so there could be confounding factors, which we did our best to control for, with the help of data that followed students over multiple years.
Another nuance is that skill acquisition, rather than simply hours on the software, is important. When we measure outcomes based on skills mastered on the platform, not time spent, we saw students benefiting more equally, whether they started off being higher-performing students or lower. This suggests to us that, if teachers are using this platform, our recommendation is not to focus on time, but to focus on progress.
Was there anything surprising about what you found?
I was surprised with the results overall. I have long been a supporter of this type of software. But an argument known as "The 5 Percent Problem" suggests that many of the prior promising findings about these platforms were based on an unrealistic usage pattern - a student who uses this platform for 20 hours a year could see great results but that student is in the top 5% of users and is probably very motivated educationally and has more resources.
So, the first big surprise is that students who are using these platforms are using them far less than many people might expect. When I say "They use it, on average, around six hours." People think "Oh, per week?" And, no, it's about six hours per year.
Another surprise: I personally thought that we wouldn't see a linear relationship. I thought students would need to meet a minimum threshold to have any benefit.
What are you hoping this work can do?
Teachers are so overburdened in America, and they do heroic work. So my hope is that this research can help them make decisions about how to spend their class time and resources. And this research suggests we can express some optimism about some mastery-based math technologies that are already out there.
A question in this area that I'm really invested in right now is: Given that so many students are far behind, what can we as a society and as researchers do to help them catch up? Specifically, I'm curious about the results of stacking or layering these programs with other interventions or across years.
Also, Khan Academy, like many other platforms, is changing enormously with generative AI. That's a very different type of instruction than what we've done before. Math education software before mostly did not use language; it involved primarily problem sets and maybe videos. Now, with generative AI, you can have a Socratic dialogue with the program - and we want to know what the effect of that is.