#CambiumImpact: Cambium Assessment Hotline
A Q&A with Balaji Kodeswaran, Senior Vice President of Software Engineering and Technology Innovation at Cambium Assessment
Earlier this year, we released our 2022 Impact Report, highlighting Cambium Learning Group’s collective work to drive improved outcomes for the people, communities, and groups we serve. The report details the current state of Cambium’s reach and our momentum over the last year, including real-world case studies from the Cambium family brands. In this blog series, we will take a deeper dive into those featured case studies. Read on for a Q&A with Balaji Kodeswaran, Senior Vice President of Software Engineering and Technology Innovation at Cambium Assessment, to learn more about the Hotline case study.
Can you tell us more about the Cambium Assessment Hotline?
Let me start with the “what” first. Cambium Assessment’s Hotline system is a predictive AI engine that uses a purpose-built, Large Language Model (LLM) to scan Cambium Assessment’s (CAI) online assessments for any text input by a student for disturbing content. This can include intent to harm self or others, criminal activity, abuse, and more. Any such content flagged is escalated to a human for verification and once verified, passed on to the appropriate administrators at the school and/or district to act upon. Because of Hotline, in ‘22-’23, 187,500 entries were flagged for immediate human review and over 22,000 of these were escalated to school/district administrators for intervention.
The incredible volume of text that Hotline scans represents a tremendous challenge from both a data science and a software engineering perspective. The data science challenge is the ability to accurately detect disturbing material without flagging too many false alerts so that the human verification process is focused on reviewing the most at-risk samples. The engineering challenge is being able to run the AI scanning in a performant manner while efficiently managing the costs of the immense computing infrastructure needed for this product. To provide a sense of scale of the workload: for the ’22-’23 academic year, Hotline scanned over 97.5 million text fragments, with the average text scan taking about a tenth of a second – all while ensuring that every test was scanned within one hour of student submission – even on our busiest testing days of the year.
Now, for the “why.” Historically, student essays were scored by humans who also flagged any responses that contained disturbing material. However, there is a delay of days or even weeks from the time that the student submits a test to when humans score the response. Furthermore, as we transitioned our state assessment clients to AI scoring for essays, humans were reviewing less and less responses. Hotline was conceived to solve both these issues – conduct a scan of responses for disturbing content and do so expeditiously so that intervention can be applied in a timely manner. As I look back at the instances clients have shared around how Hotline alerts lead to a positive intervention for a child in distress, I am humbled at how impactful it has been. It is this real-world impact that Hotline has on the community we serve that drives us to make it better each day.
So, what’s next? Every year, we retrain Hotline AI to bring in any additional data to improve the models. We will continue to roll out enhancements to the underlying algorithms and approaches to improve the accuracy, performance, and cost efficiencies of the system. We are looking into expanding Hotline beyond text; specifically, running audio responses from English language proficiency assessments through the scanning process. We are always open to new ideas, and I encourage anyone who has suggestions on where we can apply Hotline to reach out to me or anyone from the CAI Machine Learning team; we would love to hear from you.
What is your prediction about how AI will be implemented in the assessment world?
AI is already widely used in assessments for automated scoring, and this trend will continue to improve and become more pervasive. Generative AI to support the automated development of test questions to measure different skills and ability levels are already at different stages of development across organizations (including ours). Historically hard-to-measure standards that require multiple participants, such as student discourse or collaborative problem solving, can now be assessed at scale by employing AI-powered virtual participants/chatbots. Accessibility and embedded support for students with accommodations will benefit from improved AI implementations of features like speech-to-text, word prediction, and language translations. With the amount of innovations occurring right now around the use of AI in education, I expect to see many of them make their way into assessments in the very near future.
Education measurement is a field rooted in established scientific theory requiring solid evidence to demonstrate the efficacy of proposed changes. At CAI, we have refined our automated scoring processes over many years to ensure that the scores produced are valid, reliable, equitable and free from bias to demonstrate that the models are performing as well or better than human scorers. The scientific rigor required to get stakeholder acceptance, especially for high-stakes assessments, like state summative or standardized college admission tests, is high. I expect that lower-stakes assessments like formative, diagnostics or screeners will be the first to see the benefits of AI and the winning ideas will eventually make their way into higher-stakes assessments.
What are the benefits of AI in the future of assessment?
Recent advances in AI are transformative and will reshape how we think about assessments in the future. Every assessment program strives to minimize the amount of time students are testing, maximize the amount of information that can be gathered, and provide educators with actionable insights into students’ strengths and weaknesses. Imagine an AI-powered assessment that can be integrated into the daily instructional routine of a student that uses the various data points gathered to accurately summarize the student’s mastery over a subject area. Intelligent reporting tools that use the data from these integrated assessments to inform educators on which approaches work best for each student allow for highly customized interventions and overall better outcomes for our students. While these changes will take time and require a myriad of challenges (technical, ethical, regulatory and others) to be overcome, I am optimistic that “responsible AI” in assessments will greatly benefit the teachers and students that we serve.
What's something in your personal or career life that has impacted you in a positive way?
I firmly believe in the philosophy that every day is an opportunity to learn something new. Often, at the end of a day, I ask myself what it is that I know now that I didn’t at the beginning of the day. It does not have to be anything profound – even a collection of trivial but useful tidbits makes for a good day. Working in a technology-forward organization like ours, the opportunity to learn something new is ever-present. This is the greatest motivator for me; being able to work with a great group of folks where we are able to pool our collective learning to build world-class products and services that have a positive impact on society.