Making the Grade

Artificial intelligence has changed the classroom for English teachers. When ChatGPT use became commonplace, Sarah Walker, who has taught the subject in Austin ISD for the last 14 years, had her students go back to paper and pencil.

“What a lot of teachers in English have done this past year is move back to really old-fashioned methods,” she said. “My students just completed a long research paper, and we did the whole thing handwritten until the final step when they typed it up.”

But even as Walker takes measures to prevent her 10th graders at the Ann Richards School from using AI to write essays and summarize books, they must take the English State of Texas Assessments of Academic Readiness exam at the end of the school year. And ironically, Walker now makes sure her students know that their STAAR essays are no longer graded by humans, but by the “AI” she discourages.

“We do talk about it,” Walker said. “And sometimes, some teachers will use AI to score students’ writing – I think I did that for one of their STAAR-style essays – like, ‘Hey, get this feedback, [because] this is who’s going to be your evaluator.’”

That’s not new: The Texas Education Agency has been using “automated scoring engines” (ASEs) since December 2023 to grade the majority of STAAR essays (both long- and short-form responses for STAAR exam subjects like English, U.S. history, and social studies) written by third graders to high schoolers – a big switch from the all-human graders with four-year degrees and rigorous scoring training hired in previous years.

It’s a cost-cutting measure for the state of Texas. In 2023, the STAAR was redesigned, as required by House Bill 3906, to have fewer multiple choice questions and six to seven times more written responses. Since human scoring of the new test would cost up to $20 million more per year, “hybrid” scoring – using both scoring engines and human graders – would allow the state to stay within budget, Commissioner of Education Mike Morath argued to lawmakers as early as 2022.

And indeed, from 2023 to 2024, TEA hired almost 10,000 fewer humans to grade STAAR (from about 15,000) to net savings of approximately $9.8 million, according to a TEA representative.

But now that the state agency has taken over four public school districts since October based on low scores on this exam, public school advocates have questioned if ASEs, which some argue are artificial intelligence, are scoring students differently than a human grader who might award additional points for creativity and human ingenuity.

One reason for concern: School districts requested the TEA’s rescoring of their machine-graded tests using human graders, and then were rescored higher. After A-F accountability scores were released last year, Arlington ISD sent their tests back to the state agency. In December 2025, 38 campuses received higher scores, and in some cases, a school’s accountability score raised by an entire letter grade, an F to a D. Dallas ISD submitted over 5,000 tests for rescoring and about a third were rescored higher by humans, The Dallas Morning News reported in July 2025.

And even as Austin ISD’s own STAAR scores slightly improved from 2024 to 2025, the school district finds itself closer to the edge of state intervention due to academic accountability and amidst financial troubles. This school year, AISD has signed a charter partnership to manage three middle schools that have underperformed on the STAAR exam, and proposed deep cuts to campus-level staffing and planning time. This year’s STAAR scores will be released in early to mid-June.

One former grader for the TEA, granted anonymity because of his current employment in the education industry, remembers that 15 years ago, a student could get full credit on a fantasy narrative. Now, in the age of AI, he “really question[s] the ability of the machine to understand and evaluate a student’s essay.”

“If the student is saying something in a particularly flowery way or something … they are going to get a lower score than they deserve,” the former grader said. “And I think if that happens once, it’s one time too many.”

How does the grading work? Per the TEA, all essay responses written in English are first graded by the scoring engine. Then, about a quarter of those essays are double-checked by humans. The machine also flags responses as “low confidence” if they land between scores or contain unfamiliar language, and transfer those to human graders.

Though some teachers and public education advocates like Walker refer to the scoring engines as AI, the TEA claims otherwise. “It’s not ‘AI’ such as ChatGPT. Distinct from generative AI technology, ASEs are technology designed to examine patterns in large samples of written content,” the TEA wrote to the Chronicle.

Walter Stroup, former professor of education at UT-Austin and now U-Mass Dartmouth, said the scoring engines still use machine learning. “The process is the same, right? You’re training these models on existing [essay] examples, and then asking them to rate new examples without a human in the loop,” Stroup told the Chronicle.

David DeMatthews, professor of educational leadership and policy at UT, also labels the scoring engines as AI. On the other hand, he doesn’t think the major problem is “human versus AI,” but rather whether the ASEs will be sufficiently monitored over time to generate consistent scoring and report problems proactively, not just when school districts challenge the scores.

“High-stakes assessment systems only work if educators, families, and communities believe the results are fair, stable, and accurate. I’m not sure [Commissioner of Education Mike] Morath and the state have that trust anymore”
– David DeMatthews, professor of educational leadership and policy at UT

“The validity of these systems depends heavily on the training data, the consistency of human scoring used to build them, and ongoing monitoring,” DeMatthews wrote to the Chronicle. “With AI, I’m always worried about the latter part, especially over time. People stop checking. … And then, you get in trouble.”

In a statement, TEA clarified that the ASEs are trained on 3,000 human-scored essays, and are compared to human scores regularly throughout the scoring window. “If an ASE does not sufficiently replicate human scoring, the scoring process is paused, the engines are reprogrammed, and all affected responses are rescored,” the representative wrote.

DeMatthews also finds the rescoring results a cause for concern, but doesn’t consider them decisive evidence that automated grading is driving lower STAAR scores. On the other hand, as he emphasized, “Even small scoring differences can have outsized consequences … affecting whether a student passes, whether a campus earns a higher rating, or whether a district faces intervention under the A-F accountability system.”

At the end of the day, DeMatthews pressed, these changes to how the STAAR is being graded are happening alongside changes to the tests themselves, which some educators claim are making them harder to pass. School districts have challenged changes to the TEA’s accountability system, “arguing the state effectively ‘moved the finish line’ in ways that lowered scores,” he continued.

“I think the automated scoring question, at the end of the day, is fundamentally about public trust,” DeMatthews concluded. “High-stakes assessment systems only work if educators, families, and communities believe the results are fair, stable, and accurate. I’m not sure Morath and the state have that trust anymore.”

For Texas English teachers like Walker, the fact that the state is using a machine-learning engine to grade STAAR essays is disappointing, but far from surprising. “With the amount of funding that the state is wanting to put towards public education, why would they pay for human readers?” she asked.

And when full-time positions are being cut right now under Austin ISD’s heavy $181 million budget deficit projection, the consequences of more systemic issues within Texas public education – like basic funding – make less consequential issues like AI grading fall to the back burner of Walker’s concerns.

“Honestly, I would probably rather [the state] put the money into the classroom than put it into paying people to read the test,” Walker said.

But as someone who pushes her students to think critically and write creatively, the state’s reliance on machine learning is far from what Walker aims to teach her own students about writing. She would rather her students turn in an unpolished essay than an AI-generated one, because “that’s what growth looks like,” she said.

“It does trouble me the role that AI is playing right now, especially when students’ brains are not fully developed,” Walker continued. “The goal of teaching kids is never to make a cog that works somewhere. It’s always to build a human that can keep preserving and improving our world.”

This article appears in May 29 • 2026.

A note to readers: Bold and uncensored, The Austin Chronicle has been Austin’s independent news source for over 40 years, expressing the community’s political and environmental concerns and supporting its active cultural scene. Now more than ever, we need your support to continue supplying Austin with independent, free press. If real news is important to you, please consider making a donation of $5, $10 or whatever you can afford, to help keep our journalism on stands.

SUPPORT US

Sammie Seamon

July 17 • 2026

Look Inside!