A growing body of research published in late 2024 and 2025 suggests that large language models like ChatGPT are no longer just mimicking human language — they are actively reshaping it. Linguists at the Max Planck Institute for Human Development in Berlin reported this year that distinct vocabulary patterns favored by AI chatbots are bleeding into everyday human speech, particularly in academic lectures, podcasts, and YouTube videos. The finding has reignited a broader debate about whether generative AI is quietly homogenizing global communication, and what that might mean for linguistic diversity, cultural identity, and the future of writing itself.
What the Researchers Found
The Max Planck team, led by researcher Hiromu Yakura, analyzed roughly 280,000 hours of spoken English from academic talks and YouTube broadcasts before and after ChatGPT’s November 2022 release. They found measurable spikes in the use of words like “delve,” “meticulous,” “boast,” and “swift” — terms statistically overrepresented in ChatGPT outputs compared to typical human writing. The study, covered by Scientific American and other outlets, suggests a feedback loop: AI models trained on human text are now influencing the humans who interact with them, who in turn produce new text that future models will absorb.
This phenomenon is sometimes called “linguistic seepage” or “AI lexical contagion.” It builds on earlier work showing that scientific abstracts, peer reviews, and even casual emails have shifted vocabulary distributions since the rise of generative chatbots. A separate analysis by researchers at Stanford and elsewhere documented that up to 17 percent of recent computer-science peer reviews show signs of AI-assisted drafting, with telltale phrasing patterns becoming statistically detectable.
Why This Matters Beyond Vocabulary
Linguists have long observed that languages evolve through contact — between speakers, dialects, and now, apparently, between humans and machines. But the speed and scale of AI-driven change is unprecedented. Where previous lexical shifts unfolded over decades through mass media or migration, ChatGPT-style influence appears to register within months. Critics worry this could accelerate the erosion of regional dialects, idioms, and stylistic idiosyncrasies that give languages their texture.
“Word choices shape how we think, build identity, and bond with one another,” Yakura told reporters, echoing concerns expressed by sociolinguists who argue that homogenized vocabulary may flatten the cultural distinctiveness embedded in everyday speech. Organizations including UNESCO, which tracks endangered languages worldwide, have separately warned that AI tools — overwhelmingly trained on English and a handful of other dominant languages — risk widening the gap between linguistic “haves” and “have-nots.”
The Feedback Loop Problem
Computer scientists have flagged a related concern known as “model collapse,” where AI systems trained increasingly on AI-generated text begin to lose nuance, edge cases, and rare expressions. A 2024 paper published in Nature demonstrated that successive generations of models trained on synthetic data tend to converge on bland, statistically average outputs. If human writing also drifts toward chatbot norms, the distinction between machine-generated and human-generated training data may blur — with consequences for both AI development and human creativity.
Some scholars push back against alarmism. Languages have always borrowed from new technologies: the printing press, telegraph, radio, and internet each introduced vocabulary and stylistic conventions that critics initially deplored. From this view, AI is simply the latest chapter in a long history of technologically mediated language change, and human speakers will adapt, resist, and innovate as they always have.
What to Watch Next
Researchers are now examining whether similar patterns appear in non-English languages, where training data is sparser and AI fluency is more uneven. Educators, meanwhile, are grappling with how to teach writing in a world where students routinely consult chatbots before submitting essays. Expect heightened scrutiny in 2026 of how AI companies disclose training data, how schools evaluate originality, and whether linguistic diversity becomes a formal policy concern alongside data privacy and algorithmic bias. The deeper question — whether humans are training the machines, or the machines are training us — is unlikely to be resolved soon.

