A growing body of research is reframing how scientists understand human language acquisition, with large language models (LLMs) increasingly being used as computational laboratories to test long-standing psycholinguistic theories. Recent work published in 2024 and 2025 suggests that the same statistical learning mechanisms that allow neural networks to acquire grammar may mirror — at least in part — the way infants extract patterns from the speech they hear, reigniting a decades-old debate about whether language is innate or learned from exposure.
The discussion gained renewed momentum following a wave of studies comparing the linguistic competence of transformer-based models with that of children. Researchers at New York University, MIT, and the Max Planck Institute have begun training smaller, “developmentally plausible” language models on datasets roughly equivalent to what a child hears in the first years of life — about 100 million words — to see whether artificial systems can replicate human-like grammatical generalisation under realistic input constraints. The effort is part of the BabyLM Challenge, an annual benchmark that explicitly limits training data to push researchers away from brute-force scaling.
From Chomsky to Transformers
For more than half a century, the field of psycholinguistics has been shaped by Noam Chomsky’s argument that children are born with an innate “universal grammar” — a biologically endowed language faculty that explains how they master complex syntax despite the supposed “poverty of the stimulus.” Critics, including usage-based linguists like Michael Tomasello, have long countered that statistical regularities in everyday speech are far richer than nativists assume, and that general-purpose learning mechanisms could plausibly do the work.
Modern LLMs are now adding empirical weight to that counterargument. A widely cited 2024 paper in Nature Machine Intelligence showed that models trained on child-directed speech corpora — such as those compiled in the long-running CHILDES database — could acquire subtle syntactic phenomena, including subject-verb agreement across long dependencies and constraints on question formation, without any built-in grammatical scaffolding. The findings do not prove that humans learn the same way, but they undercut the strong version of the poverty-of-the-stimulus claim.
What the Models Get Right — and Wrong
Despite the headlines, researchers caution that LLMs remain imperfect cognitive models. While they capture distributional patterns remarkably well, they typically require orders of magnitude more text than a child encounters, and they lack the multimodal grounding — eye contact, gesture, joint attention — that developmental psychologists consider central to early word learning. Studies from teams at Stanford and the University of Edinburgh have shown that models often fail on pragmatic inference tasks, struggling with implicature, irony, and context-dependent meaning in ways that even three-year-olds typically do not.
“The interesting question is no longer whether neural networks can learn grammar, but which aspects of human language they cannot learn from text alone,” said Tal Linzen, a computational linguist at NYU whose lab has published extensively on syntactic evaluation of language models. His group’s BLiMP benchmark has become a standard tool for probing whether models internalise the kinds of grammatical contrasts that linguists care about.
Why It Matters Beyond the Lab
The convergence of psycholinguistics and computational linguistics has practical stakes. Educational technology firms are using insights from child-scale models to design adaptive literacy tools; clinicians studying developmental language disorders are exploring whether deviations from model-predicted learning trajectories could serve as early diagnostic signals; and policy debates over AI regulation increasingly hinge on whether machine “understanding” is comparable to human understanding in any meaningful sense.
There is also a methodological shift underway. For decades, psycholinguistic experiments relied on small samples of university students reading sentences in lab booths. Now, researchers can generate predictions from a model, test them against eye-tracking or EEG data from human participants, and iterate rapidly. The result is a tighter feedback loop between theory and experiment than the field has ever had.
What to Watch Next
The next frontier is multimodality. Several research groups are training models that combine text with video, audio, and embodied interaction in simulated environments — an attempt to close the grounding gap that pure text models cannot. If those systems begin to show pragmatic competence approaching that of toddlers, the philosophical implications will be considerable. Conversely, if they plateau, the result may vindicate those who argue that something uniquely human — whether innate structure, social cognition, or embodiment — is doing essential work that no amount of data can replace.
For more deep dives into language science, cognition, and the technologies reshaping how we study the mind, visit science.wide-ranging.com for related coverage and analysis.


