A growing body of research in computational linguistics and psycholinguistics is challenging long-held assumptions about how humans — and machines — acquire language. Recent 2024–2025 studies show that large language models trained on developmentally realistic data can mimic key milestones of child language acquisition, prompting researchers to reconsider Noam Chomsky’s foundational claim that grammar is largely innate. The findings, emerging from collaborations between cognitive scientists and AI researchers at institutions including New York University, Stanford, and the Max Planck Institute, suggest that statistical learning from input may explain far more about linguistic competence than previously believed.
The BabyLM Challenge and a New Era of Cognitive Modeling
At the center of this shift is the BabyLM Challenge, an annual competition launched in 2023 that asks researchers to train language models on roughly the same amount of linguistic input a human child encounters by age 13 — about 100 million words or fewer. That’s a fraction of the trillions of tokens used to train commercial systems like GPT-4 or Gemini. The 2024 edition, whose results were presented at the Conference on Computational Natural Language Learning (CoNLL), showed that smaller, cognitively constrained models could rival much larger systems on certain grammar and semantics benchmarks.
The implications run deep. For decades, generative linguistics held that children must possess an innate “universal grammar” because the language they hear is too sparse and noisy to support the rich grammatical knowledge they eventually display — the so-called “poverty of the stimulus” argument. But if a neural network with no built-in grammatical rules can learn syntactic structure from a child-sized dataset, the argument loses some of its force.
What the Latest Studies Found
One widely discussed paper published in Nature by researchers at NYU’s Center for Data Science demonstrated that a model trained exclusively on first-person video and audio recordings from a single child — captured via a head-mounted camera over 18 months — could learn to associate words with visual referents. The study, available through Science, suggests that even minimal, ecologically valid input is enough to bootstrap meaningful word learning, without elaborate innate machinery.
Other 2025 work has focused on morphology and syntax. Models trained on developmentally plausible corpora are now reproducing well-known patterns from child language research: overgeneralization errors (“goed” instead of “went”), gradual mastery of complex relative clauses, and sensitivity to subject-verb agreement across long-distance dependencies. Linguist Tal Linzen, whose lab at NYU has been instrumental in evaluating these systems, has argued that LLMs are becoming “the most powerful tool we have for testing theories of language acquisition,” even if they are not perfect cognitive replicas.
Pushback From Traditional Linguists
Not everyone is convinced. Critics — including Chomsky himself, in a 2023 New York Times essay — have argued that statistical pattern-matching is fundamentally different from genuine linguistic competence. A model can produce grammatical sentences without “knowing” grammar in any meaningful sense, the argument goes, much as a parrot can mimic speech without understanding it. Skeptics also point out that BabyLM models still fall short on pragmatics and discourse-level reasoning, areas where children excel remarkably early.
Sociolinguists have raised additional concerns. Real children learn language embedded in social interaction, with caregivers who respond, correct, and scaffold meaning. Most LLMs, even cognitively scaled ones, learn from disembodied text. Researchers at the Max Planck Institute for Psycholinguistics are now exploring multimodal and interactive training regimes that better reflect how language unfolds in the wild.
Why It Matters Beyond the Lab
The stakes extend well past academic debate. If smaller, more efficient models can match larger ones on linguistic tasks, it could reshape how AI systems are built — favoring data quality and cognitive plausibility over brute-force scale. That has consequences for energy use, accessibility, and the global distribution of AI research, since training a BabyLM-scale system is feasible on a single university GPU. It also opens new avenues for diagnosing language disorders, designing better language-learning tools, and understanding multilingualism.
For applied linguistics, the cross-pollination between AI and traditional fields like phonology, semantics, and discourse analysis could be the most productive collaboration in a generation. Expect the 2025 BabyLM results and follow-up neurolinguistic studies — including fMRI comparisons between human and model processing — to continue blurring the lines between machine learning and the cognitive science of language.
For more on the science behind language, cognition, and emerging research, visit science.wide-ranging.com for related coverage and deeper dives.

