Why AI systems don't learn and what to do about it: Lessons on autonomous learning from cognitive science [A paper review]

22 Mar, 2026

Paper Review: Why AI Systems Don’t Learn — and What to Do About It

This paper by Emmanuel Dupoux, Yann LeCun, and Jitendra Malik argues that despite the success of modern machine learning, current AI systems do not truly “learn” in the way humans or animals do. The central claim is that learning in today’s AI is externalized. Data collection, curation, training, and evaluation are all handled by human experts, while the model itself remains passive. Once deployed, it stops learning altogether, making it brittle in dynamic real-world settings.

The authors highlight domain mismatch as a key issue. Models are trained on static datasets but operate in environments that are constantly changing and unpredictable. Humans, in contrast, adapt continuously through interaction. This gap explains why scaling data and models alone cannot lead to general intelligence.

To address this, the paper introduces the idea of autonomous learning, which requires three capabilities: selecting one’s own data, evaluating one’s own knowledge, and flexibly switching between learning strategies. These are framed as active learning, meta-cognition, and meta-control, respectively. The argument is that these abilities are fundamental across biological systems but largely absent in current AI.

The main contribution is the A–B–M architecture. System A corresponds to learning from observation, similar to self-supervised learning. System B corresponds to learning from action, as in reinforcement learning. The novel component is System M, a meta-controller that monitors internal signals like uncertainty and dynamically orchestrates how and when learning happens. This replaces the rigid, human-designed training pipelines with an internal control mechanism.

A key insight of the paper is that intelligence emerges from the interaction between observation and action. System A provides structured representations and predictive models, while System B generates data through exploration and grounding. The lack of tight integration between these systems is identified as a major limitation in current approaches.

The paper also proposes an evolutionary-developmental framework to train such systems. Instead of designing everything manually, the idea is to learn initial structures and learning strategies over many simulated lifecycles. This leads to a bi-level optimization problem, where learning itself becomes the object of optimization.

While the framework is conceptually strong, it remains largely theoretical. System M, in particular, is described at a high level without concrete implementation details. The computational demands of evolutionary training and the need for rich simulation environments also make the proposal difficult to realize in practice at present.

Overall, the paper is best understood as a roadmap rather than a solution. It shifts the focus from building better models to building systems that can learn autonomously over time. By grounding its arguments in cognitive science, it offers a compelling direction for future AI research, even if many of the proposed ideas will take years to materialize.