photo: Mathias Reding · Pexels
Signal & Noise · 2026-05-21

Karpathy joins Anthropic.

2026-05-21 18-day window 2026-05-04 → 2026-05-21
The reporting period is dominated by Andrej Karpathy's move to Anthropic, a signal that reinforces the industry's shift toward Agentic Engineering as a formal discipline. While frontier labs compressed an annual's worth of developer updates into a single week, a growing World Model Coalition (Marcus, Choi, Mitchell) has begun to present a coordinated front against the scale-only narrative. Technologically, the open-weight bonanza has shifted focus from raw parameter counts to long-context efficiency.
01 / Summary

The window, distilled

The reporting period is dominated by Andrej Karpathy's move to Anthropic, a signal that reinforces the industry's shift toward Agentic Engineering as a formal discipline. While frontier labs compressed an annual's worth of developer updates into a single week, a growing World Model Coalition (Marcus, Choi, Mitchell) has begun to present a coordinated front against the scale-only narrative. Technologically, the open-weight bonanza has shifted focus from raw parameter counts to long-context efficiency.

02 / Headlines

What landed

  1. Andrej Karpathy joins Anthropic
    OpenAI co-founder and ex-Tesla AI Director moves to the frontier of LLMs
  2. The Vietnam metaphor for AI backlash
    Gary Marcus crystallizes growing skepticism; argues foundation-level rethink is required
  3. Open-weight architecture bonanza
    Gemma 4, DeepSeek V4, Kimi K2.6, MiMo 2.5 — efficiency-first design (KV sharing, compressed attention)
  4. Prompt optimization beats RL by 35x rollouts
    Agrawal's GEPA wins ICLR oral; +6-19pp over GRPO with language-medium reflection
  5. Three frontier dev events in 10 days
    Code w/ Claude, Google I/O (Gemini Spark, Antigravity), GPT-5.5 preview
03 / Threads

What moved across voices

The institutionalization of agentic engineering

Andrej KarpathySimon WillisonGarry TanJack Clark

Karpathy's move to Anthropic is the endorsement; Claude Code is the reference implementation. Tan's GStack + Skillify formalizes failure-as-skill discipline. Willison ships datasette-llm-accountant + datasette-llm-limits — durable observability infrastructure for LLM workflows.

The world-models coalition vs scale-only

Gary MarcusYejin ChoiMelanie Mitchell

A formerly fragmented group has converged into a coordinated alternative. Marcus's May 17 piece explicitly cites Mitchell and Choi. Choi's Toronto lecture (~400 attendees) names jagged intelligence from absent world models. Once-contrarian fringe is now a named paradigm.

Long-context economics + architectural efficiency

Sebastian RaschkaNathan Lambert

As models keep more tokens around for reasoning, KV-cache and attention cost are the binding constraint. KV sharing (Gemma 4), layer-wise attention budgeting (Laguna XS.2), compressed convolutional attention (ZAYA1), mHC (DeepSeek V4). The architectural diff is real; scale alone is being supplemented.

The AI backlash crystallized

Gary MarcusEthan MollickJack Clark

Marcus published 11 critiques in 18 days. Mollick pushes methodologically (AI is too broad a term for useful polling). Marcus concedes coding as a domain where AI delivers. Clark approaches from policy angle — data sovereignty as regulatory framework.

04 / How to Do It This Week

The practitioner synthesis

Prompting & inference 01
  • Append 'enumerate your uncertainty' at the end of long queries
    via Andrej Karpathy · x.com/karpathy
Tools, repos, libraries 04
Architectural & model-selection 03
  • KV-sharing models (Gemma 4)
    If you ship long-context inference, evaluate KV-sharing variants against your default — long-context cost calc is shifting
  • Compressed attention (DeepSeek V4 mHC, ZAYA1)
    Inference economics changing for long-form reasoning workloads — re-baseline before committing to vendor
  • Fast-context + slow-weight composition (FST)
    Up to 3x more sample-efficient than RL, 70% less KL divergence; consider for continual learning scenarios
    via Tiwari, Sareen, Agrawal et al. (UC Berkeley + Mila) · arxiv.org/html/2605.12484v1
Methodological frames 03
  • Skillify — every failure becomes a tested skill
    Turn each agent failure into a structural skill with tests, so the bug stops being one-off and starts being eliminated by structure
  • Per-task AI sentiment, not aggregate
    Don't ask 'do you like AI'; ask per-task 'does it help with X' — aggregate polls produce noise
    via Ethan Mollick · x.com/emollick
  • World models over scale
    Common-sense reasoning via structured knowledge graphs (ATOMIC, COMET) outperforms scale-only on common-sense tasks — David vs Goliath
Papers worth a closer read 02
  • GEPA: Reflective Prompt Evolution Can Outperform RL
    Prompt evolution beats RL on 6 tasks by 6-19pp with 35x fewer rollouts — language reflection IS a richer learning medium than scalar reward
  • Learning, Fast and Slow: Towards LLMs That Adapt Continually
    Formalizes fast-context + slow-weight composition; reduces catastrophic forgetting by 70% vs RL-only
05 / Quotes

Lines worth carrying

I've joined Anthropic. I think the next few years at the frontier of LLMs...
Andrej Karpathy
Skillify is the fix. Every failure becomes a skill. Every skill has tests. The bug becomes structural.
Garry Tan
Could generative AI turn out to be the tech industry's Vietnam? And could public backlash lead AI to a better place?
Gary Marcus
Asking people whether they like 'AI' is not going to be a useful poll.
Ethan Mollick
AI models can perform some tests extremely well, while surprising us with silly mistakes somewhere else.
Yejin Choi
06 / Cross-references

Who built on whom

07 / Source registry

The voices

Andrej Karpathy
@karpathy
blog · YouTube · X
Simon Willison
@simonw
simonwillison.net · X
Garry Tan
@garrytan
YouTube · X · GitHub
Ethan Mollick
@emollick
One Useful Thing · X
Gary Marcus
@garymarcus
Marcus on AI · X
Cassie Kozyrkov
Decision substack
Nathan Lambert
Interconnects substack
Arvind Narayanan
Normal Tech substack
Yejin Choi
papers · talks · NVIDIA/Stanford
Melanie Mitchell
@MelMitchell1
Santa Fe Institute · X
Yannic Kilcher
@yaborobot
YouTube
Sebastian Raschka
@rasbt
Ahead of AI substack · X
Lakshya Agrawal
@LakshyAAAgrawal
papers · X · UC Berkeley
MIT Technology Review
MIT Tech Review AI section
Jack Clark
@jackclarkSF
Import AI · X · Anthropic