Signal & Noise — 2026-05-21

01 / Summary

The window, distilled

The reporting period is dominated by Andrej Karpathy's move to Anthropic, a signal that reinforces the industry's shift toward Agentic Engineering as a formal discipline. While frontier labs compressed an annual's worth of developer updates into a single week, a growing World Model Coalition (Marcus, Choi, Mitchell) has begun to present a coordinated front against the scale-only narrative. Technologically, the open-weight bonanza has shifted focus from raw parameter counts to long-context efficiency.

02 / Headlines

What landed

Andrej Karpathy joins Anthropic
OpenAI co-founder and ex-Tesla AI Director moves to the frontier of LLMs
The Vietnam metaphor for AI backlash
Gary Marcus crystallizes growing skepticism; argues foundation-level rethink is required
Open-weight architecture bonanza
Gemma 4, DeepSeek V4, Kimi K2.6, MiMo 2.5 — efficiency-first design (KV sharing, compressed attention)
Prompt optimization beats RL by 35x rollouts
Agrawal's GEPA wins ICLR oral; +6-19pp over GRPO with language-medium reflection
Three frontier dev events in 10 days
Code w/ Claude, Google I/O (Gemini Spark, Antigravity), GPT-5.5 preview

03 / Threads

What moved across voices

The institutionalization of agentic engineering

Andrej KarpathySimon WillisonGarry TanJack Clark

Karpathy's move to Anthropic is the endorsement; Claude Code is the reference implementation. Tan's GStack + Skillify formalizes failure-as-skill discipline. Willison ships datasette-llm-accountant + datasette-llm-limits — durable observability infrastructure for LLM workflows.

The world-models coalition vs scale-only

Gary MarcusYejin ChoiMelanie Mitchell

A formerly fragmented group has converged into a coordinated alternative. Marcus's May 17 piece explicitly cites Mitchell and Choi. Choi's Toronto lecture (~400 attendees) names jagged intelligence from absent world models. Once-contrarian fringe is now a named paradigm.

Long-context economics + architectural efficiency

Sebastian RaschkaNathan Lambert

As models keep more tokens around for reasoning, KV-cache and attention cost are the binding constraint. KV sharing (Gemma 4), layer-wise attention budgeting (Laguna XS.2), compressed convolutional attention (ZAYA1), mHC (DeepSeek V4). The architectural diff is real; scale alone is being supplemented.

The AI backlash crystallized

Gary MarcusEthan MollickJack Clark

Marcus published 11 critiques in 18 days. Mollick pushes methodologically (AI is too broad a term for useful polling). Marcus concedes coding as a domain where AI delivers. Clark approaches from policy angle — data sovereignty as regulatory framework.

04 / How to Do It This Week

The practitioner synthesis

Prompting & inference 01

Append 'enumerate your uncertainty' at the end of long queries

via Andrej Karpathy · x.com/karpathy

Tools, repos, libraries 04

datasette-llm-accountant 0.1a4

LLM cost/usage observability layered over Datasette

via Simon Willison · simonwillison.net/2026/May/19/datasette-llm-acc…
datasette-llm-limits 0.1a0

Rate-limit governance for LLM-driven workflows

via Simon Willison · simonwillison.net/2026/May/15/datasette-llm-lim…
github.com/gepa-ai/gepa

Prompt-evolution in production; outperforms GRPO by 6-19pp with 35x fewer rollouts

via Lakshya Agrawal · iclr.cc/virtual/2026/poster/10009493
GStack + Skillify

Claude Code skills for office hours, code review, failure-to-skill conversion

via Garry Tan · www.youtube.com/watch?v=wkv2ifxPpF8

Architectural & model-selection 03

KV-sharing models (Gemma 4)

If you ship long-context inference, evaluate KV-sharing variants against your default — long-context cost calc is shifting

via Sebastian Raschka · magazine.sebastianraschka.com/p/recent-developm…
Compressed attention (DeepSeek V4 mHC, ZAYA1)

Inference economics changing for long-form reasoning workloads — re-baseline before committing to vendor

via Sebastian Raschka · magazine.sebastianraschka.com/p/recent-developm…
Fast-context + slow-weight composition (FST)

Up to 3x more sample-efficient than RL, 70% less KL divergence; consider for continual learning scenarios

via Tiwari, Sareen, Agrawal et al. (UC Berkeley + Mila) · arxiv.org/html/2605.12484v1

Methodological frames 03

Skillify — every failure becomes a tested skill

Turn each agent failure into a structural skill with tests, so the bug stops being one-off and starts being eliminated by structure

via Garry Tan · www.youtube.com/watch?v=wkv2ifxPpF8
Per-task AI sentiment, not aggregate

Don't ask 'do you like AI'; ask per-task 'does it help with X' — aggregate polls produce noise

via Ethan Mollick · x.com/emollick
World models over scale

Common-sense reasoning via structured knowledge graphs (ATOMIC, COMET) outperforms scale-only on common-sense tasks — David vs Goliath

via Yejin Choi · momentum.stanford.edu/stories/david-goliath-and…

Papers worth a closer read 02

GEPA: Reflective Prompt Evolution Can Outperform RL

Prompt evolution beats RL on 6 tasks by 6-19pp with 35x fewer rollouts — language reflection IS a richer learning medium than scalar reward

· iclr.cc/virtual/2026/poster/10009493
Learning, Fast and Slow: Towards LLMs That Adapt Continually

Formalizes fast-context + slow-weight composition; reduces catastrophic forgetting by 70% vs RL-only

· arxiv.org/html/2605.12484v1

05 / Quotes

Lines worth carrying

I've joined Anthropic. I think the next few years at the frontier of LLMs...

Andrej Karpathy

Skillify is the fix. Every failure becomes a skill. Every skill has tests. The bug becomes structural.

Garry Tan

Could generative AI turn out to be the tech industry's Vietnam? And could public backlash lead AI to a better place?

Gary Marcus

Asking people whether they like 'AI' is not going to be a useful poll.

Ethan Mollick

AI models can perform some tests extremely well, while surprising us with silly mistakes somewhere else.

Yejin Choi

07 / Source registry

The voices

Andrej Karpathy

@karpathy

blog · YouTube · X

Simon Willison

@simonw

simonwillison.net · X

Garry Tan

@garrytan

YouTube · X · GitHub

Ethan Mollick

@emollick

One Useful Thing · X

Gary Marcus

@garymarcus

Marcus on AI · X

Cassie Kozyrkov

—

Decision substack

Nathan Lambert

—

Interconnects substack

Arvind Narayanan

—

Normal Tech substack

Yejin Choi

—

papers · talks · NVIDIA/Stanford

Melanie Mitchell

@MelMitchell1

Santa Fe Institute · X

Yannic Kilcher

@yaborobot

YouTube

Sebastian Raschka

@rasbt

Ahead of AI substack · X

Lakshya Agrawal

@LakshyAAAgrawal

papers · X · UC Berkeley

MIT Technology Review

—

MIT Tech Review AI section

Jack Clark

@jackclarkSF

Import AI · X · Anthropic

Karpathy joins Anthropic.