A Return After Seven Years: Why Jack Clark's Staged Release Is Back

Most people don’t remember, but this isn’t the first time the AI industry has declared “this model is too dangerous to release.” And one person runs through both decisions.

It started with déjà vu

Not long ago, Anthropic announced that it would not publicly release its new frontier model, Claude Mythos Preview. Instead, access goes only to Project Glasswing, a consortium of 11 partners including AWS, Apple, Google, Microsoft, Cisco, CrowdStrike, and the Linux Foundation, with some 40 additional organizations being helped to scan critical infrastructure with the model. Anthropic pledged $100 million in usage credits and $4 million in open-source security donations.

The reason: “too dangerous to release.” To many people this scene will feel unfamiliar — especially after the past several years, when it was a given that AI models get commercialized, opened up via API, and released as open weights.

For me, though, it was déjà vu. Because exactly seven years ago, someone made almost the same decision. And the same person stands at the center of the Mythos non-release decision today.

Jack Clark. Today I want to talk about him.

2019: the decision the industry laughed at

In February 2019, OpenAI announced GPT-2 with an unusual declaration: “We will not release the full 1.5B model.” The stated reason was misuse — scenarios like automated fake news, email impersonation, and social media manipulation.

The reaction was icy.

“Isn’t this a performance? Looks like a publicity stunt.”
“Someone will replicate it anyway. Withholding is pointless.”
“They’re overestimating their own model.”

Chip Huyen, then a deep learning engineer, told MIT Technology Review: “I don’t think the staged release is very useful in this case, because the work is easily replicable. But it might be useful in the sense that it sets a precedent for future projects.” In the end, that turned out half right and half wrong. More on that later.

A fact that matters from today’s vantage point: OpenAI back then was not the commercialized organization it is now. Its identity was closer to a nonprofit research institute — and even an organization like that got mocked for choosing to withhold. That tells you the mood of the time.

The person who designed that decision and testified in person before the U.S. House Permanent Select Committee on Intelligence on June 13, 2019 was OpenAI’s policy director, Jack Clark.

In his testimony, Clark proposed the prototype of a new norm: “staged release” — publishing progressively from smaller models, buying society and the research community time to adapt. OpenAI did exactly that: 124M (February) → 355M (May) → 774M (August) → 1.5B (November).

The industry chose a different answer

In the end, the large-scale misuse everyone feared never materialized. The full 1.5B model released in November went out without incident. Clark’s “staged release” frame never became the industry standard.

Instead, the industry chose a different answer: “don’t withhold — ship with safeguards attached.” Red teaming, system cards, Responsible Scaling Policies, RLHF-based safety layers, bug bounty programs — all products of that choice. GPT-3 became accessible via API, ChatGPT launched as a public product, and Meta released LLaMA as an open model.

“Secure and then release.” That was the industry’s default policy for seven years.

Huyen’s 2019 remark — “it might be useful in setting a precedent” — was right in this sense. The twist is that the precedent came down not as “withholding” but as “pre-release vetting.”

Clark’s seven years

Clark left OpenAI in December 2020. A few months later he resurfaced as a co-founder of Anthropic, the company built by OpenAI alumni including the siblings Dario and Daniela Amodei. Anthropic went on to become a major engine of the industry’s safety practices — Constitutional AI, the Responsible Scaling Policy, detailed system cards. Much of it was driven by OpenAI alumni who hadn’t agreed with Sam Altman’s “vibes”-based approach.

Then in March 2026, Clark became Anthropic’s Head of Public Benefit and the head of the newly founded Anthropic Institute, a research organization addressing the most serious challenges AI poses to society.

A month later, Project Glasswing was announced.

The day before the Glasswing announcement, Clark wrote in issue 452 of his newsletter Import AI:

“AI that is especially good at helping you find vulnerabilities in code for defensive purposes can easily be repurposed for offensive purposes.”

He called AI an “everything machine,” adding that each new model generation doubles the policy problems.

In essence, it’s the same thing he told Congress seven years earlier. What changed is the danger level of the models he deals with.

This time, the evidence came first

The core reason Clark’s 2019 staged release drew ridicule was that it was prevention without proof. It was a hold based on the possibility of “it might generate fake news,” and the critics’ charge — “selling fear without evidence” — had a point.

Mythos starts from an entirely different premise. Look at the cases published by Anthropic’s Frontier Red Team blog.

OpenBSD TCP SACK — a 27-year-old bug

In an OS famous for making security a design principle, a vulnerability had been hiding in the TCP SACK implementation added in 1998. Mythos found it.

The structure, briefly: OpenBSD tracks SACK state in a singly linked list. If an attacker sends SACK blocks under specific conditions, a code path triggers in which the last node of the list is deleted at the same time a “new hole gets appended.” The append attempts a write through a pointer that is already NULL, and the kernel crashes.

Under normal conditions, the two prerequisites can’t hold simultaneously. But by exploiting the signed integer overflow of the 32-bit TCP sequence number, an attacker can place the start of a SACK block 2^31 away from the legitimate window and satisfy both at once.

A bug that survived thousands of code reviews, dozens of major releases, 27 years. The full cost of 1,000 scaffold runs was under $20,000.

FFmpeg H.264 — the bug fuzzers missed five million times

FFmpeg is one of the most heavily fuzzed media libraries in the world. A type-mismatch bug in the H.264 decoder entered the codebase in 2003 and became exploitable through a 2010 refactoring, yet automated testing tools failed to catch it despite hitting that line of code five million times. Mythos found it by reasoning about what the code means.

FreeBSD NFS RCE (CVE-2026-4747) — a 17-year-old remote root

A remote code execution vulnerability in the FreeBSD NFS server, hidden for 17 years. It yields root without authentication. According to Anthropic, after the initial prompt Mythos carried out the entire process — from discovering the vulnerability to building a working exploit — without human intervention, delivering a 20-gadget ROP chain split across 6 packets.

On an exploit-development benchmark using a Firefox 147 vulnerability, Opus 4.6 succeeded twice across hundreds of attempts; Mythos succeeded 181 times. On CyberGym: Mythos 83.1% vs. Opus 4.6 66.6%. A signal that the model is doing something fundamentally different.

The change the community is feeling

If this were only Anthropic’s claim, I’d be skeptical. But the security community is independently seeing the same signal.

Linux kernel maintainer Greg Kroah-Hartman put it memorably:

“Until a few months ago we were getting what we called ‘AI slop’ — obviously wrong, low-quality AI-generated security reports. It was kind of funny. Not even worrying. Then about a month ago, something changed. The world switched.”

curl maintainer Daniel Stenberg wrote that he now spends several hours a day processing AI-generated vulnerability reports.

Security researcher Thomas Ptacek argued in his late-March essay Vulnerability Research Is Cooked that coding agents will fundamentally change the practice and the economics of exploit development.

Nicholas Carlini, a security researcher at Anthropic, said in the Glasswing announcement video: “In the past few weeks I’ve found more bugs than I’ve found in my entire life.”

Simon Willison’s comment is worth quoting:

“‘Our model is too dangerous to release’ is a great line for hyping a new product. But in this case, the caution looks justified.”

The counterarguments exist too

To be fair, the opposing view deserves space. Security research teams like AISLE and Vidoc Security Lab reproduced some of the bugs Mythos found, using public models (GPT-5.4, Claude Opus 4.6).

AISLE: 8 of 8 public models detected the FreeBSD exploit; even a 3.6B-parameter model managed it
Vidoc: reproduced the FreeBSD, Botan, and OpenBSD cases with GPT-5.4 and Claude Opus 4.6; FFmpeg and wolfSSL ended in partial success

Their argument: discovery capability is already widespread, and Mythos’s real differentiation lies in exploit construction and operationalization. The reading is that this is not “something only Mythos can do” but “the upper layer of a change already underway.”

That view doesn’t actually contradict Clark’s dual-use thesis. “Public models are catching up” also means Mythos-class capability will soon proliferate. That is the very reason Glasswing exists.

History came back, in a different shape

Set Clark’s 2019 staged release and the 2026 Mythos non-release side by side, and you can see one person’s principle walking a seven-year arc.

What changed:

Nature of the grounds: theoretical concern → demonstrated capability
Scope: staged release of the model itself → restricted release confined to one domain (cybersecurity)
The ecosystem’s understanding: a lonely claim → resonance from independent observers like Kroah-Hartman, Stenberg, and Ptacek
Execution: a simple hold → a defensive-use structure with 12 partners, 40-plus organizations, and $100 million

What didn’t change:

The core thesis: some capabilities need time for society to adapt
The critics’ reaction: “it’ll be replicated anyway,” “they’re overestimating it”
The person at the center: Jack Clark

The decision mocked as “too cautious” in 2019 is starting to look like “the obvious decision” in 2026. It’s not that history repeats — the same logic came back, having met different evidence.

The real test is coming soon. OpenAI is reported to have finished pretraining its next model, codenamed “Spud.” Altman is said to have told people internally that “a strong model that will accelerate the economy” is weeks away. If Spud shows Mythos-level cybersecurity capability, OpenAI’s release strategy will reveal whether Anthropic’s caution becomes the industry standard — or remains the exception.

The protocol the industry discarded seven years ago — can Clark’s idea take root this time?

References

Anthropic Frontier Red Team, Claude Mythos Preview
Anthropic, Project Glasswing announcement
Jack Clark, Testimony before the U.S. House Permanent Select Committee on Intelligence (June 13, 2019)
Jack Clark, Import AI #452
OpenAI, Release Strategies and the Social Impacts of Language Models (2019)
AISLE, AI Cybersecurity After Mythos: The Jagged Frontier
Vidoc Security Lab, We Reproduced Anthropic’s Mythos Findings With Public Models
Thomas Ptacek, Vulnerability Research Is Cooked