Re Claims of catastrophic and existential risk from general-purpose AI
Dear Anthropic and AI/LLM research community colleagues,
I and many of my colleagues in the security community are tired of AI theater and everyone involved in it. “We created another existential threat…” but we are going to give that existential threat to the most powerful corporations on the planet first. This is a clear and terrifying misalignment.
You have publicly framed Claude Mythos Preview as an extraordinary step—capabilities in the neighborhood of frontier offensive security research and vulnerability discovery—while simultaneously announcing Project Glasswing, which routes early access to that model through a closed consortium of a dozen of the world’s largest technology and finance firms. If the risk story is sincere, that ordering is backwards: the scientific community, standards bodies, and universities should help define and verify the threat model before “pay to play” becomes the default path for the most sensitive capability.
The same materials turn around and invite reassurance in language like this—quoted verbatim from the system preview card for Claude Mythos Preview:
Similarly, Claude Mythos Preview shows a dramatic reduction in the frequency of unwanted high-stakes actions that the model takes at its own initiative, with behaviors related to deception falling by more than half relative to Opus 4.6, and continued improvements on recent models’ already good behaviors in areas like self-preservation and power-seeking.
That is the point: the public is asked to hold “existential misalignment” and “dramatic reductions in deception, self-preservation, and power-seeking” in the same story, without the replication, protocols, or failure thresholds that would let anyone else tell which narrative the evidence actually supports.
Separately, your published Claude’s Constitution—the rules you say should shape the model’s judgment—includes this passage, quoted verbatim:
But we want Claude to be cognizant of the risks this kind of power concentration implies, to view contributing to it as a serious harm that requires a very high bar of justification, and to attend closely to the legitimacy of the process and of the actors so empowered.
Project Glasswing is a live instance of concentrated access to an extraordinarily capable system. The “very high bar of justification” and attention to “the legitimacy of the process and of the actors” are not something a closed vendor narrative can self-certify—that is the work of independent replication, adversarial evaluation, and the institutions this letter names.
I don’t ask the next question to insult or out of frivolity, but rather to show that the message is inconsistent—and we need high assurance when it comes to these matters. Only one of two things can be true: either you are a responsible AI company or you are not. I would not classify a company that runs a pay-to-play program to pilot a potentially devastating, self-jailbreaking technology without working extensively with the scientific community first as having met that bar.
What do you think the scientific community exists for?
If you are being sincere, then you must show us something designed to survive external scrutiny. If we have truly crossed a threshold for existential threats, then now is not the time to unleash it to select companies. Anthropic: get the scientific community involved. Go talk to the National Institute of Standards and Technology (NIST). Get universities involved. You claim you want to be the secure, responsible AI vendor. Act like it.
Concretely—still regarding Claude Mythos Preview, Project Glasswing, and the standard you ask the world to accept—we offer the following recommendations. They overlap on purpose: the failure mode they target is a single one, namely high-stakes claims without high-assurance evidence.
- Shift the burden of proof. Do not ask observers to accept extraordinary capability on presentation while treating controllability as a footnote. Demand—and supply—evidence that safeguards hold under agentic, long-horizon use; that the system resists self-jailbreaking and goal decomposition; and that behavior is stable across model versions and drift, not only on curated snapshots. Show controllability under the same conditions you showcase capability.
- Separate access control from safety. Restricting who may call an API is governance of distribution, not proof of containment. If behavioral limits are real, they must be enforced by the system and the stack—not inferred from the respectability of the partner list. Who gets it is not a safety mechanism.
- Use your own uncertainty honestly (RSP). Your Responsible Scaling Policy and related materials acknowledge that evaluation is ambiguous and that the underlying science is still immature. Taken seriously, that admission rules out treating internal judgment alone as sufficient for existential or civilization-scale claims. What follows is obligatory: independent validation layers—NIST-class rigor, academia, and neutral evaluators—not another internal slide deck.
- Distinguish disclosure from legitimacy. No one is asking you to publish live exploit chains in a blog post. What we are asking for is legitimate scrutiny: independent testing, adversarial evaluation, and predefined protocols everyone can see before the scores are announced. Withholding operational detail is not the same thing as avoiding scrutiny.
- Do not confuse a commercial consortium with public-interest governance. This is not a claim that large enterprises are “bad.” It is a claim that a closed commercial consortium optimizes for a different set of interests than democratic or scientific legitimacy. If the risk is exceptional, the governance of that risk must be exceptional too—NIST, universities, and credibly neutral third parties in the loop, not only consortium members. If risk is exceptional, governance must be exceptional.
- Make self-jailbreaking decisive. The decisive question is whether safeguards survive autonomous iteration, task decomposition, and persistence—the very behaviors you promote when you market “agentic” use. If they do not, the story that staged access equals containment collapses: you are describing distribution, not control. If it can route around its own controls, this is distribution—not containment.
- Separate the markets you are mixing. A capability market aimed at defense and advantage tolerates opacity, speed, and asymmetric information. Enterprise and public deployment require reliability, auditability, and bounded behavior under stress. The letter’s concern is that these models are being blended in rhetoric and product without an honest accounting of which rules apply to which buyer.
- Reframe what is actually being competed on. The public argument is too often cast as hype versus reality. The sharper description is capability competition versus adoption and governance competition. The worry is that vendors are optimizing the former while selling into environments that require the latter.
- Demand concrete artifacts—not adjectives. Among the things that should exist in public or semi-public form: adversarial red-team results from external teams; a serious failure-mode taxonomy; explicit control-break thresholds; reproducible evaluation harnesses; and, where you claim traceability, audit logs or other guarantees an outsider can reason about. (This subsumes what we have elsewhere called independent replication, public or semi-public protocols, shared challenge problems, and transparent limitation reporting.)
- Independent evidence of controls. Grant, for argument, that Claude Mythos Preview is as capable as claimed. Narrow the issue: it is not a cartoon choice between “release” and “no release.” Where is the independent evidence that your controls hold under the exact autonomous behaviors you are promoting?
That is the lever.
You’ve demonstrated capability. You have not demonstrated control under that capability.
Without movement on these fronts, Mythos-class claims remain indistinguishable from theater—no matter how sober the press release sounds.
Stop what you are doing, and go to science. This is my call to action. This is not just about Anthropic—it is an open letter to every LLM and AI research firm that believes it may be building an existential threat.
You built a remarkable LLM, but it seems you no longer remember why the scientific community exists. It is time to take corrective measures.
Respectfully,
cablepull aka Mehmet Yilmaz
P.S. — Verifiable material for other claims on this site is indexed under Receipts Available.