OpenMythos: Someone Reverse-Engineered Anthropic's Most Dangerous AI. It Got 10,000 GitHub Stars in Two Weeks.

Aisha Washington
6 days ago
6 min read

Anthropic decided not to publicly release Claude Mythos, its most capable model, because internal testing found it could autonomously discover zero-day vulnerabilities in every major operating system and browser. The decision was announced in April, and the model was locked inside Project Glasswing — a vetted consortium of twelve organizations including Apple, Google, and Microsoft. Two weeks later, a 22-year-old developer named Kye Gomez published OpenMythos on GitHub: a from-scratch reconstruction of what Mythos's architecture might be, built entirely from public research papers. The repo has no trained weights, no access to Anthropic's proprietary systems, and no official connection to the company. It also has over 10,000 GitHub stars.

The stars are not a measure of the project's accuracy. They are a measure of what happens when you tell the AI community that a model is too dangerous to release.

What OpenMythos Actually Is

Gomez spent weeks reading every public paper he could find on recurrent transformer architectures, inference-time scaling, and the behavioral patterns researchers had reported from Claude Mythos Preview. His core hypothesis is that Mythos is a Recurrent-Depth Transformer (RDT) — sometimes called a Looped Transformer — rather than a conventional stacked-layer architecture.

The distinction matters. A standard transformer model gets deeper by adding more unique layers stacked in sequence. Each layer processes the input once, then passes it to the next. A Recurrent-Depth Transformer takes a smaller set of layers and runs them through the same input multiple times. The depth comes from repetition rather than accumulation. The analogy is the difference between consulting twenty different experts sequentially versus consulting the same expert twenty times until the answer converges. For reasoning tasks that benefit from iterative refinement, the recurrent approach can match or exceed deeper fixed architectures at smaller parameter counts.

OpenMythos implements this as three functional blocks: a Prelude (standard transformer layers, run once), a Recurrent Block (a set of layers run N times, where N can vary per position), and a Coda (standard transformer layers, run once). Parcae, a paper from UCSD and Together AI published in April 2026, provided the primary technical backing for the RDT hypothesis: a 770 million-parameter Parcae model matched the quality of a 1.3 billion-parameter standard transformer on standard benchmarks.

OpenMythos ships with equations, citations, and a polite disclaimer that it has nothing to do with Anthropic. The code runs. There are no weights. It is a building with floor plans and no materials, based on what a building inspector could infer from the outside.

Why Anthropic Locked the Original

Understanding why OpenMythos matters requires understanding what made Anthropic choose not to release Mythos.

The Claude Mythos system card records what happened during internal testing. The model, when directed to find security vulnerabilities, identified thousands of zero-day flaws — previously unknown software weaknesses — across every major operating system and every major web browser. The model was capable of constructing multi-step attack chains: not just finding a vulnerability but designing a sequence of actions that would exploit it. The system card cites a 32-step network attack sequence as one documented example.

Anthropic's conclusion was that the cybersecurity capabilities could not be selectively disabled without fundamentally degrading the model's reasoning abilities. The features that let Mythos find software flaws are the same features that let it reason well across domains. You cannot turn off one without affecting the other. The model that discovers zero-day vulnerabilities and the model that helps researchers synthesize complex literature are the same model.

Project Glasswing was the compromise. Twelve organizations — all of them with demonstrated security infrastructure and accountability — get controlled access to Mythos for defensive purposes. Anthropic committed up to $100 million in usage credits and $4 million in direct donations to open-source security organizations. The logic: the only responsible use of an offensive cybersecurity tool at this capability level is to use it to fix the vulnerabilities before someone else finds them.

This is the backdrop against which OpenMythos appeared. Anthropic made a decision that the capabilities were too dangerous for unrestricted release. A developer decided to reconstruct the architecture from public information to understand what those capabilities might be built on.

What the 10,000 Stars Mean

Gomez posted about OpenMythos on X with a thread explaining the architecture. He later wrote that he "never expected this post to blow up at all" and that the project was meant as a research artifact, not a public statement. The repo's stars accumulated faster than most deliberate viral launches in the AI space.

The speed reflects a specific kind of demand. When a frontier model is locked away for safety reasons, the AI community does not generally accept the lock as the end of the conversation. Meta's open-source Llama models were built partly because the community wanted what was being kept behind API gates. Stable Diffusion became dominant partly because it gave the image generation community what DALL-E had made inaccessible. OpenMythos is the architectural equivalent: not the capabilities, but the blueprint.

The critical difference is that OpenMythos, even if Gomez's architectural hypothesis is correct, does not transfer Mythos's capabilities. The weights — the trained parameters that encode everything the model learned — are not here. A correct architecture reconstruction without weights is a map of a territory no one can enter. You could reproduce the Recurrent-Depth Transformer exactly, train it from scratch with sufficient compute and data, and potentially build something with similar architectural properties. But that is a multi-year, billion-dollar research program, not a GitHub clone.

The security concern is therefore not that OpenMythos immediately enables dangerous capabilities. It is that the project demonstrates how quickly motivated developers can synthesize public information into structured architectural hypotheses. If Gomez's reconstruction is substantially correct, Anthropic's decision not to publish a technical paper has not prevented the architecture from becoming at least partially public. It has only raised the effort required to get there.

The Broader Question OpenMythos Raises

AI safety research has long debated the relationship between model capability and architectural knowledge. The concern is not usually about the architecture itself — knowing that a building uses steel-reinforced concrete does not let you build one without resources. The concern is about what architectural knowledge accelerates.

Cloud Security Alliance's analysis of the Mythos situation noted that "capability diffusion may outpace centralized control when motivated developers and accessible compute combine." The OpenMythos timeline is consistent with this framing. Anthropic's decision to withhold Mythos was made in April. The architecture reconstruction appeared on GitHub within weeks, accumulated 10,000 stars, spawned multiple forks, and generated peer analysis from academic researchers — all before any official Anthropic comment.

The question is not whether Anthropic made the right call in locking Mythos. Most security researchers who reviewed the system card appear to agree that a model capable of autonomously constructing multi-step network attack sequences should not be generally available. The question is whether the lock accomplishes what it is designed to accomplish when the community's response is to reconstruct the architecture from public literature.

A related consideration: OpenMythos is fully transparent about what it is. The README makes clear that it is a theoretical reconstruction with no actual weights, no Anthropic connection, and no verified accuracy. The community knows what they are getting. The concern is not OpenMythos itself — it is the precedent that what is withheld can be partially recovered through motivated inference.

This is the pattern with most major AI safety decisions. The safety community debates the decision. The open-source community begins working around it. Both conversations produce useful information. Neither resolves the underlying tension between capability diffusion and access control. Anthropic's $100 million commitment through Project Glasswing is an attempt to use Mythos's capabilities defensively before that diffusion reaches a level where it cannot be managed.

What OpenMythos Gets Right About AI Architecture

Putting aside the question of whether the architecture is accurate, the Recurrent-Depth Transformer hypothesis that underlies OpenMythos is genuinely interesting as a research direction.

Standard transformer architectures achieve capability primarily through parameter count and context length. More parameters, better model — with predictable scaling laws. The RDT approach challenges this by asking whether depth through iteration can substitute for depth through accumulation. The Parcae paper's result — a 770M-parameter model matching a 1.3B-parameter baseline — suggests the substitution is real, at least at those scales.

For the field broadly, this matters because it suggests that the relationship between model size and capability may be more malleable than scaling laws imply. A recurrent architecture that iterates efficiently might achieve frontier-level reasoning at a fraction of the parameter count, which has significant implications for inference cost and latency. Whether Anthropic actually uses this approach in Mythos is unknown. That the community is now exploring it because of Mythos is not.

Gomez's project, independent of its accuracy, has introduced the RDT hypothesis to a much larger audience than the academic papers that inspired it. The 10,000 GitHub stars represent people who read the README, ran the code, and began thinking about what recurrent depth might enable. Some of them will build on it. Some will find it wrong and publish corrections. The feedback loop between a locked model and an open reconstruction has already produced more public discussion of inference-time architectural choices than the Mythos announcement itself.

Whether you work with AI tools for research, writing, or knowledge synthesis, the OpenMythos situation highlights something worth tracking: the gap between what frontier AI can do and what is publicly accessible is closing, but not in the direction most people expect. The capabilities diffuse not through the models but through the architectural understanding they generate. Tools that help capture and connect AI research insights — across papers, discussions, and repository documentation — are increasingly necessary to follow this field. The Mythos story is being written in system cards, GitHub readmes, and academic preprints simultaneously. The question is who is reading all three.

OpenMythos: Someone Reverse-Engineered Anthropic's Most Dangerous AI. It Got 10,000 GitHub Stars in Two Weeks.

What OpenMythos Actually Is

Why Anthropic Locked the Original

What the 10,000 Stars Mean

The Broader Question OpenMythos Raises

What OpenMythos Gets Right About AI Architecture

Recent Posts

Get started for free

Features

Alternatives

Solutions

Resources

Company