top of page

Andrej Karpathy Published an LLM Wiki Pattern: 16 Million Views for a Folder Structure

Every time you ask an AI assistant about something you've been researching for months, it starts from zero. It doesn't remember the papers you asked it to summarize last week. It has no idea that the question you're asking now connects to three other questions from three months ago. The RAG pipeline, which stands for retrieval-augmented generation and is the dominant architecture for giving AI models access to external documents, treats each query as independent. Nothing accumulates. Nothing compounds.

On April 3, 2026, Andrej Karpathy, a co-founder of OpenAI and former head of AI at Tesla who is now an independent researcher, published a post on X describing how he gets around this problem, along with his GitHub Gist laying out the full architecture. The pattern, a three-folder markdown setup where an AI model compiles, maintains, and queries a structured knowledge base without a vector database, generated 16 million views on that single post. For anyone following Andrej Karpathy LLM work, the post landed as more than a technical curiosity. The Gist accumulated 5,000 stars and 485 comments within days.

The number is worth pausing on: 16 million views for a description of a folder structure. That's not a viral product launch. It's developers recognizing a problem they've been working around for a long time, and seeing someone with Karpathy's standing in the AI research community name it precisely.

What Happened: A GitHub Gist With 16 Million Views

The concept Karpathy described is straightforward in structure, even if the implications are not. Three folders: raw/, wiki/, and an index.md file that maps everything and is designed to fit within a single context window.

raw/ holds source material, including research papers, articles, GitHub READMEs, documentation, and YouTube transcripts. Nothing is organized, nothing is processed. It's a dump of everything potentially relevant. wiki/ is where the model comes in: periodically, a model runs over the raw material and compiles it into structured, encyclopedia-style Markdown articles. Each article synthesizes what the raw sources say about a concept, resolves conflicts between them, and builds links to related articles. index.md is the table of contents, a structured map of every article in the wiki, concise enough to fit in a single context load.

When you query the system, the model reads index.md first, identifies which articles are relevant, loads them, and answers from the compiled knowledge rather than the raw source fragments. There is no embedding step, no vector database, no retrieval pipeline. The pattern doesn't retrieve information. It has already synthesized it.

Karpathy's choice of distribution format was itself a statement. He published a Gist, not a repository, with no installable code, no package manager integration, and no README with setup instructions. His reasoning, widely quoted in subsequent coverage: "In the age of LLM agents, sharing an idea is more valuable than sharing code, because the other person's agent customizes and builds it for your specific needs." Within days, the community validated that framing by producing multiple independent implementations: obsidian-wiki plugins, a full LLM-wiki GitHub repository, and a v2 Gist that extended the original pattern with agent memory architecture.

VentureBeat's coverage of the release captured the practical appeal accurately: the Andrej Karpathy LLM knowledge base architecture "bypasses RAG with an evolving markdown library maintained by AI." The framing matters: this isn't a new product or a trained model, it's an architectural pattern for how knowledge should be organized before it ever reaches a model. For developers who have built and maintained RAG pipelines, the prospect of replacing vector databases and embedding infrastructure with a folder of Markdown files is immediately attractive, and that's exactly the reaction the Gist received.

Why the Pattern Matters

The fundamental difference between RAG and this approach is when knowledge assembly happens. RAG assembles context at query time: you ask a question, the system retrieves relevant chunks from a vector store, and the model generates a response from those chunks. The quality of the response depends entirely on retrieval quality, including whether the right chunks were identified, whether they contain enough context, and whether they contradict each other.

The Karpathy approach assembles context at compile time. Before you ask any question, a model has already read all your raw sources, understood their relationships, and written structured articles that synthesize that understanding. When you ask a question, you're not retrieving fragments. You're querying a knowledge structure that was built for understanding rather than for retrieval.

Three practical implications follow from this difference:

Token efficiency. For small knowledge bases, loading an index and two or three wiki articles into context uses approximately 95% fewer tokens than loading equivalent raw source material. The more significant effect is context quality: a well-structured 3,000-word article is far more usable to a model than ten 300-word chunks retrieved from disparate documents.

Transparency. Every fact in an article is traceable to a Markdown file that a human can read, edit, or delete. Vector embeddings are not. If the system gives you a wrong answer, you can find and correct the source article. If a RAG pipeline gives you a wrong answer, the error may be in the embedding space in ways that are not directly accessible.

Knowledge accumulation. This is the property Karpathy emphasizes most. A model that maintains a wiki over months develops a form of institutional memory about the domain, not as a weight change, but as a structured artifact that encodes relationships between concepts. A researcher asking about a new paper's connection to prior work gets a qualitatively different answer from a system that has been compiling related papers for six months versus one that encounters each paper independently at query time.

The practical use cases reflect these properties. Researchers tracking a literature space across dozens of papers get cross-referenced synthesis rather than isolated summaries. Developers maintaining architecture decision records get a system that can trace why a design choice was made across months of context. Product managers tracking competitive intelligence get a repository that accumulates rather than forgets.

Why Most Implementations Quietly Fail

The 16 million views went to the idea. The 485 gist comments went to what breaks when you try to build it.

The most consistent pattern in community feedback is a variation on this: the folder structure gets built correctly, the first compilation produces impressive results, and then over the following weeks the knowledge base slowly becomes unreliable, redundant, or abandoned. This isn't a failure of understanding the concept. It's a failure of the concept to specify its own maintenance requirements.

Knowledge has a lifecycle that the pattern doesn't address. Markdown files, by default, are assumed to be permanently valid. But in practice, the bug you found last week is more relevant than the one from six months ago. The architectural pattern you've seen twelve times is more reliable than the one you've seen once. The original pattern provides no mechanism for representing temporal relevance or decay. Articles written in month one and month six coexist in wiki/ with no indication of which reflects current understanding.

Contradictions accumulate without automatic detection. As the repository grows, new compilation passes introduce articles that conflict with earlier ones. Karpathy's architecture describes a "lint pass," a periodic scan of the entire wiki to identify inconsistencies and resolve them. This works. It also requires someone to run it deliberately and regularly. Most implementations don't sustain this practice.

The scale ceiling is hard and arrives unexpectedly. The approach works reliably below approximately 50,000 to 100,000 tokens of compiled content. Above that threshold, index.md itself no longer fits in a single context window, and the entire query model breaks. For a personal research project across a focused domain, this ceiling may never be reached. For anything approaching team or departmental scale, it's encountered quickly.

The enterprise limitations are structural rather than solvable through iteration. A local folder of Markdown files managed via one person's file system has no access control model, no multi-user conflict resolution, and no audit trail. For teams, this is not a limitation to work around. It's a reason to use a different architecture.

LLM Wiki v2, published on GitHub Gist by developers extending Karpathy's original, explicitly addresses some of these gaps by adding persistence patterns from agent memory research, documenting failure modes at scale, and proposing mechanisms for knowledge lifecycle management. The existence of v2 within weeks of the original confirms that the pattern as described is incomplete for production use.

LLM Wiki vs RAG: When to Use Which

The "RAG is dead" framing that circulated after Karpathy's post is inaccurate in the same way that "email is dead" framings always are. The two approaches are not competitors for the same workload. They're suited to different scales and different requirements.

The Karpathy approach is the right choice when the knowledge base is small enough to compile into under 100,000 tokens, when the primary user is the person who built it or a small team with shared context, and when knowledge accumulation matters more than raw retrieval speed. RAG is the right choice when the knowledge base contains hundreds of thousands to millions of documents, when multiple users with different permission levels need access, or when content changes frequently enough that maintaining a curated compilation is impractical.

For most individual knowledge workers and small teams, this pattern is not just viable; it's better than the RAG infrastructure they would otherwise need to build and maintain. The complexity of setting up a vector database, managing embeddings, tuning retrieval parameters, and debugging retrieval noise is real. For a personal research system with a few hundred documents, that complexity is genuinely unnecessary.

The practical middle ground that Karpathy's workflow represents maps well to tools like Obsidian, where the Web Clipper handles source ingestion and the file system handles the folder structure. The community implementations that appeared within days of the original Gist are overwhelmingly Obsidian-based, which is not a coincidence.

What This Means for How We Build Knowledge Systems

Karpathy's framing of this as an "idea file" rather than a code release reflects something real about how AI development norms are shifting. In 2026, the interesting part of a knowledge management system is not the infrastructure. It is context engineering: how information is structured, synthesized, and made available to a model at query time.

The broader trend this represents is a reorientation of how developers and knowledge workers think about AI tools. The question used to be "how do I make the model smarter?" The question in 2026 is increasingly "how do I structure the information the model has access to?" Karpathy's pattern is one answer to the second question.

Ward Cunningham published the original WikiWikiWeb in 1995 as a pattern, not a product, not a commercial tool, but a description of how to organize collaborative knowledge. It took years for the idea to generalize into what we now call wikis. Karpathy's Gist is in a similar position: a pattern that works for its creator, spreads through community implementation, and generates variations that address its gaps. Analytics Vidhya put it well: the significance is less in the technical implementation and more in the paradigm shift it represents for how practitioners think about AI and persistent knowledge.

The question the 16 million views represent isn't really about a folder structure. It's whether the knowledge you accumulate over months and years actually compounds, or whether every new session starts from scratch. That's a question about architecture, but it's also a question about what kind of knowledge infrastructure you want to build around yourself.

The most honest answer from the community is that Karpathy's pattern works well for the person who built it, in the domain they care about, with the discipline to run lint passes regularly. For everyone else, it works until it doesn't, and then it requires either significant investment in tooling or a more opinionated system that handles the maintenance layer automatically. The discussion thread on the original Gist is itself a useful artifact: 485 comments documenting where the pattern breaks, what fixes people tried, and which extensions actually stuck. Reading it takes less time than building a wiki that fails.

If the problem Karpathy's pattern addresses, knowledge that should accumulate but doesn't, sounds familiar, it's because it's the central problem in personal and team knowledge management. Tools that treat each session as stateless leave valuable context on the table. remio's AI-native second brain is built around the same principle: the value of a knowledge system comes from what it builds up over time, not from what it retrieves on demand. Whether you're building your own pattern or looking for a tool that does the compounding for you, the question worth asking is the same: when you come back to this information in six months, will the AI have remembered anything?

Get started for free

A local first AI Assistant w/ Personal Knowledge Management

For better AI experience,

remio only supports Windows 10+ (x64) and M-Chip Macs currently.

​Add Search Bar in Your Brain

Just Ask remio

Remember Everything

Organize Nothing

bottom of page