AlphaGenome Released: DeepMind’s 1Mb Context AI for Gene Regulation
- Ethan Carter

- Jan 29
- 6 min read

On January 28, 2026, Google DeepMind released AlphaGenome, a new AI system designed to predict how DNA sequences control gene expression. Unlike the pure DNA language models we have seen recently, this tool focuses specifically on mapping sequence to function. It arrives with a significant technical headline: a context window of one million base pairs (1Mb), allowing it to analyze the genome at a scope and resolution that previous tools struggled to achieve.
For researchers and bioinformaticians, the arrival of AlphaGenome shifts the landscape of genomic deep learning models. It moves beyond predicting the next nucleotide to predicting actual biological readouts like RNA-seq and ATAC-seq. This distinction matters because it addresses the practical needs of labs trying to interpret the "dark genome"—the 98% of our DNA that doesn't code for proteins but regulates them.
The Community Verdict: AlphaGenome vs Evo 2

Before diving into the architecture described in the Nature paper, it is worth looking at how the bioinformatics community is actually reacting to this release. The primary comparison drawing attention on platforms like Reddit is between AlphaGenome and Evo 2, developed by the Arc Institute.
The consensus emerging from early user discussions is that while both models ingest DNA, they solve fundamentally different problems. Evo 2 operates as a DNA language model (DNALM), trained to predict the next token in a sequence. AlphaGenome is a sequence-to-function model. It takes a sequence and predicts experimental outputs.
User experience reports from the computational biology community highlight a specific frustration with existing DNALMs. Several users noted that Evo 2, while theoretically impressive, can be a "pain to use" in practical workflows. Reproducing results from the Evo 2 paper has proven difficult for independent labs, and its performance on standard genomic prediction tasks has not consistently hit state-of-the-art levels.
AlphaGenome is being received less as a theoretical breakthrough and more as a massive engineering achievement. DeepMind has effectively solved the "macro vs. micro" problem. Previously, models had to choose between seeing a wide context (millions of bases) or seeing high resolution (single bases). AlphaGenome manages both. It integrates the robust engineering frameworks seen in previous DeepMind projects, offering a more stable, albeit less open, tool for immediate research application.
Analyzing the 1Mb Context Window in AlphaGenome

The technical core of AlphaGenome is its ability to process a 1Mb context window with single-base resolution. This is a substantial jump from models like Borzoi, which operated with a context of around 500,000 bases.
This expanded window is not just a vanity metric. Gene regulation is a long-distance affair. Enhancers—short regions of DNA that increase the likelihood of transcription—can be located hundreds of thousands of base pairs away from the gene they regulate. If a model’s window is too narrow, it simply cannot "see" the connection between a remote mutation and a gene's malfunction.
By doubling the context window, AlphaGenome captures these long-range interactions. According to the release data, the model can simultaneously predict:
5,930 human genetic signals.
1,128 mouse genetic signals.
Outcomes including gene expression, splicing, chromatin profiles, and chromosomal contact maps.
In benchmarking against 26 standard tasks, AlphaGenome matched or outperformed current state-of-the-art models in 25 categories. This suggests that the architecture is capturing genuine biological signal rather than just overfitting to training data.
AlphaGenome for Non-Coding DNA Analysis
The primary utility of AlphaGenome lies in decoding the non-coding regions of DNA. Since only about 2% of the human genome codes for proteins, the vast majority of disease-associated variants sit in the non-coding regions. These areas act as the operating system of the cell, determining when and where genes are turned on.
Understanding this "dark genome" has been a stumbling block for clinical genetics. A mutation in a coding region is often easy to interpret (e.g., it breaks a protein). A mutation in a non-coding region is cryptic. It might do nothing, or it might disable a critical regulatory switch.
AlphaGenome offers a method to predict the effect of these single variants. By feeding the model a reference sequence and a mutated sequence, researchers can check for predicted differences in RNA-seq or ATAC-seq signals. This variant effect prediction capability is what transforms the model from a biological curiosity into a potential tool for identifying pathogenic mutations in rare diseases or cancer.
Assessing AlphaGenome API Access and Open Source Status

DeepMind’s distribution strategy for AlphaGenome follows a pattern established with AlphaFold 3, which has sparked debate regarding open science.
The code for the model architecture has been made available on GitHub via a JAX implementation (google-deepmind/alphagenome). However, the model weights—the massive numerical parameters learned during training—are not fully open source. Instead, access is gated through an API for non-commercial research use.
This approach creates a specific user requirement. Researchers can define inputs and get predictions, but they cannot run the full model locally on their own clusters without the weights. The community response has been mixed. While the API ensures accessibility for those without massive compute resources, it frustrates power users who need to modify the model or run high-throughput privacy-sensitive screens that cannot leave local servers.
There is a strong demand for a "weights-available" release similar to the early days of AlphaFold 2. The current setup mimics the AlphaFold Server model: powerful and free for academics, but ultimately a "black box" regarding the underlying parameters.
Where AlphaGenome Still Struggles
Despite the high benchmark scores, AlphaGenome is not a universal solver for genetics. The Nature paper and subsequent expert commentary point out distinct limitations.
Anshul Kundaje’s lab, a leader in this field, noted that while the model is an improvement over Borzoi, it still struggles with cell-type specific regulation. Predicting how a gene behaves generally is different from predicting how it behaves specifically in a liver cell versus a neuron. AlphaGenome improves on this, but the variance in individual gene expression between people remains hard to capture perfectly.
Furthermore, the model is strictly a research tool. It is not ready for clinical settings. The predictive scores can sometimes exaggerate the risk of certain genetic changes, leading to false positives if used for patient diagnosis without wet-lab validation.
Technical Requirements for Predicting Gene Regulation
For those looking to integrate AlphaGenome into their workflow, understanding the input-output requirements is essential.
The model relies heavily on data from the ENCODE consortium. It does not just read letters; it maps them to specific assay results. If you are designing an experiment to validate AlphaGenome’s predictions, you need to be looking at the same metrics:
RNA-seq: For gene expression levels.
ATAC-seq: For chromatin accessibility (is the DNA tightly wound or open?).
Hi-C: For 3D genome structure and contact maps.
Because the model predicts these specific experimental modes, it acts as a "virtual assay." You can simulate thousands of mutations in silico to see which ones alter the chromatin accessibility, filtering down the candidates before spending money on actual reagents.
The Future of Genomic Deep Learning Models

AlphaGenome represents a consolidation phase in genomic AI. We are moving away from the initial excitement of "can AI read DNA?" to the engineering reality of "can AI reliably predict experiments?"
The move to a 1Mb window with single-base resolution sets a new standard. Future models will likely need to match this scope to remain relevant. The competition between sequence-to-function models (like AlphaGenome) and generative language models (like Evo 2) will likely define the next year of research.
For now, AlphaGenome holds the advantage in specific, verifiable predictions. It is a tool built for the biologist who needs a specific answer about a specific variant, rather than a generative model designed to dream up new sequences. As the API sees wider use, we will likely see a surge in papers utilizing it to screen for non-coding drivers of disease, provided the community can navigate the limitations of the closed-weight system.
FAQ
What is the difference between AlphaGenome and Evo 2?
AlphaGenome is a sequence-to-function model that predicts biological assay results like gene expression. Evo 2 is a DNA language model (DNALM) designed to predict the next DNA sequence token. They have different architectures and intended uses, with AlphaGenome being more specialized for predictive benchmarks.
How can I access AlphaGenome for my research?
You can access AlphaGenome through a Google DeepMind API provided for non-commercial research. While the JAX code is available on GitHub, the model weights are not fully open-source for local execution.
Does AlphaGenome work on non-coding DNA?
Yes, analyzing non-coding DNA is its primary strength. It can predict how mutations in the 98% of the genome that does not code for proteins affect gene regulation, chromatin structure, and splicing.
What is the context window of AlphaGenome?
AlphaGenome features a context window of one million base pairs (1Mb). This allows it to identify long-range interactions between genes and regulatory elements that are far apart on the chromosome.
Is AlphaGenome ready for clinical use?
No. While it performs well on benchmarks, it is currently a research tool. It should not be used for medical diagnosis or patient care as it can still produce errors regarding the severity of specific genetic variants.


