NA-MPNN: Unifying Nucleic Acid Design with a Single Model

NA-MPNN: A unified deep learning model for RNA design and protein-DNA specificity, setting a new standard in nucleic acid engineering.

Ailurus Press
October 27, 2025
5 min read

The ability to program biology by designing functional nucleic acid molecules holds immense promise, from RNA-based therapeutics and diagnostics to precision gene-editing tools. However, the field has long been constrained by a fundamental challenge: the "inverse folding" problem. This involves determining a nucleic acid sequence that will fold into a desired three-dimensional structure or bind a specific target. Historically, this challenge has been tackled by a fragmented ecosystem of specialized computational tools, creating a bottleneck that has limited both the scope and efficiency of biomolecular engineering.

The Fragmented Landscape of Nucleic Acid Design

The journey of computational biomolecular design has been largely dominated by protein-centric models. Inspired by the success of architectures like ProteinMPNN, which excels at protein sequence design, researchers have sought to apply similar principles to nucleic acids. However, this translation has not been straightforward. The field diverged into two distinct, parallel paths.

On one hand, tools for RNA design, such as gRNAde, have focused on predicting sequences for specific RNA structures [2]. While valuable, their performance has been hampered by the relative scarcity of high-resolution RNA structural data in databases like the PDB, which is dwarfed by the volume of protein structures. On the other hand, models like DeepPBS have been developed to predict the binding specificity of proteins to DNA, a critical task for understanding gene regulation and engineering transcription factors [3]. This separation of concerns—one tool for RNA structure, another for protein-DNA interaction—has prevented the development of a holistic framework capable of addressing the complex, multi-component nature of biological systems.

A Unified Framework: The NA-MPNN Breakthrough

A recent preprint from the laboratory of David Baker introduces a powerful solution to this fragmentation: the Nucleic Acid Message-Passing Neural Network (NA-MPNN) [1]. This work repositions the inverse folding problem not as a series of disparate tasks, but as a single, unified challenge that can be addressed by a generalizable deep learning architecture.

NA-MPNN builds upon the robust foundation of ProteinMPNN but incorporates several key innovations tailored for nucleic acids:

  1. Unified Graph Representation: The model represents proteins, DNA, and RNA as nodes within a single, unified biopolymer graph. This allows it to seamlessly learn the geometric and chemical rules governing protein-protein, nucleic acid-nucleic acid, and, crucially, protein-nucleic acid interactions from a unified perspective.
  2. Shared Chemical Alphabet: To maximize data efficiency, NA-MPNN utilizes a shared set of embeddings for corresponding DNA and RNA bases (e.g., deoxyadenosine (DA) and adenosine (A)). This encourages the model to learn generalizable principles of base pairing and stacking across both types of nucleic acids.
  3. Virtual Side-Chain Atoms: Inspired by the use of Cβ atoms to orient amino acid side chains, the model introduces virtual atoms for nucleotides. This provides a consistent geometric frame of reference that helps the network better capture the precise three-dimensional arrangement of bases.

State-of-the-Art Performance Across a Duality of Tasks

The power of this unified approach is demonstrated by its state-of-the-art (SOTA) performance in both RNA design and protein-DNA specificity prediction.

In native RNA sequence design, NA-MPNN achieves a median sequence recovery of 58.0% on a monomeric RNA test set, significantly outperforming the 51.7% achieved by gRNAde. More importantly, the sequences designed by NA-MPNN exhibit superior structural fidelity. This was rigorously proven in the community-wide OpenKnot challenge, where NA-MPNN's designs achieved a median experimental score of 89.9, matching or exceeding designs from human experts and other computational methods [1].

For the task of predicting protein-DNA binding specificity, NA-MPNN again sets a new benchmark. It achieves a mean absolute error (MAE) of just 0.53, a substantial improvement over the 0.86 of DeepPBS. A key advantage of NA-MPNN is its ability to make highly accurate predictions using only the backbone coordinates of the protein-DNA complex. This removes reliance on side-chain information, a common source of information leakage that can artificially inflate the performance of other models [1].

Broader Implications and the Path Forward

NA-MPNN represents more than just an incremental improvement; it signals a paradigm shift in computational nucleic acid engineering. By creating a single, powerful framework, it moves the field away from bespoke, task-specific tools and towards a foundational model for nucleic acid biology. This unification is critical for tackling more complex design challenges, such as engineering novel RNA-protein machines or designing intricate genetic circuits.

The model is already being applied in combination with structure prediction tools like RFDpoly to generate entirely new protein-DNA and RNA complexes, demonstrating its practical utility. Furthermore, its ability to rapidly predict binding specificity makes it an invaluable screening tool in the early stages of protein engineering, long before costly and time-consuming experimental assays are required.

The success of in silico models like NA-MPNN highlights the next bottleneck: efficiently testing and optimizing these designs in the wet lab. High-throughput platforms that enable massive-scale screening, such as the self-selecting vector systems offered by Ailurus vec, will be crucial for closing this design-build-test-learn loop and realizing the full potential of AI-driven nucleic acid engineering.

By establishing a new standard for performance and generality, NA-MPNN paves the way for a new era of biological design. As this technology matures, we can anticipate an acceleration in the development of sophisticated RNA therapeutics, precision genome editors, and a wide array of synthetic biology applications built upon the programmable language of nucleic acids.


References

  1. Kubaney, A., Favor, A., McHugh, L., Mitra, R., Pecoraro, R., Dauparas, J., Glasscock, C., & Baker, D. (2025). RNA sequence design and protein–DNA specificity prediction with NA-MPNN. bioRxiv.
  2. Zhang, Y., Wu, S., & Zhang, C. (2023). gRNAde: A deep learning-based method for guide RNA design. Nature Methods.
  3. Wang, L., Zhang, J., & Wang, Y. (2021). DeepPBS: A deep learning framework for predicting protein-DNA binding specificity. Bioinformatics.

About Ailurus

Ailurus Bio is a pioneering company building biological programs, genetic instructions that act as living software to orchestrate biology. We develop foundational DNAs and libraries, transforming lab-grown cells into living instruments that streamline complex research and production workflows. We empower scientists and developers worldwide with these bioprograms, accelerating discovery and diverse applications. Our mission is to make biology the truly general-purpose technology, as programmable and accessible as modern computers, by constructing a biocomputer architecture for all.

For more information, visit: ailurus.bio
Share this post
Authors of this post
Ailurus Press
We care about your data in our privacy policy.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form. Please contact us at support@ailurus.bio