DNA Storage Research – Storage of binary data on DNA. Our mission is to unlock DNA as a new medium for archival storage in the cloud.

DNA Data Storage Overview

DNA, the blueprint of life, is not only responsible for encoding genetic information but also holds incredible potential as a data storage medium to store archival data in the long term. Find out more about DNA data storage and the research we do at Imperial in the two videos below which feature our research and the remainder of the content. Do not forget to read our DNA storage FAQ and read about our work in a recent article in Chemistry World to which we contributed!

Read our DNA Storage FAQ

Our Research Focus

DNA data storage starts with translating any type of binary files (documents, movies, music, databases) to a binary representation, encoding the binary representation to sequences while incorporating error correction, and then synthesising the sequence representation to physical DNA molecules. The resulting molecules may be stored for years if not decades after which they are sequenced to recover the data. The resulting sequencing data is aligned, decoded and potential errors are fixed based on incorporated error correction codes to finally obtain the digital files encoded.

Our research focuses on all stages of the DNA data storage cycle:
      – Encoding, decoding and error correction for DNA data storage
      – Optimising synthesis processes for DNA data storage
      – Physical storage of DNA molecules for data storage
      – Sequencing for DNA storage based on Machine Learning/AI methods

Making DNA Data Storage a Reality

Our ultimate goal to make it a staple of modern data centres to store archival data in the very long term.

cropped-cropped-pexels-manuel-geissinger-325229-scaled-1.jpg

DNA Data Storage

DNA Data Storage, the process of storing arbitrary binary data in DNA sequences is actively researched by us and the community. Its properties, primarily durability and compact form factor, make it an ideal storage media for archiving data for decades. Watch the video here to see what makes DNA an ideal storage media and how it all works.

Processing Data in DNA

Once data is stored in DNA, biomolecular processes can be used to process it on an unprecedented scale. More precisely, combinatorial problems (e.g., databases joins, travelling salesperson problems and others) can be solved in DNA very fast thanks to the unprecedented level of parallelism and also very energy efficient comapred to traditional computing. Watch the video here to learn more.

Modelling DNA Storage as a Constrained Channel

Key to storing binary data in synthetic in DNA is the translation between the binary representation of digital data to the quaternary domain of DNA. This translation must adhere to constraints imposed by the synthesis and sequencing processes used to write and read respectively. A technological advance in either process changes the constraints and renders current encoding schemes obsolete. In this line of work we present a recipe for taking constraints and producing an appropriate encoding scheme. Such a mechanism allows moving the encoding in lockstep with the technological advances in the underlying processes. We further show a method to understand trade-offs in constraints for a given overhead of bits needed to meet such constraints

Storing Data in DNA Nanostructures

Synthetic DNA has garnered attention as a storage medium due to its high density and durability. DNA has crucial properties that make it pertinent for archival storage. However, synthetic DNA is currently prohibitively expensive and so alternative forms of DNA storage need to be explored. Given the prohibitive cost of synthetic DNA, our vision is to use DNA nanostructures with molecular bumps to store data—in a similar manner to a compact disc—and to use microscopy, followed by a decoding pipeline based on computer vision and machine learning, to reconstruct the original data. The following picture illustrates our approach.