Expert Speak Digital Frontiers
Published on Apr 13, 2023
Data storage in DNA could help replace large databanks and reduce the climate impact of data storage in the long run
 Archiving the internet in DNA The increasing efficiency of input, output, processing, and data storage measures innovation in the technology field. Data storage remains one of the core competencies required in any technological innovation. Blockchain technologies, at a point, offered a plausible solution to secure data storage. However, the significant climate impact of blockchain technologies have made it less desirable. Since the 1950s, data has been stored on reels of magnetic tape, with few other innovations providing viable alternative solutions that are space- and cost-friendly. As a result, data storage has remained relatively unchanged, using binary (0 and 1) as its foundation. As of 2021, there are approximately 10 trillion gigabytes of data, which is still increasing daily. Storing this data on magnetic tape takes up space. In addition, it can dilapidate over time, consuming multiple resources—like space, climate control, fuels, magnetic tapes, and metals—in large numbers and regularly so to avoid data loss.

Data storage in DNA 

In the last decade, the field of bioengineering has begun researching the possibility of storing data as DNA. DNA, a well-researched and understood structural phenomenon, offers a similar structure as binary code. The molecules that create DNA are of four types, adenine (A), thymine (T), guanine (G), and cytosine (C). These four chemicals are the structural base of all DNA codes, providing the foundation for creating life on Earth. These molecules offer a similar base as binary, thus, allowing an alternative to storage. Theoretically, DNA, due to its size advantage, can store all of the world’s data in the sizing equivalent of a coffee mug, as compared to magnetic tape storage, which would require space in the amounts of multiple football fields.
Theoretically, DNA, due to its size advantage, can store all of the world’s data in the sizing equivalent of a coffee mug, as compared to magnetic tape storage, which would require space in the amounts of multiple football fields.
The shapes and structure of DNA molecules only allow pairs between A and T, and, C and G, and occur in complementary sequences, called Watson-Crick complementarity. This pairing indicates that the existence of one pair can accurately predict which pair will follow. Thus, even in reconstruction, the pairings and, therefore, the binary code can be reproduced accurately without loss of data. That is, the pairing requirement means DNA sequencing can be predicted during cloning and reproduction, and the synthetic DNA data entered will be copied accurately. As a result, DNA can be managed without high maintenance costs or effort and does not require physical filing systems. Furthermore, the data stored in DNA are easily replicated for negligible cost, even across generations of DNA reproduction, without data loss or corruption. DNA storage is currently an expensive venture that also requires a significant investment to extract data from DNA; however, it is potentially more cost-, energy-, and time-efficient for archival purposes. Furthermore, if adequately held in salt, it can be preserved for decades without excessive climate control, lasting longer than the data in controlled data centres. Data stored in plant seeds, for example, will not require temperature control and ensure longevity and can be replicated easily without loss of data.

Limitations and alternatives 

While DNA storage is a significant leap forward for green solutions and technological innovation, there are drawbacks. Primarily, DNA synthesis methods depend on organic chemistry and are novel innovations; the process still needs to be made less expensive and has a long way to go before it becomes a daily solution for regular users. Further, not only does it consume high monetary investment, but it also is time-consuming, both to sequence data into DNA and to extract data. For these two reasons, the current recommendation is to use DNA storage for archival purposes or to create backups for critical and sensitive data, like blockchain passwords. Additionally, while DNA data storage doesn’t require climate control for naturally occurring DNA, like in plants, synthetic DNA requires cold and dark storage, reintroducing the problem of storage spaces as with magnetic tapes. DNA-based data storage is a frequently suggested replacement for traditional electronic data because of its durability, low cost, and space requirements. However, since artificial DNA and microorganisms require modification and genetic coding for their storage, an alternative approach, i.e., storing data in seeds is recommended instead of storing in plants or synthetic data.
While DNA data storage doesn’t require climate control for naturally occurring DNA, like in plants, synthetic DNA requires cold and dark storage, reintroducing the problem of storage spaces as with magnetic tapes.
Storing data in plant seeds is the most economical and sustainable solution for data storage currently. Furthermore, this solution counters the previously mentioned limitations since sources are self-preservatory and can be stored without excess synthetic protection. Despite these drawbacks, due to upcoming alternative innovations in the same field, in the last five years, many companies have also begun selling this service, both storing data in naturally occurring DNA and synthetic DNA.

Government investment in DNA data storage 

In the United States (US), a molecular informatics programme was announced in 2017, which funded research for DNA-based data storage; another similar programme called Molecular Information Storage Technology (MIST) 2018 also aims to scale DNA data storage to writing 1 TB and reading/retrieving 10 TB of data a day. In the European Union (EU) as well, The European Bioinformatics Institute has received funding for research in this field since 2013.
These innovations are critical to DNA data storage and are set to significantly influence the data storage industry.
Government funding in these research fields indicates that technological innovations will innovate data reading devices that can process data much faster than what is possible today. In addition, these devices must possess random access retrieval methods and operate customisable DNA storage devices. These innovations are critical to DNA data storage and are set to significantly influence the data storage industry. While DNA data storage is not a solution for everyday or commercial use in the short term, it does allow archives to become more sustainable and assist in the journey towards climate efficiency and economical use of space and other resources when it comes to the fundamental of technology growth, i.e., data and data storage. Before this innovation becomes a part of everyday technology, it can, thus, be incorporated into archiving and record maintenance, replacing the large databanks that exist to hold the world’s information so far. Additionally, this will combat the even more significant issue of the climate impact of data storage and blockchain technologies, reducing both the economic and social costs of data centres in the long run.
The views expressed above belong to the author(s). ORF research and analyses now available on Telegram! Click here to access our curated content — blogs, longforms and interviews.

Author

Shravishtha Ajaykumar

Shravishtha Ajaykumar

Shravishtha Ajaykumar is Associate Fellow at the Centre for Security, Strategy and Technology. Her fields of research include geospatial technology, data privacy, cybersecurity, and strategic ...

Read More +