Rxivist logo

Content-Based Similarity Search in Large-Scale DNA Data Storage Systems

By Callista Bee, Yuan-Jyue Chen, David Ward, Xiaomeng Liu, Georg Seelig, Karin Strauss, Luis Ceze

Posted 27 May 2020
bioRxiv DOI: 10.1101/2020.05.25.115477

Synthetic DNA has the potential to store the world's continuously growing amount of data in an extremely dense and durable medium. Current proposals for DNA-based digital storage systems include the ability to retrieve individual files by their unique identifier, but not by their content. Here, we demonstrate content-based retrieval from a DNA database by learning a mapping from images to DNA sequences such that an encoded query image will retrieve visually similar images from the database via DNA hybridization. We encoded and synthesized a database of 1.6 million images and queried it with a variety of images, showing that each query retrieves a sample of the database containing visually similar images are retrieved at a rate much greater than chance. We compare our results with several algorithms for similarity search in electronic systems, and demonstrate that our molecular approach is competitive with state-of-the-art electronics. ### Competing Interest Statement C.B., Y.C., G.S, K.S, and L.C. have filed a patent application on the core idea. K.S. and Y.C. are employed by Microsoft.

Download data

  • Downloaded 1,581 times
  • Download rankings, all-time:
    • Site-wide: 5,306 out of 88,613
    • In bioengineering: 101 out of 1,973
  • Year to date:
    • Site-wide: 697 out of 88,613
  • Since beginning of last month:
    • Site-wide: 465 out of 88,613

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)