HITInfrastructure

Storage News

Genomic Data Compression Eases Cloud Data Migration, Storage

Geneformics latest release focuses on compressing clinical data for precision medicine and eases cloud data migration and storage.

Genomic data compression eases cloud data migration and storage.

Source: Thinkstock

By Elizabeth O'Dowd

- Geneformics Data Systems announced the release of its distributed cloud compression solution Geneformics D to improve cloud data migration speeds and data storage.

Geneformics D aims to increase upload and download speeds, storage, and archiving to assist healthcare organizations in processing, storing, and retrieving data for precision medicine.

“Precision medicine is making the migration of genomic processes to a cloud infrastructure attractive for healthcare and research organizations, due to scalability, sharing features and ease of use,” Geneformics CEO Rafael Feitelberg said in a statement. “Now organizations can seamlessly scale with genomic compression in the cloud, while accelerating analyses and reducing storage requirements.”

Geneformics D helps clinicians and researchers access from population sequencing, gene banks, hereditary, rare disease, cancer, and genomics-based pharmaceutical efficacy research. The compression saves on bandwidth, time, and storage space.

The tool is integrated into the cloud infrastructure as an Infrastructure-as-a-Service (IaaS) tool and is based on a lossless compression technology co-developed with the Weizmann Institute of Science. It implements a distributed, compressed file system for the cloud and jointly implements a compressed object-as-file standard file system for Linux.

Geneformics D is currently available on AWS and will be available for other cloud service providers in the future. Features include:

  • Intelligent caching of decompressed file segment on instance-attached disks accelerating processes.
  • Seamless, virtually unbounded scale-out with the compute infrastructure.
  • Patent-pending, automatic management of genomic storage on the cloud, directing data to the most cost-effective storage tier. This provides up to 50 percent savings in addition to the compression savings, through fine-grained object storage tiering.

Healthcare organizations are utilizing more data technology as they continue to collect more patient data for analytics and precision medicine. The amount of data being collected is complex and requires tools that can handle it.

“With a single, whole-genome human sample involving approximately 250-300GB of data, we are addressing a major challenge faced by Next Generation Sequencing practitioners,” Geneformics CTO and founder Arik Keshet said in a statement. “With Geneformics D, compressed genomics data requires one tenth of the storage space that would have otherwise been needed. Additionally, Geneformics D works seamlessly and does not require changes to data formats or APIs.”

Cloud data migration and data storage are large undertakings for any organization, but is especially cumbersome for organizations collecting and storing data from precision medicine and analytics.

Data storage is always a pain point for healthcare organizations as they debate over what data to keep on-premises and what data needs to move to the cloud. Storing too much data on-premises can be expensive and inflexible as storage needs ebb and flow. However, cloud can also be a struggle for some organizations if it’s not managed properly.

Compressing data is one of the most important steps when it comes to storage restrictions. However, simply compressing the data does not guarantee its accuracy, which is why lossless compression is such an important part of data storage and migration.

Compressing data without the guarantee that it is lossless can subtly, but seriously alter data.

Data that is as complex as genomic data can be seriously altered by compression that is not meant to handle data that exact or complex. Compressing a normal image file or document is much simpler than compressing genomic data, and incorrect compression can lead to serious misrepresentation of the data.

Healthcare organizations need to consider their storage and migration tools as they continue to collect and utilize patient data for precision medicine and analytics. The more data that is collected, the more important it becomes to compress, store, and migrate the data properly.