- Healthcare organizations collect massive amounts of unstructured healthcare data and many struggle to find tools that will allow that data to be used. Making unstructured data actionable with machine learning and artificial intelligence (AI) can help clinicians gain valuable insight into patients’ health.
Understanding and reconciling the two major types of data, structured and unstructured information, is one of the major challenges for healthcare providers.
Structured data is data stored within fixed confines, such as a file. Structured data is easier to analyze and store because it has straightforward boundaries and is created and stored in a standardized format.
Patient demographic information, diagnosis and procedure codes, medication codes, and certain other data from the electronic health record are typically generated in a standardized, structured way. Traditional data warehouses are usually equipped to handle structured data.
Unstructured data is a little more difficult. Unstructured data comes in many forms including but not limited to emails, audio files, videos, text documents, genome files and social media posts. Unstructured data is undefined and can’t be analyzed the same way as structured data, which is why it’s much harder for healthcare organizations to make unstructured data actionable.
Unstructured data sets are in the terabyte range and are expected to reach petabyte scales. There is more data being collected and used for patient care, according to Condusiv Technologies CEO Jim D’Arezzo
“Today, human whole genome sets are typically hundreds of gigabytes in size,” D’Arezzo told HITInfrastructure.com in a previous interview. “But, current figures indicate that the sequence data is now doubling every seven to nine months. In 2014 you had an estimated 228,000 genomes for sequence, but now the figure is estimated to be over 1.6 billion.”
Cloudera and MetiStream announced their joint project to help healthcare organizations mine unstructured data.
MetiStream, a healthcare analytics provider has built an end-to-end interactive analytics platform MetiStream Ember, on Cloudera’s machine learning platform.
The tool can give clinicians insights based on large volumes of handwritten notes and genomic data. MetiStream Ember is available on Microsoft Azure and Apache Spark and is currently being used by Rush University Medical Center.
"With Cloudera and MetiStream on Microsoft Azure, we can quickly spin up and down resources as our data processing needs change and evolve, and we can load huge volumes of data in days that would have taken weeks on premises,” Rush University Medical Center Chief Analytics Officer Dr. Bala Hota said in a statement.
“We have also been able to apply machine learning to discover new insights from our data, and by using Cloudera technologies, we are working to make development of new models easier and faster for our data scientists.”
About 80 percent of healthcare data is unstructured. Not only do organizations need tools to look back at the legacy data they have already stored, but they also have to deal with increasing amount of data being produced every day. With the addition of connected medical and Internet of Things (IoT) devices. Organizations are collecting unstructured data at an alarming rate.
"We believe that machine learning and analytics are powerful tools for understanding diseases, improving outcomes, containing costs and delivering better care where it's needed most," Cloudera Founder and Chief Strategy Officer Mike Olsen said in a statement.
"Today, healthcare organizations can do what was previously impossible,” he continued. “They can integrate complex data sets from EHR, genomics, and imaging with machine learning and analytics at massive scale for momentous transformations in patient care, engagement, and outcomes."
Healthcare organizations need to look into tools that will help them take advantage of the unstructured data they collect. Without introducing machine learning and AI into the process organizations can’t make use of the data being collected and use it to gain better insight into patient conditions.