Storage News

Data Science Orgs Form GPU Initiative for Data Analytics

Continuum Analytics, H2O.ai, and MapD Technologies the GPU Open Analytics Initiative (GOAI) to establish common data frameworks to support data analytics

Source: Thinkstock

By Elizabeth O'Dowd

- Continuum Analytics, H2O.ai, and MapD Technologies announced the formation of the GPU Open Analytics Initiative (GOAI) to establish common data frameworks to support data analytics. The collaboration will also use graphics processing units (GPUs) across all major industries, including healthcare.

GOAI aims to assist the development of a standardized data science ecosystem on GPUs by allowing resident applications to interchange data efficiently.

Other founding members of GOAI include BlazingDB, Graphistry and Gunrock from UC Davis. All members will work together, contributing technical expertise to create the data framework.

GPUs are becoming increasingly popular for healthcare data analytics as machine learning workloads are growing too large, straining traditional central processing units (CPUs). The GOAI was formed to migrate workloads to GPUs and to establish a common standard so organizations can benefit from the power of end-to-end GPU computing.

A common GPU standard has the potential to enable intercommunication between data applications and enhance workflow by removing latency. It can also decrease the complexity of data flows between core analytical applications.

The GOIA announced an open source GPU Data Frame with a corresponding Python API as its first project. The GPU Data Frame is a common API that enables efficient interchange of data between processes running on the GPU.

End-to-end computation on the GPU avoids transfers back to the CPU or copying of in-memory data, according to the release. This reduces compute time and cost for high-performance analytics common in artificial intelligence workloads.

MapD Core database users can output SQL query results into the GPU Data Frame. The results are manipulated by the Continuum Analytics’ Anaconda NumPy-like Python API or used as input into the H2O suite of machine learning algorithms without additional data manipulation. The GOIA tested this approach and concluded that it produced faster processing times compared to passing data between applications on a CPU.

“The data science and analytics communities are rapidly adopting GPU computing for machine learning and deep learning. However, CPU-based systems still handle tasks like subsetting and preprocessing training data, which creates a significant bottleneck,” MapD Technologies CEO and Co-Founder Todd Mostak said in a statement.

“The GPU Data Frame makes it easy to run everything from ingestion to preprocessing to training and visualization directly on the GPU,” he continued. “This efficient data interchange will improve performance, encouraging development of ever more sophisticated GPU-based applications.”

Continuum Analytics Co-founder and Chief Data Scientist Travis Oliphant said that using Anaconda is mobilizing the Open Data Science movement by helping organizations avoid transfer processes between CPUs and GPUs.

Organizations are considering GPU for large-scale analytic projects because GPUs can handle bigger blocks of data than CPUs. The more powerful hardware allows organizations to gain near real-time access to analyzed data.

Healthcare organizations in particular will benefit from GPU-based analytics solutions because they offer superior processing power and yield faster results. Organizations looking to cut back on costs using analytics for more accurate diagnoses can potentially cut down the number of patient visits.

A GPU-based data analytics system will let clinicians receive analytic results during the initial appointment, rather than having the patient come back for a follow up appointment.

For example, Fuzzy Logix and Kinetica announced a partnership in May 2017 to release a joint solution using Fuzzy Logix’s high performance data analytics and Kinetica’s GPU-accelerated database. 

The new tool will extend Kinetica’s in-database analytic capabilities by hundreds of additional GPU accelerated machine learning and predictive analytics algorithms from Fuzzy Logix. The analytic functions will be able to utilize Kinetica’s distributed GPU pipeline through its User Defined Functions (UDFs).

Organizations need to have the processing power to quickly analyze and produce near real-time results to save money by eliminating the need for many patient follow up visits. Organizations cannot take full advantage of analytics without deploying the proper infrastructure technology to support the growing data demands.