AtScale Inc. has published the results of a new benchmark study of BI-on-Hadoop analytics engines. The study tested Hive, Impala, Presto and Spark SQL, and it found that each of the open source tools had its own “sweet spot.”
“There is no single ‘best engine,'” the study concluded. “Presto, Hive, Impala and Spark SQL were all able to effectively complete a range of queries on over 6 billion rows of data. The ‘winning’ engine for each of our benchmark queries was dependent on the query characteristics (join size, selectivity, group-bys).”
It added, “A successful BI-on-Hadoop architecture will likely require more than one SQL on Hadoop engine. Each engine has its strengths: Presto’s and Impala’s concurrency scaling support for quick metric queries, Spark SQL’s handling of large joins, Hive’s and Impala’s consistency across multiple query types. Enterprises might consider leveraging different engines for different query patterns.”