Apache Spark, Cloudera Search, Impala — Which is best for Analytics?

This blog was penned by Justin Langseth, the CEO of Zoomdata

As the leading Big Data Visualization, Exploration and Analytics platform, Zoomdata has been designed to take advantage of the many advanced features in next generation data stores such as Cloudera Impala, Cloudera Search and Apache Spark. In addition, rather than moving data from these data stores into a proprietary data environment for reporting, Zoomdata’s executes it queries directly in these data stores. As a result, Zoomdata’s customers are able to analyze terabytes and petabytes of in a matter of seconds.

1. Direct Analysis of Raw HDFS Data

There is a lot to be said for analyzing raw data directly. Simply store it in HDFS, Hadoop’s Distributed File System, and sort out the structure later. However, the downside of this “schema-on-read” flexibility is latency. Using MapReduce, queries can take many minutes to hours depending on the query and size of the dataset. While there have been efforts to speed up MapReduce, its design means it will always suffer from latency compared to new query frameworks.

2. Analytic SQL with Impala

Impala is the open source, analytic database that runs natively in Hadoop. It provides business intelligence and data discovery solutions for analysts and business users alike with the fastest querying times.

Zoomdata integrated with Impala early on, and the results were dramatic. The video below shows Zoomdata using the micro-query sharpening approach on Impala to analyze a billion rows of sales transactions data nearly instantly. Not only is Impala much faster than direct analysis of HDFS raw data, we believe it is the leading way to store big data while retaining the ability to analyze it very quickly – especially leveraging the standard columnar storage format, Parquet.

3. Full-Text Search with Apache Solr (Cloudera Search)

One of the strengths of Hadoop is that it can store full fidelity data of any type, be it structure, semi-structured, or unstructured. Open standards like Apache Solr, which powers Cloudera Search, make it easy to search the semi-structured and unstructured data. This full-text search engine also opens up the data in Hadoop to any user who simply knows how to “Google.” By indexing data into Cloudera Search, all data (regardless of structure) can be analyzed, but with the ability to do free-text search and leverage facets.

The video below shows Zoomdata leveraging a Cloudera Search index of TripAdvisor hotel reviews, allowing for structured (graphs), semi-structured (facets), and unstructured (search) analysis all at once.

4. In-Memory Analysis with Apache Spark

Apache Spark seems to be everywhere today. It’s a great way to process data and it’s also a good place to do light data preparation work and test machine learning algorithms. If the dataset is small and can fit in-memory, Spark is blazingly fast. The video below shows Zoomdata operating on sales transaction data, directly in Spark:

Why Choose?

All four approaches (raw HDFS, Impala, Search, and Spark) have their place and are good for different use cases. As a partner in Cloudera’s Accelerator program, Zoomdata is certified in Cloudera 5. Through the Cloudera Accelerator program, Zoomdata is working with the Cloudera team to highlight the use cases that take advantage of Impala and Spark technologies. From a pharmaceutical company being able to explore billions of rows of data on an iPad in seconds to an adtech company being able to visualize the location of millions of mobile phones on a map, Cloudera and Zoomdata are helping businesses take advantage of big data. The Zoomdata team is also working closely with Cloudera to maximize the impact of analytics across all of the Cloudera engines, to build bridges between them, and most importantly to make analysis using Impala and Spark fast, fun, and easy for business users so they don’t need to worry about what is happening under the covers.

Apache Spark, Cloudera Search, Impala — Which is best for Analytics?

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112