Welcome!

Microsoft Cloud Authors: Janakiram MSV, Yeshim Deniz, David H Deans, Andreas Grabner, Stackify Blog

Related Topics: @BigDataExpo, Microsoft Cloud, @CloudExpo, Apache

@BigDataExpo: Blog Post

Apache Drill’s Self-Service Capabilities By @MapR | @CloudExpo [#BigData]

Big Data is a jungle: rich with resources, abundant in growth, but also a bit overwhelming and easy to get lost in

Help Yourself: Leveraging Apache Drill's Self-Service Capabilities

Small data management solutions don't work in our brave new Big Data world. Back in the small data days, we talked proudly about having gigabytes of structured data that had been carefully denormalized to reduce latency as much as possible. Today's data is measured in petabytes, and it is dynamic, complex, and wildly varied in structure.

Small data was a nicely planned garden, but Big Data is a jungle: rich with resources, abundant in growth, but also a bit overwhelming and easy to get lost in.

Exploring that jungle requires solutions that enable interactive, self-service ways to work with historical as well as near real-time data. Hadoop and NoSQL on Hadoop solved a significant amount of Big Data access and availability problems. Add Apache Drill and SQL-on-Hadoop to the mix and you have a solution designed to enable easy analysis of complex data structures and datasets using the well-known SQL semantics.

If you want to blaze a path through the Big Data jungle, you want Apache Drill in your solution set.

Dig Deep with Drill
Apache Drill is a SQL query engine that works with numerous underlying data formats and sources. As a standalone query engine that supports multiple data sources, it works with the Hadoop and NoSQL database solutions that an organization may already have in place.

Apache Drill excels in demanding situations that require low latency performance, such as data exploration, data discovery, ad hoc business intelligence (BI) queries, and Day Zero analytics. It enables efficient analytics operations ranging from a fast overview of a specific dataset to an extended, explorative analysis of a very large data pool. Apache Drill supports interactive queries, rather than batch-oriented requests. It scales from a single laptop to a large cluster of servers easily.

And it's user-friendly. With minimal IT involvement, Apache Drill enables data to be queried in its native formats, including nested data, schema-less data and dynamic data. There is no need to explicitly define and maintain schemas; Drill can automatically leverage the structure embedded in the data. This enables self-service data exploration. Live data can be worked with upon its arrival with no need to prepare a schema and massage the data into a query-ready form. Analysts can change data sources on the fly without getting hung up waiting for DBA services to structure that newly requested data.

Analysts can also leverage their existing SQL skills and BI tools to directly query self-describing data and process complex data types. This closes the hole that had existed between the standard SQL and Big Data solutions built for efficient use of Big Data, such as Hadoop-based systems, and the need for SQL compatibility to access structured databases.

While we may quietly pride ourselves on the glorious Bigness of our Big Data, we all know that the data in itself is of little value. It's the knowledge that can be gained from it that is priceless. Apache Drill is an essential tool to knock down the wall that had kept businesses from fully harnessing the power of Big Data.

Wait, what about...?
You may be wondering why such a fuss is being made in business and technology circles right now about Apache Drill. After all, there are dozens of other proprietary and open source projects providing SQL or SQL-comparable features on Hadoop.

The problem is that many of these solutions were designed with a "backwardly compatible" mindset. The intent was to take technology designed for small data and engineer it to work in a Big Data world. Useful tools were developed, but it's now time to develop solutions that are designed specifically to support the new ways that we use data.

While Apache Drill was initially inspired by Google's Dremel project, it is now a vehicle that can be used to bring forward-looking technologies to Big Data. Apache Drill is the ideal interactive SQL engine for Hadoop, which rapidly continues to gain popularity, as Apache Drill fully supports Hadoop's (and HBase's) flexibility and agility. Apache Drill is the only SQL engine for Hadoop that doesn't demand schemas to be created and maintained or data to be transformed before it can be queried.

Validated and Approved
The open source community has greatly refined the original features of Google's Dremel, with enhanced capabilities including the extensibility of its architecture, overall agility, support for full SQL, optional schema handling, and its ability to handle nested data (such as JSON, Protobuf, Parquet).

The Apache Software Foundation announced in December 2014 that it has promoted Drill to a top-level project at Apache, where it joins other illustrious projects such as Apache Hadoop and httpd (the world's most popular Web server).

Drill's promotion to a top-level project demonstrates that Drill has a strong community of users and developers. Users can be confident that the project has proven itself and has a viable roadmap for its development. The community will continue to advance Apache Drill's key technologies and performance.

It's time to stop looking to the past for answers and begin driving the future. If you're ready to test-drive Drill, you can do so using the MapR Sandbox for Hadoop, which runs on PC, Mac and Linux platforms. MapR Technologies is the provider of the top-ranked distribution for Apache Hadoop.

You can also view a tutorial on analyzing real-world data using Drill.

More Stories By Nitin Bandugula

As a Sr. Product Marketing Manager at MapR, Nitin brings his engineering, business and management skills together to market technology products. At MapR, Nitin focuses on SQL, batch and in-memory frameworks and streaming technologies on Hadoop. Prior to MapR, Nitin worked for enterprise companies and startups in various roles including Engineering, Product Management and Management Consulting. Nitin holds a Masters degree in Computer Science from the Illinois Institute of Technology and an MBA from the Johnson School at Cornell University.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


@ThingsExpo Stories
SYS-CON Events announced today that Avere Systems, a leading provider of enterprise storage for the hybrid cloud, will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Avere delivers a more modern architectural approach to storage that doesn't require the overprovisioning of storage capacity to achieve performance, overspending on expensive storage media for inactive data or the overbui...
Coca-Cola’s Google powered digital signage system lays the groundwork for a more valuable connection between Coke and its customers. Digital signs pair software with high-resolution displays so that a message can be changed instantly based on what the operator wants to communicate or sell. In their Day 3 Keynote at 21st Cloud Expo, Greg Chambers, Global Group Director, Digital Innovation, Coca-Cola, and Vidya Nagarajan, a Senior Product Manager at Google, will discuss how from store operations...
SYS-CON Events announced today that IBM has been named “Diamond Sponsor” of SYS-CON's 21st Cloud Expo, which will take place on October 31 through November 2nd 2017 at the Santa Clara Convention Center in Santa Clara, California.
SYS-CON Events announced today that Ryobi Systems will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Ryobi Systems Co., Ltd., as an information service company, specialized in business support for local governments and medical industry. We are challenging to achive the precision farming with AI. For more information, visit http:...
High-velocity engineering teams are applying not only continuous delivery processes, but also lessons in experimentation from established leaders like Amazon, Netflix, and Facebook. These companies have made experimentation a foundation for their release processes, allowing them to try out major feature releases and redesigns within smaller groups before making them broadly available. In his session at 21st Cloud Expo, Brian Lucas, Senior Staff Engineer at Optimizely, will discuss how by using...
In this strange new world where more and more power is drawn from business technology, companies are effectively straddling two paths on the road to innovation and transformation into digital enterprises. The first path is the heritage trail – with “legacy” technology forming the background. Here, extant technologies are transformed by core IT teams to provide more API-driven approaches. Legacy systems can restrict companies that are transitioning into digital enterprises. To truly become a lead...
SYS-CON Events announced today that Daiya Industry will exhibit at the Japanese Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Ruby Development Inc. builds new services in short period of time and provides a continuous support of those services based on Ruby on Rails. For more information, please visit https://github.com/RubyDevInc.
SYS-CON Events announced today that CAST Software will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. CAST was founded more than 25 years ago to make the invisible visible. Built around the idea that even the best analytics on the market still leave blind spots for technical teams looking to deliver better software and prevent outages, CAST provides the software intelligence that matter ...
As businesses evolve, they need technology that is simple to help them succeed today and flexible enough to help them build for tomorrow. Chrome is fit for the workplace of the future — providing a secure, consistent user experience across a range of devices that can be used anywhere. In her session at 21st Cloud Expo, Vidya Nagarajan, a Senior Product Manager at Google, will take a look at various options as to how ChromeOS can be leveraged to interact with people on the devices, and formats th...
SYS-CON Events announced today that Yuasa System will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Yuasa System is introducing a multi-purpose endurance testing system for flexible displays, OLED devices, flexible substrates, flat cables, and films in smartphones, wearables, automobiles, and healthcare.
SYS-CON Events announced today that Dasher Technologies will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Dasher Technologies, Inc. ® is a premier IT solution provider that delivers expert technical resources along with trusted account executives to architect and deliver complete IT solutions and services to help our clients execute their goals, plans and objectives. Since 1999, we'v...
Enterprises have taken advantage of IoT to achieve important revenue and cost advantages. What is less apparent is how incumbent enterprises operating at scale have, following success with IoT, built analytic, operations management and software development capabilities – ranging from autonomous vehicles to manageable robotics installations. They have embraced these capabilities as if they were Silicon Valley startups. As a result, many firms employ new business models that place enormous impor...
SYS-CON Events announced today that Massive Networks, that helps your business operate seamlessly with fast, reliable, and secure internet and network solutions, has been named "Exhibitor" of SYS-CON's 21st International Cloud Expo ®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. As a premier telecommunications provider, Massive Networks is headquartered out of Louisville, Colorado. With years of experience under their belt, their team of...
SYS-CON Events announced today that Taica will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Taica manufacturers Alpha-GEL brand silicone components and materials, which maintain outstanding performance over a wide temperature range -40C to +200C. For more information, visit http://www.taica.co.jp/english/.
SYS-CON Events announced today that TidalScale, a leading provider of systems and services, will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. TidalScale has been involved in shaping the computing landscape. They've designed, developed and deployed some of the most important and successful systems and services in the history of the computing industry - internet, Ethernet, operating s...
SYS-CON Events announced today that MIRAI Inc. will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. MIRAI Inc. are IT consultants from the public sector whose mission is to solve social issues by technology and innovation and to create a meaningful future for people.
SYS-CON Events announced today that TidalScale will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. TidalScale is the leading provider of Software-Defined Servers that bring flexibility to modern data centers by right-sizing servers on the fly to fit any data set or workload. TidalScale’s award-winning inverse hypervisor technology combines multiple commodity servers (including their ass...
Join IBM November 1 at 21st Cloud Expo at the Santa Clara Convention Center in Santa Clara, CA, and learn how IBM Watson can bring cognitive services and AI to intelligent, unmanned systems. Cognitive analysis impacts today’s systems with unparalleled ability that were previously available only to manned, back-end operations. Thanks to cloud processing, IBM Watson can bring cognitive services and AI to intelligent, unmanned systems. Imagine a robot vacuum that becomes your personal assistant tha...
Widespread fragmentation is stalling the growth of the IIoT and making it difficult for partners to work together. The number of software platforms, apps, hardware and connectivity standards is creating paralysis among businesses that are afraid of being locked into a solution. EdgeX Foundry is unifying the community around a common IoT edge framework and an ecosystem of interoperable components.
As popularity of the smart home is growing and continues to go mainstream, technological factors play a greater role. The IoT protocol houses the interoperability battery consumption, security, and configuration of a smart home device, and it can be difficult for companies to choose the right kind for their product. For both DIY and professionally installed smart homes, developers need to consider each of these elements for their product to be successful in the market and current smart homes.