|By Elad Israeli||
|October 7, 2010 10:25 AM EDT||
In recent times, one of the most popular subjects related to the field of Business Intelligence (BI) has been In-memory BI technology. The subject gained popularity largely due to the success of QlikTech, provider of the in-memory-based QlikView BI product. Following QlikTech’s lead, many other BI vendors have jumped on the in-memory “hype wagon,” including the software giant, Microsoft, which has been aggressively marketing PowerPivot, their own in-memory database engine.
The increasing hype surrounding in-memory BI has caused BI consultants, analysts and even vendors to spew out endless articles, blog posts and white papers on the subject, many of which have also gone the extra mile to describe in-memory technology as the future of business intelligence, the death blow to the data warehouse and the swan song of OLAP technology. I find one of these in my inbox every couple of weeks.
Just so it is clear - the concept of in-memory business intelligence is not new. It has been around for many years. The only reason it became widely known recently is because it wasn’t feasible before 64-bit computing became commonly available. Before 64-bit processors, the maximum amount of RAM a computer could utilize was barely 4GB, which is hardly enough to accommodate even the simplest of multi-user BI solutions. Only when 64-bit systems became cheap enough did it became possible to consider in-memory technology as a practical option for BI.
The success of QlikTech and the relentless activities of Microsoft’s marketing machine have managed to confuse many in terms of what role in-memory technology plays in BI implementations. And that is why many of the articles out there, which are written by marketers or market analysts who are not proficient in the internal workings of database technology (and assume their readers aren’t either), are usually filled with inaccuracies and, in many cases, pure nonsense.
The purpose of this article is to put both in-memory and disk-based BI technologies in perspective, explain the differences between them and finally lay out, in simple terms, why disk-based BI technology isn’t on its way to extinction. Rather, disk-based BI technology is evolving into something that will significantly limit the use of in-memory technology in typical BI implementations.
But before we get to that, for the sake of those who are not very familiar with in-memory BI technology, here’s a brief introduction to the topic.
Disk and RAM
Generally speaking, your computer has two types of data storage mechanisms – disk (often called a hard disk) and RAM (random access memory). The important differences between them (for this discussion) are outlined in the following table:
Most modern computers have 15-100 times more available disk storage than they do RAM. My laptop, for example, has 8GB of RAM and 300GB of available disk space. However, reading data from disk is much slower than reading the same data from RAM. This is one of the reasons why 1GB of RAM costs approximately 320 times that of 1GB of disk space.
Another important distinction is what happens to the data when the computer is powered down: data stored on disk is unaffected (which is why your saved documents are still there the next time you turn on your computer), but data residing in RAM is instantly lost. So, while you don’t have to re-create your disk-stored Microsoft Word documents after a reboot, you do have to re-load the operating system, re-launch the word processor and reload your document. This is because applications and their internal data are partly, if not entirely, stored in RAM while they are running.
Disk-based Databases and In-memory Databases
Now that we have a general idea of what the basic differences between disk and RAM are, what are the differences between disk-based and in-memory databases? Well, all data is always kept on hard disks (so that they are saved even when the power goes down). When we talk about whether a database is disk-based or in-memory, we are talking about where the data resides while it is actively being queried by an application: with disk-based databases, the data is queried while stored on disk and with in-memory databases, the data being queried is first loaded into RAM.
Disk-based databases are engineered to efficiently query data residing on the hard drive. At a very basic level, these databases assume that the entire data cannot fit inside the relatively small amount of RAM available and therefore must have very efficient disk reads in order for queries to be returned within a reasonable time frame. The engineers of such databases have the benefit of unlimited storage, but must face the challenges of relying on relatively slow disk operations.
On the other hand, in-memory databases work under the opposite assumption that the data can, in fact, fit entirely inside the RAM. The engineers of in-memory databases benefit from utilizing the fastest storage system a computer has (RAM), but have much less of it at their disposal.
That is the fundamental trade-off in disk-based and in-memory technologies: faster reads and limited amounts of data versus slower reads and practically unlimited amounts of data. These are two critical considerations for business intelligence applications, as it is important both to have fast query response times and to have access to as much data as possible.
The Data Challenge
A business intelligence solution (almost) always has a single data store at its center. This data store is usually called a database, data warehouse, data mart or OLAP cube. This is where the data that can be queried by the BI application is stored.
The challenges in creating this data store using traditional disk-based technologies is what gave in-memory technology its 15 minutes (ok, maybe 30 minutes) of fame. Having the entire data model stored inside RAM allowed bypassing some of the challenges encountered by their disk-based counterparts, namely the issue of query response times or ‘slow queries.’
When saying ‘traditional disk-based’ technologies, we typically mean relational database management systems (RDBMS) such as SQL Server, Oracle, MySQL and many others. It’s true that having a BI solution perform well using these types of databases as their backbone is far more challenging than simply shoving the entire data model into RAM, where performance gains would be immediate due to the fact RAM is so much faster than disk.
It’s commonly thought that relational databases are too slow for BI queries over data in (or close to) its raw form due to the fact they are disk-based. The truth is, however, that it’s because of how they use the disk and how often they use it.
Relational databases were designed with transactional processing in mind. But having a database be able to support high-performance insertions and updates of transactions (i.e., rows in a table) as well as properly accommodating the types of queries typically executed in BI solutions (e.g., aggregating, grouping, joining) is impossible. These are two mutually-exclusive engineering goals, that is to say they require completely different architectures at the very core. You simply can’t use the same approach to ideally achieve both.
In addition, the standard query language used to extract transactions from relational databases (SQL) is syntactically designed for the efficient fetching of rows, while rare are the cases in BI where you would need to scan or retrieve an entire row of data. It is nearly impossible to formulate an efficient BI query using SQL syntax.
So while relational databases are great as the backbone of operational applications such as CRM, ERP or Web sites, where transactions are frequently and simultaneously inserted, they are a poor choice for supporting analytic applications which usually involve simultaneous retrieval of partial rows along with heavy calculations.
In-memory databases approach the querying problem by loading the entire dataset into RAM. In so doing, they remove the need to access the disk to run queries, thus gaining an immediate and substantial performance advantage (simply because scanning data in RAM is orders of magnitude faster than reading it from disk). Some of these databases introduce additional optimizations which further improve performance. Most of them also employ compression techniques to represent even more data in the same amount of RAM.
Regardless of what fancy footwork is used with an in-memory database, storing the entire dataset in RAM has a serious implication: the amount of data you can query with in-memory technology is limited by the amount of free RAM available, and there will always be much less available RAM than available disk space.
The bottom line is that this limited memory space means that the quality and effectiveness of your BI application will be hindered: the more historical data to which you have access and/or the more fields you can query, the better analysis, insight and, well, intelligence you can get.
You could add more and more RAM, but then the hardware you require becomes exponentially more expensive. The fact that 64-bit computers are cheap and can theoretically support unlimited amounts of RAM does not mean they actually do in practice. A standard desktop-class (read: cheap) computer with standard hardware physically supports up to 12GB of RAM today. If you need more, you can move on to a different class of computer which costs about twice as much and will allow you up to 64GB. Beyond 64GB, you can no longer use what is categorized as a personal computer but will require a full-blown server which brings you into very expensive computing territory.
It is also important to understand that the amount of RAM you need is not only affected by the amount of data you have, but also by the number of people simultaneously querying it. Having 5-10 people using the same in-memory BI application could easily double the amount of RAM required for intermediate calculations that need to be performed to generate the query results. A key success factor in most BI solutions is having a large number of users, so you need to tread carefully when considering in-memory technology for real-world BI. Otherwise, your hardware costs may spiral beyond what you are willing or able to spend (today, or in the future as your needs increase).
There are other implications to having your data model stored in memory, such as having to re-load it from disk to RAM every time the computer reboots and not being able to use the computer for anything other than the particular data model you’re using because its RAM is all used up.
A Note about QlikView and PowerPivot In-memory Technologies
QlikTech is the most active in-memory BI player out there so their QlikView in-memory technology is worth addressing in its own right. It has been repeatedly described as “unique, patented associative technology” but, in fact, there is nothing “associative” about QlikView’s in-memory technology. QlikView uses a simple tabular data model, stored entirely in-memory, with basic token-based compression applied to it. In QlikView’s case, the word associative relates to the functionality of its user interface, not how the data model is physically stored. Associative databases are a completely different beast and have nothing in common with QlikView’s technology.
PowerPivot uses a similar concept, but is engineered somewhat differently due to the fact it’s meant to be used largely within Excel. In this respect, PowerPivot relies on a columnar approach to storage that is better suited for the types of calculations conducted in Excel 2010, as well as for compression. Quality of compression is a significant differentiator between in-memory technologies as better compression means that you can store more data in the same amount RAM (i.e., more data is available for users to query). In its current version, however, PowerPivot is still very limited in the amounts of data it supports and requires a ridiculous amount of RAM.
The Present and Future Technologies
The destiny of BI lies in technologies that leverage the respective benefits of both disk-based and in-memory technologies to deliver fast query responses and extensive multi-user access without monstrous hardware requirements. Obviously, these technologies cannot be based on relational databases, but they must also not be designed to assume a massive amount of RAM, which is a very scarce resource.
These types of technologies are not theoretical anymore and are already utilized by businesses worldwide. Some are designed to distribute different portions of complex queries across multiple cheaper computers (this is a good option for cloud-based BI systems) and some are designed to take advantage of 21st-century hardware (multi-core architectures, upgraded CPU cache sizes, etc.) to extract more juice from off-the-shelf computers.
A Final Note: ElastiCube Technology
The technology developed by the company I co-founded, SiSense, belongs to the latter category. That is, SiSense utilizes technology which combines the best of disk-based and in-memory solutions, essentially eliminating the downsides of each. SiSense’s BI product, Prism, enables a standard PC to deliver a much wider variety of BI solutions, even when very large amounts of data, large numbers of users and/or large numbers of data sources are involved, as is the case in typical BI projects.
When we began our research at SiSense, our technological assumption was that it is possible to achieve in-memory-class query response times, even for hundreds of users simultaneously accessing massive data sets, while keeping the data (mostly) stored on disk. The result of our hybrid disk-based/in-memory technology is a BI solution based on what we now call ElastiCube, after which this blog is named. You can read more about this technological approach, which we call Just-in-Time In-memory Processing, at our BI Software Evolved technology page.
WebRTC is about the data channel as much as about video and audio conferencing. However, basically all commercial WebRTC applications have been built with a focus on audio and video. The handling of “data” has been limited to text chat and file download – all other data sharing seems to end with screensharing. What is holding back a more intensive use of peer-to-peer data? In her session at @ThingsExpo, Dr Silvia Pfeiffer, WebRTC Applications Team Lead at National ICT Australia, will look at different existing uses of peer-to-peer data sharing and how it can become useful in a live session to...
Oct. 7, 2015 05:00 PM EDT Reads: 552
NHK, Japan Broadcasting, will feature the upcoming @ThingsExpo Silicon Valley in a special 'Internet of Things' and smart technology documentary that will be filmed on the expo floor between November 3 to 5, 2015, in Santa Clara. NHK is the sole public TV network in Japan equivalent to the BBC in the UK and the largest in Asia with many award-winning science and technology programs. Japanese TV is producing a documentary about IoT and Smart technology and will be covering @ThingsExpo Silicon Valley. The program, to be aired during the peak viewership season of the year, will have a major impac...
Oct. 7, 2015 04:45 PM EDT Reads: 185
The buzz continues for cloud, data analytics and the Internet of Things (IoT) and their collective impact across all industries. But a new conversation is emerging - how do companies use industry disruption and technology enablers to lead in markets undergoing change, uncertainty and ambiguity? Organizations of all sizes need to evolve and transform, often under massive pressure, as industry lines blur and merge and traditional business models are assaulted and turned upside down. In this new data-driven world, marketplaces reign supreme while interoperability, APIs and applications deliver un...
Oct. 7, 2015 04:13 PM EDT
Through WebRTC, audio and video communications are being embedded more easily than ever into applications, helping carriers, enterprises and independent software vendors deliver greater functionality to their end users. With today’s business world increasingly focused on outcomes, users’ growing calls for ease of use, and businesses craving smarter, tighter integration, what’s the next step in delivering a richer, more immersive experience? That richer, more fully integrated experience comes about through a Communications Platform as a Service which allows for messaging, screen sharing, video...
Oct. 7, 2015 04:00 PM EDT Reads: 1,063
Developing software for the Internet of Things (IoT) comes with its own set of challenges. Security, privacy, and unified standards are a few key issues. In addition, each IoT product is comprised of at least three separate application components: the software embedded in the device, the backend big-data service, and the mobile application for the end user's controls. Each component is developed by a different team, using different technologies and practices, and deployed to a different stack/target - this makes the integration of these separate pipelines and the coordination of software upd...
Oct. 7, 2015 04:00 PM EDT Reads: 201
Internet of Things (IoT) will be a hybrid ecosystem of diverse devices and sensors collaborating with operational and enterprise systems to create the next big application. In their session at @ThingsExpo, Bramh Gupta, founder and CEO of robomq.io, and Fred Yatzeck, principal architect leading product development at robomq.io, discussed how choosing the right middleware and integration strategy from the get-go will enable IoT solution developers to adapt and grow with the industry, while at the same time reduce Time to Market (TTM) by using plug and play capabilities offered by a robust IoT ...
Oct. 7, 2015 04:00 PM EDT Reads: 2,131
Can call centers hang up the phones for good? Intuitive Solutions did. WebRTC enabled this contact center provider to eliminate antiquated telephony and desktop phone infrastructure with a pure web-based solution, allowing them to expand beyond brick-and-mortar confines to a home-based agent model. It also ensured scalability and better service for customers, including MUY! Companies, one of the country's largest franchise restaurant companies with 232 Pizza Hut locations. This is one example of WebRTC adoption today, but the potential is limitless when powered by IoT.
Oct. 7, 2015 03:30 PM EDT Reads: 7,417
SYS-CON Events announced today that Luxoft Holding, Inc., a leading provider of software development services and innovative IT solutions, has been named “Bronze Sponsor” of SYS-CON's @ThingsExpo, which will take place on November 3–5, 2015, at the Santa Clara Convention Center in Santa Clara, CA. Luxoft’s software development services consist of core and mission-critical custom software development and support, product engineering and testing, and technology consulting.
Oct. 7, 2015 03:15 PM EDT Reads: 607
“In the past year we've seen a lot of stabilization of WebRTC. You can now use it in production with a far greater degree of certainty. A lot of the real developments in the past year have been in things like the data channel, which will enable a whole new type of application," explained Peter Dunkley, Technical Director at Acision, in this SYS-CON.tv interview at @ThingsExpo, held Nov 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA.
Oct. 7, 2015 03:00 PM EDT Reads: 6,902
"Matrix is an ambitious open standard and implementation that's set up to break down the fragmentation problems that exist in IP messaging and VoIP communication," explained John Woolf, Technical Evangelist at Matrix, in this SYS-CON.tv interview at @ThingsExpo, held Nov 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA.
Oct. 7, 2015 02:00 PM EDT Reads: 5,848
You have your devices and your data, but what about the rest of your Internet of Things story? Two popular classes of technologies that nicely handle the Big Data analytics for Internet of Things are Apache Hadoop and NoSQL. Hadoop is designed for parallelizing analytical work across many servers and is ideal for the massive data volumes you create with IoT devices. NoSQL databases such as Apache HBase are ideal for storing and retrieving IoT data as “time series data.”
Oct. 7, 2015 01:45 PM EDT Reads: 479
There are so many tools and techniques for data analytics that even for a data scientist the choices, possible systems, and even the types of data can be daunting. In his session at @ThingsExpo, Chris Harrold, Global CTO for Big Data Solutions for EMC Corporation, will show how to perform a simple, but meaningful analysis of social sentiment data using freely available tools that take only minutes to download and install. Participants will get the download information, scripts, and complete end-to-end walkthrough of the analysis from start to finish. Participants will also be given the pract...
Oct. 7, 2015 01:45 PM EDT Reads: 108
Clearly the way forward is to move to cloud be it bare metal, VMs or containers. One aspect of the current public clouds that is slowing this cloud migration is cloud lock-in. Every cloud vendor is trying to make it very difficult to move out once a customer has chosen their cloud. In his session at 17th Cloud Expo, Naveen Nimmu, CEO of Clouber, Inc., will advocate that making the inter-cloud migration as simple as changing airlines would help the entire industry to quickly adopt the cloud without worrying about any lock-in fears. In fact by having standard APIs for IaaS would help PaaS expl...
Oct. 7, 2015 01:30 PM EDT Reads: 608
SYS-CON Events announced today that ProfitBricks, the provider of painless cloud infrastructure, will exhibit at SYS-CON's 17th International Cloud Expo®, which will take place on November 3–5, 2015, at the Santa Clara Convention Center in Santa Clara, CA. ProfitBricks is the IaaS provider that offers a painless cloud experience for all IT users, with no learning curve. ProfitBricks boasts flexible cloud servers and networking, an integrated Data Center Designer tool for visual control over the cloud and the best price/performance value available. ProfitBricks was named one of the coolest Clo...
Oct. 7, 2015 01:00 PM EDT Reads: 748
Organizations already struggle with the simple collection of data resulting from the proliferation of IoT, lacking the right infrastructure to manage it. They can't only rely on the cloud to collect and utilize this data because many applications still require dedicated infrastructure for security, redundancy, performance, etc. In his session at 17th Cloud Expo, Emil Sayegh, CEO of Codero Hosting, will discuss how in order to resolve the inherent issues, companies need to combine dedicated and cloud solutions through hybrid hosting – a sustainable solution for the data required to manage I...
Oct. 7, 2015 01:00 PM EDT Reads: 454
Mobile messaging has been a popular communication channel for more than 20 years. Finnish engineer Matti Makkonen invented the idea for SMS (Short Message Service) in 1984, making his vision a reality on December 3, 1992 by sending the first message ("Happy Christmas") from a PC to a cell phone. Since then, the technology has evolved immensely, from both a technology standpoint, and in our everyday uses for it. Originally used for person-to-person (P2P) communication, i.e., Sally sends a text message to Betty – mobile messaging now offers tremendous value to businesses for customer and empl...
Oct. 7, 2015 12:15 PM EDT Reads: 193
SYS-CON Events announced today that IBM Cloud Data Services has been named “Bronze Sponsor” of SYS-CON's 17th Cloud Expo, which will take place on November 3–5, 2015, at the Santa Clara Convention Center in Santa Clara, CA. IBM Cloud Data Services offers a portfolio of integrated, best-of-breed cloud data services for developers focused on mobile computing and analytics use cases.
Oct. 7, 2015 12:00 PM EDT Reads: 673
Scott Guthrie's keynote presentation "Journey to the intelligent cloud" is a must view video. This is from AzureCon 2015, September 29, 2015 I have reproduced some screen shots in case you are unable to view this long video for one reason or another. One of the highlights is 3 datacenters coming on line in India.
Oct. 7, 2015 12:00 PM EDT Reads: 256
Nowadays, a large number of sensors and devices are connected to the network. Leading-edge IoT technologies integrate various types of sensor data to create a new value for several business decision scenarios. The transparent cloud is a model of a new IoT emergence service platform. Many service providers store and access various types of sensor data in order to create and find out new business values by integrating such data.
Oct. 7, 2015 12:00 PM EDT Reads: 468
Apps and devices shouldn't stop working when there's limited or no network connectivity. Learn how to bring data stored in a cloud database to the edge of the network (and back again) whenever an Internet connection is available. In his session at 17th Cloud Expo, Bradley Holt, Developer Advocate at IBM Cloud Data Services, will demonstrate techniques for replicating cloud databases with devices in order to build offline-first mobile or Internet of Things (IoT) apps that can provide a better, faster user experience, both offline and online. The focus of this talk will be on IBM Cloudant, Apa...
Oct. 7, 2015 11:45 AM EDT Reads: 491