Welcome!

Microsoft Cloud Authors: Pat Romanski, Andreas Grabner, Nick Basinger, Kevin Benedict, Liz McMillan

Blog Feed Post

Challenges of Monitoring, Tracing and Profiling your Applications runing in “The Cloud”

Cloud Computing presents unique opportunities to companies to reduce costs, outsource non-core functions and scale costs to match demand. However, the Cloud also presents a new level of complexity that makes ensuring application performance in the Cloud a unique challenge, in particular  with the many different usage and deployment scenarios available. Perhaps the most popular present scenario uses the Cloud to perform certain tasks where  additional computational power is unavailable in a local environment, e.g.: running large scale load-tests or processing large amounts of input data into something else. Another scenario which is becoming more attractive these days is to actually run applications in the Cloud.

The big question that circles around this second deployment scenario is whether to use a public or private cloud. The use of public cloud services raises many questions:

  • Is my data really safe with the hosting service provider?
  • How reliable is that service?
  • How can I do trouble shooting in case something happens?

No matter whether you deploy your application in a private or public cloud, Cloud computing requires a platform that can manage the dynamics of the application within this mostly-virtual, opaque environment. One of the biggest challenges presented by the dynamic nature of the Cloud is troubleshooting performance issues. There are currently no good approaches to quickly identify the root cause of application performance issues in the Cloud. Existing tools and solutions are limited in the way they capture information. Solving issues in Cloud Environments today involves inefficient manual effort from the most valuable resources of the application development team: The Architects and Engineers.

Looking at a Cloud Computing Platform

Cloud Computing Platforms - whether privately or publicly hosted - provide the ability to dynamically add additional resources as needed. This for example allows handling peak load on a hosted application to ensure that application end user response times stay within SLAs (Service Level Agreements). Cloud Computing, however, is not only about adding more virtual servers or resources to your virtual infrastructure. Cloud Computing Platforms offer Services to the hosted applications providing the base foundation on which to build scalable applications. These services include data storage, messaging, caching …

Cloud Services: Let’s take Data Storage as an example

Applications hosted in the Cloud can use Service Interfaces to access application-specific data. This data is stored “in the Cloud” and can be accessed by any component and any instance of the hosted application. The Data Services ensure reliable access, concurrency, backup, …
Instead of using interfaces like JDBC, the application uses the data storage interface like In-Memory-Data-Grid to query objects from the data store, add or manipulate data. Accessing the data via this interface enables the application to scale depending on the required bandwidth, concurrent users or amount of concurrent HTTP requests. With increasing load on the application, the Cloud Computing Platform can deploy additional virtual machines in order to handle the additional number of transactions. Additional deployed application instances work seamlessly against the same Data Service interfaces.

Integration with services that run “outside the Cloud”

Most often applications need to get access to resources other than those provided by the Cloud Services. These could be external services available on the internet - like a payment, search or mapping service accessed via Web Service or RESTful interfaces. It could also be accessing data from other applications that you run - most likely applications that you run on-premise, e.g.: your in-house CRM. In order for that to work the Cloud environment must allow outbound connections from any virtual instance.

The BIG PICTURE

Following illustration shows what an application architecture - hosted in a virtual cloud environment - could look like:

Running Applications in the Cloud

Running Applications in a Cloud Environment

On one side you have the end-users that work with the application. Depending on the load and on the response times the requests could be handled by 1, 2 or many more virtual instances that host the application. The application makes use of Cloud Services like Data Storage Services to persist and share data between the virtual instances. External or In-House services might be called via remoting technologies.

The Cloud is a complex environment that can dynamically change. Each request that is executed by an end-user can take different routes through the system and can affect other parts of the overall environment.

What is happening in my Cloud?

A pressing topic in Cloud Computing is monitoring, tracing and profiling. Ensuring SLAs (Service Level Agreements) to the end user can be done rather easily. In case application response times slow down – additional application instances are automatically deployed in order to handle additional load and to better distribute the load across more instances. The Cloud Platform takes care of it.
But is that the correct approach? Adding new virtual instances to handle additional load is fine. But what if your application actually has a performance problem? Adding new virtual instances of course solves the problem in the short run. But basically it is like taking more Advil when having a tooth ache –it actually doesn’t solve the root cause of the problem, which might be a cavity or a broken tooth

Root-Cause Analysis in the Cloud

In order to understand why the current deployment is not able to handle the current load it is necessary to look beyond end user response times and performance counters like CPU, Memory, I/O and Network Utilization.
Monitoring the services running on the cloud gives additional insight into where the time is spent and can also uncover problems in the application itself by identifying “improper” use of service interfaces. As with other architectural guidelines for “non-cloud” applications – it’s essential to be careful with the resources you have. In a traditional application you want to make sure to limit the number of roundtrips over remoting boundaries or to the database. You want to make sure that your SQL statements are well written and only return the data that you need.
The same rules apply for an application that runs in the Cloud and that accesses the Cloud Service Interfaces. The challenge until now was to monitor the activities of the application within the Cloud.

A big limitation is that it is not easily possible to remote debug through your code or to install a profiler on the virtual machines to really understand how the deployed application components communicate with other components or services.

The question that needs to be answered is
How can we get insight into the dynamics of a deployed Cloud Application?

Instead of answering this question I want you to read the following blog article: Proof of Concept: dynaTrace provides Cloud Service Monitoring and Root Cause Analysis for GigaSpaces

This blog explains how the questions raised in this blog could be answered for an application running in a GigaSpaces Cloud Environment with the use of dynaTrace.

Related posts:

  1. Proof of Concept: dynaTrace provides Cloud Service Monitoring and Root Cause Analysis for GigaSpaces In this blog - Challenges of Monitoring, Tracing and Profiling...
  2. Resource Leak Detection in .NET Applications I’ve recently been working on one of my ASP.NET Sample...
  3. Extending Visual Studio Unit Testing with Transactional Tracing In my previous blog entry I wrote about how to...

Read the original blog entry...

More Stories By Andreas Grabner

Andreas Grabner has been helping companies improve their application performance for 15+ years. He is a regular contributor within Web Performance and DevOps communities and a prolific speaker at user groups and conferences around the world. Reach him at @grabnerandi

IoT & Smart Cities Stories
The platform combines the strengths of Singtel's extensive, intelligent network capabilities with Microsoft's cloud expertise to create a unique solution that sets new standards for IoT applications," said Mr Diomedes Kastanis, Head of IoT at Singtel. "Our solution provides speed, transparency and flexibility, paving the way for a more pervasive use of IoT to accelerate enterprises' digitalisation efforts. AI-powered intelligent connectivity over Microsoft Azure will be the fastest connected pat...
There are many examples of disruption in consumer space – Uber disrupting the cab industry, Airbnb disrupting the hospitality industry and so on; but have you wondered who is disrupting support and operations? AISERA helps make businesses and customers successful by offering consumer-like user experience for support and operations. We have built the world’s first AI-driven IT / HR / Cloud / Customer Support and Operations solution.
Codete accelerates their clients growth through technological expertise and experience. Codite team works with organizations to meet the challenges that digitalization presents. Their clients include digital start-ups as well as established enterprises in the IT industry. To stay competitive in a highly innovative IT industry, strong R&D departments and bold spin-off initiatives is a must. Codete Data Science and Software Architects teams help corporate clients to stay up to date with the mod...
At CloudEXPO Silicon Valley, June 24-26, 2019, Digital Transformation (DX) is a major focus with expanded DevOpsSUMMIT and FinTechEXPO programs within the DXWorldEXPO agenda. Successful transformation requires a laser focus on being data-driven and on using all the tools available that enable transformation if they plan to survive over the long term. A total of 88% of Fortune 500 companies from a generation ago are now out of business. Only 12% still survive. Similar percentages are found throug...
Druva is the global leader in Cloud Data Protection and Management, delivering the industry's first data management-as-a-service solution that aggregates data from endpoints, servers and cloud applications and leverages the public cloud to offer a single pane of glass to enable data protection, governance and intelligence-dramatically increasing the availability and visibility of business critical information, while reducing the risk, cost and complexity of managing and protecting it. Druva's...
BMC has unmatched experience in IT management, supporting 92 of the Forbes Global 100, and earning recognition as an ITSM Gartner Magic Quadrant Leader for five years running. Our solutions offer speed, agility, and efficiency to tackle business challenges in the areas of service management, automation, operations, and the mainframe.
The Jevons Paradox suggests that when technological advances increase efficiency of a resource, it results in an overall increase in consumption. Writing on the increased use of coal as a result of technological improvements, 19th-century economist William Stanley Jevons found that these improvements led to the development of new ways to utilize coal. In his session at 19th Cloud Expo, Mark Thiele, Chief Strategy Officer for Apcera, compared the Jevons Paradox to modern-day enterprise IT, examin...
With 10 simultaneous tracks, keynotes, general sessions and targeted breakout classes, @CloudEXPO and DXWorldEXPO are two of the most important technology events of the year. Since its launch over eight years ago, @CloudEXPO and DXWorldEXPO have presented a rock star faculty as well as showcased hundreds of sponsors and exhibitors! In this blog post, we provide 7 tips on how, as part of our world-class faculty, you can deliver one of the most popular sessions at our events. But before reading...
DSR is a supplier of project management, consultancy services and IT solutions that increase effectiveness of a company's operations in the production sector. The company combines in-depth knowledge of international companies with expert knowledge utilising IT tools that support manufacturing and distribution processes. DSR ensures optimization and integration of internal processes which is necessary for companies to grow rapidly. The rapid growth is possible thanks, to specialized services an...
At CloudEXPO Silicon Valley, June 24-26, 2019, Digital Transformation (DX) is a major focus with expanded DevOpsSUMMIT and FinTechEXPO programs within the DXWorldEXPO agenda. Successful transformation requires a laser focus on being data-driven and on using all the tools available that enable transformation if they plan to survive over the long term. A total of 88% of Fortune 500 companies from a generation ago are now out of business. Only 12% still survive. Similar percentages are found throug...