Accelerating Your Journey to AIOps for IBM Z
by Per Kroll, Director IBM Z AIOps

Accelerating Your Journey to AIOps for IBM Z

The Challenges of the Digitized Enterprise

A digital transformation of your enterprise enables you to engage your customers in a more personalized, digital experience. Remember the days when you had to drive to a bank to get your account book updated, take out cash, or make payments? Now these activities are done through your mobile device, radically growing the number of updates done to your account, e.g. each time you buy a cup of coffee. You expect sub-second response time 24 by 7. Based on your interactions with a variety of bank services, you also expect offers of new services to be personalized based on your unique situation.

Let’s look at three trends related to digital transformations that put strains on your IT organization.

  • Hybrid applications are increasingly commonplace, and with 67 of the Fortune 100 companies running on IBM Z, many of these hybrid applications use IBM Z as a backend. Hybrid application are often more complex than single-platform applications, as they involve more moving parts. These parts are managed by more teams, often siloed by platforms, who need to be engaged to identify and address any issues.
  • Workloads for most clients are growing fast, and the workloads are getting spikier, creating huge challenges. We have experienced that more than ever in the current pandemic with government agencies seeing spikes as demand for their services are increasing, and financial services companies seeing spikes in their workloads as financial markets go array, and those are just a couple examples.
  • Managing the above complexity comes at a time when IT Budgets are especially challenging as we face economic downturns, and a skilled generation of operators and system programmers reach retirement age.  

In a recent survey by Digital Enterprise Journal, 68% of surveyed IT Organizations reported increased customer expectations of engagement and experience. Considering that most enterprise applications rely on IBM Z for their backend processing, these increased expectations raises the importance of effectively addressing the above challenges. To do so, organizations must adopt intelligent technologies to deliver quality service and stay ahead of their competition.

The Opportunity of AIOps

IBM Z is a hardware platform optimized for resiliency. One contributing reason for that is the wealth of operational data available to help you run your systems without a glitch. With an ever-reducing staff, how do you effectively sift through terabytes of data in real-time to identify an issue before it becomes an outage? AI is perfectly equipped to help you identify a potential issue, isolate the problem, do root cause analysis, and make quick and accurate decisions to resolve the issue.

Gartner defines AIOps as “AIOps combines big data and machine learning to automate IT operations processes, including event correlation, anomaly detection and causality determination.” AI constitutes the brain, and when combined with the right practices for collecting information, collaborating, driving process improvement, evaluating information, making decision and automation actions, customers are experiencing substantial and tangible results, with companies deploying AIOps for example reporting a 63% reduction in time to isolate a performance issue. Let’s have a look at how you can accelerate your journey to AIOps, with the right combination of best practices, tools and skills.

AIOps Capability Areas

For AIOps to be effective, you need to integrate a broad set of practices and capabilities. We have divided these practices and capabilities into three Capability Areas.

  • Inspect. The goal of the first Capability Area, Inspect, is to identify potential issues as soon as possible, ideally before they disrupt your business. To accomplish this, we need to major in three areas; Monitor our complete Infrastructure and end-to-end application performance, Generate alerts for incidents, and Apply analytics for early detection of anomalies.
  • Evaluate. The goal of the second Capability Area, Evaluate, is to rapidly isolate the problem, do root cause analysis, and determine the right actions to take. Numerous practices and technologies are used to reach this goal, including artificial intelligence to aid in the analysis and the decision making, and ChatOps to collaborate across frequently siloed teams or team members.
  • Act. The goal of the third Capability Area, Act, is to apply Intelligent Automation to enable us to respond rapidly to preempt disruptions. This includes automating runbooks, increasing the level of automation so our systems can reduce the need for manual intervention while taking self-correcting actions for more and more issues, and delivering an integrated orchestration and automation solution across our hybrid cloud infrastructure.

Accelerating the Journey to AIOps

Over the years, we have worked with hundreds of organizations to help them mature how they run their data center. To make the lessons learned from these client interactions more consumable, we have produced a framework that can be used as an aid to accelerate the journey to AIOps. This is a pragmatic framework, which is intended to incite a fact-based discussion around where you are and where it makes sense to go, based on your business drivers and your pain points. Let’s have a look.

We have divided the journey into 4 stages:

  • Firefighting. Companies finding themselves in firefighting mode attack problems as they happen. Teams work really hard to fix problems, finding that they are in a constant game of whack-a-mole. Just as you got rid of one problem, the next one pops up. There just seems to be no way of getting ahead of the game! The organization relies heavily on skilled individuals. The work is ad-hoc and no investments are done in best practices and tools to improve operational performance. As a result, customers experience frequent outages, mean time between failures (MTBF) is lower than desirable, leading to cost of downtime to go up. IBM Z is likely also managed as a silo, with little commonality with how operations are done in other parts of the company.
  • Reactive. Organizations moving from firefighting to reactive are investing in practices, skills and tools allowing them to identify problems and do root cause analysis faster. They work in a more structured manner, invest in process improvement (for example runbooks), and are increasing the level of automation for faster response times. These process improvements require some initial investments, but rapidly pays off through increased efficiency and repeatability in how to run operations, and more importantly, in improved SLAs as a result of improved resiliency. Still. a substantial amount of time is wasted in war rooms with limited efficiencies.
  • Proactive. Organizations moving from reactive to proactive are adopting practices that help them detect problems earlier and continuously improve automation to respond to new problems, before they have a negative business impact. They are also maturing their best practices to handle the complexity of hybrid applications. They realize that to optimize for hybrid applications, which is pretty much all applications nowadays, you can no longer see IBM Z as an isolated island from an operations perspective. They invest in a unified approach for how to manage their hybrid cloud infrastructure and they break down organizational barriers where possible. Incremental focus is put on site reliability engineering gains, aided by digital war rooms.
  • Intelligent. While artificial intelligence may have been present in the previous stage for specific narrow applications, we now find a more pervasive adoption of AI. We apply machine learning to identify non-trivial anomalies, find trends, forecast problems, and remediate them before they become a service disruption. We also continue to focus on continuous process improvements. By integrating practices and tool environments across Inspect, Evaluate and Act into one integrated solution, we rapidly respond to more and more issues before they impact our business. This includes applying AI to make digital war rooms smarter, reducing noise and shortening time to resolution.

Each stage is defined by a set of practices. The stages and the associated practices constitute a framework, and based on the business needs, priorities and pain points of a specific customer, you may choose a different order in which to adopt the practices. As an example, you may have adopted many, but not all recommended practices for the Reactive stage, but based on your needs and priorities, it may make more sense to invest in incremental practices for the Proactive stage, before investing in adopting all practices for the Reactive stage.

Putting the AIOps Framework Together

As organizations go through the 4 stages of AIOps; Firefighting, Reactive, Proactive and Intelligent, they improve in each of the three capability areas; Identify, Evaluate, and Act. The below graphic provides a summary view of how companies evolve in each of the capability areas as they go through their journey.

No alt text provided for this image

As mentioned in the above, this is not a prescriptive framework. A company may be at different stages of the AIOps journey for different capability areas and may choose a different order in which to improve for each capability area.

Conclusion and Next Steps

The journey to AIOps is incremental and each customer will take a slightly different path. Based on our work with many customers, we have captured a set of practices that can help accelerate that journey. By organizing these practices into a well-defined descriptive framework, we aid a meaningful, fact-based discussion that can help your organization assess where you are on this journey and determine a plan for where you want to go.

Check out my next blog on the Inspect capability area of this AIOps framework. Happy traveling!




Merci, Cela donne envie defaire

Like
Reply
Ganesh Mahadevan

IBM Technology Excellence Group Lead at Tata Consultancy Services

3y

Congratulations. It will help the infra team a lot.

Like
Reply
Tim Brooks

Sr. Product Manager @ IBM Z | AIOps for Z | Next Generation IT

3y

This is exciting stuff! piero proietti I know you showed interest in this space, curious to see what your thoughts are on this #AIOps framework for #IBMZ

Like
Reply
Andrew Bowker

Product @ IBM - Helping 100 people land PM jobs in tech. Follow me for PM career, product-building tips

3y

Great stuff! For those interested, Nathan Brice, Larry Strickland and I are hosting a webinar next week on a new offering that can help provide flexibility in how you approach this journey - https://event.webcasts.com/starthere.jsp?ei=1312633&tp_key=fd456d660d&sti=dk

Katherine Bazinet

Principal Data, AI and Automation Technical Specialist Manager, US National Market Northeast

3y

Well researched, well thought out and well written Per Kroll! I look forward to learning more about #aiops and how to apply it to #ibmz and #ibmcloud

Like
Reply

To view or add a comment, sign in

Insights from the community

Explore topics