How important is AIOps to the future of IT Operations? The 2022 Gartner Market Guide for AIOps Platforms puts it succinctly: “There is no future of IT operations that does not include AIOps. This is due to the rapid growth in data volumes and pace of change (exemplified by rate of application delivery and event-driven business models) that cannot wait on humans to derive insights.”*
Gartner shares that, “Over the past 12 months, AIOps formed part of the conversation in 40% of all inquiries with Gartner clients on IT performance analysis.”*
What’s driving this growth? The report breaks it down into “three separate but ultimately related areas:
- Digital business transformation
- Transitioning from a reactive posture to a proactive approach
- The need to make digital business observable”*
At BMC, our customers are increasingly interested in how AIOps can address their growing complexity and volume of data, which are quickly outpacing humans’ ability to manage manually. As Gartner states, “It is simply impossible for humans to make sense of thousands of events per second being generated by their IT systems.”
If your organization is ready to move forward, or if you just want the tl;dr version, here are my top three takeaways and action items for businesses, gleaned from the report:
- Focus on tangible outcomes with quantitative proof points.
- AIOps is all about productivity, such as enhanced workflows and improved staff efficiency.
- It’s more than just monitoring. “Leverage AIOps platforms for scenarios like adaptive anomaly detection or system-centric anomaly detection.”*
If you’re just beginning your AIOps journey, read on for key insights from the report and what to look for.
(And for anyone new to the concept, check out our introduction to AIOps.)
AIOps definition and characteristics
Gartner provides a straightforward market definition: “AIOps platforms analyze telemetry and events, and identify meaningful patterns that provide insights to support proactive responses.”*
Gartner defines AIOps platforms as having five characteristics:
- “Cross-domain data ingestion and analytics
- Topology assembly from implicit and explicit sources of asset relationship and dependency
- Correlation between related and redundant events associated with an incident
- Pattern recognition to detect incidents, their leading indicators or probable root cause
- Association of probable remediation”*
It’s our belief that cross-domain data ingestion underlines the importance of being able to consume large volumes of diverse data sets and apply ML and analytics. Today, the data from many point tools is siloed and not available to complementary tools to help solve problems.
Based on the characteristics, another key requirement is the ability to engage other IT disciplines and act on the rich, impactful insights that deliver value to the business. While monitoring and observability are the essential foundation of a successful AIOps strategy, the true game-changing value comes from engaging and acting on the data. Finding a problem is great; fixing it is the endgame.
AIOps and IT Service Management
Gartner continues its guidance that IT Service Management (ITSM) integration is an important part of an AIOps strategy and is one of three key tenets: Observe (Monitor), Engage (ITSM), and Act (Automation). The latest report observes that “AIOps platforms enhance a broad range of IT practices, including I&O, DevOps, SRE, security and service management.”* The application of AI to service management is known as AISM and, unlike traditional ITSM, opens the door to proactive prevention, faster MTTR, rapid innovation, and a greatly improved employee and customer experience.
As the IT disciplines of ITSM and IT operations management (ITOM) overlap more and more (we refer to this as ServiceOps), ML and analytics can be an essential enabler of that convergence. With a holistic AIOps strategy that observes, engages, and acts, effective integrated use cases can be implemented across ITOM and ITSM for automated event remediation, incident and change management, and intelligent ticketing and routing.
We believe ServiceOps is critical to true proactive service resolution: the ability to discover, monitor, service, and remediate events as they occur. Proactive service resolution is one of a number of capabilities enabled by the BMC Helix Platform, which is a unified, open platform that connects service and operations teams and provides visibility across BMC Helix and third-party solutions.
Deriving actionable insights from ML and data analytics that support intelligent automation will deliver real value to ITOps teams. Successful execution will require robust integrations to orchestration tools as well as the CMDB for service impact mapping. The visibility, intelligence, speed, and insights that AIOps brings can revolutionize these latter stages of monitoring and drive significant benefits.
Requirements of AIOps software
Based on the five characteristics defined in the report and outlined above, an AIOps platform should be able to support the full spectrum of Observe, Engage, and Act through:
- Cross-domain discovery, data ingestion, and analytics. All organizations are operating in highly complex environments and need to be able to discover and assemble a unified topology and ingest data, events, and metrics from many different sources.
- Machine learning and analytics. These approaches are used for event correlation, pattern discovery and prediction, anomaly detection, and root cause isolation. The most important aspect of this is how ML and analytics support required use cases.
- Remediation. Leverage prescriptive advice to take automated action to resolve events and bridge ITOM and ITSM processes.
The way forward for ITOps teams is to leverage a single AIOps solution that:
- Unifies data from many sources into a single view
- Identifies hardware, software, and service dependencies across multi-cloud, hybrid, and on-premises environments with dynamic service modeling
- Automates event correlation to intelligently reduce event noise
- Enables pattern identification, analysis, and contextualization
- Delivers actionable ways of applying ML and analytics to event management challenges
In this age of private and public cloud and hybrid infrastructures, digital initiatives, and rapidly changing technology landscapes, the ITOps function is integral to IT’s ability to support the business. The most successful ITOps teams will be able to leverage new AI capabilities in a strategic and tactical way to drive efficiencies, cost reduction, and speed throughout IT processes.
AIOps use cases
Based on shifting priorities revealed during the pandemic, Gartner now advises that organizations should “focus on tangible and incremental business outcomes with quantitative value-based proof points, leverage AIOps platforms for scenarios like adaptive anomaly detection or system-centric anomaly detection,” and “create an operations model to provide metadata and insights as a service to different departments such as finance, sales and marketing.”*
Using ML and analytics to identify patterns can help predict events and automate event resolution. Once these essential patterns are identified, AIOps use cases can be prioritized based on business needs, including:
- Dynamic (instead of static) threshold-based data to cut down on event noise and surface the most important events
- Anomaly detection to predictively alert on potential events and triage them before they impact the business
- Event correlation and log analytics to quickly perform root cause analysis and reduce mean-time-to-repair (MTTR)
- Orchestrated workflows for automated event remediation of commonly recurring events, linked to ITSM for incident and change management
A top-down AIOps framework
According to Gartner, “AIOps lends itself to use cases spanning the hierarchy from the IT operator up to a line of business (LOB) owner or even a CIO.”* In practice, these platforms only provide event correlation capabilities as an out-of-the-box use case, making them initially relevant for IT operators. Platform users are tasked with creating outcomes relevant to other roles, such as I&O leaders, system administrators, architects and LOB owners.” *
“Gartner recommends starting by creating a roadmap with an end-goal objective to be achieved through the use of AIOps platforms. For example, within a monitoring strategy, determine how AIOps can transform data for relevance to the target persona and how it helps address the purpose for the respective persona. Follow this by mapping out the steps leading up to the objective, starting with the current state of visibility within IT operations (e.g., noisy events, static-threshold-based alerts or leveraging dynamic thresholds).”*
“Select the AIOps platform best-suited to deliver out-of-the-box capabilities for the first step on the roadmap. The selected vendor should have capabilities or a roadmap aligned to the organization’s roadmap (for example, helping the organization to advance from event correlation to dynamic thresholds to behavior analysis with minimal effort). Watch for portability challenges in these platforms as use cases mature.”*
Insights over actions
Gartner also highlights the importance of using automation to yield valuable insights versus simply automating actions, saying, “IT organizations with a high level of maturity prefer automated insights over automated actions as a tangible goal.”* I&O leaders should prioritize tools that “reduce the visual overload for IT operators by identifying interesting data instead of treating the display screen as a dumping ground.”* For example, instead of visually analyzing multiple graphs, the AIOps platform should highlight areas that require human intervention.
Relevance for diverse personas
AIOps platforms are ideal for helping cross-functional digital business teams or “fusion teams” innovate and implement across the business. Gartner recommends the following use cases for each persona:
- DevOps: “As the DevOps practice matures, AIOps use cases broaden from a focus on preproduction to include production metrics like user engagement, quality and business relevance. This creates a need for new KPIs, comparison across multiple versions, and a product and platform focus. Considering this scenario, select platforms that can ingest instrumented data (traces, metrics and logs) and ease the effort to provide platform and product views for DevOps.”
- IT Operations: “Metric and log ingestion, followed by analytics are the primary requirements for I&O teams. The journey starts with event correlation and, as the team matures, broadens to analysis of metrics and logs followed by behavior analytics of systems and users. The primary goal here is anomaly detection, diagnostic information and root cause analysis.”
- Business: “User engagement, efficiency, productivity, and behavior analysis to help drive better decisions are the key requirements for business leaders. AIOps insights are progressively expanded, starting with correlation of user impact based on IT and broadened to include qualitative KPIs like the efficiency and productivity of technology, people and existing processes.”
- SRE: “For SRE use cases, select platforms that provide real-time topological and dependency insights for the IT architecture as one of the primary use cases and offer ease of comparative temporal and spatial analysis for multiple scenarios.”*
BMC for AIOps success
BMC’s AIOps solutions span cloud to mainframe and can help your organization proactively prevent issues before they impact service and quickly fix the problems that do occur.
BMC Helix Operations Management with AIOps is an open and scalable platform that can ingest data from hundreds of third-party tools and sources to provide cross-domain visibility, observability, and AI-driven automated actions and workflows. It combines service-centric monitoring, advanced event management, root cause isolation, and intelligent automation to effectively manage operations across complex IT environments and proactively improve performance and availability.
BMC AMI Ops is a forward-looking tool that helps mitigate issues on the mainframe before they become business problems. It uses ML to learn what normal is, detect anomalies, diagnose the probable cause, and minimize time to remediate.
BMC Helix Operations Management and BMC AMI Ops are integrated to provide end-to-end observability of the most complex applications and services.
Additional resources
To learn more about BMC’s AIOps offerings:
- Free trial: BMC Helix Operations Management
- BMC AIOps Blog
- IT Operations Trends and AIOps Adoption: Feedback from the Frontline
* Gartner, Market Guide for AIOps Platforms, Pankaj Prasad, Padraig Byrne, and Gregg Siegfried, May 30, 2022.↩
GARTNER is a registered trademark and service mark of Gartner, Inc. and/or its affiliates in the U.S. and internationally and is used herein with permission. All rights reserved.
Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings or other designation. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.
This graphic was published by Gartner, Inc. as part of a larger research document and should be evaluated in the context of the entire document. The Gartner document is available upon request from BMC at the link provided.
These postings are my own and do not necessarily represent BMC's position, strategies, or opinion.
See an error or have a suggestion? Please let us know by emailing blogs@bmc.com.