The onset of digital transformation has led to organizations adopting and maintaining tools that never existed in the past, including automation, cloud computing, containers, microservices and more.
The need to maintain these tools has taken its toll on the workforce, which means that IT operations are starting to automate maintenance as well. This has led to the advent of AIOps.
As engineers begin to bulk up their AIOps capabilities, what are the best tools they can start to use?
Looking at an AIOps Architecture
AIOps makes use of big data and machine learning (ML). Big data, in this case, is the process of instrumenting applications and environments and then streaming their logs to an analytics tool. The tool is powered by ML and has the power to detect and correlate anomalies within application data. This tool alerts on anomalies and provides information that helps administrators ultimately connect the dots to make it all work.
There are several components within this framework, including:
- Sensors and collectors
These are software applications that monitor data from your ecosystem and store it in a data lake or data warehouse for analysis.
- RuleEngines and workflows
These components are responsible for how data and alerts get moved around. If one of your metrics starts spiking, who ends up receiving information about it?
- Correlations and anomaly detection
These tools notice when your data is undergoing an anomaly. If your data is undergoing multiple anomalies, they will identify and flag the related ones.
- IT automation and configuration management
Certain tools – especially cloud management and container orchestration tools – have the ability to self-heal from minor instabilities or restore themselves from known bad configurations.
Each component of the AIOps stack presents a wide variety of options. Below, we’ll tell you the best open source solutions to help you get started.
Sensors and Collectors
From the Cloud Native Computing Foundation project, Prometheus acts as a systems and service monitoring system. The software specializes in numeric time series, which it collects by regularly polling data sources. What’s more, it can even discover mission-critical data sources automatically, without manual configuration.
- StreamSets Data Collector
Something needs to collect and organize volumes of big data – and that something is StreamSets. StreamSets provides an easy cloud-based solution for providing data pipelines from Kubernetes, Amazon S3, Azure Data Lake Storage, Google DataProc and more.
Telegraf works as an agent for collecting, processing, aggregating and writing metrics.
RuleEngines & Workflows
Using StackStorm, engineers can create rules for automating actions across an entire application ecosystem, using simple integrations.
The Prefect data engineering platform lets users create tasks that perform actions and flows that organize tasks. It is effectively a tool for running pre-written code and managing failure states.
- Apache Airflow
Similar to Prefect, Apache Airflow (or simply Airflow) is a platform to programmatically author, schedule and monitor workflows. It allows for configuration as code; all you do is code a pipeline and watch as Airflow executes and parameterizes it automatically.
Using Luigi, engineers can build large amounts of very detailed pipelines. The tool can also automatically understand application dependencies, help you manage and visualize workflows and perform automation error handling. It natively integrates with command line and supports Hadoop right out of the box.
Correlations & Anomaly Detection
Anodot can ingest all of your metrics and process them quickly, finding seasonality on a daily, weekly, and quarterly basis, while also correlating related anomalies in order to speed root cause analysis. Anodot gives users the ability to proactively detect anomalies in real time, allowing companies to mitigate them before customers notice.
With Prophet, analysts can implement an additive model that incorporates seasonality and holiday events, allowing users to parse non-linear data. Users should note that this model works best with metrics that display seasonality and requires a great deal of historical data to function. With that said, Prophet has good outlier detection and can handle shifts in the metric as well as missing data.
- EGADS (Extensible Generic Anomaly Detection System)
When open-source users have a large amount of time-series data to analyze, they can turn to EGADS, which is an open-source Java package that specializes in anomaly detection.
- Anomaly Detection
It might not be imaginatively named, but Anomaly Detection does what it says. In addition to detecting anomalies in ordinary time-series data, this open-source R-package can detect and filter out seasonality to uncover underlying trends.
IT Automation & Configuration Management
Patching and configuration management can be a huge problem for IT operations, especially when performed at scale. Ansible makes that job easier by being simple to use, requiring no scripts, custom code or installation on endpoints.
Incidents happen, but many incidents are not unique. Engineers can use Rundeck to create automated error-handling procedures that can be run by support staff in case of specific incidents, decreasing time to resolution.
Running tools across multiple servers requires many repetitive commands. Operators should not be tied to their desks in order to perform those commands. Gunnery offers a web-based interface so that users can run server commands from their smartphones and browsers instead.
Completing Your AIOps Stack
Open source tools can provide a great deal of functionality and effectiveness without breaking the budget, but an AIOps stack is only as good as its analytics capability. Of the tools discussed here, only a few were able to detect and filter seasonality, and most needed to be specifically configured on a limited number of metrics in order to do so. This approach won’t scale, and it won’t provide the full range of abilities you need to truly lessen your workload.
Using an open-source analytics platform may not fit your needs, and alternatively, building an in-house analytics capability can take years. For this element of your AIOps implementation, we recommend working with a trusted vendor who can provide you with analytics capabilities tailored to both your present and future needs.