Amazon CloudWatch Monitoring for Workloads Hosted on VMware Cloud on AWS
By Krishna Kumar R, AWS Cloud Architect – Kyndryl
By Sourish Bhattacharya, AWS Cloud Architect – Kyndryl
By Amit Kumar Jha, Sr. Partner Solutions Architect – AWS
By Mayank Singh, Sr. Partner Solutions Architect – AWS
Kyndryl |
Monitoring and managing cloud-based resources is crucial for maintaining performance, troubleshooting issues, and ensuring the health of your infrastructure.
Currently, there is a lack of a unified solution for monitoring virtual machines running on VMware Cloud on AWS, native AWS services, and on-premises infrastructure that integrates with information technology management system (ITMS) tools.
In this post, we will explore the integration of Amazon CloudWatch with VMware Cloud on AWS with a focus on monitoring the workload of virtual machines (VMs). We’ll elaborate on the benefits of integrating CloudWatch with other AWS services as well as third-party services like ServiceNow ITSM tool and IBM Netcool.
These integrations enable centralized real-time monitoring and incident management, and provide a unified service management experience for administrators with a comprehensive view of their infrastructure.
Kyndryl works extensively with customers to operate their workloads on VMware Cloud on AWS and developed this integration to monitor the same with Amazon CloudWatch.
Kyndryl is an AWS Premier Tier Services Partner and global IT infrastructure services provider that is relentlessly innovating to help customers with cloud-native transformation and make the journey seamless. Kyndryl designs, builds, manages, and modernizes the complex, mission-critical information systems companies depend on every day.
Solution Overview
Amazon CloudWatch is a robust monitoring and observability service provided by Amazon Web Services (AWS). With CloudWatch, customers gain system-wide visibility into resource utilization, application performance, and operational health.
The solution described here helps to monitor workloads hosted on a VMware Cloud on AWS software-defined data center (SDDC) environment using CloudWatch. It provides a unified service management experience for AWS-native and VMware Cloud on AWS workloads.
Flow Diagram
This section details the flow of events for the solution as described in Figure 1 below. This flow is used to invoke a custom AWS Lambda function to process CloudWatch alerts and send them to Netcool and ServiceNow.
Figure 1 – Monitoring flow diagram.
The numbered events in the sequence diagram are:
- Amazon CloudWatch agent is installed on workload VMs hosted on VMware Cloud on AWS, and the SDDC environment monitors and sends alerts to CloudWatch.
- Alert action configured in CloudWatch will post an Amazon Simple Notification Service (Amazon SNS) topic.
- Since the AWS Lambda function is subscribed and configured to act upon the SNS topic, the Lambda function will be triggered.
- The Lambda function processes the CloudWatch alerts and posts the results to the SNS topic for Netcool.
- Through a webhook integration, the SNS subscription sends the CloudWatch alerts to Netcool. This webhook provides flexibility to integrate with any third-party applications.
- For each unique alert Netcool receives, it creates an incident in ServiceNow. The primary key for identifying the alerts uniquely is a combination of AWS account number, region, alert, and host details. If an incident for the same node alert is still open on Netcool or ServiceNow, Netcool will suppress that alert and it won’t create a new incident. Thus, it avoids duplication of incidents.
Architecture
Components 1-10 in Figure 2 describes the flow and AWS services used to accomplish workloads monitoring with CloudWatch.
Figure 2 – Monitoring architecture diagram.
- AWS Systems Manager Agent (SSM Agent) – Used for attaching VMware Cloud VMs with AWS as an SSM managed instance.
- Amazon CloudWatch Agent – Configured to collect custom metrics from VMs.
- AWS Systems Manager (SSM) – Used to manage VMs from the AWS console and run automation to configure CloudWatch agents.
- SSM Parameter Store – Used as a central repository to store CloudWatch agent metrics configuration.
- CloudWatch Metrics – Custom namespace is used to store metrics pushed by CloudWatch agents.
- CloudWatch Alarms – Created for standard KPIs, and alarm actions are configured.
- SNS Topic (for Lambda integration) – Used in alarm action to invoke custom Lambda function.
- AWS Lambda – Processes the incoming CloudWatch alarm message and makes modifications to the message payload per IBM Netcool’s requirement and notifies SNS.
- SNS Topic (for Netcool webhook) – Triggers Netcool webhook and passes the processed message.
- IBM Netcool – Receives the alarm message and extracts values from the same to create ServiceNow incident.
- ServiceNow – Used as ITSM to create an incident ticket for each CloudWatch alarm and follows the lifecycle of the alarm.
Solution Deployment
Prerequisites
- AWS account and VMware Cloud on AWS SDDC.
- Required permissions to use AWS resources such as Amazon CloudWatch, AWS Lambda, AWS System Manager, and Amazon SNS. Refer to AWS least privilege best practices for creation permissions.
- AWS System Manager and CloudWatch agent, AWS Command Line Interface (CLI) v2, and Python need to be installed on the virtual machines hosted on VMware Cloud on AWS.
- Access and sufficient permissions to Netcool event management and ServiceNow ITSM.
Steps
- Install AWS SSM agent on the VMs running on VMware Cloud on AWS and configure AWS Hybrid Activation. This will attach VM with SSM as a hybrid workload.
- Install CloudWatch agents on the VMs on VMware Cloud on AWS using the SSM Run Command AWS-ConfigureAWSPackage and create CloudWatch agent configuration file.
- Store CloudWatch agent configuration in the SSM Parameter Store. By default, CloudWatch collects the CPU, memory, disk, swap, and netstat metrics from the VM. Custom metrics can be configured using plugins such as procstat and statsd defined in the agent configurations.
Figure 3 – Linux disk usage metrics – Standard CloudWatch metric.
- Integrate CloudWatch with Netcool using custom integration as out-of-the-box integrations are not available. Create a custom Lambda function and SNS topic in the CloudWatch alarm action.
- The Lambda function will process the incoming CloudWatch alerts and modify the message payload per IBM Netcool requirements, and post them to another SNS topic.
- Netcool is integrated with the SNS topic as an HTTPS endpoint (webhook).
- The processed alarm message is sent to Netcool from SNS. Netcool creates incidents in ServiceNow based on the values extracted from the message, and the incidents have sufficient information to identify the CloudWatch alerts for VMs and metrics.
Figure 4 – ServiceNow incident.
- Site reliability engineers (SREs) or operations teams can act upon incidents, resolve the issues that caused the CloudWatch alert and close ServiceNow incidents.
- Once the alarm is resolved, the “OK” event is passed to Netcool using the same integration, and Netcool identifies the ServiceNow incident and resolves the same.
Summary
Monitoring the on-premises and cloud landscape is becoming increasingly important. The absence of a centralized cloud-native monitoring solution can result in operational overheads. By implementing the solution as described in this post, you will have unified service management experience for administrators and a comprehensive view of their entire infrastructure.
This also helps in reducing operational overhead with incident management and dashboards for cloud-native and VMware Cloud on AWS workloads.
Kyndryl – AWS Partner Spotlight
Kyndryl is an AWS Premier Tier Services Partner and IT infrastructure services provider that designs, builds, manages, and modernizes the complex, mission-critical information systems the world depends on every day.
Contact Kyndryl | Partner Overview | AWS Marketplace