AWS Cloud Practitioner Notes - Monitoring and Analytics

Last updated Feb 13, 2026 Published Jan 9, 2021

The content here is under the Attribution 4.0 International (CC BY 4.0) license

AWS provides a comprehensive suite of monitoring and analytics services designed to help organizations observe systems, collect metrics, analyze logs, and make data-driven decisions. Effective monitoring enables proactive issue detection, performance optimization, and continuous improvement of cloud infrastructure (Services, 2024).

Monitoring involves observing systems, collecting metrics, and using data to make informed decisions about resource allocation, performance tuning, and operational improvements. Analytics complements monitoring by providing insights into patterns, trends, and anomalies across your AWS environment (Services, 2024).

Module 7 - Amazon CloudWatch

Amazon CloudWatch is a monitoring and observability service that provides actionable insights for AWS resources and applications. CloudWatch collects monitoring and operational data in the form of logs, metrics, and events, providing a unified view of AWS resources, applications, and services that run on AWS and on-premises servers (Services, 2024).

CloudWatch monitors your AWS infrastructure and applications in real-time, automatically collecting metrics from AWS services. These metrics are tied to your resources, such as CPU utilization of an EC2 instance, request counts for an Application Load Balancer, or read and write operations for DynamoDB tables.

Core capabilities of CloudWatch

CloudWatch provides several key capabilities that help organizations maintain operational excellence:

  • Access all metrics from a central location: CloudWatch aggregates metrics from all AWS services, custom applications, and on-premises resources into a single dashboard, eliminating the need to switch between multiple monitoring tools.
  • Gain visibility across applications and services: CloudWatch provides end-to-end visibility into application performance, infrastructure health, and service dependencies, enabling teams to identify bottlenecks and optimize resource utilization.
  • Reduce Mean Time to Resolution (MTTR) and improve Total Cost of Ownership (TCO): By providing real-time alerts and automated responses to operational issues, CloudWatch helps teams respond faster to incidents and reduce operational costs (Services, 2024).
  • Drive insights to optimize applications: CloudWatch enables data-driven decision-making through customizable dashboards, statistical analysis, and integration with AWS services like Auto Scaling and Lambda.

CloudWatch metrics

CloudWatch metrics represent time-ordered data points that are published to CloudWatch. Each AWS service publishes specific metrics relevant to its operation. For example, EC2 publishes CPU utilization, network traffic, and disk I/O metrics, while RDS publishes database connections, read/write latency, and storage space metrics (Services, 2024).

Organizations can also publish custom metrics to CloudWatch for application-specific monitoring needs, such as business metrics (orders processed, user registrations) or technical metrics (API response times, queue depths).

CloudWatch Alarms

CloudWatch Alarms enable proactive monitoring by automatically initiating actions when metric thresholds are breached. Alarms watch a single metric over a specified time period and perform one or more actions based on the value of the metric relative to a threshold (Services, 2024).

When creating an alarm, you specify:

  • Metric: The data point to monitor (for example, CPU utilization).
  • Threshold: The value that triggers the alarm (for example, CPU utilization above 80%).
  • Evaluation period: How long the metric must exceed the threshold before triggering the alarm.
  • Actions: What happens when the alarm state changes (for example, send an SNS notification, trigger an Auto Scaling action, or execute a Lambda function).

CloudWatch Alarms support three states:

  • OK: The metric is within the defined threshold.
  • ALARM: The metric has breached the threshold.
  • INSUFFICIENT_DATA: Not enough data is available to determine the alarm state.

Alarms can integrate with AWS services to create automated responses, such as scaling EC2 instances up or down, stopping underutilized instances, or triggering incident response workflows (Services, 2024).

CloudWatch dashboard

CloudWatch dashboards provide customizable home pages in the CloudWatch console that you can use to monitor your resources in a single view. Dashboards display metrics and alarms for your AWS resources, enabling you to get a unified view of your AWS environment and applications (Services, 2024).

Each dashboard can display multiple graphs and visualizations, including:

  • Line graphs showing metric trends over time.
  • Stacked area charts for comparing multiple metrics.
  • Number widgets displaying the current value of a metric.
  • Text widgets for adding context and documentation.

Dashboards can be shared across teams, embedded in operational tools, or displayed on large screens in operations centers. They support cross-region and cross-account monitoring, enabling centralized visibility into distributed systems.

CloudWatch Logs

CloudWatch Logs enables you to centralize logs from all your systems, applications, and AWS services. You can use CloudWatch Logs to monitor applications and systems in real-time, retain log data for compliance and audit requirements, and search and analyze log data using query syntax (Services, 2024).

Key features of CloudWatch Logs include:

  • Log groups and streams: Organize logs hierarchically by application or resource type.
  • Metric filters: Extract metrics from log data to track specific patterns or values.
  • Log retention: Configure retention periods from one day to indefinite retention.
  • Log insights: Query and analyze log data using a purpose-built query language.
  • Integration with AWS services: Automatically collect logs from Lambda functions, EC2 instances, ECS containers, and other AWS services.

CloudWatch Logs Insights provides an interactive query interface for analyzing log data, enabling you to search billions of log events in seconds and visualize results with built-in charts and graphs.

Module 7 - AWS CloudTrail

AWS CloudTrail is a service that enables governance, compliance, operational auditing, and risk auditing of your AWS account. CloudTrail records AWS API calls and related events made by or on behalf of your AWS account and delivers log files to an Amazon S3 bucket (Services, 2024).

Every API request made in your AWS account is captured by CloudTrail, including:

  • Actions taken by users, roles, or AWS services.
  • The time and date of the action.
  • The source IP address of the request.
  • The parameters used in the request.
  • The response elements returned by the AWS service.

Benefits of CloudTrail

CloudTrail provides several key benefits for organizations:

  • Security analysis and troubleshooting: Track changes to AWS resources and investigate unusual activity or unauthorized API calls. CloudTrail logs help security teams identify who made changes, when they occurred, and from where (Services, 2024).
  • Compliance auditing: Meet regulatory requirements by maintaining a complete audit trail of all API activity in your AWS account. CloudTrail logs can be used to demonstrate compliance with standards such as PCI DSS, HIPAA, and SOC.
  • Operational troubleshooting: Diagnose operational issues by examining the sequence of API calls that led to a problem. CloudTrail helps identify configuration changes that may have caused service disruptions.
  • Risk auditing: Monitor AWS account activity to detect potential security threats, such as privilege escalation attempts, unauthorized access, or suspicious patterns of API calls.

CloudTrail Insights

CloudTrail Insights helps you identify unusual operational activity in your AWS account by analyzing CloudTrail management events and automatically detecting anomalies. Insights uses machine learning to establish baselines of normal activity and alerts you when it detects patterns that deviate from the baseline (Services, 2024).

Common use cases for CloudTrail Insights include:

  • Detecting sudden increases in API call volumes.
  • Identifying unusual patterns of resource provisioning.
  • Alerting on atypical error rates.
  • Discovering changes in API call behavior that may indicate compromised credentials.

Module 7 - AWS Trusted Advisor

AWS Trusted Advisor is an automated service that inspects your AWS environment and provides real-time recommendations to help you follow AWS best practices. Trusted Advisor analyzes your account and provides actionable recommendations across five categories (Services, 2024).

The five pillars of Trusted Advisor

Trusted Advisor evaluates your AWS environment based on the following five pillars, which align with the AWS Well-Architected Framework:

Cost optimization

Trusted Advisor identifies opportunities to reduce costs by eliminating unused resources, rightsizing underutilized instances, and leveraging pricing models such as Reserved Instances or Savings Plans. Examples include:

  • Idle load balancers that incur charges without processing traffic.
  • Unattached Elastic IP addresses that generate hourly fees.
  • EC2 instances with low CPU utilization that could be downsized.
  • Outdated Reserved Instances that no longer match your usage patterns.

By following cost optimization recommendations, organizations can significantly reduce their AWS spending without impacting performance or availability (Services, 2024).

Performance

Performance recommendations help you improve the speed and responsiveness of your applications. Trusted Advisor checks for:

  • High utilization of EC2 instances that may benefit from larger instance types.
  • EBS volumes with throughput optimization opportunities.
  • CloudFront distributions that could benefit from additional edge locations.
  • Services approaching throughput limits that may cause performance degradation.

Security

Security checks help you identify gaps in your security posture and ensure you follow AWS security best practices. Trusted Advisor examines:

  • S3 buckets with public read or write permissions that may expose sensitive data.
  • Security groups with unrestricted access (0.0.0.0/0) on common ports.
  • IAM users with access keys that have not been rotated in 90 days.
  • Multi-factor authentication (MFA) status for the root account.
  • CloudTrail logging configuration to ensure audit trails are enabled.

Security recommendations help prevent data breaches, unauthorized access, and compliance violations (Services, 2024).

Fault tolerance

Fault tolerance checks help ensure your applications can withstand failures and maintain availability. Trusted Advisor evaluates:

  • EC2 Auto Scaling groups to ensure they span multiple Availability Zones.
  • RDS database instances without Multi-AZ configuration.
  • EBS volumes without recent snapshots for disaster recovery.
  • VPC configurations that lack redundancy.
  • Elastic Load Balancers without sufficient health checks.

By implementing fault tolerance recommendations, organizations can improve application resilience and reduce the impact of infrastructure failures.

Service limits

AWS sets default service limits (quotas) to protect customers from unexpected charges and to maintain service quality. Trusted Advisor monitors your usage and alerts you when you approach or exceed these limits, helping you avoid service disruptions (Services, 2024).

Service limit checks include:

  • EC2 instance limits per region.
  • VPC limits (subnets, internet gateways, elastic IPs).
  • RDS database instance limits.
  • S3 bucket limits per account.
  • IAM role and policy limits.

Trusted Advisor access levels

The availability of Trusted Advisor checks depends on your AWS Support plan:

  • Basic and Developer Support: Access to seven core checks focused on security (S3 bucket permissions, security groups, IAM use, MFA on root account) and service limits.
  • Business and Enterprise Support: Access to the full set of Trusted Advisor checks across all five pillars, including cost optimization, performance, and fault tolerance recommendations. Business and Enterprise Support customers also receive programmatic access to Trusted Advisor via the AWS Support API.

Module 7 - AWS X-Ray

AWS X-Ray is a distributed tracing service that helps developers analyze and debug production applications, particularly those built using microservices architecture. X-Ray provides an end-to-end view of requests as they travel through your application and shows a map of your application’s underlying components (Services, 2024).

With X-Ray, you can:

  • Trace requests across microservices: Follow requests as they flow through multiple services, identifying bottlenecks and latency issues.
  • Identify performance bottlenecks: Analyze the time spent in each component of your application to pinpoint slow operations.
  • Understand dependencies: Visualize service dependencies and identify how services communicate with each other.
  • Detect errors and exceptions: Identify which components are generating errors and understand the conditions that lead to failures.
  • Analyze latency distribution: Examine response time distributions to identify outliers and performance issues affecting specific users or regions.

X-Ray integrates with various AWS services, including Lambda, API Gateway, Elastic Beanstalk, EC2, and ECS, making it easy to instrument applications without significant code changes (Services, 2024).

Module 7 - Amazon EventBridge

Amazon EventBridge is a serverless event bus service that makes it easy to connect applications using events. EventBridge receives events (indicators of a change in environment) from your applications, AWS services, and third-party SaaS applications, and routes them to target services based on rules you define (Services, 2024).

Key features of EventBridge include:

  • Event-driven architecture: Build loosely coupled, distributed applications that respond to events in real-time.
  • Event filtering and routing: Define rules to route events to specific targets based on event content.
  • Integration with AWS services: Connect to over 90 AWS services as event sources or targets.
  • Schema registry: Discover, create, and manage event schemas for your applications.
  • Archive and replay: Store events for future processing or replay events to recover from failures.

EventBridge supports three types of event buses:

  • Default event bus: Receives events from AWS services.
  • Custom event buses: Receive events from your applications or third-party SaaS providers.
  • Partner event buses: Receive events from SaaS partners integrated with EventBridge.

Module 7 - AWS Health Dashboard

AWS Health Dashboard provides personalized information about events that might affect your AWS infrastructure and resources. The service offers two views (Services, 2024):

Service Health Dashboard

The Service Health Dashboard displays the general status of AWS services across all regions. This public dashboard shows:

  • Current operational status of all AWS services.
  • Historical information about service availability.
  • RSS feeds for service status updates.
  • Planned maintenance notifications.

The Service Health Dashboard is available to all AWS users without authentication and provides a global view of AWS service health.

Account Health Dashboard

The Account Health Dashboard (formerly AWS Personal Health Dashboard) provides personalized information about AWS events that may affect your specific resources. Unlike the Service Health Dashboard, which shows global service status, the Account Health Dashboard focuses on events relevant to your AWS account (Services, 2024).

The Account Health Dashboard includes:

  • Alerts for resources that may be affected by AWS infrastructure issues.
  • Notifications about scheduled maintenance activities.
  • Proactive notifications about changes that may impact your resources.
  • Guidance for addressing events and mitigating impact.
  • Integration with CloudWatch Events and EventBridge for automated responses.

Organizations can use the Account Health Dashboard to stay informed about operational issues, plan for scheduled changes, and automate responses to health events using EventBridge rules.

Best practices for AWS monitoring and analytics

Implementing effective monitoring and analytics practices helps organizations maintain operational excellence and respond quickly to issues. Consider these best practices when designing your monitoring strategy:

  1. Establish baselines: Understand normal operating parameters for your resources before setting alarms and thresholds.
  2. Use multiple metrics: Monitor complementary metrics to get a complete picture of system health (for example, combine CPU, memory, and disk I/O metrics for EC2 instances).
  3. Implement layered monitoring: Monitor at multiple levels, including infrastructure, application, and business metrics.
  4. Automate responses: Use CloudWatch Alarms with Auto Scaling, Lambda functions, or Systems Manager to automatically respond to operational issues.
  5. Centralize logs: Aggregate logs from all sources into CloudWatch Logs for unified analysis and troubleshooting.
  6. Enable CloudTrail: Maintain a complete audit trail of all API activity for security analysis and compliance.
  7. Review Trusted Advisor regularly: Act on Trusted Advisor recommendations to optimize costs, improve performance, and enhance security.
  8. Implement distributed tracing: Use X-Ray for microservices architectures to understand dependencies and identify performance bottlenecks.
  9. Set up notification channels: Configure SNS topics or integrate with incident management tools to ensure alerts reach the right teams.
  10. Document and test: Document your monitoring strategy and regularly test alarm configurations to ensure they trigger correctly.

Conclusion

AWS provides a comprehensive set of monitoring and analytics services that work together to provide visibility into your cloud infrastructure and applications. By combining CloudWatch for metrics and logs, CloudTrail for audit trails, Trusted Advisor for best practice recommendations, X-Ray for distributed tracing, EventBridge for event-driven automation, and the Health Dashboard for service status, organizations can build robust observability solutions that support operational excellence, security, and continuous improvement.

Effective use of these services enables teams to detect issues proactively, respond to incidents quickly, optimize resource utilization, and ensure compliance with organizational policies and regulatory requirements. As your AWS environment grows, investing in comprehensive monitoring and analytics becomes increasingly important for maintaining reliability, performance, and cost efficiency.

References

  1. Services, A. W. (2024). What is monitoring? https://docs.aws.amazon.com/wellarchitected/latest/management-and-governance-guide/monitoring.html
  2. Services, A. W. (2024). AWS Observability Best Practices. https://aws.amazon.com/solutions/implementations/aws-observability/
  3. Services, A. W. (2024). What is Amazon CloudWatch? https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/WhatIsCloudWatch.html
  4. Services, A. W. (2024). Amazon CloudWatch Benefits. https://aws.amazon.com/cloudwatch/
  5. Services, A. W. (2024). Using Amazon CloudWatch metrics. https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/working_with_metrics.html
  6. Services, A. W. (2024). Using Amazon CloudWatch alarms. https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html
  7. Services, A. W. (2024). Create alarms to stop, terminate, reboot, or recover an EC2 instance. https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/UsingAlarmActions.html
  8. Services, A. W. (2024). Using Amazon CloudWatch dashboards. https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch_Dashboards.html
  9. Services, A. W. (2024). What is Amazon CloudWatch Logs? https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/WhatIsCloudWatchLogs.html
  10. Services, A. W. (2024). What is AWS CloudTrail? https://docs.aws.amazon.com/awscloudtrail/latest/userguide/cloudtrail-user-guide.html
  11. Services, A. W. (2024). Security best practices in AWS CloudTrail. https://docs.aws.amazon.com/awscloudtrail/latest/userguide/best-practices-security.html
  12. Services, A. W. (2024). Logging Insights events for trails. https://docs.aws.amazon.com/awscloudtrail/latest/userguide/logging-insights-events-with-cloudtrail.html
  13. Services, A. W. (2024). AWS Trusted Advisor. https://aws.amazon.com/premiumsupport/technology/trusted-advisor/
  14. Services, A. W. (2024). Cost Optimization with AWS Trusted Advisor. https://docs.aws.amazon.com/awssupport/latest/user/cost-optimization-checks.html
  15. Services, A. W. (2024). Security checks in AWS Trusted Advisor. https://docs.aws.amazon.com/awssupport/latest/user/security-checks.html
  16. Services, A. W. (2024). Service limits checks in AWS Trusted Advisor. https://docs.aws.amazon.com/awssupport/latest/user/service-limits-checks.html
  17. Services, A. W. (2024). What is AWS X-Ray? https://docs.aws.amazon.com/xray/latest/devguide/aws-xray.html
  18. Services, A. W. (2024). AWS X-Ray integrations with AWS services. https://docs.aws.amazon.com/xray/latest/devguide/xray-services.html
  19. Services, A. W. (2024). What is Amazon EventBridge? https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-what-is.html
  20. Services, A. W. (2024). AWS Health Dashboard. https://docs.aws.amazon.com/health/latest/ug/what-is-aws-health.html
  21. Services, A. W. (2024). Getting started with the AWS Health Dashboard. https://docs.aws.amazon.com/health/latest/ug/getting-started-health-dashboard.html

You also might like