AWS cloud practitioner notes - EC2

Last updated Apr 6, 2024 Published Jan 4, 2021

The content here is under the Attribution 4.0 International (CC BY 4.0) license

Join Our Community

Connect with developers, architects, and tech leads who share your passion for quality software development. Discuss TDD, architecture, software engineering, and more.

→ Join Slack

This post, based on the free official course from AWS (training & certification, 2020), is the second in a series designed to cover the entire AWS Cloud Practitioner Essentials course content. The focus here is EC2 (Elastic Compute Cloud) and the broader set of compute services that AWS provides. It also covers the pay-as-you-go pricing model that underpins the economic case for moving workloads to the cloud.

Before diving into EC2 itself, it is worth establishing a precise definition of cloud computing: it is the on-demand delivery of IT resources and applications through the internet with pay-as-you-go pricing (Mell & Grance, 2011). That last phrase is critical. When you consume compute resources in a traditional data center, you pay for capacity upfront whether you use it or not. AWS aggregates demand from hundreds of thousands of customers, which allows it to buy hardware at scale, pass those savings on, and charge you only for what you actually consume. The result is that even a single developer can access the same quality of infrastructure that large enterprises use, without bearing the capital cost of owning it.

Module 2 - Introduction EC2 (CaaS)

Amazon EC2 (Elastic Compute Cloud) is the foundational compute service of AWS. To understand its value, consider what happens when a company needs a new server in a traditional on-premises environment: the team must submit a purchase order, wait for hardware to arrive, rack and cable it in a data center, configure the operating system, patch it, and only then can application code be deployed. That process can take weeks and carries significant upfront cost. EC2 collapses that timeline to minutes and converts a capital expenditure into an operational one.

When you launch an EC2 instance, AWS provisions a virtual server on physical hardware in one of its data centers. The physical host can run many instances simultaneously, a property known as multitenancy (Glossary, 2021). AWS uses a hypervisor to enforce strict isolation between tenants, meaning that one customer’s workload cannot read or affect the memory of another customer’s instance even if they share the same underlying host. From your perspective, the instance behaves exactly like a dedicated machine.

EC2 supports two major operating system families: Linux (in a wide variety of distributions, from Amazon Linux to Ubuntu, Red Hat, and SUSE) and Windows Server. Once an instance is running, you control not only the operating system but also the networking configuration, including which ports are open, which subnets the instance joins, and which security groups govern inbound and outbound traffic.

The shared responsibility model is fundamental to understanding what EC2 gives you and what it requires of you. AWS is responsible for the physical security of its data centers, the health of the underlying host hardware, the hypervisor layer, and the global network fabric. You, as the customer, are responsible for everything above the hypervisor: patching the operating system, configuring the firewall rules correctly, managing user access, setting up Auto Scaling to handle demand, and designing the architecture for high availability. EC2 is therefore described as Infrastructure as a Service (IaaS), sometimes also framed as Compute as a Service (CaaS), precisely because it hands you a compute primitive and leaves the higher-level decisions to you.

In practical terms, this means you must:

  • Set up and manage your instances, including installing software and configuring services
  • Apply operating system patches and security updates on a regular schedule
  • Define scaling policies so that capacity adjusts to workload changes
  • Design your deployment across multiple Availability Zones to achieve high availability

The benefit of that responsibility is full control. You can install any software, tune kernel parameters, attach custom storage volumes, and connect the instance to any networking topology you need. EC2 is flexible because it does not prescribe how your software should be structured. It is reliable because AWS underpins it with redundant power and networking in every data center. And it is scalable because you can change the size of an instance or add more instances in response to demand, as the scaling section below explains in detail.

Module 2 - EC2 Instance types

EC2 instance types determine the combination of CPU, memory, storage, and network bandwidth available to your workload. AWS groups instance types into families, where each family is optimized for a different category of computing need. Choosing the right instance type is one of the most consequential decisions you make when deploying on EC2, because an undersized instance causes performance problems while an oversized one wastes money.

The term resources in the context of instance families refers to the aggregate of CPU cores, RAM, local storage throughput, and network bandwidth. Each family balances these resources differently:

  • General purpose instances (for example, the M and T families) provide a balanced ratio of CPU to memory and are suited for workloads where no single resource is the bottleneck. Web servers, application servers, and code repositories are typical candidates. The T family is burstable, meaning it accumulates CPU credits during idle periods and spends them during spikes, making it cost-effective for workloads that are mostly idle but occasionally active.

  • Compute-optimized instances (the C family) favor CPU over memory, making them appropriate for workloads that are CPU-bound rather than memory-bound. Game servers, high-performance computing (HPC) jobs, scientific modeling, and batch processing pipelines that perform intensive numerical transformations all benefit from the higher CPU-to-memory ratio of compute-optimized instances.

  • Memory-optimized instances (the R, X, and z families) prioritize RAM. They are the right choice when your workload holds large datasets in memory for fast access: in-memory databases such as Redis or Memcached at very high throughput, real-time analytics engines that process streaming data, and SAP HANA or other in-memory ERP systems all fall into this category.

  • Accelerated computing instances (the P, G, Inf, and Trn families) attach hardware accelerators, such as GPUs or AWS-designed chips like Inferentia and Trainium, to the instance. Hardware accelerators are purpose-built silicon that excels at floating-point arithmetic, graphics rendering, and data pattern matching at a rate that general-purpose CPUs cannot match. Machine learning training, deep learning inference, and video transcoding are the primary workloads that justify the premium cost of these instance types.

  • Storage-optimized instances (the I, D, and H families) are designed for workloads that require very high read and write throughput to locally attached NVMe SSDs. Sequential and random I/O-intensive applications, such as NoSQL databases, data warehousing, and Hadoop distributed file systems, benefit from the low-latency local storage that these instances provide. The key difference from other families is that the bottleneck being relieved is storage I/O rather than CPU or memory.

Selecting the wrong family is a common source of both performance problems and unnecessary cost. A memory-optimized instance running a CPU-bound workload pays for RAM that goes unused, while a compute-optimized instance running an in-memory database will likely swap to disk under load. AWS recommends benchmarking your workload against multiple instance types before committing to a production size.

Module 2 - EC2 pricing

EC2 pricing is not a single flat rate. AWS offers several purchasing options, each carrying a different price point and a different contractual commitment. Understanding the trade-offs between them is a core objective of the Cloud Practitioner exam and a practical skill for anyone managing AWS costs (AWS, 2021).

On-demand instances carry no upfront commitment. You pay for compute capacity by the second (for Linux instances) or by the hour (for Windows and some other operating systems), and you stop paying the moment you terminate the instance. On-demand is the right choice for workloads with unpredictable traffic patterns, for short-term experiments, and for any situation where you cannot interrupt the workload mid-run. The trade-off is that on-demand carries the highest per-unit price of all purchasing options.

Spot instances allow you to request spare EC2 capacity in AWS data centers at a market-driven price. AWS determines the Spot price based on long-term supply and demand for each instance type in each Availability Zone. AWS can reclaim a spot instance with a two-minute warning when it needs the capacity back, so spot instances are not appropriate for workloads that cannot tolerate interruption. In exchange for accepting that interruption risk, AWS offers discounts of up to 90% compared to on-demand prices. Batch processing jobs, big data analytics, CI/CD pipeline workers, and rendering tasks are well-suited to spot instances because they can checkpoint their state and resume after an interruption.

Reserved instances are a billing commitment: you agree to use a specific instance type in a specific region for a one-year or three-year term, and in return AWS provides a discount of up to 75% compared to on-demand pricing. Standard Reserved Instances lock you into a specific instance family and operating system, while Convertible Reserved Instances allow you to exchange the instance type during the term at a slightly lower discount. Reserved instances make sense for workloads with predictable, steady-state resource consumption, such as a production database or a core application tier that runs continuously.

Savings plans are a more flexible commitment model introduced to address one limitation of Reserved Instances: they apply to any instance family, any size, any operating system, and any region. You commit to a minimum hourly spend (measured in dollars per hour) for one or three years, and any usage up to that commitment is billed at the discounted rate. Savings plans can reduce costs by up to 72% compared to on-demand and are often easier to manage at scale than a portfolio of Reserved Instances.

Dedicated hosts go one step further than multitenancy: you receive a physical server entirely dedicated to your use. No other customer’s instances run on that host. This is necessary for compliance requirements that mandate physical isolation (certain government and financial regulations specify this), as well as for software licenses that are tied to the number of physical CPU sockets rather than virtual cores. Dedicated hosts are the most expensive EC2 purchasing option.

Beyond the purchasing option, the total cost of an EC2 deployment depends on several additional factors: the instance type (a c6i.32xlarge costs far more per hour than a t3.micro), the AWS region selected (operating costs vary by geography and some regions carry a premium), the number of instances running simultaneously, the configuration of any attached load balancers, and the number of Elastic IP addresses allocated to the account. AWS provides a pricing calculator that allows you to model all of these variables and compare the total monthly cost across purchasing options before committing any resources.

Module 2 - EC2 scaling

One of the core promises of cloud computing is that capacity should match demand, not the other way around. In a traditional data center, you provision for peak load because you cannot add hardware quickly enough to respond to traffic spikes. The result is that most of the time you are running hardware at a fraction of its capacity. EC2 breaks that constraint through two distinct scaling dimensions.

Vertical scaling (sometimes called scaling up or down) means changing the size of an existing instance. Because EC2 instances are virtual machines rather than physical hardware, you can stop an instance, choose a larger or smaller instance type, and start it again. This gives you access to more or less CPU, memory, and storage without replacing any hardware. Vertical scaling is straightforward but has a ceiling: the largest EC2 instance types have a finite amount of CPU and memory, and there is always a brief period of downtime while the instance is stopped and restarted with the new type.

Horizontal scaling (scaling out or in) means adding or removing instances from a pool. Instead of making one instance larger, you run more instances in parallel and distribute work across all of them. Horizontal scaling has no practical ceiling, since AWS can provision additional instances in minutes, and it does not require downtime. The trade-off is that your application must be designed to run across multiple instances simultaneously, which is a non-trivial architectural requirement for stateful applications.

AWS Auto Scaling operationalizes horizontal scaling through two policies. Dynamic scaling reacts to real-time demand signals: when a CloudWatch metric such as average CPU utilization crosses a defined threshold, Auto Scaling adds instances to the group; when demand falls, it removes them. Predictive scaling uses machine learning to forecast future demand based on historical traffic patterns and provisions capacity in advance of a predicted spike, avoiding the latency that reactive scaling incurs when traffic climbs faster than new instances can be launched. For most production workloads, combining both policies provides the best balance of responsiveness and cost efficiency.

Module 2 - Elastic load balancing (ELB)

Horizontal scaling raises an immediate question: once you have multiple instances handling requests, how do clients know which instance to contact? The answer is a load balancer, a component that sits in front of the instance pool and distributes incoming requests across all healthy instances.

AWS Elastic Load Balancing (ELB) is a managed load balancing service that operates at the regional level. Because it is managed, AWS handles the provisioning, maintenance, and scaling of the load balancer infrastructure itself. ELB integrates natively with Auto Scaling: as the Auto Scaling group adds or removes instances, ELB automatically routes traffic only to instances that have passed health checks, ensuring that clients never reach an instance that is starting up, shutting down, or unhealthy.

AWS offers three types of load balancers, each suited to a different protocol layer (AWS, 2021):

  • Application Load Balancer (ALB) operates at Layer 7 of the OSI model and understands HTTP and HTTPS. Because it inspects the content of each request, it can route traffic based on URL path, host header, query parameters, or HTTP method. This makes it the right choice for modern web applications and microservices architectures where different URL prefixes map to different backend services.

  • Network Load Balancer (NLB) operates at Layer 4 and forwards TCP, UDP, and TLS traffic without inspecting the payload. It is optimized for very high throughput and very low latency (sub-millisecond), making it appropriate for real-time streaming, gaming, and IoT workloads that require predictable performance under millions of concurrent connections.

  • Classic Load Balancer predates both ALB and NLB and provides basic request distribution across EC2 instances at both Layer 4 and Layer 7. It was designed for applications built in the original EC2-Classic network environment, which AWS has since retired. AWS recommends migrating Classic Load Balancers to ALB or NLB for all new workloads.

ELB is cost-efficient because you pay only for the capacity consumed, and it is automatically scalable because AWS provisions additional load balancer nodes as traffic grows, without requiring any manual intervention.

Module 2 - Message and queueing

As applications grow more complex, they are often decomposed into multiple services that need to communicate with each other. The way those services communicate has a direct impact on how the overall system behaves when any individual component fails or slows down.

Tightly coupled architectures connect services directly: Service A sends a synchronous request to Service B and waits for a response before continuing. When everything is healthy, this works well. When Service B becomes slow or unavailable, however, Service A is blocked and the failure propagates through the entire dependency chain.

                    talks to
Application A --------------------> Application B

Loosely coupled architectures introduce an intermediary between services so that the sender and receiver do not need to be available simultaneously. The sender places a message in a queue or onto a topic; the receiver processes it when it is ready. A failure in the receiver does not block the sender, because the message simply waits in the intermediary until the receiver recovers.

              sends to                  process
                      _________________
Application A ------> | message queue | <-------- Application B
                      |_______________|

AWS provides two complementary services for building loosely coupled architectures.

AWS SQS (Simple Queue Service) is a fully managed message queue. Producers send messages to a queue; consumers poll the queue and process messages at their own pace. The payload of a message (the data it carries) is protected during transit and storage. SQS supports any message volume without message loss, and it retains messages until a consumer successfully processes and deletes them, or until a configurable retention period expires. SQS is the right choice when you need point-to-point communication where each message should be processed by exactly one consumer.

AWS SNS (Simple Notification Service) implements the publish/subscribe (pub/sub) pattern. A producer publishes a message to an SNS topic, and SNS fans the message out to all subscribers simultaneously. Subscribers can be SQS queues, HTTP/HTTPS endpoints, email addresses, SMS numbers, AWS Lambda functions, or mobile push notification endpoints. SNS is the right choice when a single event needs to trigger multiple downstream actions at the same time, for example, when a new user registers and you need to simultaneously send a welcome email, update a CRM system, and kick off an onboarding workflow.

The two services complement each other: a common pattern is to use SNS to fan out an event to multiple SQS queues, where each queue feeds a dedicated consumer service. This combination gives you both fan-out delivery and durable, backpressure-safe consumption.

Module 2 - Additional compute services

EC2 gives you full control over virtual machines, but not every workload needs that level of control. AWS provides several higher-level compute abstractions that remove operational overhead in exchange for reduced configurability.

AWS Lambda

AWS Lambda is a serverless compute service: you upload your function code, configure a trigger, and AWS runs the code in a fully managed execution environment. You do not provision instances, patch operating systems, or configure Auto Scaling. AWS allocates the necessary compute resources for each invocation, scales to match the rate of incoming events, and deallocates resources when execution completes.

Lambda is designed for short-lived executions. The maximum duration of a single Lambda invocation is 15 minutes, which makes it well-suited for event-driven tasks such as processing records from an SQS queue, responding to API Gateway requests, transforming objects uploaded to S3, or reacting to database change streams. Because you pay only for the compute time consumed during execution (measured in milliseconds), Lambda can be extremely cost-effective for workloads with infrequent or unpredictable invocation patterns.

The trade-off is that Lambda’s execution environment is ephemeral and stateless. Any state that must persist between invocations must be stored externally in a service such as S3, DynamoDB, or ElastiCache. Lambda also imposes limits on memory (up to 10 GB per function), storage (/tmp is limited to 10 GB), and execution duration (15 minutes maximum), which means it is not appropriate for long-running processes or workloads that require persistent local state.

AWS ECS (Elastic Container Service) and EKS (Elastic Kubernetes Service)

Containers package an application and its dependencies into a portable, reproducible unit that runs consistently across different environments. Orchestrating containers at scale, meaning deciding which host to run each container on, restarting failed containers, distributing load, and rolling out updates, requires a container orchestration platform.

AWS ECS (Elastic Container Service) is AWS’s own container orchestration service. It manages a cluster of compute resources and schedules containers onto them based on CPU and memory requirements. ECS integrates deeply with other AWS services including IAM for fine-grained permissions, CloudWatch for metrics and logs, and ELB for load distribution across container tasks. ECS can run on two different compute planes:

  1. When running ECS on EC2, you manage the underlying EC2 instances that form the cluster. This gives you visibility into and control over the host operating system, the Docker daemon configuration, and the instance networking, but it also means you bear responsibility for patching the hosts, right-sizing the cluster, and handling host failures.

  2. When running ECS on AWS Fargate, AWS manages the compute infrastructure entirely. You specify the CPU and memory your container task requires, and AWS provisions, patches, and scales the underlying compute without exposing any host to you. Fargate removes the operational burden of managing EC2 instances from the container deployment workflow, at the cost of reduced access to host-level settings.

AWS EKS (Elastic Kubernetes Service) provides a managed Kubernetes control plane. If your organization has existing Kubernetes expertise or needs compatibility with the broader Kubernetes ecosystem (Helm charts, Kubernetes-native operators, service meshes), EKS allows you to run standard Kubernetes workloads on AWS without managing the control plane yourself. Like ECS, EKS can use either EC2 worker nodes (which you manage) or Fargate (which AWS manages).

The choice between ECS and EKS generally comes down to operational familiarity and ecosystem requirements: ECS is simpler and tightly integrated with AWS, while EKS provides portability and access to the Kubernetes ecosystem. Both support Fargate for serverless container execution.

For a deep dive into ECS, please visit the page dedicated to that.

Up next

Infrastructure and reliability

References

  1. training, A. W. S., & certification. (2020). AWS Cloud Practitioner Essentials. https://www.aws.training/Details/eLearning?id=60697
  2. Mell, P., & Grance, T. (2011). The NIST Definition of Cloud Computing (No. Special Publication 800-145; Issue Special Publication 800-145). National Institute of Standards and Technology. https://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-145.pdf
  3. Glossary, G. (2021). Multitenancy. https://www.gartner.com/en/information-technology/glossary/multitenancy
  4. AWS. (2021). Overview of Amazon Web Services. https://d1.awsstatic.com/whitepapers/aws-overview.pdf

Changelog

  • Apr 06, 2024 - Added pricing calculator reference under section EC2 pricing

You also might like