Interview Q&A All Levels Aws

AWS EC2 Interview Questions & Answers (2026)

50+ AWS EC2 interview questions covering instance types, Auto Scaling, AMIs, pricing models, networking, storage, and troubleshooting — Basic to Advanced.

January 20, 2025 27 min read 50 Questions TechwithDB
Level:

Amazon EC2 (Elastic Compute Cloud) provides resizable virtual servers in the AWS cloud.

Problems it solves:

  • Eliminates upfront hardware investment (CapEx → OpEx)
  • Scales compute capacity in minutes, not months
  • Pay-as-you-go — no cost for idle capacity
  • Global availability across 30+ AWS Regions
  • Managed hypervisor, physical security, and networking

Key capabilities: Choose OS, instance type, storage, networking, and have root access.

Pricing ModelSavingsCommitmentBest For
On-Demand0% (baseline)NoneUnpredictable loads, dev/test
Reserved (1yr)~40%1 yearSteady-state production workloads
Reserved (3yr)~60%3 yearsLong-term stable workloads
Spot70–90%None (interruptible)Batch jobs, stateless apps
Savings Plans~66%1–3 year $/hr commitFlexible — covers EC2 + Lambda

Strategy: Run baseline on Savings Plans, burst with Spot, ad-hoc with On-Demand.

An Amazon Machine Image (AMI) is the template used to launch EC2 instances.

Contents:

  • Root volume snapshot (OS + pre-installed software)
  • Launch permissions (public, private, or shared with specific accounts)
  • Block device mapping (which EBS volumes to attach + sizes)

AMI types:

  • EBS-backed (default) — root on EBS, persists when stopped
  • Instance store-backed — ephemeral root, lost on stop/terminate

Create custom AMI:

aws ec2 create-image \
  --instance-id i-0abc123 \
  --name "WebApp-$(date +%Y%m%d)" \
  --no-reboot
ActionEBS RootInstance StorePublic IPBilling
StopPreservedLostReleasedNo compute charge (EBS billed)
TerminateDeleted (unless DeleteOnTermination=false)LostReleasedAll charges stop
HibernatePreserved + RAM snapshotN/AReleasedNo compute charge

Key points:

  • Stopped instances can be restarted — they may land on a different physical host
  • Use --disable-api-termination to protect critical instances
  • Hibernate preserves in-memory state (RAM to EBS) for fast resume

A Security Group is a virtual stateful firewall that controls inbound and outbound traffic for EC2 instances.

Key characteristics:

  • Stateful — if you allow inbound traffic, the response is automatically allowed
  • Rules are allow-only — you cannot create deny rules
  • Applied at the instance/ENI level
  • Multiple security groups can be attached to one instance

Example — allow SSH and HTTP:

aws ec2 authorize-security-group-ingress \
  --group-id sg-0abc123 \
  --protocol tcp --port 22 --cidr 203.0.113.0/32

aws ec2 authorize-security-group-ingress \
  --group-id sg-0abc123 \
  --protocol tcp --port 80 --cidr 0.0.0.0/0

An Elastic IP (EIP) is a static, public IPv4 address you can associate with any EC2 instance or NAT Gateway.

When to use:

  • You need a fixed public IP that survives instance stop/start
  • Failover — quickly remap to a standby instance
  • Whitelisting by partners or firewalls that require a known IP

Important billing note:

AWS charges for EIPs that are allocated but not associated with a running instance (~$0.005/hr).

# Allocate
aws ec2 allocate-address --domain vpc

# Associate
aws ec2 associate-address \
  --instance-id i-0abc123 \
  --allocation-id eipalloc-0abc123

Best practice: Use EIP only when required. For most web apps, use a Load Balancer DNS name instead.

EC2 instances are grouped into families optimised for specific workloads:

FamilyOptimised ForExamples
General PurposeBalanced CPU/memory/networkt3, t4g, m6i, m7g
Compute OptimisedHigh-performance processorsc6i, c7g, c7n
Memory OptimisedLarge in-memory datasetsr6i, r7g, x2idn
Storage OptimisedHigh sequential I/O, NVMei3, i4i, d3
Accelerated ComputingGPU/FPGA/ML inferencep4, g5, trn1, inf2
HPC OptimisedTightly coupled HPChpc6a, hpc7g

Naming convention: m7g.2xlarge

  • m = family (general purpose)
  • 7 = generation
  • g = processor (Graviton)
  • 2xlarge = size

User Data is a script that runs once automatically at instance first launch via cloud-init.

Use cases:

  • Install packages and dependencies
  • Pull application code from S3 or Git
  • Configure system settings
  • Register with a load balancer or service mesh

Example — Bootstrap a web server:

#!/bin/bash
yum update -y
yum install -y httpd
echo "<h1>Hello from $(hostname -f)</h1>" > /var/www/html/index.html
systemctl enable --now httpd

View logs:

cat /var/log/cloud-init-output.log

Note: User Data runs as root. It is limited to 16 KB. For larger scripts, store in S3 and download via User Data.

Instance Metadata is data about your running instance accessible from within the instance at a link-local IP address.

Available data includes: instance ID, AMI ID, IAM role credentials, hostname, public IP, AZ, and more.

Access via IMDSv2 (recommended):

# Step 1: Get session token
TOKEN=$(curl -sX PUT "http://169.254.169.254/latest/api/token" \
  -H "X-aws-ec2-metadata-token-ttl-seconds: 21600")

# Step 2: Query metadata
curl -sH "X-aws-ec2-metadata-token: $TOKEN" \
  http://169.254.169.254/latest/meta-data/instance-id

curl -sH "X-aws-ec2-metadata-token: $TOKEN" \
  http://169.254.169.254/latest/meta-data/iam/security-credentials/

IMDSv2 vs IMDSv1:

IMDSv1IMDSv2
AuthNoneSession token required
SSRF protectionNoYes
AWS recommendationDisableUse always

A Key Pair consists of a public key (held by AWS) and a private key (held by you), used for passwordless SSH authentication.

How it works:

  1. Create a key pair — AWS stores the public key
  2. Specify it when launching an instance
  3. AWS places the public key in ~/.ssh/authorized_keys on the instance
  4. You authenticate using the .pem private key

SSH access:

chmod 400 my-key.pem
ssh -i my-key.pem ec2-user@<public-ip>   # Amazon Linux
ssh -i my-key.pem ubuntu@<public-ip>     # Ubuntu
ssh -i my-key.pem centos@<public-ip>     # CentOS

If you lose your private key, AWS cannot recover it. Use EC2 Instance Connect or SSM Session Manager as alternatives.

IP TypeAssigned ByPersists After Stop?Publicly Routable?
Public IPAWS (auto)No — changes on restartYes
Private IPVPC CIDRYesNo (RFC 1918)
Elastic IPAWS (manual)YesYes

Private IP — always present, used for internal VPC communication. Public IP — auto-assigned in public subnets, ephemeral. Elastic IP — static public IP, manually attached; incurs cost when unassociated.

EC2 Hibernate suspends an instance and saves its RAM contents to the encrypted EBS root volume.

Stop vs Hibernate:

StopHibernate
RAM preservedNoYes
Resume timeFull bootSeconds
EBS requiredYesYes (encrypted)

Use cases:

  • Long-running in-memory computations you want to pause
  • Applications with slow startup or warm-up time
  • Dev environments — resume exactly where you left off

Requirements:

  • Root EBS volume must be encrypted
  • RAM size must be ≤ 150 GB
  • Supported instance families: C3, C4, C5, M3, M4, M5, R3, R4, R5, T2, T3
  • Not supported on bare metal instances

Auto Scaling Group (ASG) automatically adjusts instance count based on demand.

Key components:

  • Launch Template — defines what to launch (AMI, type, SG, user data)
  • ASG — min/max/desired capacity, health check config
  • Scaling Policy — when and how much to scale

Policy Types:

1. Target Tracking (recommended for most workloads)

Goal: Keep average CPU utilization at 60%
AWS adds/removes instances automatically to maintain the target.

2. Step Scaling (fine-grained control)

CPU 70–85%  → add 2 instances
CPU 85–100% → add 4 instances
CPU < 30% (10 min) → remove 1 instance

3. Scheduled Scaling

aws autoscaling put-scheduled-update-group-action \
  --auto-scaling-group-name prod-asg \
  --scheduled-action-name morning-scale-up \
  --recurrence "0 8 * * MON-FRI" \
  --desired-capacity 10

4. Predictive Scaling — ML-based, forecasts 24h ahead and scales proactively.

FeatureSecurity GroupNetwork ACL
Applied atInstance/ENI levelSubnet level
Stateful?Yes — return traffic auto-allowedNo — must allow both directions
Rule typesAllow onlyAllow and Deny
Rule evaluationAll rules evaluatedNumbered rules, lowest first
Default inboundDeny allAllow all (default VPC NACL)
Default outboundAllow allAllow all

When to use NACLs: Block specific IPs/CIDRs at subnet level (e.g., known bad actors). Use SGs for normal per-app access control.

Best practice: SGs as primary control, NACLs as supplementary subnet-level guardrails.

Placement groups control physical placement of instances on AWS infrastructure.

Cluster — all in same AZ, same rack

  • Ultra-low latency, 25 Gbps enhanced networking
  • Risk: single rack failure affects all instances
  • Use: HPC, tightly-coupled distributed systems

Spread — each instance on different hardware

  • Max 7 per AZ per group
  • Use: critical instances (ZooKeeper, master nodes)

Partition — logical partitions, each on separate rack

  • Up to 7 partitions per AZ, hundreds of instances
  • Use: Hadoop, Cassandra, HDFS
aws ec2 create-placement-group \
  --group-name hpc-cluster \
  --strategy cluster
FeatureEBS (Elastic Block Store)Instance Store
PersistenceSurvives stop/startEphemeral — lost on stop/terminate
PerformanceUp to 256,000 IOPS (io2)Very high NVMe SSD
SnapshotsS3 snapshots supportedManual copy needed
CostBilled separately per GBIncluded in instance price
Use caseOS, databases, persistent dataCache, buffers, temp scratch space
DetachableYesNo

When to use Instance Store: Applications that replicate data across nodes (Redis cluster, Kafka, Cassandra) where speed matters and data loss is tolerable.

Volume TypeCategoryMax IOPSMax ThroughputUse Case
gp3SSD16,0001,000 MB/sGeneral purpose, boot volumes (default)
gp2SSD16,000250 MB/sLegacy general purpose
io2 Block ExpressSSD256,0004,000 MB/sMission-critical databases (Oracle, SAP)
io1SSD64,0001,000 MB/sI/O-intensive databases
st1HDD500500 MB/sBig data, data warehouses, log processing
sc1HDD250250 MB/sCold data, infrequent access

gp3 vs gp2: gp3 is cheaper and allows independent IOPS/throughput configuration. Always prefer gp3 for new volumes.

An IAM Role attached to an EC2 instance grants the instance permissions to call AWS APIs without storing any credentials on disk.

Why better than access keys:

  • No static credentials to rotate or accidentally leak
  • Temporary credentials are automatically rotated every few hours
  • Credentials delivered via IMDS — not stored in files
  • Easy to audit permissions via CloudTrail

How it works:

EC2 Instance → assumes IAM Role → gets temp credentials via IMDS
Application → reads credentials from SDK default credential chain
SDK → calls AWS API using those credentials

Attach a role via CLI:

aws ec2 associate-iam-instance-profile \
  --instance-id i-0abc123 \
  --iam-instance-profile Name=MyEC2Role

Never store aws_access_key_id in .env or config files on an EC2 instance.

FeatureLaunch TemplateLaunch Configuration
VersioningMultiple versions supportedImmutable
Spot + On-Demand mixSupportedNot supported
EC2 Fleet supportYesNo
Parameter inheritanceCan inherit from base templateNo
AWS recommendationPreferredDeprecated

Create a Launch Template:

aws ec2 create-launch-template \
  --launch-template-name "prod-web" \
  --version-description "v1" \
  --launch-template-data '{
    "ImageId": "ami-0abc123",
    "InstanceType": "m6i.large",
    "KeyName": "my-key",
    "SecurityGroupIds": ["sg-0abc123"],
    "UserData": "BASE64_ENCODED_SCRIPT"
  }'

Spot Instances use spare AWS capacity at up to 90% discount. AWS can reclaim them with a 2-minute warning.

Interruption handling best practices:

  • Poll for termination notice every 5 seconds:
TOKEN=$(curl -sX PUT "http://169.254.169.254/latest/api/token" \
  -H "X-aws-ec2-metadata-token-ttl-seconds: 30")
curl -sH "X-aws-ec2-metadata-token: $TOKEN" \
  http://169.254.169.254/latest/meta-data/spot/termination-time
  • Use EventBridge + Lambda to drain ELB targets before interruption
  • Enable checkpointing to save progress to S3 or EFS
  • Use Spot Fleet with capacity-optimized strategy across multiple instance types and AZs
  • Mix Spot with On-Demand for baseline capacity

Best workloads for Spot: CI/CD workers, batch processing, ML training, rendering, big data.

Cooldown Period — after a scaling activity, ASG waits before triggering another scaling action. Prevents thrashing.

  • Default: 300 seconds
  • Apply shorter cooldowns for scale-out, longer for scale-in

Instance Warm-Up — time allowed for a new instance to start contributing metrics before being counted in scaling decisions.

  • Used with Target Tracking and Step Scaling
  • Prevents new (not-yet-ready) instances from triggering premature scale-out

Lifecycle Hooks — pause instance transitions to run custom actions:

Launching   → Pending:Wait → [run bootstrap] → Pending:Proceed   → InService
Terminating → Terminating:Wait → [drain]     → Terminating:Proceed → Terminated
aws autoscaling put-lifecycle-hook \
  --lifecycle-hook-name "install-agent" \
  --auto-scaling-group-name prod-asg \
  --lifecycle-transition autoscaling:EC2_INSTANCE_LAUNCHING \
  --heartbeat-timeout 300

ELB distributes incoming traffic across multiple EC2 instances, containers, or Lambda functions.

TypeLayerProtocolKey Feature
ALB7 (Application)HTTP/HTTPS/gRPCPath/host-based routing, WebSocket
NLB4 (Network)TCP/UDP/TLSUltra-low latency, static IP, millions of RPS
GWLB3 (Gateway)IPInline network appliances (firewall, IDS)
CLB4 and 7HTTP/TCPLegacy — avoid for new deployments

ALB routing rules example:

/api/*              → Target Group: API servers
/images/*           → Target Group: Static servers
Host: admin.example.com → Target Group: Admin servers

Best practice: Use ALB for web apps and microservices, NLB for TCP/UDP services needing extreme performance or fixed IPs.

The AWS Nitro System is the underlying hypervisor platform for all modern EC2 instances. It offloads virtualisation functions to dedicated Nitro hardware.

Components:

  • Nitro Cards — dedicated hardware for VPC networking and EBS I/O
  • Nitro Security Chip — hardware root of trust, cryptographic attestation
  • Nitro Hypervisor — lightweight, minimal attack surface

Advantages over Xen:

  • Near bare-metal performance (less than 2% virtualisation overhead)
  • Higher network bandwidth (up to 400 Gbps with EFA)
  • Hardware-enforced security isolation
  • Enables Bare Metal instances (i3.metal, m5.metal, etc.)
  • Foundation for Nitro Enclaves (isolated secure compute)

All instances from 5th generation onwards (m5, c5, r5, etc.) run on Nitro.

EBS Snapshots are incremental backups stored in S3 (managed by AWS, not visible in your S3 console).

Create a snapshot:

aws ec2 create-snapshot \
  --volume-id vol-0abc123 \
  --description "Daily backup $(date +%Y-%m-%d)"

Best practices:

  • Use Amazon Data Lifecycle Manager (DLM) to automate snapshot schedules and retention
  • Take snapshots during low I/O periods to reduce performance impact
  • Enable EBS Snapshot Archive to move cold snapshots to cheaper storage (75% cost saving)
  • Copy snapshots cross-region for disaster recovery

Restore from snapshot:

aws ec2 create-volume \
  --snapshot-id snap-0abc123 \
  --availability-zone us-east-1a \
  --encrypted

Enhanced Networking uses SR-IOV (Single Root I/O Virtualisation) to provide higher bandwidth, higher packet-per-second rates, and lower latency — at no extra cost.

TechnologyMax BandwidthUse Case
ENA (Elastic Network Adapter)100 GbpsAll modern instances (5th gen+)
EFA (Elastic Fabric Adapter)400 GbpsHPC, MPI, ML distributed training
Intel 82599 VF10 GbpsOlder instance families (c3, i2)

Check ENA support:

aws ec2 describe-instances \
  --query 'Reservations[].Instances[].EnaSupport'

# Enable on a stopped instance
aws ec2 modify-instance-attribute \
  --instance-id i-0abc123 \
  --ena-support

Use EFA for: Tightly-coupled MPI workloads and distributed deep learning (PyTorch DDP, Megatron-LM).

Two types of status checks:

System Status Check = AWS infrastructure problem (hardware/hypervisor)

# Fix: Stop then Start (moves to new physical host)
aws ec2 stop-instances --instance-ids i-xxx
aws ec2 start-instances --instance-ids i-xxx

Instance Status Check = OS/software problem

# Step 1: Get console output
aws ec2 get-console-output --instance-id i-xxx --output text

# Step 2: Get screenshot
aws ec2 get-console-screenshot --instance-id i-xxx

Common causes and fixes:

SymptomCauseFix
0/2 system checksHardware failureStop/Start instance
1/2 checks (instance)Kernel panic / OOMCheck console output
SSH timeoutSG missing port 22Add port 22 inbound rule
SSH auth failureWrong key or usernameUse ec2-user (AL2), ubuntu (Ubuntu)
Full disk/ or /var fullExtend EBS, clean up logs

Debug without SSH:

aws ssm start-session --target i-xxx

Core HA principles: Eliminate single points of failure, deploy across multiple AZs.

Reference architecture:

Route 53 (latency/failover routing)
        |
  CloudFront (CDN + WAF)
        |
Application Load Balancer (multi-AZ)
      /             \
AZ-1 (us-east-1a)   AZ-2 (us-east-1b)
Auto Scaling Group   Auto Scaling Group
  EC2 x 2             EC2 x 2
        \               /
   Multi-AZ RDS (synchronous replication)
   ElastiCache (Multi-AZ with replication)

Checklist:

  • Min 2 AZs for all compute and data tiers
  • ASG with health checks replacing unhealthy instances automatically
  • ALB health checks draining targets gracefully before replacement
  • Multi-AZ RDS with automated failover
  • S3 for static assets (11 nines durability)
  • CloudWatch alarms and SNS for incident notification
  • Route 53 health checks for DNS-level failover
FeatureEC2 FleetSpot Fleet
Purchase typesOn-Demand + Spot + ReservedPrimarily Spot
Target unit typesvCPUs, Memory, UnitsInstances, Spot units
APIcreate-fleetrequest-spot-fleet
AWS recommendationYes (preferred)Older API

EC2 Fleet config example:

{
  "TargetCapacitySpecification": {
    "TotalTargetCapacity": 100,
    "OnDemandTargetCapacity": 20,
    "SpotTargetCapacity": 80
  },
  "SpotOptions": {
    "AllocationStrategy": "capacity-optimized"
  },
  "LaunchTemplateConfigs": [{
    "LaunchTemplateSpecification": {
      "LaunchTemplateId": "lt-xxx",
      "Version": "1"
    },
    "Overrides": [
      { "InstanceType": "m5.xlarge", "WeightedCapacity": 4 },
      { "InstanceType": "m5a.xlarge", "WeightedCapacity": 4 },
      { "InstanceType": "m6i.xlarge", "WeightedCapacity": 4 }
    ]
  }]
}

Spot allocation strategies:

  • capacity-optimized — reduces interruption risk (recommended)
  • price-capacity-optimized — balance of price and availability
  • lowest-price — cheapest pool (higher interruption risk)

Multi-layer cost optimisation strategy:

1. Right-Sizing with Compute Optimizer:

aws compute-optimizer get-ec2-instance-recommendations \
  --filters name=Finding,values=Overprovisioned

2. Purchase option mix:

70% Compute Savings Plans (baseline, 3-year)
20% Spot Instances (flexible workloads)
10% On-Demand (true burst / unpredictable)

3. Graviton migration (up to 40% price/performance improvement):

aws ec2 modify-instance-attribute \
  --instance-id i-xxx \
  --instance-type "{\"Value\": \"m7g.large\"}"

4. Scheduling non-production instances:

  • Use AWS Instance Scheduler to stop dev/test instances nights and weekends
  • Typical saving: ~65% on non-prod compute

5. Storage cleanup:

# Find unattached EBS volumes
aws ec2 describe-volumes \
  --filters Name=status,Values=available \
  --query 'Volumes[*].{ID:VolumeId,Size:Size}'
FeatureCapacity ReservationReserved Instance
PurposeGuarantees capacity in an AZBilling discount
CommitmentNone — cancel anytime1 or 3 years
Billing when unusedCharged at On-Demand rateCharged regardless
Capacity guaranteeYesNo
DiscountNoneUp to 72%

Best practice — combine both:

Capacity Reservation → guarantees physical capacity exists in AZ
+
Reserved Instance / Savings Plan → applies billing discount
= Guaranteed capacity at discounted price
aws ec2 create-capacity-reservation \
  --instance-type m6i.large \
  --instance-platform Linux/UNIX \
  --availability-zone us-east-1a \
  --instance-count 10 \
  --instance-match-criteria targeted

Nitro Enclaves are isolated, hardened virtual machines within an EC2 instance for processing highly sensitive data.

Key properties:

  • No persistent storage, no external networking, no interactive access
  • Communicates with parent instance only via vsock
  • Cryptographically attested using AWS KMS
  • Memory and CPU fully isolated from parent instance and AWS operators

Use cases:

  • Processing PII/PHI (HIPAA, PCI-DSS workloads)
  • Secure cryptographic key operations
  • ML inference on confidential data (medical images, financial records)
  • Digital rights management (DRM)
# Enable Nitro Enclaves on instance
aws ec2 modify-instance-attribute \
  --instance-id i-xxx \
  --enclave-options 'Enabled=true'

# Build and run enclave image
nitro-cli build-enclave \
  --docker-uri myapp:latest \
  --output-file myapp.eif

nitro-cli run-enclave \
  --eif-path myapp.eif \
  --memory 2048 \
  --cpu-count 2 \
  --enclave-cid 16

ASG supports two types of health checks:

Health Check TypeWhat it checksWhen to use
EC2 (default)Instance reachability onlyBasic setups
ELBApplication-level health (HTTP 200)Production — more accurate

Enable ELB health checks on ASG:

aws autoscaling update-auto-scaling-group \
  --auto-scaling-group-name prod-asg \
  --health-check-type ELB \
  --health-check-grace-period 300

Flow when an instance fails ELB health check:

1. ALB marks target unhealthy
2. ALB stops sending new requests to that instance
3. In-flight requests complete (connection draining)
4. ASG detects unhealthy instance and terminates it
5. ASG launches replacement instance in same or different AZ
6. New instance registers with ALB, passes health check, receives traffic

Grace period = time ASG waits before starting health checks on a new instance. Set slightly longer than your application startup time.

EC2 Image Builder automates the creation, testing, patching, and distribution of AMIs on a schedule.

Pipeline stages:

Base Image (source AMI or OS)
      |
Build Components (AWSTOE documents)
  - Install packages
  - Harden OS (CIS benchmarks)
  - Install agents (SSM, CloudWatch)
      |
Test Components
  - Boot test
  - Custom validation scripts
      |
Distribution
  - Copy AMI to multiple regions
  - Share with member accounts
  - Update Launch Templates

Benefits:

  • Automated patch management — always start from a patched base
  • Centralised golden AMI governance
  • Integrated with AWS Organizations for multi-account distribution
  • Built-in CIS hardening components available
aws imagebuilder start-image-pipeline-execution \
  --image-pipeline-arn arn:aws:imagebuilder:us-east-1:123:image-pipeline/my-pipeline

Three secure access patterns — no inbound port 22 required:

1. AWS Systems Manager Session Manager (recommended):

# No SG inbound rules needed, no key pair, full audit trail in CloudTrail
aws ssm start-session --target i-0abc123

# Port forwarding via SSM
aws ssm start-session \
  --target i-0abc123 \
  --document-name AWS-StartPortForwardingSession \
  --parameters '{"portNumber":["3306"],"localPortNumber":["3306"]}'

2. EC2 Instance Connect:

aws ec2-instance-connect send-ssh-public-key \
  --instance-id i-0abc123 \
  --instance-os-user ec2-user \
  --ssh-public-key file://temp-key.pub

ssh -i temp-key.pem ec2-user@<ip>

The temporary key is valid for 60 seconds only.

3. Bastion Host (jump box) in public subnet:

ssh -J ec2-user@<bastion-ip> ec2-user@<private-ip>

Security comparison:

MethodKey PairInbound RuleAudit Trail
SSM Session ManagerNot neededNot neededCloudTrail
EC2 Instance ConnectTemporaryPort 22 requiredCloudTrail
Bastion HostRequiredPort 22 requiredPartial
AspectVertical Scaling (Scale Up)Horizontal Scaling (Scale Out)
MethodIncrease instance sizeAdd more instances
DowntimeRequired (stop/start)None with ASG
LimitHardware ceiling (largest instance)Virtually unlimited
High AvailabilitySingle point of failureMulti-AZ resilient
App changes neededNoneApp must be stateless/distributed

Vertical scale (resize instance):

aws ec2 stop-instances --instance-ids i-xxx
aws ec2 modify-instance-attribute \
  --instance-id i-xxx \
  --instance-type "{\"Value\": \"m6i.4xlarge\"}"
aws ec2 start-instances --instance-ids i-xxx

AWS recommendation: Prefer horizontal scaling for resilience. Use vertical scaling only for stateful workloads (e.g., single-node databases) that cannot be distributed.

T-series instances (t3, t3a, t4g) earn CPU credits when idle and spend them during bursts.

Credit mechanics:

  • Each vCPU earns credits at a fixed rate when CPU usage is below baseline
  • 1 CPU credit = 1 vCPU at 100% for 1 minute
  • Credits accumulate up to a maximum balance (varies by instance size)
InstanceBaseline CPUCredits Earned/hrMax Credits
t3.micro10%6144
t3.medium20%24576
t3.large30%36864

Standard vs Unlimited mode:

  • Standard — instance throttled when credits exhausted
  • Unlimited — bursts beyond credit balance at extra cost ($0.05/vCPU-hr)
aws autoscaling update-auto-scaling-group \
  --auto-scaling-group-name prod-asg \
  --health-check-type ELB

Monitor credit balance:

CloudWatch metric: CPUCreditBalance
Alert when < 20 credits to prevent throttling

An EC2 instance transitions through several states:

pending → running → stopping → stopped
                  → shutting-down → terminated
                  → hibernating → stopped (hibernated)
StateDescriptionBilled?
pendingInstance booting, resources provisioningNo
runningNormal operationYes
stoppingGraceful shutdown in progressNo
stoppedPowered off, EBS retainedEBS only
shutting-downTermination in progressNo
terminatedDeletedNo

Key rule: You are billed only in the running state (per second, minimum 60 seconds).

Protect from accidental termination:

aws ec2 modify-instance-attribute \
  --instance-id i-xxx \
  --disable-api-termination

You cannot directly encrypt an existing unencrypted EBS volume. The process requires a snapshot copy:

Step-by-step:

# 1. Create snapshot of unencrypted volume
aws ec2 create-snapshot \
  --volume-id vol-unencrypted \
  --description "Pre-encryption snapshot"

# 2. Copy snapshot with encryption enabled
aws ec2 copy-snapshot \
  --source-region us-east-1 \
  --source-snapshot-id snap-xxx \
  --encrypted \
  --kms-key-id alias/aws/ebs \
  --description "Encrypted copy"

# 3. Create new volume from encrypted snapshot
aws ec2 create-volume \
  --snapshot-id snap-encrypted \
  --availability-zone us-east-1a \
  --encrypted

# 4. Stop instance, detach old volume, attach new encrypted volume

Prevent future unencrypted volumes account-wide:

aws ec2 enable-ebs-encryption-by-default --region us-east-1

EC2 Serial Console provides access to the serial port of an instance — useful when the OS is unresponsive, SSH is broken, or the network is misconfigured.

Enable at account level:

aws ec2 enable-serial-console-access

Connect:

aws ec2-instance-connect send-serial-console-ssh-public-key \
  --instance-id i-xxx \
  --serial-port 0 \
  --ssh-public-key file://my-key.pub

ssh -p 9001 i-xxx.port0@serial-console.ec2-instance-connect.us-east-1.aws

Use cases:

  • Fix misconfigured /etc/ssh/sshd_config
  • Debug kernel panic or boot failures
  • Fix iptables rules that locked out SSH
  • Access GRUB menu to boot into single-user/rescue mode
  • Fix misconfigured network interface configuration

Requirements: Instance must be Nitro-based and have a user password set (or use EC2 Instance Connect key).

Predictive Scaling uses ML to forecast future traffic and proactively scales your ASG before demand hits.

How it works:

  1. Analyses CloudWatch metrics history (minimum 14 days required)
  2. Forecasts load 48 hours ahead
  3. Schedules scaling actions in advance
  4. Can be combined with dynamic scaling policies

Modes:

ModeBehaviour
Forecast onlyShows predictions, takes no action
Forecast and scaleAutomatically schedules scaling actions
aws autoscaling put-scaling-policy \
  --auto-scaling-group-name prod-asg \
  --policy-name predictive-cpu \
  --policy-type PredictiveScaling \
  --predictive-scaling-configuration '{
    "MetricSpecifications": [{
      "TargetValue": 60,
      "PredefinedMetricPairSpecification": {
        "PredefinedMetricType": "ASGCPUUtilization"
      }
    }],
    "Mode": "ForecastAndScale",
    "SchedulingBufferTime": 300
  }'

SchedulingBufferTime — how many seconds before predicted peak to start scaling (allow for instance boot and app warm-up time).

These are three different EC2 networking concepts — often confused in interviews:

ENIENAEFA
Full nameElastic Network InterfaceElastic Network AdapterElastic Fabric Adapter
What it isVirtual NIC (software construct)SR-IOV driver for enhanced networkingHigh-performance RDMA-capable NIC
Max bandwidthDepends on instanceUp to 100 GbpsUp to 400 Gbps
LatencyStandardLower than legacy VFUnder 15 µs (MPI-level)
Use caseAll EC2 networkingModern instances (default on 5th gen+)HPC, distributed ML training

ENI — logical network interface attached to instances. Every instance has at least one. Can be detached and moved (useful for failover with fixed private IPs).

ENA — the SR-IOV driver that makes networking fast on Nitro instances. Enabled by default on 5th gen+ instances.

EFA — optional adapter for HPC workloads using MPI/NCCL. Bypasses the OS kernel for inter-node communication using the SRD protocol.

Instance Refresh replaces instances in an ASG in a rolling fashion — useful for updating AMI, instance type, or User Data without manual intervention.

Trigger a refresh:

aws autoscaling start-instance-refresh \
  --auto-scaling-group-name prod-asg \
  --desired-configuration '{
    "LaunchTemplate": {
      "LaunchTemplateId": "lt-xxx",
      "Version": "$Latest"
    }
  }' \
  --preferences '{
    "MinHealthyPercentage": 80,
    "InstanceWarmup": 300,
    "SkipMatching": true
  }'

Key preferences:

PreferenceDescription
MinHealthyPercentageMinimum % healthy during refresh (controls blast radius)
InstanceWarmupSeconds before counting new instance as healthy
SkipMatchingSkip instances already on the correct template version
CheckpointPercentagesPause at these % for manual validation

Monitor progress:

aws autoscaling describe-instance-refreshes \
  --auto-scaling-group-name prod-asg

Default metrics (no agent needed) — 5-minute intervals:

MetricDescription
CPUUtilizationPercentage of CPU used
NetworkIn / NetworkOutBytes transferred in and out
DiskReadOps / WriteOpsInstance store disk I/O
StatusCheckFailed0 or 1

Requires CloudWatch Agent (not available by default):

MetricWhy it needs the agent
mem_used_percentOS-level memory — hypervisor cannot see this
disk_used_percentPer-filesystem disk usage
swap_usedSwap utilisation
Custom app logsApplication-specific metrics and log ingestion

Install agent via SSM:

aws ssm send-command \
  --document-name "AWS-ConfigureAWSPackage" \
  --targets "Key=instanceids,Values=i-xxx" \
  --parameters '{"action":["Install"],"name":["AmazonCloudWatchAgent"]}'
FeatureDedicated InstanceDedicated Host
IsolationDedicated hardware, AWS manages placementFull physical server dedicated to you
VisibilityNo host detailsFull visibility (host ID, sockets, cores)
BYOL supportNot supportedSupported (Windows, SQL Server, Oracle)
Placement controlNoYes — target a specific host
BillingPer instancePer host per hour
ComplianceBasic isolationStrict regulatory requirements

Use Dedicated Host when:

  • Bring Your Own License (BYOL) is required
  • Compliance mandates physical server isolation
  • You need to demonstrate hardware tenancy to auditors
aws ec2 allocate-hosts \
  --instance-type m6i.large \
  --quantity 1 \
  --availability-zone us-east-1a \
  --auto-placement on

EC2 does not support hot resizing, but you can achieve near-zero downtime using these strategies:

Strategy 1 — Instance Refresh on ASG (recommended):

# Update Launch Template with new instance type, then:
aws autoscaling start-instance-refresh \
  --auto-scaling-group-name prod-asg \
  --preferences '{
    "MinHealthyPercentage": 90,
    "InstanceWarmup": 300,
    "CheckpointPercentages": [25, 50, 75, 100],
    "CheckpointDelay": 120
  }'

ASG replaces instances in rolling batches, ALB handles traffic shifting automatically.

Strategy 2 — Blue/Green with separate ASG:

1. Launch new ASG with the new instance type
2. Register new ASG with same Target Group (weighted)
3. Gradually shift ALB weight: 90/10 → 50/50 → 0/100
4. Decommission old ASG once traffic is fully shifted

Strategy 3 — Standby for single instance:

1. Launch new instance with correct type
2. Register with Target Group, wait for health check
3. Deregister old instance (connection draining runs)
4. Terminate old instance

AWS Graviton processors are ARM64-based chips designed by AWS, offering improved price-performance over x86.

DimensionGraviton (ARM64)Intel / AMD (x86_64)
ArchitectureARM64x86_64
Price/performanceUp to 40% betterBaseline
Generation examplesm7g, c7g, r7g, t4gm7i, c7a, r7i
OS supportAmazon Linux 2/2023, Ubuntu 20.04+, RHEL 8+, WindowsUniversal
Energy efficiencyUp to 60% less powerBaseline

Migration steps:

# 1. Build Docker images for ARM64
docker buildx build --platform linux/arm64 -t myapp:arm64 .

# 2. Test on t4g.micro (Graviton2, free tier eligible)

# 3. Switch Launch Template to Graviton instance type
aws ec2 modify-launch-template \
  --launch-template-id lt-xxx \
  --launch-template-data '{"InstanceType":"m7g.large"}'

Best suited for Graviton: Web services, microservices, containerised apps, Java workloads, open-source databases (MySQL, PostgreSQL, Redis).

Tenancy controls whether your instance runs on shared or dedicated hardware.

TenancyHardwareCostUse Case
defaultShared with other AWS customersStandard pricingMost workloads
dedicatedDedicated hardware, AWS manages placement~10% premiumCompliance, isolation requirements
hostSpecific physical host you controlHigherBYOL, strict regulatory audits

Set at instance level:

aws ec2 run-instances \
  --instance-type m6i.large \
  --placement '{"Tenancy":"dedicated"}' \
  --image-id ami-xxx

Set at VPC level (forces all instances in VPC to use dedicated tenancy):

aws ec2 modify-vpc-tenancy \
  --vpc-id vpc-xxx \
  --instance-tenancy dedicated

You can change from dedicated to default tenancy (stop required), but not from default to dedicated on a running instance.

These three terms are technically distinct but often used interchangeably:

IAM Role
  |-- Trust Policy (allows ec2.amazonaws.com to assume this role)
  |-- Permission Policies (what actions the role can perform)
        |
Instance Profile (container that holds the role for EC2)
        |
EC2 Instance (applications read credentials from IMDS)
  • IAM Role — the set of permissions (policies attached)
  • Instance Profile — the wrapper that allows a role to be assigned to an EC2 instance
  • In the AWS Console, creating an EC2 role auto-creates an instance profile with the same name
  • Via CLI you must create them separately:
aws iam create-role \
  --role-name EC2S3ReadRole \
  --assume-role-policy-document file://ec2-trust.json

aws iam create-instance-profile \
  --instance-profile-name EC2S3ReadProfile

aws iam add-role-to-instance-profile \
  --instance-profile-name EC2S3ReadProfile \
  --role-name EC2S3ReadRole

aws ec2 associate-iam-instance-profile \
  --instance-id i-xxx \
  --iam-instance-profile Name=EC2S3ReadProfile

A Spot capacity pool is a set of unused EC2 instances of the same instance type in the same AZ. Each pool has its own price and interruption rate.

Key insight: Interruption frequency is driven by demand for a specific pool, not just price.

Choosing pools wisely:

1. Use Spot Instance Advisor:

  • Shows historical interruption rates (less than 5%, 5-10%, etc.) per pool
  • Check before choosing instance types

2. Diversify across pools:

"Overrides": [
  { "InstanceType": "m5.xlarge",  "SubnetId": "subnet-az1" },
  { "InstanceType": "m5a.xlarge", "SubnetId": "subnet-az1" },
  { "InstanceType": "m4.xlarge",  "SubnetId": "subnet-az1" },
  { "InstanceType": "m5.xlarge",  "SubnetId": "subnet-az2" },
  { "InstanceType": "m5a.xlarge", "SubnetId": "subnet-az2" }
]

3. Use capacity-optimized allocation strategy:

  • AWS selects the pool with the most available capacity
  • Lowest interruption probability (not lowest price)

4. Avoid pools with less than 5 available instances — high interruption risk.

Many EC2 instances support network bandwidth bursting — they can temporarily exceed their baseline network bandwidth for short periods.

How it works:

  • Instances earn and spend network I/O credits similar to CPU credits on T-series
  • Bursting applies when the instance has been idle and accumulated credits
  • Sustained traffic is capped at the baseline bandwidth

Example (m5.large):

MetricValue
Baseline bandwidthUp to 10 Gbps (baseline)
Burst bandwidth10 Gbps
Burst availabilityBased on credit accumulation

Check current bandwidth limits:

aws ec2 describe-instance-types \
  --instance-types m5.large \
  --query 'InstanceTypes[].NetworkInfo'

Best practices:

  • For sustained high-throughput workloads, choose instances with dedicated (non-burstable) bandwidth (m5.4xlarge and above)
  • Use Placement Groups (Cluster) to maximise intra-cluster bandwidth
  • Monitor NetworkIn and NetworkOut in CloudWatch to detect sustained saturation
  • Enable Jumbo Frames (9001 MTU) within a VPC to reduce overhead:
ip link set eth0 mtu 9001

Add More Questions to This Guide

Know questions that should be here? Share them and help the community!

Open Google Form