Episode 97: Domain 3 Audio Quiz: Scenario Walkthroughs
In this episode, we’ll practice scenario walkthroughs—a method that mirrors both real-world troubleshooting and the way exam questions are structured. The approach is simple: start with the symptoms, identify the risks they represent, and apply the minimal viable AWS fix that resolves the issue without overbuilding. The exam rewards candidates who can recognize the right balance of simplicity, security, and resilience. By working through these examples step by step, you’ll train yourself to respond calmly and methodically under time pressure.
Walkthrough one: a bucket of public S3 content flagged for security risk. The symptom is exposure to the internet. The risk is data leakage and noncompliance. The minimal viable fix is to enforce Block Public Access (BPA) at the bucket and account level, then serve content safely through CloudFront with Origin Access Control. This both protects the bucket and allows only CloudFront to fetch objects, solving the exposure without breaking delivery.
Walkthrough two: an Application Load Balancer shows high 5xx errors. The symptom is failed requests. Risks include outages and user frustration. Minimal fixes include checking target health checks, increasing idle timeouts, and ensuring Auto Scaling is in place so backend targets can scale with load. The correct exam answer will always be to check target group health and scaling first, not to replace the ALB.
Walkthrough three: spiky workloads overwhelm compute resources. The symptom is inconsistent performance. Risk is unpredictable availability. The AWS-native fix is Auto Scaling Groups with target tracking policies, possibly enhanced with warm pools to reduce cold-start lag. This ensures elasticity without manual intervention.
Walkthrough four: users complain of high latency across geographies. The symptom is poor performance for global users. Risk is churn and cost inefficiency. Minimal fix is CloudFront with tuned price classes—limiting which edge regions are used while still accelerating global delivery. This reduces latency and optimizes cost at the same time.
Walkthrough five: a relational database is saturated with read traffic. The symptom is slow queries and timeouts. Risk is application downtime. The minimal viable fix is to add RDS or Aurora read replicas and, where possible, offload repeated queries into an ElastiCache layer. Together, these distribute load and reduce pressure on the primary database without full re-architecture.
Walkthrough six: an EC2 tenant experiences “noisy neighbor” contention. The symptom is performance degradation. The risk is SLA violations. Minimal fix is to right-size instances or move the workload behind a Network Load Balancer that isolates connections at scale. In some cases, moving to a different instance family resolves resource contention cleanly.
Walkthrough seven: containerized applications need secure credential management. The symptom is secrets stored insecurely in environment variables. Risk is credential leakage. Minimal viable fix is to use IAM task roles for ECS/EKS and store sensitive information in Secrets Manager. This eliminates static secrets in code and enforces least privilege dynamically.
Walkthrough eight: APIs are overwhelmed by traffic spikes. The symptom is backend overload. Risk is downtime and cost overrun. Minimal fix is API Gateway throttling policies combined with Dead Letter Queues to capture unprocessed events safely. This shields the backend while retaining data.
Walkthrough nine: cross-account teams need access to resources. The symptom is inconsistent permissions. Risk is over-permissive roles or shadow accounts. Minimal fix is IAM AssumeRole across accounts, governed by Service Control Policies at the organizational level to enforce guardrails. This provides flexibility with centralized governance.
Walkthrough ten: logs are missing from some accounts. The symptom is blind spots in auditing. Risk is compliance failure. Minimal fix is to enable an organization-wide CloudTrail, centralizing logs into an S3 bucket with a defined retention policy. This provides continuous audit coverage without gaps.
Walkthrough eleven: drift between infrastructure configurations is detected. The symptom is inconsistent environments. Risk is noncompliance and hard-to-debug errors. Minimal fix is AWS Config rules with automatic remediation actions to reset misconfigured resources. This makes drift detection proactive instead of reactive.
Walkthrough twelve: hybrid workloads struggle with DNS resolution. The symptom is name resolution failures across cloud and on-premises. Risk is connectivity gaps. Minimal viable fix is Route 53 Resolver endpoints—inbound for forwarding queries from on-premises, outbound for resolving private domains. This provides consistent hybrid DNS paths.
Walkthrough thirteen: a real-time analytics team needs scalable ingestion. The symptom is bottlenecks in logging pipelines. Risk is lost events. Minimal viable fix is Kinesis Data Streams or Firehose with S3 partitioning. Partitioning improves downstream query performance, while Firehose handles buffering and compression. This solution provides scale and durability for continuous ingestion.
For more cyber related content and books, please check out cyber author dot me. Also, there are other prepcasts on Cybersecurity and more at Bare Metal Cyber dot com.
Let’s continue with scenario walkthroughs, this time focusing on playbooks—repeatable fixes that you can apply under pressure. Each example follows the same method: symptoms, risks, and the minimal viable AWS control that restores safety and reliability.
Playbook one: an application is under attack from suspicious traffic patterns. The symptom is malicious requests hitting a web application. The risk is data breach or downtime. The minimal viable fix is to add a new AWS WAF rule, test it in count mode to confirm it doesn’t block legitimate users, then roll it out. The exam often expects “add a WAF rule and test” rather than building a custom firewall appliance.
Playbook two: IAM permissions are overly broad, using wildcards in policies. The symptom is uncontrolled access. The risk is privilege escalation. The fix is to scope permissions down to specific ARNs or actions, replacing wildcards with least-privilege roles. The exam cue is “principle of least privilege,” which always points toward refining IAM policies, not broadening them.
Playbook three: a GuardDuty finding signals suspicious API calls from an unusual location. The symptom is a potential compromise. The risk is data loss or account misuse. The fix is to route GuardDuty findings into EventBridge, automatically creating security tickets or invoking a Lambda for remediation. This playbook turns alerts into actions, which the exam often highlights under “automated security response.”
Playbook four: Inspector identifies CVEs on EC2 instances. The symptom is known vulnerabilities. The risk is exploitation. The minimal fix is to use Systems Manager (SSM) runbooks to patch instances, then verify with Inspector scans. The exam cue is “vulnerability management,” which maps directly to Inspector plus SSM automation.
Playbook five: an EBS volume hits throughput bottlenecks. The symptom is sluggish disk performance. Risk is application slowdown. The minimal fix is to upgrade to gp3 with provisioned throughput or io2 for mission-critical workloads. Exam keywords like “high I/O needs” or “throughput bottleneck” point directly to tuning EBS type.
Playbook six: DynamoDB partitions show uneven load. The symptom is hot partitions. Risk is throttling and failed requests. The fix is to redesign partition keys to distribute traffic more evenly, and in some cases, add a Global Secondary Index to balance queries. The exam signals this with “hot key” or “partition imbalance.”
Playbook seven: a critical workload must continue if a Region fails. The symptom is single-Region exposure. Risk is downtime. The minimal viable fix is a Multi-Region failover pattern using Route 53 health checks to redirect traffic if the primary Region fails. If the exam mentions “active/active,” think global tables or Global Accelerator; if it says “failover,” think Route 53 health checks.
Playbook eight: a service currently reaches AWS APIs over the internet. The symptom is unnecessary exposure. Risk is attack surface. The minimal fix is to add VPC endpoints or PrivateLink for private connectivity. The exam keywords “keep traffic private” or “no internet access” point to this answer.
Playbook nine: a media company wants to protect premium video content. The symptom is unauthorized downloads. The risk is revenue loss. The minimal fix is CloudFront with signed URLs or cookies, enforcing time-limited, user-specific access. On the exam, “temporary, secure distribution” always maps to signed URLs or cookies.
Playbook ten: a Step Functions workflow fails unpredictably. The symptom is incomplete executions. Risk is lost business processes. The fix is to implement retries with exponential backoff and catch handlers to redirect failures. The exam keywords “workflow reliability” or “handle errors gracefully” point to Step Functions retries and catches.
Playbook eleven: a company plans a database migration with minimal downtime. The symptom is risk of long cutovers. Risk is extended outages. The fix is to rehearse with Database Migration Service (DMS) using Change Data Capture to sync ongoing updates, then perform a controlled cutover. The exam cue “migrate with minimal downtime” nearly always maps to DMS.
Playbook twelve: teams lack consistent visibility into system health. The symptom is fragmented monitoring. Risk is blind spots. The minimal viable fix is to establish an observability baseline: CloudWatch dashboards, key alarms, and logs collected centrally. For exam purposes, “monitor metrics and set alarms” always maps to CloudWatch, not ad hoc scripts.
Finally, remember the exam lens: always justify your choice with the least operational overhead and the highest safety. Managed, secure, cost-aware solutions are the ones AWS expects you to pick. Over-engineered answers are distractors, and insecure ones are traps. The right choice is the simplest fix that resolves risk and aligns with AWS best practices.
In conclusion, repeatable walkthroughs turn complex scenarios into fast, safe decisions. By practicing symptoms → risks → minimal fix, you’ll train yourself to move confidently through exam questions and real-world designs.
