Phantom Code AIPhantom Code AI
FeaturesEarn with UsMy WorkspacePricing
FeaturesEarn with UsMy WorkspacePricing
Interview questions/Amazon
Company guide — Amazon

Amazon Interview Questions — Leadership Principles, Bar Raiser, and What L5/L6/L7 Loops Actually Test

The honest map of how Amazon interviews — sixteen Leadership Principles assessed in every round, an independent Bar Raiser with override authority, and a behavioral signal that matters as much as your coding screen. Real questions, STAR-format answer guidance, and the follow-ups Bar Raisers actually ask.

Amazon's hiring loop is the most documented and the most misunderstood process in big tech. The core thing to internalize is this: every interviewer in the loop is rating you on Leadership Principles, regardless of whether the round is labeled "coding" or "system design" or "behavioral." A coding round at Amazon is partly a behavioral round, because the interviewer is also rating Customer Obsession (did you ask about the user?), Dive Deep (did you go past the surface?), and Deliver Results (did you actually solve the problem in the time available?). The 16 Leadership Principles are not a poster on the wall — they are the literal rubric on the feedback form, with a numeric score per principle per round.

Layered on top is the Bar Raiser. The Bar Raiser is an interviewer from outside the hiring team, trained to evaluate independently and given veto power on the hire decision. Even if the hiring manager loves you and four interviewers vote to hire, a single No from the Bar Raiser can sink the loop. This is by design: the Bar Raiser exists to ensure each new hire raises the average bar of the team they are joining. In practice, this means one round of your loop will go deeper, harder, and more behaviorally probing than any other — and you will not always know which round it was. Plan accordingly.

How Amazon interviews are structured

The standard SDE loop is five rounds, sometimes six for SDE III and above. One online assessment first for early-career candidates, then a phone screen with the hiring manager or a senior engineer, then the onsite (now usually a virtual onsite) of four to six back-to-back interviews. Expect roughly: two coding rounds, one or two system design rounds (depending on level), one bar-raiser-led behavioral round, and one round with the hiring manager focused on team fit and depth.

Behavioral content shows up in every round. A typical structure inside one 60-minute round is 10 to 20 minutes of LP-style behavioral questions up front, then the technical content, then 5 minutes of candidate questions at the end. The behavioral portion is not a warmup — it is scored. Many candidates over-prepare the technical and under-prepare the LPs and lose the loop in the behavioral signal even when they crushed the algorithms.

Amazon expects STAR format for behavioral answers: Situation, Task, Action, Result. Interviewers will literally interrupt and ask "what was the situation specifically?" if you start at the action. Build a bank of 10 to 15 stories from the past 18 to 24 months that map to the 16 LPs, with at least two stories per principle so you do not repeat the same project across rounds. Senior candidates (SDE III, L7) need stories with broader scope — cross-org impact, multi-quarter timelines, leadership through influence rather than authority.

Level expectations: SDE I (L4) is scoped to single-feature work and a strong coding bar. SDE II (L5) is the workhorse level — system design becomes serious, ownership of services is expected, and the LPs start being scored more critically. SDE III (L6) requires multi-team scope, ambiguity navigation, and measurable business impact in your stories. Principal (L7) is set apart by ability to define strategy, raise the org-wide bar, and make calls leadership defers to.

Leadership Principles questions

One question per major LP, with STAR-format answer guidance and the Bar Raiser-style follow-ups to expect. These ten LPs come up most frequently — bring two stories for each.

Customer Obsession

Tell me about a time you put the customer first when it was inconvenient for your team.

Use a story where you pushed back on an internal preference because the customer outcome was clearly worse. Situation: pick a launch with measurable customer signal — NPS, support tickets, churn. Task: name the trade-off explicitly (faster ship vs. customer harm). Action: describe the specific mechanism you used to surface customer data — interviews, telemetry, a dogfood week — and the conversation where you said no to your own team. Result: quantify the customer impact you protected (tickets avoided, retention preserved) and what your team learned. The trap here is generic empathy stories; Amazon wants a moment where being customer-obsessed cost you something. End with the metric that proved it was the right call. Bar Raisers will follow up with: "What did you do that someone less customer-obsessed would not have done?"

Ownership

Describe a time you took on something far outside your scope to fix a problem nobody owned.

Pick a cross-team failure that was falling through the cracks — an on-call incident, a stalled migration, a deprecated dependency about to break production. Situation: explain why it was nobody's job. Task: clarify why you decided to own it despite no mandate. Action: walk through how you built coalition — which teams you pulled in, what artifact you created (doc, runbook, dashboard), how you escalated when blocked. Action also includes the unglamorous work: writing the postmortem, owning the rollback plan, doing the migration cleanup at midnight. Result: long-term outcome — did the problem stay solved? Did you transfer ownership to a permanent owner? Amazon's Ownership LP specifically calls out long-term thinking, so include the second-order effects: what process changed, what runbook now exists. Avoid stories where you took ownership of something already assigned to you.

Invent and Simplify

Tell me about a time you invented something new — a tool, a process, a system — that simplified a hard problem.

The bar here is genuine novelty within your context, not novelty against the global state of the art. Situation: describe a recurring pain point with a real cost (engineering hours per week, customer latency, error rate). Task: state why existing solutions did not work — you tried them or evaluated them. Action: walk through your invention — what you built, why the design was non-obvious, what you removed rather than added. Simplification matters: if your solution added complexity, this is the wrong story. Result: adoption metrics. Did other teams use it? Did it become the standard? Amazon weighs adoption heavily because invention without adoption is just side-project energy. Bar Raisers often probe: "What was the simplest version you considered, and why did you not pick that?"

Are Right, A Lot

Describe a decision you made with incomplete information that turned out to be correct — and how you knew.

This LP rewards calibrated judgment, not luck. Pick a moment where you had to decide before the data was conclusive. Situation: name the pressure (timeline, blast radius, stakeholder split). Task: enumerate the options you considered, including the one you rejected. Action: describe the heuristics, mental models, or analogies you used to break the tie — past on-call incidents, the shape of similar systems, customer behavior patterns. Crucially, mention the disconfirming evidence you actively sought before committing. Result: how you validated correctness afterward — A/B test, observed metric, a decision review. Strong answers also include a counterfactual: what you would have done differently with one more day of data, and whether that gap mattered. Avoid binary outcomes where you guessed and got lucky.

Have Backbone; Disagree and Commit

Tell me about a time you disagreed with your manager or a senior leader and how you handled it.

This is probably the single most asked LP question. The trap is choosing a story where you were right and the leader was wrong — that reads as ego. Pick a real disagreement on substance, not style. Situation: the decision and why you saw it differently. Task: clarify what was at stake — customer harm, technical debt, team morale. Action: describe how you escalated respectfully — written doc, 1:1 conversation, explicit request for a decision meeting, data you pulled together. Then describe the commit: even if the leader did not change their mind, how did you back the decision once it was made? Result: outcome of the project, and your relationship with the leader afterward. Bar Raisers want to see backbone (you spoke up) and commit (you did not sandbag execution). The strongest endings: "I still think my read was right, but the team needed alignment more than it needed me to be right."

Deliver Results

Tell me about a time you had to deliver results under tight constraints — timeline, budget, or scope.

Amazon's Deliver Results LP is about outcomes, not effort. Situation: name the constraint precisely. A two-week launch deadline. A frozen headcount. A regulatory date that could not slip. Task: describe what "delivered" actually meant — define the success metric the same way the business defined it. Action: this is the bulk of the answer. Walk through the specific trade-offs you made — what you cut, what you parallelized, where you took on calculated debt, how you sequenced. Mention the stakeholder management — who you communicated with, how often, and how you handled the moment something started slipping. Result: shipped, on time, with measured customer impact. Quantify everything you can. Then mention what you would do again and what you would not. Avoid stories where you delivered through pure heroics — Amazon respects mechanism, not martyrdom.

Insist on the Highest Standards

Describe a time you refused to ship something even though leadership wanted it shipped.

Use a story where you held the line on quality and the cost of delay was real. Situation: name the leadership pressure and the quality concern — a memory leak, a flaky test suite, a security review gap, a UX bug that would embarrass the team. Task: define your standard explicitly — what "good enough" looked like and why the current state did not clear it. Action: describe how you communicated up. Strong answers include the artifact you used — a clear written risk doc, a measured argument with specific failure modes, an offer of a smaller-scope alternative. Show that you did not just refuse; you offered a path. Result: what shipped, when, and what would have happened if you had not pushed back. Bar Raisers often follow with: "What would you have done if leadership still said ship?" Have an answer ready that does not violate the LP.

Earn Trust

Tell me about a time you had to rebuild trust with a peer or partner team after a mistake.

The strongest version of this story has you owning a real mistake — not a manufactured one and not a near-miss. Situation: what happened, what you broke, who got hurt. Task: name the trust deficit precisely — was the partner team going to stop integrating with you? Was your manager questioning your judgment? Action: walk through the repair work. The structure that lands: acknowledge clearly, take ownership without deflection, write the postmortem, propose the mechanism that prevents recurrence, then earn back trust through follow-through. Mention the small things — showing up to their standups uninvited, sending the weekly status update they did not ask for. Result: a measurable change in the relationship. Bar Raisers probe for the specific moment trust came back — was it a public acknowledgment, a shipped fix, a renewed partnership? Avoid stories where the trust break was not your fault.

Dive Deep

Walk me through a debugging or investigation moment where you went deeper than anyone else on the team.

This LP tests technical depth and intellectual honesty. Pick a story with multiple layers — a bug that had a wrong answer at every shallow layer. Situation: a production incident, a metric anomaly, a customer complaint nobody could reproduce. Task: clarify what was at stake and why surface-level debugging was insufficient. Action: walk through the investigation as a sequence of hypotheses and disconfirmations. "I assumed it was the cache, ran X, found Y, which ruled it out. Then I looked at the network layer..." The answer should sound like a story, not a list. Include the moment you found the actual root cause and why it was non-obvious. Result: the fix, the prevention mechanism, and what you taught the team. Bar Raisers love this question because it is hard to fake — they will ask you to draw the system on a whiteboard mid-story.

Frugality

Tell me about a time you accomplished significantly more with significantly fewer resources.

Frugality at Amazon is about constraint as a creativity multiplier, not just cost cutting. Situation: a project with a real resource gap — half the headcount you needed, no infrastructure budget, a third-party vendor you could not afford. Task: define the goal that did not change despite the constraint. Action: walk through the substitutions and reuse you found. Did you repurpose an internal tool? Did you negotiate scope with the customer? Did you build a cheaper version of an expensive system? The action should show inventiveness, not just suffering. Mention the moments you protected the team from the constraint — what you took on yourself so the team could move. Result: outcome relative to the original ask, plus what the cheaper approach taught you. Avoid stories that are really just "we worked harder."

Bar Raiser-specific questions

The Bar Raiser round goes deeper than the others. Expect probing follow-ups, scope challenges, and questions that test self-awareness as much as accomplishment.

Tell me about your biggest professional failure.

The Bar Raiser is testing self-awareness, not punishing failure. Pick a real failure — a project that missed, a hire that did not work, a launch that hurt customers. Avoid the cliched "I worked too hard" deflection. Situation: name the failure directly and own it. Task: explain what you were trying to do and why it mattered. Action: walk through what went wrong, including the moments you saw warning signs and did not act. Crucially, name the specific thing you did wrong — not the team, not the org, not the timing. Then describe what you changed afterward: a habit, a mechanism, a written checklist, a different escalation pattern. Result: how you applied the lesson on the next project, with evidence. Bar Raisers will probe for: "What would you do differently today?" — have a specific answer that proves the lesson stuck. The strongest version is a failure you would still own publicly today.

Walk me through your most complex technical project end-to-end.

This question is about scope, ownership, and depth all at once. Pick a project where you owned a non-trivial slice — ideally one with cross-team dependencies, real ambiguity, and measurable customer impact. Structure the answer in five parts. First, the customer or business problem (one minute). Second, the technical landscape — existing systems, constraints, alternatives you rejected (two minutes). Third, the design — diagrams in your head, key trade-offs, the specific decision points (three minutes). Fourth, the execution — the unglamorous parts: rollout plan, on-call coverage, rollback strategy, the weekend you spent migrating data. Fifth, the result — adoption, latency, cost, customer feedback. Throughout, the Bar Raiser will interrupt with depth probes: "Why not use Kafka instead?" "What was your write throughput?" Be ready to defend every choice with specifics. The trap is starting at architecture before establishing the customer problem.

Describe a time you raised the bar for your team — culturally or technically.

The Bar Raiser interview is named after this LP value: the idea that every hire and every contribution should raise the average. Pick a moment where you set a new standard that outlived your involvement. Situation: the prior baseline — "our deploy frequency was once a week and breakage was common." Task: why you decided to push for change, even when it was not asked of you. Action: the mechanism. Did you write a doc, run a hackathon, refactor a critical service, mentor a new pattern, introduce a code review checklist that other teams adopted? The action must be a mechanism, not just an opinion. Result: durable change. Do other teams use it? Did it become the team's new normal? The strongest version: someone you mentored is now doing the same thing for the next person. Bar Raisers want to see leverage, not just personal contribution.

Tell me about a time you had to make a long-term decision that hurt short-term metrics.

This blends Ownership with Are Right, A Lot. Situation: a moment when the right thing to do for the next two years looked bad in the next two weeks — paying down debt, deprecating a feature with active users, rewriting a service mid-launch. Task: define the time horizon trade-off precisely. What did you give up in the short term, and what were you trying to protect in the long term? Action: walk through how you communicated the cost. Who needed to be aligned? What artifact did you produce — a written tenet, a roadmap, a stakeholder doc? Describe the moment leadership questioned the trade-off and how you held the line. Result: the long-term outcome, ideally with retrospective evidence that the short-term pain was worth it. If the long-term outcome has not played out yet, name the leading indicator that confirms the call.

Coding round questions

Amazon coding rounds skew toward arrays, strings, trees, and graphs. Expect one warm-up problem and one harder follow-up, with depth probes on time complexity and edge cases.

Given an array of integers, find the contiguous subarray with the largest sum. Return the sum.

Kadane's algorithm. Walk the array maintaining two running values: the best sum ending at the current index, and the global best. At each step, choose between extending the previous subarray or starting fresh at the current element. Time O(n), space O(1). Amazon interviewers expect you to first state the brute-force O(n^2) solution, then propose Kadane's, then explain why the local-vs-global recurrence is correct. Edge cases that come up: all-negative arrays (return the largest single element), empty arrays (clarify with the interviewer), and integer overflow on huge inputs. Common follow-up: return the actual subarray, not just the sum — track the start and end indices. Second follow-up: do it for a 2D matrix (this becomes a much harder O(n^3) problem using Kadane on each pair of row boundaries). Always narrate your trade-offs out loud; Amazon weighs problem-solving communication.

Design a function that takes a list of strings and groups anagrams together.

Two valid approaches. First, sort each string and use the sorted form as a hash key — O(n * k log k) where k is the average string length. Second, use a 26-element character count tuple as the key — O(n * k). The second is faster for long strings; the first is simpler. Amazon interviewers will let you pick, but they want you to articulate the choice. Walk through the hash map mechanics, the bucket grouping, and the final flattening. Edge cases: empty strings (one group), unicode strings (your 26-char trick breaks; clarify ASCII assumption), and case sensitivity (decide and state it). Common follow-up: what if the strings are very long and memory is constrained? Discuss streaming and external sorting. Second follow-up: "how would you parallelize this across machines?" — partition by hash key and reduce.

Given a binary tree, return its level-order traversal as a list of lists.

Standard BFS with a queue. Initialize the queue with the root, then loop: for each level, snapshot the queue length, drain that many nodes, collect their values, and enqueue their children. Time O(n), space O(n). Amazon often follows with variations: zigzag traversal (alternate left-to-right and right-to-left per level), right-side view (last node of each level), or vertical-order traversal (a much harder problem requiring a sorted map keyed on column index). Edge cases: empty tree (return empty list), single-node tree, skewed tree (still O(n) but the queue grows linearly with depth on degenerate inputs). Discuss recursive DFS as an alternative — pass the level as a parameter and append into the right bucket. Both work, but BFS reads more naturally for level-order. Be ready to draw the tree on a whiteboard and trace your code line by line.

Implement an LRU cache with O(1) get and put.

The canonical answer is a hash map plus a doubly linked list. The hash map gives O(1) lookup from key to node; the doubly linked list maintains recency order with O(1) insert and remove. On get, move the node to the head (most recently used). On put, insert at the head, and if capacity is exceeded, evict the tail. Walk through the pointer manipulation carefully — Amazon interviewers will ask you to code it cleanly, not just describe it. Edge cases: capacity of zero, updating an existing key (move and update value, not insert), and thread safety (mention but defer unless asked). Common follow-ups: "What if multiple threads access this concurrently?" Discuss a coarse mutex versus per-bucket locks versus a lock-free design. Second follow-up: "How would you make this distributed?" Now you are designing a sharded cache like Memcached, with consistent hashing for placement.

Given a directed graph, detect if it contains a cycle.

Use DFS with three colors: white (unvisited), gray (in the current recursion stack), black (fully processed). If during DFS you encounter a gray node, you have found a back edge and therefore a cycle. Time O(V + E), space O(V). Amazon interviewers want you to handle disconnected graphs — wrap the DFS in a loop over all nodes. Edge cases: self-loops (a single node with an edge to itself is a cycle), multi-edges (treat as one for cycle detection), and isolated nodes. Common follow-up: "What if the graph is undirected?" The three-color trick still works, but now you need to track the parent to avoid mistaking the edge you came in on as a back edge. Second follow-up: "What if the graph is too large for memory?" Discuss external-memory algorithms or streaming approximations. Also be ready to compare DFS-cycle-detection with topological sort: a topological sort exists if and only if no cycle exists.

System design questions

System design at Amazon is heavily product-flavored. Interviewers expect you to ground every decision in customer impact, scale numbers, and trade-offs you can defend.

Design Amazon's product recommendation system.

Start by clarifying scope. Are we designing real-time on-page recommendations, email recommendations, or the full personalization platform? Pick a slice — for example, real-time "customers who viewed this also viewed" on a product page. Define functional requirements: query latency under 100ms, fresh enough to reflect recent activity, ranked by relevance. Non-functional: handle Amazon-scale read traffic, degrade gracefully on candidate-generation failures. Move to high-level architecture: an offline pipeline computing item-item co-occurrence and embeddings, a feature store keyed by user and item, a candidate generator (collaborative filtering plus content-based plus recently-trending), a ranker (a learned model over hand-crafted plus learned features), and a final business-rules filter (in stock, ships to user, not blocked). Discuss the data flow: clickstream into Kinesis, into a batch layer for training and a streaming layer for fresh signals. Talk trade-offs: cold-start users (fall back to popularity), exploration vs. exploitation (a small randomized slot), evaluation (offline NDCG plus online A/B). Bar Raisers will push on the cold-start, the eval methodology, and the latency budget breakdown.

Design a DynamoDB-style distributed key-value store.

This is a deeply Amazon-flavored question. Start by stating the core design choices DynamoDB makes: key-value with optional sort key, eventual or strong consistency at the user's choice, single-digit-millisecond latency at any scale, fully managed. Sketch the architecture: partitioning by hash of the partition key using consistent hashing, replication across multiple availability zones (typically three replicas), quorum-based reads and writes for tunable consistency. Walk through the write path: client hits a request router, request router locates the partition's primary, primary writes to a local log, replicates to followers, acknowledges based on consistency level. Walk through the read path with both consistency modes. Discuss conflict resolution — DynamoDB uses last-writer-wins with timestamps; Dynamo (the original paper) used vector clocks and reconciliation at read time. Cover repartitioning when a partition gets hot (split), and merging when traffic drops. Bar Raisers will ask about: hot keys, rebalancing during a split, the gossip protocol for membership, and how you handle a region outage. Reference the original Dynamo paper if you have read it.

Design Prime Video's streaming architecture.

Clarify scope first — are we designing the playback path, the upload and transcoding pipeline, or both? Most interviewers want the playback path with a quick nod to ingest. Functional requirements: serve millions of concurrent viewers globally, adaptive bitrate, low startup latency, DRM-protected content. Non-functional: global CDN reach, graceful degradation when a region fails. High-level architecture: video ingest produces multiple bitrates and segments using HLS or DASH, segments and manifests are stored in object storage and pushed to a CDN with edge caching, the player fetches the manifest then segments in sequence, switching bitrate based on bandwidth probes. Discuss the control plane: catalog service, entitlement service (does this user have access in this region?), license service (DRM key issuance), watch-history service. Talk trade-offs: pre-positioning popular content to edge POPs vs. cache-on-demand, peer-assisted delivery for live events, the playback startup vs. quality trade-off. Bar Raisers love depth probes here: how does Apple HLS differ from DASH, what does ABR look like under flaky mobile networks, how do you handle an edge POP losing power mid-stream. Mention measurement: rebuffering ratio, video-start failure rate, average bitrate delivered.

Map your LP stories in real time

PhantomCode listens to the question being asked and surfaces the most relevant LP framing, the STAR structure to follow, and the story from your prepared bank that fits — all without showing up on the interviewer's screen share. Built specifically for Amazon-style behavioral pressure where every round scores on Leadership Principles.

See the interview copilot

Related guides

  • All interview question banks
  • Google interview questions