PhantomCodeAIPhantomCodeAI
FeaturesMock InterviewDashboardJobsPricing
FeaturesMock InterviewDashboardJobsPricing
HomeInterview QuestionsBackend Engineer
Updated for 2026 backend loops

Backend Engineer Interview Questions — What You Actually Get Asked

Backend interviews look superficially similar to generic software-engineer loops — a coding round, a system design round, a behavioural round — but the bar inside each round is calibrated differently. Coding rounds skew toward concurrency, throughput, and data-structure choices that matter at server scale (rate limiters, LRU caches with TTL, top-K on a stream) rather than pure algorithm puzzles. System design is the heaviest weight — for L5+ backend roles it often takes two rounds — and the interviewer expects you to drive the back-of-envelope numbers, name specific datastores and their trade-offs, and have opinions on consistency, partitioning, and failure modes. The behavioural round screens hard for on-call maturity: how you reason about outages, post-mortems, and the long tail of operational work that backend engineers actually own. What separates a passing answer from a no-hire is not whether you know the right architecture — there is rarely one right architecture — but whether you can articulate the trade-offs at the level of a system, name what you would measure to validate the choice, and own the consequences when the choice turns out to be wrong. The questions below come from real loops at companies with strong backend hiring bars and are written for L5+ depth; for junior backend roles you can skip some of the L5 follow-ups and focus on the core implementation.

How backend interview loops are structured

  • Coding rounds, but server-shaped. Expect problems where the data structure choice is the point — sliding windows, heaps, LRU caches, rate limiters, concurrent maps. Pure algorithmic puzzles (DP on grids, tricky tree traversals) are less common than in generalist loops. Talk about thread safety unprompted; it signals you have written real server code.
  • Two system design rounds at L5+. One is usually a big system (notifications, payments, feed, search), the other a focused subsystem (rate limiter, idempotency layer, cache invalidation). The big one tests breadth; the focused one tests depth. Both require capacity math — run the numbers out loud.
  • Database round, sometimes embedded. Many backend loops include explicit DB questions: index design, query plans, when to denormalise, sharding strategies, consistency choices. If the JD mentions a specific datastore (Postgres, DynamoDB, Spanner, Cassandra), expect questions on its quirks.
  • On-call and operations show up everywhere. The behavioural round always has an outage question. Even coding rounds quietly probe operational thinking — "what happens if this service restarts mid-request?" The senior signal is treating the running system, not just the code, as the artefact you own.

Coding round Q&A

Backend coding rounds reward implementations that are correct under concurrency and efficient under load. Talk about locks, atomicity, and big-O before you start typing.

Q1. Implement a rate limiter that supports 1000 requests/sec per user, with burst handling, in Go (or Python/Java — your choice).

There are three standard approaches and the interviewer wants you to compare them before you write code. Fixed window counter is the simplest — bucket requests by floor(now / window) and increment a counter — but it allows 2× the limit at the window boundary (999 requests at second 0.99, another 1000 at second 1.01). Sliding window log keeps a sorted list of timestamps per user and drops anything older than the window; it is exact, but memory grows with QPS. Token bucket is the standard backend answer because it handles bursts naturally: each user has a bucket of capacity B that refills at R tokens/second, every request consumes one token, requests are rejected when the bucket is empty. To allow short bursts above 1000/sec, set B = 2000 and R = 1000. Implementation-wise, on each request you compute tokens_now = min(B, tokens_last + (now - last_refill) * R), check tokens_now >= 1, decrement, and store. Wrap the read-modify-write in an atomic or a mutex per user — sync.Mutex in Go, AtomicLong with CAS in Java, asyncio.Lock per key in Python. For a single instance, an in-memory map keyed by user_id works. The L5 follow-up is always 'now make it distributed' — and the answer is Redis with a Lua script that does the read-modify-write atomically (INCR plus EXPIRE is not sufficient because it is fixed-window, not token bucket). Lua keeps the script server-side so the round-trip is one RTT. The harder L5+ follow-up is 'what if Redis is a hot spot' — answer is local token bucket with periodic reconciliation: each instance gets a share of the global rate (1000/N), borrows extra tokens from a global pool every ~1s, and accepts that under skewed traffic you allow ~10% over-quota briefly. Probabilistic approaches (request a token from Redis with probability p, locally allow with 1-p) trade accuracy for fewer Redis hits.

Q2. Given a stream of incoming HTTP requests with timestamps, design a function that returns the top-K most-requested endpoints in the last 5 minutes, with O(log K) per insert and O(K) per query.

The right shape is a sliding window plus a min-heap of size K plus a hashmap from endpoint to count and to its heap position. On each request: append (endpoint, timestamp) to a global deque, increment hashmap[endpoint], then update the heap. The heap stores (count, endpoint). If endpoint is already in the heap, sift it up since its count increased; if not, and the heap has fewer than K entries, push it; if the heap is full and the new count exceeds the min, pop and push. Updating an arbitrary heap entry is O(log K) only if you keep an index map from endpoint to heap position — a plain heap will not support it. The sliding window expiry runs lazily: before answering a query, pop from the front of the deque while the front timestamp is older than now - 300s, and decrement hashmap[endpoint] for each popped entry; if a decremented endpoint is in the heap, sift it down (or remove if it drops below the min). Query returns the heap contents sorted by count — O(K). Memory is O(N) for the deque and O(M) for the hashmap where M is distinct endpoints. The L5+ follow-up is the high-QPS case where you cannot afford O(1) per request anymore — answer is count-min sketch for the per-endpoint counts (sublinear memory, slight overcounting) combined with a heavy-hitters algorithm like Misra-Gries or the SpaceSaving algorithm. For approximate top-K at very high cardinality, a t-digest or HyperLogLog++ for distinct counts also comes up. Mention that the bucketed-window variant (one bucket per 10 seconds, 30 buckets sliding) is cheaper than per-request expiry if approximate freshness is acceptable.

Q3. Write a function that idempotently processes a payment given a request ID. Discuss what 'idempotent' means in a distributed system.

Idempotent means: calling the operation N times has the same effect as calling it once. In a distributed system this matters because the network will retry — a client times out, a load balancer retries, a queue redelivers — and you must not charge the card twice. The canonical pattern: the client generates an idempotency_key (UUID), sends it as a header on every retry of the same logical request. The server has an idempotency table with columns (key PRIMARY KEY, user_id, request_hash, response, status, created_at). On request arrival: SELECT FROM idempotency WHERE key = ?. If a row exists with status='completed', return the stored response without re-executing. If status='in_progress', return 409 Conflict (or block on it). If no row exists, INSERT with status='in_progress' inside the same transaction as the payment write — if the INSERT fails on the unique constraint, another request won the race and you should re-read. The inserted-then-failed problem is the real trap: you insert the idempotency row, the payment side-effect succeeds, then the process crashes before you write status='completed' and the response. On retry, the row exists but is 'in_progress' forever. Two fixes: (1) timeout-based recovery — treat 'in_progress' older than N seconds as failed and re-execute, accepting the risk; (2) make the payment side-effect itself idempotent at the downstream layer (Stripe accepts an idempotency key too — pass it through). Strong answers mention idempotency at three layers: HTTP header (client-generated key), DB unique constraint (the key column), and downstream provider key. For event publishing alongside the DB write, use the transactional outbox pattern — write the event row in the same transaction as the payment, then a separate poller ships it to Kafka with at-least-once delivery, and consumers dedupe by event_id. Trade-offs: storing idempotency rows costs storage (~hundreds of bytes per request, TTL them after 24h); the lookup adds one DB query per request; the alternative — relying on a UUID in the payment table with a unique constraint — works but loses the ability to return the original response on retry.

Q4. Implement an LRU cache with TTL. Discuss thread safety.

Standard LRU is a HashMap from key to a doubly-linked-list node, plus a doubly-linked list ordered by recency with sentinel head and tail. get(k): look up the node, unlink and re-link at the head, return its value. put(k, v): if k exists, update and move to head; otherwise create a node, insert at head, and if size > capacity evict the tail and remove its key from the map. Adding TTL: each node also stores expires_at = now + ttl. On get(k), if expires_at <= now, evict it and return miss. For proactive expiry (so expired entries do not sit in memory until accessed), add a min-heap keyed by expires_at, or a hashed timer wheel for O(1) per-tick expiry with bounded resolution. A background goroutine/thread pops the heap and evicts expired entries every few hundred ms. Trade-off: heap-based expiry is O(log N) per insert; timer wheel is O(1) but adds bucket-resolution slop. Thread safety: a single mutex around all operations is fine for an interview — write it explicitly so the interviewer sees you thought about it. The L5+ follow-up is 'now make it scale to high concurrency'. Three real options: (1) striped locking — shard by hash(key) % N into N independent LRUs each with its own mutex, contention drops by N×, at the cost of slightly inaccurate global LRU ordering (good enough in practice — this is what Guava and Caffeine do); (2) lock-free with a concurrent linked hash map — Java's ConcurrentLinkedHashMap or Caffeine, which use a write buffer that batches LRU reorderings off the hot path; (3) RCU-style optimistic concurrency for read-heavy workloads. Mention the eviction trade-off when TTL expiry and capacity eviction both fire simultaneously — the natural ordering is 'evict expired first, then LRU' so you do not waste capacity holding stale entries. Also flag the thundering-herd problem when a hot key expires: solutions are request coalescing (single-flight pattern, where the first miss takes the lock to recompute and other callers wait on the same future) or probabilistic early expiration (refresh-ahead — refresh with probability p as the entry approaches its TTL).

System design Q&A

Start with requirements and capacity numbers before drawing boxes. Name specific datastores, queues, and protocols — generic "use a database" reads as junior.

Q1. Design a notification service that handles 100M users, supports push/email/SMS, and guarantees delivery.

Start with the ingestion layer. Producers (other internal services — billing, social graph, marketing) write notification requests onto a Kafka topic partitioned by user_id. Kafka absorbs traffic spikes and decouples producers from the delivery pipeline. The notification service reads from Kafka in worker pools. For each message, it looks up the user's channel preferences from a sharded DB (user_id → {push: true, email: true, sms: false, quiet_hours: 22-08}) — this should be cached in Redis with a write-through pattern since preferences are read every notification and rarely written. The routing layer fans out: if push is enabled, queue to the push pipeline; if email, to the email pipeline; if SMS, to the SMS pipeline. Each pipeline has provider integrations — FCM and APNS for push (mobile platforms differ; you need device-token storage with the user, and you must handle token rotation and invalidation responses), SES or SendGrid for email, Twilio or MessageBird for SMS. Providers fail and rate-limit, so each pipeline needs: exponential backoff with jitter on transient failures (5xx, 429), a circuit breaker per provider to fail fast when a provider is down, a dead-letter queue for messages that exhaust retries (humans inspect these). Deduplication is critical because Kafka is at-least-once: dedupe by (user_id, notification_id) in a short-TTL Redis set or by a unique constraint on a deliveries table. Idempotency keys flow through to the provider where supported (FCM accepts a message_id you can replay safely). The 'guarantees delivery' question is where most candidates stumble — be honest. At-least-once is the default and achievable: write the request durably, retry until provider acknowledges. Exactly-once across third-party providers is impossible — the SMS provider may charge you and have its own retries; you cannot make a downstream system idempotent that isn't. 'Delivered' vs 'sent' is also different: 'sent' means you handed it to the provider; 'delivered' means the device or mailbox confirmed receipt, which arrives asynchronously via webhook (FCM delivery receipts, SES bounce/complaint webhooks). Store both states on the delivery row. Capacity: 100M users × ~5 notifications/day average = 500M/day = ~6K/sec average, with peaks of 50–100K/sec during marketing blasts — partition Kafka with enough partitions to parallelise workers (rough rule: one partition per 1–2K/sec per consumer). The harder problems: quiet hours and per-channel rate limiting (do not send three notifications in five minutes — bucket per user with a sliding window), and ordering (the user wants the 'message received' notification before 'message read' — partition Kafka by user_id to preserve per-user ordering, accept that across-user ordering is undefined).

Q2. Design a payment system that handles millions of transactions per day, supports refunds and chargebacks, and is auditable.

The architectural choice that everything else flows from is the double-entry ledger. Every transaction is recorded as two journal entries that sum to zero: a charge debits the customer's account and credits revenue; a refund credits the customer's account and debits revenue. Ledger entries are append-only — you never update them, you post a reversing entry. This is non-negotiable for auditability and is how every real payments system is built (Stripe, Square, Adyen). Schema sketch: accounts(account_id, type, currency) and journal_entries(entry_id, transaction_id, account_id, amount, direction, posted_at). The sum of all entries for a transaction must be zero — enforce in app logic and verify with a periodic reconciliation job. API layer: every endpoint that mutates money takes an idempotency_key and stores the response keyed by it, so retries are safe. For external integrations (card networks, bank rails) use the two-phase authorize → capture pattern. Authorize reserves funds on the card and returns a token; capture (later, possibly minutes or days) actually moves the money. This lets you cancel cleanly if a downstream step fails — the auth expires on its own. The multi-service flow (reserve inventory + charge card + send confirmation + grant access) needs the saga pattern: each step has a compensating action, and a coordinator (or a choreographed event chain) walks the steps. If step 3 fails, compensations for steps 1 and 2 run. Sagas are eventually consistent — for the brief window between steps the system is inconsistent, which is acceptable for payments because everything is reconcilable from the ledger. Event sourcing fits naturally: every state change is an event (PaymentInitiated, PaymentAuthorized, PaymentCaptured, PaymentRefunded, PaymentDisputed), stored in an append-only event log. The current state of any transaction is a fold over its event history. This gives you the audit trail for free — auditors and finance can replay any transaction's full history. It also gives you point-in-time reconstruction for disaster recovery: replay the event log up to timestamp T to get the system's state at T. Refunds: a refund is a new transaction that posts reversing entries against the original — never delete or modify the original. Chargebacks: when the card network notifies you of a dispute, post a ChargebackInitiated event, freeze the linked transaction (do not block other transactions on the same card — that is a UX disaster), and respond within the network's deadline (usually 30 days). Track win/loss outcomes for product analytics. Sharding strategy: partition by account_id (merchant in a marketplace context, user in a P2P context) so a single account's history is on one shard and reads are fast. Cross-account queries (treasury, reporting) run against a separate analytics warehouse fed by CDC. Don't shard by transaction_id — you lose locality. Storage: Postgres or CockroachDB for the ledger (ACID is mandatory), Kafka for the event bus, S3 for cold event archives older than ~90 days. Reconciliation job runs nightly: sum every account's entries, compare to provider statements (Stripe payouts, bank statements), flag discrepancies. Discrepancies happen — provider fees, FX rounding, partial captures — and a human reviews them. Also discuss: PCI scope (do not store raw card numbers, use the provider's vault and store only tokens; this collapses your PCI footprint from PCI-DSS Level 1 to SAQ A), currency handling (store all amounts as integer minor units — cents, paise — never floats), and the latency budget (auth path must complete in ~3s end-to-end or the card network times out).

Q3. Design a global URL shortener (bit.ly clone) that handles 10B URLs and 100M redirects/day with sub-50ms latency globally.

Two distinct paths with very different characteristics: the write path (create short URL) is rare and tolerates ~100ms; the read path (redirect) is hot and must be sub-50ms p99 globally. Optimise them separately. ID generation: avoid collisions by using a counter-based ID instead of hashing. A Snowflake-style ID is 64 bits (timestamp + datacenter + sequence) generated in-process — no coordination per request. Base62 encode (0-9, a-z, A-Z) to get a 7-character short URL — 62^7 ≈ 3.5 trillion, enough for 10B with massive headroom. Hashing the long URL (truncating SHA-256 to 7 chars) is simpler but has collision risk that grows fast at 10B scale — birthday collision probability becomes meaningful by ~100M URLs at 7 chars, and resolving collisions on insert means a read-modify-write round trip. Snowflake-style avoids the problem entirely. Storage: a globally replicated DB. Three real choices — CockroachDB or Spanner (strongly consistent, expensive, hard to operate globally), DynamoDB Global Tables (eventually consistent across regions, fully managed, cheaper), or a homegrown solution with per-region writes and async replication. For a URL shortener, eventual consistency is acceptable — a brand-new short URL might 404 for ~1s in a far region, which users tolerate. DynamoDB Global Tables is the pragmatic choice. Schema: PK = short_code, attributes = long_url, owner_id, created_at, expires_at, custom_slug_flag. Index on owner_id for the user's dashboard. Read path (the hot path): user hits short.ly/abc1234, request lands at the nearest CDN PoP via anycast. CDN looks up abc1234 in its cache — if hit (~95% hit rate after a few hours of warmup), serve a 301 redirect immediately, ~10ms. If miss, CDN fetches from the regional DB read replica with stale-while-revalidate, caches the response with a long TTL (24h+), serves the redirect. Set Cache-Control: public, max-age=86400 on the redirect response so the CDN and intermediate caches respect it. Use 301 (permanent) not 302 (temporary) because 301 is cacheable by browsers — repeat clicks never hit your infra at all. Edge cases: you cannot use 301 if the URL might change (custom slugs being remapped) or expire — use 302 in those cases and accept the cost. Analytics pipeline (the second hot path): every redirect fires an async event to Kafka (short_code, timestamp, geo, user_agent, referrer) — non-blocking, do not delay the redirect on Kafka write failure. Kafka → ClickHouse or Druid for click analytics. ClickHouse is the standard choice for this workload — columnar, fast aggregations, cheap storage. Dashboards query ClickHouse with sub-second response on billions of rows. Collisions are non-issue with Snowflake IDs. Custom slugs are user-chosen short codes (short.ly/myevent2026) — enforce uniqueness with a conditional insert (INSERT IF NOT EXISTS) and return a clean error on conflict. Expiry: cheap expiration at 10B scale is the real challenge — you cannot scan the table. Solutions: (1) DynamoDB TTL deletes expired rows automatically in the background, cheap and good enough; (2) for stricter freshness, partition by expiry-week and drop entire partitions; (3) for free-tier abuse mitigation, do lazy expiry on read — check expires_at when serving the redirect, return 410 Gone if expired, and let TTL clean up later. Abuse: spammers will create millions of short URLs pointing at malware. Mitigations — rate limit creation per IP/user, scan target URLs against Google Safe Browsing on creation and periodically thereafter, allow user reports, and serve a warning interstitial for flagged URLs. Capacity: 100M redirects/day = ~1.2K/sec average, peaks ~10K/sec, easily handled by a single CDN tier. 10B URLs at ~200 bytes each is 2 TB — trivial for any cloud DB. The hard cost is the analytics pipeline at sustained 1K+ events/sec.

Behavioural Q&A

Backend behavioural rounds always include an outage question. The interviewer wants specifics — tools, timestamps, the actual post-mortem action items you owned.

Q1. Tell me about an outage you owned end-to-end.

The interviewer is grading four phases: detection, diagnosis, mitigation, and follow-through. Strong answers walk through all four with specific tools and timestamps. Detection: how you found out — pager from an alert (which alert, what threshold), customer reports, a graph you happened to glance at. The bonus signal is whether you owned the alert quality: 'we had a synthetic check on the checkout flow that fired within 90 seconds of the regression, which is how we caught it before customer reports'. If detection was slow, own it — 'we did not have alerting on that path and learned about it from a tweet' is a legitimate answer if you fixed the gap afterward. Diagnosis: name the tools. Logs in Datadog or Splunk, metrics in Prometheus or Cloudwatch, distributed traces in Honeycomb or Jaeger, profiles in Pyroscope. Walk through the funnel — 'error rate spiked on the checkout service, traces showed the slow span was the inventory call, logs on the inventory service showed connection pool exhaustion, metrics confirmed DB CPU was at 100%'. The signal here is whether you have a reproducible debugging method or you flail. Mitigation: what stopped the bleeding. Rollback is the most common and the right default — 'we reverted the deploy, error rate dropped within 90 seconds'. Other mitigations: shifting traffic to a healthy region, scaling up capacity, disabling a feature flag, restarting a stuck process. Be precise about the time to mitigate (TTM) — 'detection to mitigation was 12 minutes' beats 'we fixed it quickly'. Post-mortem: this is the senior signal and where most candidates underweight. Mention the document — every mature org runs blameless post-mortems with structured sections (timeline, contributing factors, what went well, what went poorly, action items with owners and dates). The senior version: 'the immediate fix was the rollback, but the post-mortem surfaced that the connection pool size was hardcoded — I owned the action item to make it dynamic and added a load test that would have caught the regression in CI'. Concrete action items shipped, not just filed. Avoid blaming individuals — the failure was systemic (insufficient alerting, missing test coverage, brittle abstraction). Even if a person made a mistake, the systemic fix is what to change. L6 candidates should also mention organisational changes — 'we changed our deploy process to canary every change to 1% for 30 minutes before fleet-wide' — not just code fixes.

Q2. Describe a system you designed that turned out to be wrong.

This is the inverse of 'tell me about your biggest accomplishment' and it is much harder to answer well because it requires real self-awareness, not a polished narrative. Pick a real architectural mistake — not a small bug, not 'we picked the wrong library', but a design-level commitment that you owned and that failed in a meaningful way. The shape: (1) you made a design decision with imperfect information — describe what you knew and did not know at the time, and why the decision seemed reasonable; (2) the failure mode emerged later — usually one of: scale (the design did not survive 10× growth), an edge case (a rare user behaviour broke the model), or ops complexity (the on-call burden was higher than anticipated); (3) you owned the corrective action — sometimes a partial rewrite, sometimes a hard migration, sometimes 'we ate the cost for a year while we built the replacement'. The reflective close is the most important part: what habit changed afterward. 'Now I always prototype the failure case before committing to the design — I write the disaster scenario as a one-pager and review it with two skeptics' is the kind of close that lands. 'Now I do more upfront design' is too vague. Concrete examples that work: chose eventual consistency for a flow that turned out to need strong consistency, leading to user-visible weirdness that took 18 months to migrate off; designed a microservice boundary that cut across a transaction, leading to constant distributed-state bugs; built a config-driven system that became unmaintainable when configs grew past ~50 entries; over-indexed on a single dimension (latency, cost, simplicity) and missed the cost on the others. Avoid trauma-dumping — the interviewer does not need every gory detail, they need to know you have the capacity for unflinching post-hoc analysis. Avoid 'it turned out fine in the end' — that defeats the point of the question. The strongest signal is calmness about it: you describe the mistake without defensiveness, you took the lesson, you moved on.

Q3. Tell me about a time you simplified a system.

This is the opposite of 'I built something big' and an equally important signal for senior engineers — sometimes more important, because the bias in tech is toward addition. The interviewer is checking whether you can recognise complexity that is not paying its way and reduce it. Strong answers describe replacing a multi-service flow with a simpler one (collapsing three services back into one when the boundaries were wrong), deleting code or services (sunsetting a feature that nobody used, removing a config option that nothing read), merging components that had grown apart (consolidating two near-duplicate auth libraries into one), or replacing a complex custom system with a simpler off-the-shelf one (ripping out a homegrown job queue for SQS). The measurable outcome is what makes the answer concrete: fewer pages on the on-call rotation (from 12/week to 3/week), less ops burden (no more weekly cert rotation), easier onboarding (new engineers ship in week 1 instead of week 4), reduced infra cost (cut compute spend by 40%), faster builds (CI dropped from 22 minutes to 6). Numbers land. Process: how you got buy-in matters. Simplification almost always crosses team boundaries — 'I want to delete the service your team depends on' is a hard conversation. Walk through how you mapped the dependencies, found the actual consumers (often half the listed consumers are dead), negotiated a deprecation window, wrote a migration guide, and shepherded the last laggards. Mention the artefact — the migration doc, the deprecation timeline, the decision record explaining why this complexity was no longer worth keeping. The senior reflection: 'I look for the patterns — when I see the same kind of bug filed three times against the same module, that is usually a signal the abstraction is wrong, and the fix is to simplify the underlying model, not to keep patching'. Avoid vague claims — 'I removed legacy code' with no specifics is what every candidate says. Name the system, the specific complexity, the change you made, and the result. Bonus signal: mention what you considered removing and decided not to — sometimes the complexity exists for a real reason that is not obvious until you look closely, and recognising that is also a senior skill.

Prep the way you'll actually be tested

Reading sample answers is preparation. The loop itself is performance under pressure — and backend rounds in particular reward thinking out loud about trade-offs that you cannot rehearse cold. Drill coding under realistic conditions with PhantomCodeAI's coding copilot so you build the habit of narrating your data-structure choices and concurrency reasoning. For the system design half of the loop, run live drills against the prompts above with system design mock interviews — the transcript afterward shows exactly where you skipped capacity math, hand-waved a consistency trade-off, or named a datastore without justifying it. That is what the real interviewer is grading.