Airbnb's interview loop is structured like most senior software engineering loops — coding, system design, behavioral — but with two distinctive tilts that shape how you should prepare. The first is the technical center of gravity: Airbnb is, mechanically, a marketplace built on top of a search and ranking system. Almost every product surface (the search page, the trip planner, the host pricing tool, the trust and safety queue) is some variant of "rank these candidates for this query." That means the coding and system design rounds skew toward search, ranking, marketplace dynamics, and ML-adjacent infrastructure — even for backend engineers not on a search team. The second is the behavioral round, which Airbnb calls "core values." It is heavily weighted in the loop, an explicit hire/no-hire signal on its own (not just a tiebreaker), and run by an interviewer trained specifically to evaluate alignment with the company's named values. Candidates who treat it as a polite chat about teamwork miss the bar. The seniority bar at Airbnb is relatively high — the company hires a smaller engineering org than its peer FAANG companies and intentionally maintains a senior-heavy distribution. That shows up in the loop as deeper system design rounds and a behavioral round that explicitly probes for ownership and cross-functional leadership, not just collaboration.
Two coding rounds, often marketplace-flavored. The questions below match the actual shape of what Airbnb asks — search, ranking, pricing, attribution — rather than generic algorithm puzzles.
Three viable approaches and Airbnb actually uses a hybrid. A trie keyed on lowercased city/neighborhood names gives O(L) prefix lookup where L is the query length, and you store the top-K most-popular completions at each internal node so lookup is constant-extra after the descent — clean and fast, but a single typo breaks it (the user types ‘nwe york’ and gets nothing). An n-gram inverted index (bigrams or trigrams of every location name) handles typos naturally — score candidate matches with Jaccard similarity or BM25 over the n-gram overlap, take the top-K — but it is slower than the trie and the scoring needs tuning to push popular cities ahead of small neighborhoods that happen to overlap more n-grams. The modern third option is dense embeddings: encode every location name with a small text encoder, encode the user query with the same encoder, run ANN search (HNSW) over the location index. Embeddings give you semantic matches for free (‘the city of lights’ → Paris, ‘NYC’ → New York), but they need an offline training corpus and are overkill for the common case. The pragmatic production design is: trie for exact prefix (covers ~80% of queries), n-gram fallback when the trie returns fewer than K results (covers typos), and a learned re-ranker on top that blends popularity, personalization (the guest’s previous searches), and seasonality. The L5 follow-up is multi-lingual and synonyms: Airbnb operates in 60+ languages, so you maintain a synonym map (‘NY’ → ‘New York’, ‘NYC’ → ‘New York City’ → bridges to neighborhoods like ‘Manhattan’, ‘Brooklyn’) and an alias index keyed by language. State the hybrid out loud early — interviewers grade the trade-off discussion more than the implementation.
Pragmatic ranking problem and the interview answer is mostly about feature engineering, not the sort. Step one is the hard filter — drop listings that don’t match location radius, don’t have all the requested dates available, fail the price ceiling, or lack must-have amenities (pets, wifi). Step two is the scoring function. As a baseline, write a linear score: w1 × location_score (inverse distance from the query centroid) + w2 × price_score (where in the local price percentile this listing sits — guests often prefer mid-range over the cheapest) + w3 × amenity_match_count + w4 × listing_quality (rating × log(review_count) — Bayesian-smoothed for new listings) + w5 × host_response_signal. State explicitly that in production this linear function is replaced by a gradient-boosted tree (XGBoost or LightGBM) trained on click and booking labels, using the same features plus interaction features (price × season, amenity × guest_segment). The cold-start problem matters: new listings have no booking history, so you fall back to content-based features (photo quality, description completeness, host superhost status) and give them a small exploration boost so the ranker can learn. The personalization layer sits on top — a separate model takes the top-100 listings from the base ranker and re-orders them based on the guest’s previous bookings (same destination type, same price band, same group size). Mention the eval setup: offline NDCG@10 against held-out booking sessions, online A/B with bookings-per-search as the north-star metric. The L5 signal is naming the train/serve skew problem — features computed offline must match what the serving path computes, or your offline NDCG numbers lie.
Walk through the inputs first because they’re messier than they look. Per-listing pricing config: a base nightly rate, an optional rate calendar (overrides for specific nights — hosts set higher prices for known events), a weekend uplift (e.g. +20% Fri/Sat), length-of-stay discount tiers (e.g. 10% off at 7+ nights, 25% off at 28+ nights), and a currency. The algorithm: iterate every night in [checkin, checkout) — note the half-open range, checkout night is not charged — for each night look up the rate calendar override or fall back to the base, apply the weekend uplift if it’s a Fri or Sat, sum the nightly subtotals. Then apply the length-of-stay discount tier based on total nights. Then add cleaning fee, service fee, occupancy taxes (these are jurisdiction-specific — a separate tax service, not inline). Then round to the currency’s minor unit (USD: 2 decimals; JPY: 0 decimals) using banker’s rounding for fairness. Edge cases the interviewer probes: same-day checkout (zero nights — return error), partial weeks crossing a discount boundary (5-night stay that crosses Mon→Fri gets no weekly discount; 7-night that includes the boundary does), date-range straddling a seasonal price change (handled by the per-night iteration, no special logic needed), DST transitions (count calendar nights, not hours). The harder follow-up is dynamic pricing: hosts opt into Airbnb’s ML pricing engine, which sets per-night prices based on demand. Then the function reads from the model’s output rather than the static config — discuss caching (model writes prices nightly into the rate calendar, the function just reads) versus on-demand (recompute on every quote, more accurate but adds latency). Senior signal: bring up the consistency requirement — the price the guest sees in search must match what they see in checkout, so cache invalidation matters.
Attribution problem. Data structure: per user, a list of (timestamp, click_source, click_id) tuples kept in chronological order. The conversion event is a single (user_id, timestamp, value) — typically a booking, with value being the booking revenue. Bound the attribution window — 30 days is a common default, clicks older than that don’t count. Last-click model is trivial: filter clicks within the window before the conversion timestamp, take the most recent, assign 100% of credit. Time-decay applies an exponential weight: for click at time t before conversion C, weight is 2^(-(C - t) / half_life) — a typical half_life is 7 days. Normalize weights to sum to 1, multiply by conversion value to get per-click credit. Position-based (also called U-shaped): if there are N>=2 eligible clicks, give 40% to the first click in the window, 40% to the last, split the remaining 20% equally across the middle clicks; if N=1 it gets 100%. State the trade-offs explicitly: last-click is simple and easy to debug but systematically under-credits top-of-funnel discovery channels (the user discovers Airbnb via a brand search ad, then days later returns via direct and books — direct gets all the credit, brand ad gets none). Time-decay is a middle ground but the half-life is a hyperparameter you have to tune per business cycle. Position-based explicitly rewards both discovery and conversion channels but the 40/40/20 split is arbitrary. The real production answer is data-driven attribution (Shapley values over channel sequences), which Airbnb uses for marketing-mix modeling — but it’s expensive and requires lots of conversion data per channel. Data-quality caveats to bring up: click tracking is noisy (ad blockers, ITP cookie restrictions on Safari), users span devices (the same person clicks an ad on mobile and books on desktop — needs identity resolution via login), and the conversion itself may be later canceled (need a refund-aware version that retracts credit).
Airbnb system design rounds skew toward search, ranking, ML infrastructure, and marketplace integrity. You drive — the interviewer will not feed you the structure. Start with requirements, capacity, and the API surface.
Start with the latency budget: 300ms end-to-end, so figure ~50ms network, ~150ms backend, ~100ms render. Backend has to do geo-filter, availability-filter, hard-filter on price/amenities/guest count, and rank — in 150ms. The architecture has four layers. The indexing layer is Elasticsearch (or an in-house equivalent), sharded geographically — Airbnb has historically used per-city shards because most queries are city-scoped, with country-level fallback shards for ‘Spain’-grain queries. Each listing document has its lat/lon (indexed as geohash for fast range queries), amenities (keyword fields), price (numeric), guest capacity, and pre-computed quality scores. The availability layer is separate because availability changes constantly — represent each listing’s calendar as a bitmap (one bit per future night, ~2 years out) and store the bitmap in Redis keyed by listing_id. To filter a search like ‘checkin 2026-06-12, checkout 2026-06-15’, AND the relevant bit-ranges across candidate listings — this happens after the geo/filter pass against an already-reduced candidate set of a few thousand. The ranking layer takes the top ~1000 candidates from the index, fetches richer features (recent click-through rate, host signals, photo embeddings), and scores with a gradient-boosted tree model deployed inline. Add the personalization layer for logged-in guests: a second model takes the top-100 from the base ranker and re-orders based on the user’s history (saved searches, previous bookings, similar guests). Global latency is solved by per-region deployment: index shards replicated to each region (Americas, EU, APAC), traffic routed to nearest region. Cold-start considerations: new guest has no personalization signal, fall back to popularity-based personalization (what guests like them tend to like — segmentation by query intent). The long-tail problem: 10M+ listings globally, most queries hit <1% of inventory, so the index has long-tail bloat that hurts shard load balance — solution is hot/cold separation (active listings in fast shards, dormant listings archived to a slower shard accessed only when filters explicitly include them). Talk about query types: a ‘flexible dates’ query expands the date filter to a range and requires either pre-computing flexible-date availability (expensive) or accepting higher latency on those queries.
This is an ML system end to end. The training data is historical: for every (listing, night) tuple, you know the price the host set, whether it got booked, and the booking revenue. The model predicts P(booking | price, features) — a binary classifier with price as one of the inputs. Features split into three buckets. Per-listing features: location (lat/lon, neighborhood embedding), size (bedrooms, beds, max guests), amenities, rating, review count, photo quality score, host superhost status. Per-market features: aggregate booking velocity in the listing’s neighborhood, competing supply (how many similar listings are available for the same night), seasonality (day of week, time to date — bookings spike 4–6 weeks out for most markets), known events (concerts, conferences inflate demand). Per-night features: how many nights out it is, whether it’s a weekend, how the surrounding nights in the same listing are doing. Model choice: gradient-boosted trees (LightGBM) is the historical workhorse — interpretable, fast to train, handles missing features well. Newer deep models that use sequence features (the booking funnel — searches, clicks, bookings — over time) can outperform but require more infrastructure. Once you can predict P(booking | price), the optimization is: find the price that maximizes E[revenue] = price × P(booking | price). This is concave in price (assuming P decreases monotonically with price), so a 1-D search works. But you don’t just present that price to the host — hosts have anchors. Surveys show hosts reject suggested prices that are >20% below what they themselves set, even when the math says lower price + higher booking rate = more revenue. So the product layer caps the suggestion to within X% of the host’s current price, and surfaces explanations (‘bookings in your area for this date are X% above last year’). Cold-start for new listings: use content-based features only (no booking history), and explicitly run a price-exploration regime — give the new listing varied prices over the first weeks to gather data. Freshness: as the booking window approaches and the night remains unbooked, the optimal price drops (a empty night earns nothing). Re-run the optimization daily as the window closes. The eval setup is tricky — you can’t just A/B prices and measure bookings naively (the host’s own price-setting is a confound). Use a holdout where some hosts get suggestions and some don’t, measure delta in bookings and revenue at the host level over a quarter.
Multi-layer detection feeding a human review queue. The text layer: ML classifiers for several violation categories — scam detection (off-platform payment requests in messages, suspiciously low prices, urgency language), policy violation (parties banned in many cities, prohibited item lists), abusive language. These are typically transformer-based classifiers fine-tuned per category, scored at multiple thresholds — high-confidence positives auto-action (delete message, flag listing), mid-confidence go to human review, low-confidence are logged for offline analysis. The image layer: photo verification (matches the listing’s photos against the host’s claimed location and listing type — a ‘beachfront condo’ with no water in any photo is suspicious), duplicate detection (the same set of photos appearing across multiple listings is a known fraud pattern — perceptual hashing like pHash, then ANN search over the hash space), face detection (guests in photos is a privacy issue requiring blurring). The network layer is the most powerful and the most overlooked: build a graph where nodes are accounts (hosts, guests) and edges are interactions (bookings, messages, shared payment instruments, shared IPs, shared devices). Coordinated fraud rings show up as dense subgraphs — community detection (Louvain or Leiden) over this graph surfaces them. The pipeline has two paths: real-time scoring on every new listing, message, and review (must be fast, sub-second, runs the lightest models); and async batch scoring that runs heavier models nightly and re-evaluates the whole corpus when models are retrained or policy changes. The investigation tool for the human review team is its own product — surfaces the flagged content with context (host’s history, guest’s history, related accounts in the network graph, similar past cases), lets the reviewer take action (warn, suspend listing, suspend account, escalate to law enforcement), and logs the decision as training data for the next model iteration. The precision-recall trade-off is policy-driven, not just ML — too aggressive blocks legitimate hosts (a single false positive ban can lose a long-time superhost and surface as a press incident); too loose lets scams through and surfaces as a different kind of press incident. Different categories pick different operating points (zero-tolerance: child safety, weapons; tolerant-with-appeal: party violation, photo policy). The appeal process is part of the design — every automated action must have a path to human review, with SLAs on response time. The regulatory layer: different countries impose different requirements (Germany requires registration numbers for short-term rentals in many cities; New York City has its own short-term rental law). The system must be country-aware and enforce country-specific policies at the listing level.
Airbnb's distinctive behavioral round. The interviewer is trained to evaluate alignment with the four named values, and the round carries its own hire/no-hire signal. Specific stories beat polished generalities.
Pick one and tell a real story — the failure mode is reciting the value back at the interviewer. ‘Be a cereal entrepreneur’ is the easiest to land if you have a genuine scrappy story (the value refers to the founders selling custom-branded cereal boxes to fund the company in the early days — it’s about resourcefulness under constraint, not about hustle for hustle’s sake). A good ‘cereal entrepreneur’ story: you were blocked on something that the normal path said would take months — getting a vendor approved, getting a team to prioritize a request, getting access to data — and you found a 10% solution that unblocked you in days. Walk through what the obstacle was, why the official path was too slow, what shortcut you took, what you explicitly traded off (probably some elegance, some long-term tech debt), and what it earned (the project shipped on time, a hypothesis got validated). ‘Champion the mission’ is the second-most-picked and the hardest to do well — most candidates say something vague about ‘connecting people’. Ground it in a specific moment where you made a decision that traded off short-term optimization for the long-term experience of a host or guest. ‘Be a host’ is about hospitality applied to teammates — a story about how you onboarded a new engineer or supported a teammate through a rough quarter. ‘Embrace the adventure’ is about comfort with uncertainty — a project you took on where you didn’t know if it would work. The unifying signal: don’t name the value, show it. The interviewer will tell their HC that you embody the value — that’s the goal, not getting them to nod.
Airbnb explicitly screens for diversity of thought — the question is grading whether you can hold a position based on evidence others haven’t seen, and whether you can either persuade the room or be persuaded by them. Pick a contested decision where you genuinely held a minority view based on a piece of evidence you’d gathered. The shape: there was a meeting or doc review where the room was converging on a direction; you said something like ‘I’ve been looking at X and I think we’re missing Y’; you brought a specific piece of evidence (data, a prototype, a customer interview, a paper) that the room hadn’t considered; the room engaged with the evidence and either updated to your position, met you halfway, or persuaded you back with counter-evidence. The reflective close is the most important part: name what made you confident enough to hold the position (the evidence was robust, you’d stress-tested it with a trusted peer first), and name what would have changed your mind. Avoid two failure modes: (1) showing stubbornness — a story where you held the position despite being persuasively countered reads as someone who can’t be moved by evidence; (2) showing manufactured contrarianism — taking the opposite view for the sake of it reads as performative. The strongest version of this story ends with ‘they were right and I updated’ — Airbnb values intellectual honesty over winning, and a story where you held a minority view, presented your evidence well, and then updated when persuaded shows both the courage to dissent and the openness to change.
This question is core to Airbnb’s product worldview — the company sells travel as a route to cross-cultural empathy, and they screen for whether you have it. Pick a real cross-cultural experience. Strong examples: working with a team in a country very different from your own (the daily-work differences matter — meeting cadence, decision-making norms, written-vs-verbal communication preferences), living abroad for a sustained period (not a two-week vacation), partnering with a customer from a context very different from your own, building product for users in a market you didn’t grow up in. Walk through specific behavioral adjustments you made: ‘I learned that direct disagreement in meetings landed differently than I expected, so I started raising concerns 1:1 with the most senior person before the meeting, and the meetings got more productive.’ Or: ‘I noticed my emails were getting shorter, more transactional responses from one of the offices; I switched to longer-form context and the working relationship improved.’ The reflective close: what did this experience change about how you approach unfamiliar contexts now? A strong close describes a new habit — ‘I now always ask one question about working style at the start of a partnership with a new team’ — that you’ve carried into other settings. Avoid two anti-patterns: (1) framing the other culture as exotic or as an obstacle (‘their culture was very different from mine and it was hard to adjust’) — reads as low cultural literacy; (2) generic platitudes about empathy (‘I learned to listen more’) without specific behavior change. Specifics over sentiment.
Airbnb’s culture values direct communication — the question is grading whether you bring up hard things rather than letting them fester. Pick a real example: telling a teammate their work wasn’t meeting the bar, raising a concern about a leadership decision, pushing back on a manager’s call, telling a peer that their behavior in meetings was undermining a junior teammate. The shape of a strong answer: there was a situation where avoiding the conversation was the easy path; you decided to have it; you prepared (you wrote down what specifically you wanted to say, you chose the right setting — usually 1:1, sometimes after a meeting rather than during); you framed it as a problem to solve together rather than as a verdict; you led with the specific behavior, not the person’s character; you listened to their response and didn’t just deliver your prepared speech. The close: how did the relationship change afterward? The strongest version is ‘the relationship deepened — they thanked me later, and we work more directly now’; a second-strongest is ‘it was uncomfortable but we both adjusted and it’s now resolved’; the version to avoid is ‘they were defensive and it ended badly’ — even if true, that suggests you didn’t frame it well. Specific behaviors that land: bringing evidence (‘in the last three reviews you’ve made the change at the end rather than discussing earlier — here are the examples’), not absolutes (‘this happens a lot’); naming impact, not intent (‘the effect on the team was X’, not ‘you don’t care about quality’); asking for their view before delivering yours. The reflective close: what did you learn about when and how to have these conversations? A strong answer names a shift — ‘I now have these conversations earlier, when the issue is smaller, rather than letting them build up’.
Reading sample answers is preparation. The loop itself is performance under pressure — and Airbnb's core values round in particular grades how you sound when you're telling a story under time pressure. Use PhantomCodeAI as your real-time copilot during the actual Airbnb loop. The transcript afterward shows exactly which trade-offs you named, which values you embodied, and where you hesitated — so the next round of the loop gets sharper. Or browse the full interview-questions hub for other companies' loops.