SQL: Interview Reference

SDE interviews test a narrow slice of SQL: the join types, when an index helps (and when it's quietly ignored), GROUP BY/HAVING, and window functions. Get those cold and you clear most rounds. Below is exactly that surface area, plus 15 classic problems with worked PostgreSQL solutions.

Dialect: PostgreSQL (notes for MySQL where they differ).

Order of Execution

Mental model: SQL is written in one order but executed in another.

Number FROM/WHERE/GROUP BY/HAVING/SELECT/ORDER BY/LIMIT in execution order

Written	Executed
SELECT	5
FROM	1
WHERE	2
GROUP BY	3
HAVING	4
SELECT	5
ORDER BY	6
LIMIT	7

Implication: aliases in SELECT can't be used in WHERE (still being built), but can in ORDER BY.

Joins

-- INNER: rows matching on both sides
SELECT u.name, o.total
FROM users u
JOIN orders o ON o.user_id = u.id;

-- LEFT: all left rows; right NULL if no match
SELECT u.name, o.total
FROM users u
LEFT JOIN orders o ON o.user_id = u.id;

-- FULL OUTER: union of LEFT + RIGHT
SELECT u.name, o.total
FROM users u
FULL OUTER JOIN orders o ON o.user_id = u.id;

-- SELF JOIN: same table twice (employee → manager)
SELECT e.name AS emp, m.name AS mgr
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.id;

-- CROSS JOIN: cartesian product (use sparingly)
SELECT a.id, b.id FROM a CROSS JOIN b;

Predict the pattern

Dataset — users: (id=1, name='alice'), (id=2, name='bob'). orders: (id=1, user_id=1, total=50).

SELECT u.name, o.total
FROM users u
LEFT JOIN orders o ON o.user_id = u.id;

What is o.total for the row where u.name = 'bob'?

LEFT JOIN keeps every row from the left table. Bob (id=2) has no matching order, so all right-side columns are filled with NULL — the row is not dropped.

Predict the pattern

Dataset — same as above: users: (id=1 alice), (id=2 bob). orders: one row with user_id=1.

SELECT u.name, o.total
FROM users u
JOIN orders o ON o.user_id = u.id;

How many rows does this INNER JOIN return?

INNER JOIN only keeps rows where the join condition matches on both sides. Bob (id=2) has no order, so he is excluded. Only alice matches → 1 row.

Anti-join / semi-join

-- Users with NO orders
SELECT u.* FROM users u
LEFT JOIN orders o ON o.user_id = u.id
WHERE o.id IS NULL;

-- Or:
SELECT * FROM users WHERE id NOT IN (SELECT user_id FROM orders);
-- Or (safer with NULLs):
SELECT * FROM users u WHERE NOT EXISTS
  (SELECT 1 FROM orders o WHERE o.user_id = u.id);

NOT IN trap: if subquery returns any NULL, the whole NOT IN is unknown → 0 rows. Prefer NOT EXISTS.

Aggregates + GROUP BY / HAVING

SELECT user_id, COUNT(*) AS n_orders, SUM(total) AS revenue
FROM orders
WHERE created_at >= '2025-01-01'
GROUP BY user_id
HAVING SUM(total) > 1000
ORDER BY revenue DESC;

Predict the pattern

Dataset — emp table, 3 rows: bonus values are 100, NULL, 200.

SELECT COUNT(bonus) FROM emp;

What does this query return?

COUNT(col) counts non-NULL values only; the row with bonus = NULL is silently skipped. COUNT(*) would return 3. COUNT(bonus) returns 2.

Predict the pattern

Dataset — table t with 3 rows: col = 'a', col = 'x', col = NULL.

SELECT * FROM t WHERE col != 'x';

Rows where col IS NULL are ___ from the result.

Any comparison with NULL yields UNKNOWN (not TRUE), so NULL != 'x' is UNKNOWN, which fails the WHERE filter. Only col = 'a' passes → NULL rows are excluded.

WHERE filters rows before grouping.
HAVING filters groups after.
Every non-aggregated column in SELECT must be in GROUP BY (Postgres strict; MySQL lax → enable ONLY_FULL_GROUP_BY).

Conditional aggregates

SELECT
  COUNT(*) FILTER (WHERE status = 'paid')   AS paid,
  COUNT(*) FILTER (WHERE status = 'failed') AS failed
FROM orders;

-- Portable form:
SELECT
  SUM(CASE WHEN status='paid'   THEN 1 ELSE 0 END) AS paid,
  SUM(CASE WHEN status='failed' THEN 1 ELSE 0 END) AS failed
FROM orders;

Window Functions

The interview-grade tool. OVER (PARTITION BY ... ORDER BY ...).

-- Running total per user
SELECT user_id, order_date, total,
       SUM(total) OVER (PARTITION BY user_id ORDER BY order_date) AS running
FROM orders;

-- Rank orders within user by total
SELECT user_id, total,
       ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY total DESC) AS rn,
       RANK()       OVER (PARTITION BY user_id ORDER BY total DESC) AS rk,
       DENSE_RANK() OVER (PARTITION BY user_id ORDER BY total DESC) AS drk
FROM orders;

-- Previous / next row
SELECT user_id, order_date, total,
       LAG(total)  OVER (PARTITION BY user_id ORDER BY order_date) AS prev_total,
       LEAD(total) OVER (PARTITION BY user_id ORDER BY order_date) AS next_total
FROM orders;

Function	Output
`ROW_NUMBER()`	unique sequential rank, no ties
`RANK()`	ties share rank, leaves gaps (1,1,3)
`DENSE_RANK()`	ties share rank, no gaps (1,1,2)
`NTILE(k)`	distribute rows into k buckets
`LAG(col, n)`	value from row n behind
`LEAD(col, n)`	value from row n ahead
`FIRST_VALUE`, `LAST_VALUE`	first/last in window
`SUM/AVG/COUNT OVER`	running aggregate

Frame clause

SUM(x) OVER (
  PARTITION BY u
  ORDER BY t
  ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW   -- running total
)
-- BETWEEN 6 PRECEDING AND CURRENT ROW              -- 7-day moving sum

CTEs (WITH)

WITH recent_orders AS (
  SELECT user_id, SUM(total) AS spent
  FROM orders
  WHERE created_at >= NOW() - INTERVAL '30 days'
  GROUP BY user_id
)
SELECT u.name, r.spent
FROM users u JOIN recent_orders r ON r.user_id = u.id
WHERE r.spent > 500;

Improves readability over deep subqueries.
Postgres ≥ 12: CTEs no longer always materialize; use MATERIALIZED/NOT MATERIALIZED to control.

Recursive CTE (graph / hierarchy)

-- All descendants of manager id 1
WITH RECURSIVE tree AS (
  SELECT id, name, manager_id, 1 AS depth
  FROM employees WHERE id = 1
  UNION ALL
  SELECT e.id, e.name, e.manager_id, t.depth + 1
  FROM employees e JOIN tree t ON e.manager_id = t.id
)
SELECT * FROM tree;

Use for: org charts, category trees, BOMs, shortest-path on small graphs, "explode" sequences.

Subqueries

Type	Example
Scalar	`SELECT name, (SELECT COUNT(*) FROM orders o WHERE o.user_id = u.id) FROM users u`
`IN`	`WHERE id IN (SELECT user_id FROM orders)`
`EXISTS`	`WHERE EXISTS (SELECT 1 FROM orders WHERE user_id = u.id)`
Correlated	subquery references outer query, runs per row

EXISTS short-circuits → typically faster than IN on large data, and NULL-safe.

Indexes

Types

Type	Best for
B-tree (default)	equality + range; ORDER BY
Hash	equality only
GIN	full-text, JSONB, array contains
GiST	geometry, ranges
BRIN	huge, append-only, naturally ordered tables

Composite index

CREATE INDEX idx_orders_user_date ON orders(user_id, created_at);

Useful for:

WHERE user_id = ? AND created_at > ?
WHERE user_id = ?
Not WHERE created_at > ? (leftmost-prefix rule).

Covering index

CREATE INDEX idx_orders_user_total
ON orders(user_id) INCLUDE (total);
-- Query can be answered from index alone (no table fetch).

Partial index

CREATE INDEX idx_orders_pending
ON orders(created_at) WHERE status = 'pending';

When indexes hurt

Write-heavy tables: every index = extra write cost.
Low-cardinality columns (e.g. boolean): full scan often wins.
Tiny tables.

Index doesn't get used? Common reasons

Function on column: WHERE LOWER(email) = ? → use functional index or normalize at write.
Implicit cast: WHERE int_col = '5' → planner may not use index.
Leading wildcard: LIKE '%abc'.
OR across different columns (sometimes).
Stats stale: ANALYZE table_name.

EXPLAIN ANALYZE

EXPLAIN ANALYZE
SELECT u.name, SUM(o.total)
FROM users u JOIN orders o ON o.user_id = u.id
GROUP BY u.name;

Look for:

Seq Scan on a large table with selective predicate → missing index.
High cost discrepancy vs actual time → stale statistics.
Nested Loop with huge outer side → consider Hash Join.
Rows Removed by Filter large → filter too late, push to index.
Sort with Disk mention → bump work_mem or add ORDER BY index.

Normalization

Form	Rule
1NF	atomic values, unique rows
2NF	+ no partial dependency on composite PK
3NF	+ no transitive dependency
BCNF	+ every determinant is a candidate key

In practice: aim for 3NF, then denormalize for performance when joins are too expensive (e.g. precomputed counts, materialized views, search indexes).

Transactions & Isolation

BEGIN;
  UPDATE accounts SET balance = balance - 100 WHERE id = 1;
  UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT;

Isolation level	Phenomena allowed
READ UNCOMMITTED	dirty reads
READ COMMITTED	non-repeatable reads, phantoms (Postgres default)
REPEATABLE READ	phantoms (MySQL default; Postgres prevents phantoms here)
SERIALIZABLE	none

Pessimistic lock

SELECT * FROM seats WHERE show_id = 1 AND id = 42 FOR UPDATE;
-- ... reserve ...
COMMIT;

Optimistic concurrency

UPDATE orders SET total = ?, version = version + 1
WHERE id = ? AND version = ?;   -- 0 rows updated => someone beat you

Useful PostgreSQL features

RETURNING: get inserted/updated row(s) back.

INSERT INTO users(name) VALUES ('a') RETURNING id;

INSERT ... ON CONFLICT ... DO UPDATE (upsert):

INSERT INTO clicks(url, n) VALUES ('/x', 1)
ON CONFLICT (url) DO UPDATE SET n = clicks.n + 1;

generate_series: synthetic rows for date spines.

SELECT d::date FROM generate_series('2025-01-01','2025-01-31','1 day') d;

JSONB: col->'key', col->>'key' (as text), col @> '{"a":1}' containment.
Arrays: int[], text[]; ANY, unnest.
CTE with DELETE/UPDATE + RETURNING for "move rows" patterns.

15 Classic SQL Interview Questions

Schema we'll use unless stated:

employees(id, name, dept_id, salary, manager_id, hire_date)
departments(id, name)
orders(id, user_id, total, status, created_at)
users(id, name, email)
logins(user_id, login_date)

Q1. Nth highest salary

-- 2nd highest, distinct
SELECT MAX(salary) FROM employees
WHERE salary < (SELECT MAX(salary) FROM employees);

-- General Nth
SELECT salary
FROM (SELECT salary, DENSE_RANK() OVER (ORDER BY salary DESC) rk
      FROM employees) t
WHERE rk = N;

Q2. Employees earning more than their manager

SELECT e.name
FROM employees e
JOIN employees m ON e.manager_id = m.id
WHERE e.salary > m.salary;

Q3. Top 3 highest-paid per department

SELECT id, name, dept_id, salary
FROM (
  SELECT *, DENSE_RANK() OVER (PARTITION BY dept_id ORDER BY salary DESC) rk
  FROM employees
) t
WHERE rk <= 3;

Q4. Departments with no employees

SELECT d.name
FROM departments d
LEFT JOIN employees e ON e.dept_id = d.id
WHERE e.id IS NULL;

Q5. Duplicate emails

SELECT email, COUNT(*) AS n
FROM users
GROUP BY email
HAVING COUNT(*) > 1;

Q6. Delete duplicates, keep lowest id

DELETE FROM users a
USING users b
WHERE a.email = b.email AND a.id > b.id;

Classic "gaps & islands":

WITH grp AS (
  SELECT user_id, login_date,
         login_date - (ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY login_date))::int AS bucket
  FROM logins
)
SELECT user_id, MIN(login_date) AS start_d, MAX(login_date) AS end_d, COUNT(*) AS streak
FROM grp
GROUP BY user_id, bucket
HAVING COUNT(*) >= 3;

Q8. Median salary

SELECT PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary) AS median
FROM employees;

Q9. Year-over-year revenue growth

WITH yearly AS (
  SELECT EXTRACT(YEAR FROM created_at)::int AS yr, SUM(total) AS rev
  FROM orders GROUP BY 1
)
SELECT yr, rev,
       LAG(rev) OVER (ORDER BY yr) AS prev_rev,
       ROUND(100.0 * (rev - LAG(rev) OVER (ORDER BY yr))
             / NULLIF(LAG(rev) OVER (ORDER BY yr), 0), 2) AS pct_growth
FROM yearly;

Q10. Pivot status counts (one row per user)

SELECT user_id,
  COUNT(*) FILTER (WHERE status='paid')    AS paid,
  COUNT(*) FILTER (WHERE status='pending') AS pending,
  COUNT(*) FILTER (WHERE status='failed')  AS failed
FROM orders
GROUP BY user_id;

Q11. Users who placed an order every month of last year

SELECT user_id
FROM orders
WHERE created_at >= '2024-01-01' AND created_at < '2025-01-01'
GROUP BY user_id
HAVING COUNT(DISTINCT EXTRACT(MONTH FROM created_at)) = 12;

Q12. Active users last 7 days but not in the 7 before

WITH last7  AS (SELECT DISTINCT user_id FROM logins
                WHERE login_date >= CURRENT_DATE - 7),
     prev7  AS (SELECT DISTINCT user_id FROM logins
                WHERE login_date >= CURRENT_DATE - 14 AND login_date < CURRENT_DATE - 7)
SELECT user_id FROM last7 WHERE user_id NOT IN (SELECT user_id FROM prev7);

Q13. Cumulative orders per user

SELECT user_id, created_at, total,
       SUM(total) OVER (PARTITION BY user_id ORDER BY created_at) AS cum_total
FROM orders;

Q14. Self-join: friend recommendations (friends of friends, not already friends)

-- friendships(a, b)
SELECT DISTINCT f1.a AS u, f2.b AS rec
FROM friendships f1
JOIN friendships f2 ON f1.b = f2.a
WHERE f1.a != f2.b
  AND NOT EXISTS (
    SELECT 1 FROM friendships f3
    WHERE f3.a = f1.a AND f3.b = f2.b
  );

Q15. Tree depth (recursive)

WITH RECURSIVE r AS (
  SELECT id, manager_id, 1 AS depth FROM employees WHERE manager_id IS NULL
  UNION ALL
  SELECT e.id, e.manager_id, r.depth + 1
  FROM employees e JOIN r ON e.manager_id = r.id
)
SELECT MAX(depth) AS tree_depth FROM r;

Performance gotchas: quick list

SELECT * ships every column over the wire and disables covering indexes. Name columns.
OFFSET 100000 LIMIT 20 is slow. Use cursor pagination (WHERE id > last_id ORDER BY id LIMIT 20).
COUNT(*) over a huge table is slow in Postgres. Use estimates (pg_class.reltuples) if approximation is fine.
Implicit casts kill index usage.
ORDER BY ... LIMIT 1 over indexed column is O(log n); without index it sorts everything.
WHERE col IN (huge list) may be slower than a temp table join.
IN vs EXISTS: EXISTS is safer with NULLs and short-circuits.
Transactions held open block writers. Keep them short.
Each index slows down writes. Audit unused indexes (pg_stat_user_indexes).

Quick reference

Need	SQL
Distinct values	`SELECT DISTINCT col FROM t`
Count of distinct	`COUNT(DISTINCT col)`
String contains	`WHERE col ILIKE '%abc%'`
Null check	`WHERE col IS NULL` (never `= NULL`)
Coalesce	`COALESCE(col, 'default')`
Conditional	`CASE WHEN x THEN a ELSE b END`
Cast	`col::int` (PG) / `CAST(col AS INT)`
Concat	`a \|\| b` (PG/Standard), `CONCAT(a,b)` (MySQL)
Date diff (days)	`(d1 - d2)::int` (PG), `DATEDIFF(d1,d2)` (MySQL)
Top N per group	`ROW_NUMBER() OVER (PARTITION BY g ORDER BY x DESC) <= N`
Pagination	cursor: `WHERE id > $last ORDER BY id LIMIT 20`

Order of Execution

Joins

Anti-join / semi-join

Aggregates + GROUP BY / HAVING

Conditional aggregates

Window Functions

Frame clause

CTEs (WITH)

Recursive CTE (graph / hierarchy)

Subqueries

Indexes

Types

Composite index

Covering index

Partial index

When indexes hurt

Index doesn't get used? Common reasons

EXPLAIN ANALYZE

Normalization

Transactions & Isolation

Pessimistic lock

Optimistic concurrency

Useful PostgreSQL features

15 Classic SQL Interview Questions

Q1. Nth highest salary

Q2. Employees earning more than their manager

Q3. Top 3 highest-paid per department

Q4. Departments with no employees

Q5. Duplicate emails

Q6. Delete duplicates, keep lowest id

Q7. Consecutive login streak (≥ 3 days)

Q8. Median salary

Q9. Year-over-year revenue growth

Q10. Pivot status counts (one row per user)

Q11. Users who placed an order every month of last year

Q12. Active users last 7 days but not in the 7 before

Q13. Cumulative orders per user

Q14. Self-join: friend recommendations (friends of friends, not already friends)

Q15. Tree depth (recursive)

Performance gotchas: quick list

Quick reference