SQL
The SQL surface area asked in SDE interviews — joins, indexes, window functions — plus 15 classic problems with solutions (PostgreSQL).
Free reference · last reviewed
SDE interviews test a narrow slice of SQL: the join types, when an index helps (and when it's quietly ignored), GROUP BY/HAVING, and window functions. Get those cold and you clear most rounds. Below is exactly that surface area, plus 15 classic problems with worked PostgreSQL solutions.
Dialect: PostgreSQL (notes for MySQL where they differ).
Order of Execution
Mental model: SQL is written in one order but executed in another.
Number FROM/WHERE/GROUP BY/HAVING/SELECT/ORDER BY/LIMIT in execution order
| Written | Executed |
|---|---|
| SELECT | 5 |
| FROM | 1 |
| WHERE | 2 |
| GROUP BY | 3 |
| HAVING | 4 |
| SELECT | 5 |
| ORDER BY | 6 |
| LIMIT | 7 |
Implication: aliases in SELECT can't be used in WHERE (still being built), but can in ORDER BY.
Joins
-- INNER: rows matching on both sides
SELECT u.name, o.total
FROM users u
JOIN orders o ON o.user_id = u.id;
-- LEFT: all left rows; right NULL if no match
SELECT u.name, o.total
FROM users u
LEFT JOIN orders o ON o.user_id = u.id;
-- FULL OUTER: union of LEFT + RIGHT
SELECT u.name, o.total
FROM users u
FULL OUTER JOIN orders o ON o.user_id = u.id;
-- SELF JOIN: same table twice (employee → manager)
SELECT e.name AS emp, m.name AS mgr
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.id;
-- CROSS JOIN: cartesian product (use sparingly)
SELECT a.id, b.id FROM a CROSS JOIN b;
Predict the pattern
Dataset — users: (id=1, name='alice'), (id=2, name='bob'). orders: (id=1, user_id=1, total=50).
SELECT u.name, o.total
FROM users u
LEFT JOIN orders o ON o.user_id = u.id;
What is o.total for the row where u.name = 'bob'?
LEFT JOIN keeps every row from the left table. Bob (id=2) has no matching order, so all right-side columns are filled with NULL — the row is not dropped.
Predict the pattern
Dataset — same as above: users: (id=1 alice), (id=2 bob). orders: one row with user_id=1.
SELECT u.name, o.total
FROM users u
JOIN orders o ON o.user_id = u.id;
How many rows does this INNER JOIN return?
INNER JOIN only keeps rows where the join condition matches on both sides. Bob (id=2) has no order, so he is excluded. Only alice matches → 1 row.
Anti-join / semi-join
-- Users with NO orders
SELECT u.* FROM users u
LEFT JOIN orders o ON o.user_id = u.id
WHERE o.id IS NULL;
-- Or:
SELECT * FROM users WHERE id NOT IN (SELECT user_id FROM orders);
-- Or (safer with NULLs):
SELECT * FROM users u WHERE NOT EXISTS
(SELECT 1 FROM orders o WHERE o.user_id = u.id);
NOT IN trap: if subquery returns any NULL, the whole NOT IN is unknown → 0 rows. Prefer NOT EXISTS.
Aggregates + GROUP BY / HAVING
SELECT user_id, COUNT(*) AS n_orders, SUM(total) AS revenue
FROM orders
WHERE created_at >= '2025-01-01'
GROUP BY user_id
HAVING SUM(total) > 1000
ORDER BY revenue DESC;
Predict the pattern
Dataset — emp table, 3 rows: bonus values are 100, NULL, 200.
SELECT COUNT(bonus) FROM emp;
What does this query return?
COUNT(col) counts non-NULL values only; the row with bonus = NULL is silently skipped. COUNT(*) would return 3. COUNT(bonus) returns 2.
Predict the pattern
Dataset — table t with 3 rows: col = 'a', col = 'x', col = NULL.
SELECT * FROM t WHERE col != 'x';
Rows where col IS NULL are ___ from the result.
Any comparison with NULL yields UNKNOWN (not TRUE), so NULL != 'x' is UNKNOWN, which fails the WHERE filter. Only col = 'a' passes → NULL rows are excluded.
WHEREfilters rows before grouping.HAVINGfilters groups after.- Every non-aggregated column in SELECT must be in GROUP BY (Postgres strict; MySQL lax → enable
ONLY_FULL_GROUP_BY).
Conditional aggregates
SELECT
COUNT(*) FILTER (WHERE status = 'paid') AS paid,
COUNT(*) FILTER (WHERE status = 'failed') AS failed
FROM orders;
-- Portable form:
SELECT
SUM(CASE WHEN status='paid' THEN 1 ELSE 0 END) AS paid,
SUM(CASE WHEN status='failed' THEN 1 ELSE 0 END) AS failed
FROM orders;
Window Functions
The interview-grade tool. OVER (PARTITION BY ... ORDER BY ...).
-- Running total per user
SELECT user_id, order_date, total,
SUM(total) OVER (PARTITION BY user_id ORDER BY order_date) AS running
FROM orders;
-- Rank orders within user by total
SELECT user_id, total,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY total DESC) AS rn,
RANK() OVER (PARTITION BY user_id ORDER BY total DESC) AS rk,
DENSE_RANK() OVER (PARTITION BY user_id ORDER BY total DESC) AS drk
FROM orders;
-- Previous / next row
SELECT user_id, order_date, total,
LAG(total) OVER (PARTITION BY user_id ORDER BY order_date) AS prev_total,
LEAD(total) OVER (PARTITION BY user_id ORDER BY order_date) AS next_total
FROM orders;
| Function | Output |
|---|---|
ROW_NUMBER() | unique sequential rank, no ties |
RANK() | ties share rank, leaves gaps (1,1,3) |
DENSE_RANK() | ties share rank, no gaps (1,1,2) |
NTILE(k) | distribute rows into k buckets |
LAG(col, n) | value from row n behind |
LEAD(col, n) | value from row n ahead |
FIRST_VALUE, LAST_VALUE | first/last in window |
SUM/AVG/COUNT OVER | running aggregate |
Frame clause
SUM(x) OVER (
PARTITION BY u
ORDER BY t
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW -- running total
)
-- BETWEEN 6 PRECEDING AND CURRENT ROW -- 7-day moving sum
CTEs (WITH)
WITH recent_orders AS (
SELECT user_id, SUM(total) AS spent
FROM orders
WHERE created_at >= NOW() - INTERVAL '30 days'
GROUP BY user_id
)
SELECT u.name, r.spent
FROM users u JOIN recent_orders r ON r.user_id = u.id
WHERE r.spent > 500;
- Improves readability over deep subqueries.
- Postgres ≥ 12: CTEs no longer always materialize; use
MATERIALIZED/NOT MATERIALIZEDto control.
Recursive CTE (graph / hierarchy)
-- All descendants of manager id 1
WITH RECURSIVE tree AS (
SELECT id, name, manager_id, 1 AS depth
FROM employees WHERE id = 1
UNION ALL
SELECT e.id, e.name, e.manager_id, t.depth + 1
FROM employees e JOIN tree t ON e.manager_id = t.id
)
SELECT * FROM tree;
Use for: org charts, category trees, BOMs, shortest-path on small graphs, "explode" sequences.
Subqueries
| Type | Example |
|---|---|
| Scalar | SELECT name, (SELECT COUNT(*) FROM orders o WHERE o.user_id = u.id) FROM users u |
IN | WHERE id IN (SELECT user_id FROM orders) |
EXISTS | WHERE EXISTS (SELECT 1 FROM orders WHERE user_id = u.id) |
| Correlated | subquery references outer query, runs per row |
EXISTS short-circuits → typically faster than IN on large data, and NULL-safe.
Indexes
Types
| Type | Best for |
|---|---|
| B-tree (default) | equality + range; ORDER BY |
| Hash | equality only |
| GIN | full-text, JSONB, array contains |
| GiST | geometry, ranges |
| BRIN | huge, append-only, naturally ordered tables |
Composite index
CREATE INDEX idx_orders_user_date ON orders(user_id, created_at);
Useful for:
WHERE user_id = ? AND created_at > ?WHERE user_id = ?- Not
WHERE created_at > ?(leftmost-prefix rule).
Covering index
CREATE INDEX idx_orders_user_total
ON orders(user_id) INCLUDE (total);
-- Query can be answered from index alone (no table fetch).
Partial index
CREATE INDEX idx_orders_pending
ON orders(created_at) WHERE status = 'pending';
When indexes hurt
- Write-heavy tables: every index = extra write cost.
- Low-cardinality columns (e.g. boolean): full scan often wins.
- Tiny tables.
Index doesn't get used? Common reasons
- Function on column:
WHERE LOWER(email) = ?→ use functional index or normalize at write. - Implicit cast:
WHERE int_col = '5'→ planner may not use index. - Leading wildcard:
LIKE '%abc'. ORacross different columns (sometimes).- Stats stale:
ANALYZE table_name.
EXPLAIN ANALYZE
EXPLAIN ANALYZE
SELECT u.name, SUM(o.total)
FROM users u JOIN orders o ON o.user_id = u.id
GROUP BY u.name;
Look for:
Seq Scanon a large table with selective predicate → missing index.- High
costdiscrepancy vsactual time→ stale statistics. Nested Loopwith huge outer side → considerHash Join.Rows Removed by Filterlarge → filter too late, push to index.SortwithDiskmention → bumpwork_memor add ORDER BY index.
Normalization
| Form | Rule |
|---|---|
| 1NF | atomic values, unique rows |
| 2NF | + no partial dependency on composite PK |
| 3NF | + no transitive dependency |
| BCNF | + every determinant is a candidate key |
In practice: aim for 3NF, then denormalize for performance when joins are too expensive (e.g. precomputed counts, materialized views, search indexes).
Transactions & Isolation
BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT;
| Isolation level | Phenomena allowed |
|---|---|
| READ UNCOMMITTED | dirty reads |
| READ COMMITTED | non-repeatable reads, phantoms (Postgres default) |
| REPEATABLE READ | phantoms (MySQL default; Postgres prevents phantoms here) |
| SERIALIZABLE | none |
Pessimistic lock
SELECT * FROM seats WHERE show_id = 1 AND id = 42 FOR UPDATE;
-- ... reserve ...
COMMIT;
Optimistic concurrency
UPDATE orders SET total = ?, version = version + 1
WHERE id = ? AND version = ?; -- 0 rows updated => someone beat you
Useful PostgreSQL features
RETURNING: get inserted/updated row(s) back.
INSERT INTO users(name) VALUES ('a') RETURNING id;
INSERT ... ON CONFLICT ... DO UPDATE(upsert):
INSERT INTO clicks(url, n) VALUES ('/x', 1)
ON CONFLICT (url) DO UPDATE SET n = clicks.n + 1;
generate_series: synthetic rows for date spines.
SELECT d::date FROM generate_series('2025-01-01','2025-01-31','1 day') d;
- JSONB:
col->'key',col->>'key'(as text),col @> '{"a":1}'containment. - Arrays:
int[],text[];ANY,unnest. - CTE with
DELETE/UPDATE+ RETURNING for "move rows" patterns.
15 Classic SQL Interview Questions
Schema we'll use unless stated:
employees(id, name, dept_id, salary, manager_id, hire_date)
departments(id, name)
orders(id, user_id, total, status, created_at)
users(id, name, email)
logins(user_id, login_date)
Q1. Nth highest salary
-- 2nd highest, distinct
SELECT MAX(salary) FROM employees
WHERE salary < (SELECT MAX(salary) FROM employees);
-- General Nth
SELECT salary
FROM (SELECT salary, DENSE_RANK() OVER (ORDER BY salary DESC) rk
FROM employees) t
WHERE rk = N;
Q2. Employees earning more than their manager
SELECT e.name
FROM employees e
JOIN employees m ON e.manager_id = m.id
WHERE e.salary > m.salary;
Q3. Top 3 highest-paid per department
SELECT id, name, dept_id, salary
FROM (
SELECT *, DENSE_RANK() OVER (PARTITION BY dept_id ORDER BY salary DESC) rk
FROM employees
) t
WHERE rk <= 3;
Q4. Departments with no employees
SELECT d.name
FROM departments d
LEFT JOIN employees e ON e.dept_id = d.id
WHERE e.id IS NULL;
Q5. Duplicate emails
SELECT email, COUNT(*) AS n
FROM users
GROUP BY email
HAVING COUNT(*) > 1;
Q6. Delete duplicates, keep lowest id
DELETE FROM users a
USING users b
WHERE a.email = b.email AND a.id > b.id;
Q7. Consecutive login streak (≥ 3 days)
Classic "gaps & islands":
WITH grp AS (
SELECT user_id, login_date,
login_date - (ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY login_date))::int AS bucket
FROM logins
)
SELECT user_id, MIN(login_date) AS start_d, MAX(login_date) AS end_d, COUNT(*) AS streak
FROM grp
GROUP BY user_id, bucket
HAVING COUNT(*) >= 3;
Q8. Median salary
SELECT PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary) AS median
FROM employees;
Q9. Year-over-year revenue growth
WITH yearly AS (
SELECT EXTRACT(YEAR FROM created_at)::int AS yr, SUM(total) AS rev
FROM orders GROUP BY 1
)
SELECT yr, rev,
LAG(rev) OVER (ORDER BY yr) AS prev_rev,
ROUND(100.0 * (rev - LAG(rev) OVER (ORDER BY yr))
/ NULLIF(LAG(rev) OVER (ORDER BY yr), 0), 2) AS pct_growth
FROM yearly;
Q10. Pivot status counts (one row per user)
SELECT user_id,
COUNT(*) FILTER (WHERE status='paid') AS paid,
COUNT(*) FILTER (WHERE status='pending') AS pending,
COUNT(*) FILTER (WHERE status='failed') AS failed
FROM orders
GROUP BY user_id;
Q11. Users who placed an order every month of last year
SELECT user_id
FROM orders
WHERE created_at >= '2024-01-01' AND created_at < '2025-01-01'
GROUP BY user_id
HAVING COUNT(DISTINCT EXTRACT(MONTH FROM created_at)) = 12;
Q12. Active users last 7 days but not in the 7 before
WITH last7 AS (SELECT DISTINCT user_id FROM logins
WHERE login_date >= CURRENT_DATE - 7),
prev7 AS (SELECT DISTINCT user_id FROM logins
WHERE login_date >= CURRENT_DATE - 14 AND login_date < CURRENT_DATE - 7)
SELECT user_id FROM last7 WHERE user_id NOT IN (SELECT user_id FROM prev7);
Q13. Cumulative orders per user
SELECT user_id, created_at, total,
SUM(total) OVER (PARTITION BY user_id ORDER BY created_at) AS cum_total
FROM orders;
Q14. Self-join: friend recommendations (friends of friends, not already friends)
-- friendships(a, b)
SELECT DISTINCT f1.a AS u, f2.b AS rec
FROM friendships f1
JOIN friendships f2 ON f1.b = f2.a
WHERE f1.a != f2.b
AND NOT EXISTS (
SELECT 1 FROM friendships f3
WHERE f3.a = f1.a AND f3.b = f2.b
);
Q15. Tree depth (recursive)
WITH RECURSIVE r AS (
SELECT id, manager_id, 1 AS depth FROM employees WHERE manager_id IS NULL
UNION ALL
SELECT e.id, e.manager_id, r.depth + 1
FROM employees e JOIN r ON e.manager_id = r.id
)
SELECT MAX(depth) AS tree_depth FROM r;
Performance gotchas: quick list
SELECT *ships every column over the wire and disables covering indexes. Name columns.OFFSET 100000 LIMIT 20is slow. Use cursor pagination (WHERE id > last_id ORDER BY id LIMIT 20).COUNT(*)over a huge table is slow in Postgres. Use estimates (pg_class.reltuples) if approximation is fine.- Implicit casts kill index usage.
ORDER BY ... LIMIT 1over indexed column is O(log n); without index it sorts everything.WHERE col IN (huge list)may be slower than a temp table join.INvsEXISTS:EXISTSis safer with NULLs and short-circuits.- Transactions held open block writers. Keep them short.
- Each index slows down writes. Audit unused indexes (
pg_stat_user_indexes).
Quick reference
| Need | SQL |
|---|---|
| Distinct values | SELECT DISTINCT col FROM t |
| Count of distinct | COUNT(DISTINCT col) |
| String contains | WHERE col ILIKE '%abc%' |
| Null check | WHERE col IS NULL (never = NULL) |
| Coalesce | COALESCE(col, 'default') |
| Conditional | CASE WHEN x THEN a ELSE b END |
| Cast | col::int (PG) / CAST(col AS INT) |
| Concat | a || b (PG/Standard), CONCAT(a,b) (MySQL) |
| Date diff (days) | (d1 - d2)::int (PG), DATEDIFF(d1,d2) (MySQL) |
| Top N per group | ROW_NUMBER() OVER (PARTITION BY g ORDER BY x DESC) <= N |
| Pagination | cursor: WHERE id > $last ORDER BY id LIMIT 20 |