Introduction
INNER JOIN and LEFT JOIN are staples in SQL querying, adept at handling the majority of database queries. However, certain nuanced scenarios require more specialized join types. These include cases where you need to count the results returned by a function row-by-row, filter rows based on their existence in another table, or return rows that do not match any entry in another table.
LATERAL joins, Semi-joins, and Anti-joins are advanced techniques designed to address these specialized cases with precision. LATERAL joins enable subqueries to reference columns from preceding tables in the same FROM clause, Semi-joins allow for retrieving rows with a match in another table without duplicating them, and Anti-joins retrieve rows without matches in another table.
Let’s delve into how these techniques can be applied in practice.
SIDE JOINTS
A LATERAL subquery in the FROM clause can reference columns from previous tables in the same FROM clause. Without LATERAL, a subquery in FROM is evaluated independently and cannot see these columns.
Utilizing LATERAL is particularly crucial when employing set-returning functions, which yield multiple rows per input. These functions can be invoked in the SELECT list, but applying them row-by-row to a column within an external table in the FROM clause necessitates the use of LATERAL.
Example: Counting Occurrences of Words
This scenario involves counting the frequency of the words “bull” and “bear” within a content column. The matches must be case-insensitive, and substrings such as “bullish” or “bearer” must be excluded.
The data is stored in the google_file_store table as follows:
| File Name | Content |
|---|---|
| draft1.txt | The stock market is predicting a bull market that would delight many investors. |
| draft2.txt | The stock market is predicting a bull market… but analysts are warning… we’re expecting a bear market. |
| final.txt | The stock market is predicting a bull market…a bear market. As always, predicting the future of the market is uncertain… |
The following SQL code employs regexp_matches() to return a row per match. By placing it in the FROM clause with LATERAL, it ensures execution once per line of google_file_store and counts all matches in the array. The anchors m and M serve as PostgreSQL word boundaries, excluding “bullish” and “bearer”.
SELECT 'bull' AS word, COUNT(*) AS nentry
FROM google_file_store, LATERAL regexp_matches(LOWER(contents), 'm(bull)M', 'g')
UNION ALL
SELECT 'bear' AS word, COUNT(*) AS nentry
FROM google_file_store, LATERAL regexp_matches(LOWER(contents), 'm(bear)M', 'g');
| Word | Count |
|---|---|
| bull | 3 |
| bear | 2 |
Semi-joins
A semi-join returns rows from the left table where at least one match exists in the right table, with each row in the left table appearing at most once. Unlike INNER JOIN, which duplicates left rows if there are multiple matches on the right, semi-joins do not.
There are two SQL implementations for semi-joins:
- WHERE EXISTS (SELECT 1 FROM…)
- WHERE col IN (SELECT col FROM …)
The EXISTS form is more general, accommodating multi-column join conditions and correlated subqueries without requiring query rewriting.
Example: Finding High-Value Customers
This scenario involves identifying customers who have placed at least one order over $100, returning their customer ID and name.
Data:
| Customer ID | Customer Name |
|---|---|
| 1 | Alice Johnson |
| 2 | Bob Smith |
| 3 | Carole Williams |
| … | … |
| 10 | Jack Anderson |
Orders:
| Order ID | Customer ID | Amount | Status |
|---|---|---|---|
| 101 | 1 | 150 | paid |
| 102 | 1 | 200 | paid |
| 103 | 1 | 75 | paid |
| … | … | … | … |
| 115 | 9 | 450 | paid |
The EXISTS subquery verifies whether at least one order over $100 exists for each customer. The SELECT 1 convention prevails as EXISTS focuses solely on row existence rather than content.
SELECT c.customer_id, c.customer_name
FROM online_store_customers c
WHERE EXISTS (
SELECT 1
FROM online_store_orders o
WHERE o.customer_id = c.customer_id AND o.amount > 100
);
Anti-joins
An anti-join returns rows from the left table where no matches exist in the right table, essentially the opposite of a semi-join.
There are two SQL implementations for anti-joins:
- LEFT JOIN … WHERE right_table.col IS NULL
- WHERE DOES NOT EXIST (SELECT 1 FROM…)
Both approaches yield the same results. However, NOT EXISTS often provides a superior query plan in modern PostgreSQL versions and is more intuitively readable. The LEFT JOIN + IS NULL pattern is older yet useful when retaining right-side columns for unmatched rows.
Example: Free Users Without Calls in April
This scenario involves returning free users who made no calls in April 2020.
Data:
| Call ID | Call Date | User ID |
|---|---|---|
| 0 | 2020-04-19 01:06:00 | 1218 |
| 1 | 2020-03-01 16:51:00 | 1554 |
| 2 | 2020-03-29 07:06:00 | 1857 |
| 3 | 2020-03-07 02:01:00 | 1525 |
| … | … | … |
| 39 | 2020-03-11 08:33:00 | 1884 |
Users:
| User ID | Status | Company ID |
|---|---|---|
| 1218 | free | 1 |
| 1554 | inactive | 1 |
| 1857 | free | 2 |
| … | … | … |
| 1884 | free | 1 |
In this example, the date filter resides in the ON clause, not the WHERE clause, distinguishing it as an anti-join. Placing the date filter in WHERE would eliminate rows where the LEFT JOIN generated NULLs, reducing it to an INNER JOIN. With the filter enabled, qualifying April Call Free users consistently produce a row, with NULL values on the right side, and the IS NULL check retains only those rows.
SELECT DISTINCT u.user_id
FROM rc_users u
LEFT JOIN rc_calls c ON u.user_id = c.user_id AND c.call_date BETWEEN '2020-04-01' AND '2020-04-30'
WHERE u.status = 'free' AND c.user_id IS NULL;
Conclusion
Advanced join techniques like LATERAL, EXISTS, and NOT EXISTS, or LEFT JOIN + IS NULL are invaluable when INNER JOIN and LEFT JOIN fall short:
- LATERAL is essential when invoking set-returning functions row-by-row within FROM.
- EXISTS avoids duplication by providing “matched rows” without the redundancy of INNER JOIN.
- NOT EXISTS or LEFT JOIN + IS NULL efficiently delivers “unmatched rows”.
When INNER JOIN duplicates rows you don’t want, use EXISTS. When you need rows without matches, use NOT EXISTS or LEFT JOIN + IS NULL. When a subquery in FROM must reference columns from an external table, add LATERAL. Practice these techniques on real SQL interview questions and the syntax becomes second nature.
Nate Rosidi is a data scientist and product strategy specialist. He is also an assistant professor teaching analytics and is the founder of StrataScratch, a platform that helps data scientists prepare for their interviews with real interview questions asked by big companies. Nate writes about the latest career market trends, gives interview advice, shares data science projects, and covers all things SQL.
For more insights and examples, visit Here.
“`

