Understanding SQL: Counting Columns from Separate Tables Based on a Certain Value
As a beginner in learning SQL, it’s essential to grasp the fundamentals of how to extract data from multiple tables. In this article, we’ll delve into the world of correlated subqueries and join syntax to solve a common problem: counting columns from separate tables based on a certain value.
Background Information
Before we dive into the solution, let’s review some essential SQL concepts:
- JOINs: Used to combine rows from two or more tables based on a related column between them.
- WHERE clause: Filters data based on conditions specified in the query.
- GROUP BY: Groups results by one or more columns and applies an aggregate function (e.g., SUM, COUNT) to each group.
Problem Description
We have three tables: Office, Staff, and Listing. Each table contains relevant information for our task:
Column Name | Data Type |
---|---|
id | Number |
office_suburb | String |
office_manager_id | Number |
Column Name | Data Type |
---|---|
id | Number |
first_name | String |
last_name | String |
salary | Number |
office_id | Number |
Column Name | Data Type |
---|---|
id | Number |
office_id | Number |
staff_id | Number |
buyer_id | Number |
The task is to write an SQL query that returns the suburb, manager’s name, total number of staff (excluding the manager), and the number of listings for sale for each office.
The Original Query
The original query attempts to solve this problem using a WHERE clause and GROUP BY:
SELECT
o.office_suburb AS "Office Suburb",
(s.first_name || ' ' || s.last_name) AS "Manager",
COUNT(s.salary)-1 AS "No. of Staff",
COUNT(l.id) AS "No. of Listings for Sale"
FROM staff s, office o, listing l
WHERE o.id = s.office_id
AND s.id = o.office_manager_id
AND o.id = l.office_id
AND s.id = l.staff_id
AND l.buyer_id IS NULL
GROUP BY o.office_suburb, s.first_name, s.last_name
Issues with the Original Query
The original query has several issues:
- The WHERE clause tries to join all three tables simultaneously using comma-separated table names. However, this is not recommended because it can lead to ambiguous column references and errors.
- The GROUP BY clause includes columns from both the Office and Staff tables. This can cause incorrect grouping results due to differences in data distribution between these two tables.
Solution Using Correlated Subqueries
The provided answer uses correlated subqueries to solve this problem:
SELECT o.office_suburb, (sm.first_name || ' ' || sm.last_name) AS Manager,
(SELECT COUNT(*)
FROM staff s2
WHERE s2.office_id = o.id
) as num_staff,
(SELECT COUNT(*)
FROM listings l
WHERE l.office_id = o.id AND l.buyer_id IS NULL
) as num_listings
FROM office o JOIN
staff sm
ON sm.id = o.office_manager_id ;
Explanation of the Solution
This solution consists of three parts:
- Join Syntax: Instead of using a WHERE clause to join all three tables, we use explicit JOIN syntax. This is considered best practice for readability and maintainability.
- Correlated Subqueries: We use correlated subqueries to calculate the total number of staff and listings for each office. These subqueries reference both the Office and Staff tables, ensuring accurate results.
Best Practices
When working with multiple tables in SQL, follow these best practices:
- Use explicit JOIN syntax instead of comma-separated table names.
- Avoid using GROUP BY clauses unless necessary.
- Consider using correlated subqueries for complex calculations that require referencing related data from another table.
Last modified on 2025-04-04