How to Count Columns from Separate Tables Based on a Certain Value Using SQL

Understanding SQL: Counting Columns from Separate Tables Based on a Certain Value

As a beginner in learning SQL, it’s essential to grasp the fundamentals of how to extract data from multiple tables. In this article, we’ll delve into the world of correlated subqueries and join syntax to solve a common problem: counting columns from separate tables based on a certain value.

Background Information

Before we dive into the solution, let’s review some essential SQL concepts:

JOINs: Used to combine rows from two or more tables based on a related column between them.
WHERE clause: Filters data based on conditions specified in the query.
GROUP BY: Groups results by one or more columns and applies an aggregate function (e.g., SUM, COUNT) to each group.

Problem Description

We have three tables: Office, Staff, and Listing. Each table contains relevant information for our task:

Column Name	Data Type
id	Number
office_suburb	String
office_manager_id	Number

Column Name	Data Type
id	Number
first_name	String
last_name	String
salary	Number
office_id	Number

Column Name	Data Type
id	Number
office_id	Number
staff_id	Number
buyer_id	Number

The task is to write an SQL query that returns the suburb, manager’s name, total number of staff (excluding the manager), and the number of listings for sale for each office.

The Original Query

The original query attempts to solve this problem using a WHERE clause and GROUP BY:

SELECT 
    o.office_suburb AS "Office Suburb", 
    (s.first_name || ' ' || s.last_name) AS "Manager",
    COUNT(s.salary)-1 AS "No. of Staff", 
    COUNT(l.id) AS "No. of Listings for Sale"
FROM staff s, office o, listing l
WHERE o.id = s.office_id 
    AND s.id = o.office_manager_id 
    AND o.id = l.office_id 
    AND s.id = l.staff_id
    AND l.buyer_id IS NULL
GROUP BY o.office_suburb, s.first_name, s.last_name

Issues with the Original Query

The original query has several issues:

The WHERE clause tries to join all three tables simultaneously using comma-separated table names. However, this is not recommended because it can lead to ambiguous column references and errors.
The GROUP BY clause includes columns from both the Office and Staff tables. This can cause incorrect grouping results due to differences in data distribution between these two tables.

Solution Using Correlated Subqueries

The provided answer uses correlated subqueries to solve this problem:

SELECT o.office_suburb, (sm.first_name || ' ' || sm.last_name) AS Manager,
       (SELECT COUNT(*)
        FROM staff s2
        WHERE s2.office_id = o.id
       ) as num_staff,
       (SELECT COUNT(*)
        FROM listings l
        WHERE l.office_id = o.id AND l.buyer_id IS NULL
       ) as num_listings
FROM office o JOIN
     staff sm
     ON sm.id = o.office_manager_id ;

Explanation of the Solution

This solution consists of three parts:

Join Syntax: Instead of using a WHERE clause to join all three tables, we use explicit JOIN syntax. This is considered best practice for readability and maintainability.
Correlated Subqueries: We use correlated subqueries to calculate the total number of staff and listings for each office. These subqueries reference both the Office and Staff tables, ensuring accurate results.

Best Practices

When working with multiple tables in SQL, follow these best practices:

Use explicit JOIN syntax instead of comma-separated table names.
Avoid using GROUP BY clauses unless necessary.
Consider using correlated subqueries for complex calculations that require referencing related data from another table.

Last modified on 2025-04-04