Top N Rows Per Group in MySQL: A Step-by-Step Guide

Restrict Results to Top N Rows Per Group

Introduction

When working with data that needs to be processed in groups, it’s often necessary to restrict the results to a certain number of rows per group. This can be achieved using various techniques depending on the database management system (DBMS) being used. In this article, we’ll explore how to achieve this using MySQL 8 and later versions, as well as MySQL 5.x.

Problem Statement

The following query yields:

year | id  | rate
2006 | p01 |  8.0
2003 | p01 |  7.4
2008 | p01 |  6.8
2001 | p01 |  5.9
2007 | p01 |  5.3
2009 | p01 |  4.4
2002 | p01 |  3.9
2004 | p01 |  3.5
2005 | p01 |  2.1
2000 | p01 |  0.8
2001 | p02 | 12.5
2004 | p02 | 12.4
2002 | p02 | 12.2
2003 | p02 | 10.3
2000 | p02 |  8.7
2006 | p02 |  4.6
2007 | p02 |  3.3

We want only the top 5 rows for each id.

Solution using MySQL 8 and Later Versions

In MySQL 8 or later, you can use the ROW_NUMBER, RANK, or DENSE_RANK function depending on the exact definition of “top N rows per group”.

Using ROW_NUMBER Function

The following query uses the ROW_NUMBER function to assign a unique number to each row within each partition (group) based on the value sorted in descending order. The results are then filtered to show only the top 5 rows for each id.

SELECT *
FROM (
    SELECT *, ROW_NUMBER() OVER (PARTITION BY catid ORDER BY value DESC) AS n
    FROM t
) AS x
WHERE n <= 5;

Using RANK Function

The following query uses poor man’s rank over partition to achieve the desired result. It outer joins the table with itself and for each row, counts the number of rows before it (e.g., the before row could be the one with a higher value).

SELECT t.pkid, t.catid, t.value, COUNT(b.value) + 1 AS rank
FROM t
LEFT JOIN t AS b ON b.catid = t.catid AND b.value > t.value
GROUP BY t.pkid, t.catid, t.value
HAVING COUNT(b.value) + 1 <= 5
ORDER BY t.catid, t.value DESC, t.pkid;

Using DENSE_RANK Function

To produce results similar to DENSE_RANK, make the following change:

SELECT t.pkid, t/catid, t.value, COUNT(DISTINCT b.value) AS rank
FROM t
LEFT JOIN t AS b ON b.catid = t.catid AND (b.value > t.value OR b.value = t.value AND b.pkid < t.pkid)
GROUP BY t.pkid, t.catid, t.value
HAVING COUNT(DISTINCT b.value) <= 5;

Solution using MySQL 5.x

In MySQL 5.x, you can use poor man’s rank over partition to achieve the desired result. The following query produces results similar to RANK.

SELECT t.pkid, t.catid, t.value, COUNT(b.value) + 1 AS rank
FROM t
LEFT JOIN t AS b ON b.catid = t.catid AND b.value > t.value
GROUP BY t.pkid, t.catid, t.value
HAVING COUNT(b.value) + 1 <= 5
ORDER BY t.catid, t.value DESC, t.pkid;

Make the following change to produce results similar to DENSE_RANK.

SELECT t.pkid, t/catid, t.value, COUNT(DISTINCT b.value) AS rank
FROM t
LEFT JOIN t AS b ON b.catid = t.catid AND (b.value > t.value OR b.value = t.value AND b.pkid < t.pkid)
GROUP BY t.pkid, t.catid, t.value
HAVING COUNT(DISTINCT b.value) <= 5;

Make the following change to produce results similar to ROW_NUMBER.

SELECT t.pkid, t/catid, t.value, ON b.catid = t.catid AND (b.value > t.value OR b.value = t.value AND b.pkid < t.pkid)
FROM t
LEFT JOIN t AS b ON b.catid = t.catid AND (b.value > t.value OR b.value = t.value AND b.pkid < t.pkid)
GROUP BY t.pkid, t.catid, t.value
HAVING COUNT(b.value) + 1 <= 5;

Conclusion

When working with data that needs to be processed in groups, it’s often necessary to restrict the results to a certain number of rows per group. Using MySQL 8 or later versions, you can achieve this using various functions like ROW_NUMBER, RANK, and DENSE_RANK. In MySQL 5.x, poor man’s rank over partition can be used to achieve similar results.

Example Use Cases

  • Restricting search results to the top N results per category
  • Displaying a list of top-scoring articles per author
  • Showing the top-rated products per product category

Step-by-Step Solution

  1. Identify the columns required for grouping (e.g., id, catid)
  2. Choose the function to use based on your specific requirements (e.g., ROW_NUMBER for unique rows, RANK for tied values, DENSE_RANK for no gaps between ranks)
  3. Write the query using the chosen function
  4. Filter the results to show only the top N rows per group

Last modified on 2025-02-11