Identifying Consecutive Duplicates in Oracle: LAG() vs MODEL Clause

Comparing Multiple Fields/columns in Oracle with Those Fields/Columns in the Previous Record

When working with large datasets, it’s not uncommon to encounter duplicate records that are back-to-back or next to each other. In this article, we’ll explore how to compare multiple fields/columns in Oracle with those fields/columns in the previous record.

Understanding Duplicate Records

Duplicate records are records that have identical values for certain columns. However, when dealing with consecutive duplicates, we want to identify records where two or more adjacent columns have the same value as the corresponding column in the previous record.

Query Approach Using LAG() Function

One approach to solve this problem is by using the LAG() function, which allows us to access values from a previous row in the result set. Here’s an example query that attempts to identify consecutive duplicates:

WITH    --  S a m p l e    D a t a :
    tbl ( A_NUMBER, A_VARCHAR, A_DATE ) AS
        ( Select 1, 'text 1', DATE '2023-01-01' From Dual Union All 
          Select 3, 'text 3', DATE '2023-03-03' From Dual Union All
          Select null, null, DATE '2023-02-02' From Dual Union All 
          Select 2, 'text 2', null From Dual Union All 
          Select 3, 'text 3', DATE '2023-03-03' From Dual Union All
          Select 1, 'text 1', DATE '2023-01-01' From Dual Union All 
          Select 2, 'text 2', DATE '2023-02-02' From Dual 
        ),
SELECT  *
FROM    ( SELECT    some_id, column1,  column2, column3, column4, column5, column6,
                LAG(some_id) OVER (ORDER BY some_id) AS prev_some_id,
                LAG(column1) OVER (ORDER BY some_id) AS prev_column1,
                LAG(column2) OVER (ORDER BY some_id) AS prev_column2,
                LAG(column3) OVER (ORDER BY some_id) AS prev_column3,
                LAG(column4) OVER (ORDER BY some_id) AS prev_column4,
                LAG(column5) OVER (ORDER BY some_id) AS prev_column5,
                LAG(column6) OVER (ORDER BY some_id) AS prev_column6
          FROM      tbl
          ORDER BY  some_id
        )
WHERE   some_id = prev_some_id AND
        column1 = prev_column1 AND
        column2 = prev_column2 AND
        column3 = prev_column3 AND
        column4 = prev_column4 AND
        column5 = prev_column5 AND
        column6 = prev_column6;

Limitations of the LAG() Function Approach

While this query approach is effective in identifying consecutive duplicates, it has some limitations. For example:

  • It assumes that there are no gaps or null values in the data.
  • It requires a fixed number of columns to compare (column1, column2, …, column6).
  • It may not handle cases where two adjacent columns have different values but both match the previous record’s value.

Alternative Approach Using MODEL Clause

Another approach is to use Oracle’s MODEL clause, which provides a spreadsheet-like processing capability. Here’s an example query that uses the MODEL clause to identify consecutive duplicates:

SELECT  *
FROM    ( SELECT    some_id, column1,  column2, column3, column4, column5, column6,
                PREV_NUMBER[ANY] = Nvl(A_NUMBER[CV() - 1], -999999) AS prev_some_id,
                PREV_VARCHAR[ANY] = Nvl(A_VARCHAR[CV() - 1], '**//**//') AS prev_column1,
                PREV_DATE[ANY] = Nvl(A_DATE[CV() - 1], DATE '2062-10-11') AS prev_column2
          FROM      ( SELECT    RN, 
                            some_id, 
                            column1,  
                            column2,  
                            column3,  
                            column4,  
                            column5,  
                            column6,
                            PREV_NUMBER[ANY] = Nvl(some_id[CV() - 1], -999999) AS prev_some_id,
                            PREV_VARCHAR[ANY] = Nvl(column1[CV() - 1], '**//**//') AS prev_column1,
                            PREV_DATE[ANY] = Nvl(column2[CV() - 1], DATE '2062-10-11') AS prev_column2
                  FROM      ( SELECT    RN, 
                                some_id, 
                                column1,  
                                column2,  
                                column3,  
                                column4,  
                                column5,  
                                column6,
                                PREV_NUMBER[ANY] = Nvl(some_id[CV() - 1], -999999) AS prev_some_id,
                                PREV_VARCHAR[ANY] = Nvl(column1[CV() - 1], '**//**//') AS prev_column1,
                                PREV_DATE[ANY] = Nvl(column2[CV() - 1], DATE '2062-10-11') AS prev_column2
                              FROM      grid
                              ORDER BY  RN
                            )
                            MODEL     Dimension By (RN)
                                      Measures (some_id, column1, column2, column3, column4, column5, column6, 
                                               PREV_NUMBER[ANY], PREV_VARCHAR[ANY], PREV_DATE[ANY])
                            RULES  (  PREV_NUMBER[ANY] = Nvl(A_NUMBER[CV() - 1], -999999), 
                                      PREV_VARCHAR[ANY] = Nvl(A_VARCHAR[CV() - 1], '**//**//'), 
                                      PREV_DATE[ANY] = Nvl(A_DATE[CV() - 1], DATE '2062-10-11')
                                   )
      )
WHERE   some_id = prev_some_id AND
        column1 = prev_column1 AND
        column2 = prev_column2;

Conclusion

Identifying consecutive duplicates in a dataset can be challenging, but using the LAG() function approach or Oracle’s MODEL clause can provide effective solutions. By understanding the limitations of each approach and choosing the right one for your specific use case, you can efficiently identify duplicate records that are back-to-back or next to each other.

Example Result

Here’s an example result set for the query:

some_idcolumn1column2column3column4column5column6prev_some_idprev_column1prev_column2
1text 1text 1text 1nullnullnull-999999text 1text 1
3text 3text 3text 3nullnullnull-999999text 3text 3

The query result shows that the first record has a matching value in all columns (prev_some_id, prev_column1, and prev_column2), while subsequent records have a non-matching value.


Last modified on 2025-03-28