Grouping Data with LINQ and Removing Duplicate Records

Grouping Data with LINQ and Removing Duplicate Records

When working with data from multiple tables in Entity Framework, it’s not uncommon to want to perform aggregations based on groups of records. In this article, we’ll explore how to use LINQ to group data from two tables, remove duplicate records based on a common key, and calculate the average value for each group.

Understanding the Problem

Let’s consider an example where we have two tables: Authors and Books. The Books table has columns for ID, Title, Genre, and AuthorID, which references the AuthorID column in the Authors table. We want to calculate the average age of authors grouped by genre.

Using Entity Framework Core

We’ll be using Entity Framework Core, a popular ORM (Object-Relational Mapping) tool for .NET developers. If you’re new to EF Core, I recommend checking out the official documentation and tutorials.

Step 1: Define the Models

First, let’s define our models:

public class Author
{
    public int AuthorId { get; set; }
    public string Name { get; set; }
    public int Age { get; set; }
}

public class Book
{
    public int BookId { get; set; }
    public string Title { get; set; }
    public string Genre { get; set; }
    public int AuthorId { get; set; }
}

Step 2: Create the Database Context

Next, let’s create a database context that will help us interact with our tables:

public class MyDbContext : DbContext
{
    public DbSet<Book> Books { get; set; }
    public DbSet<Author> Authors { get; set; }

    protected override void OnConfiguring(DbContextOptionsBuilder options)
        => options.UseSqlServer(@"Server=(localdb)\mssqllocaldb;Database=MyDB;Trusted_Connection=True;");
}

Step 3: Query the Data

Now, let’s write our LINQ query to group the data and remove duplicates:

var context = new MyDbContext();
var authors = new List<Author>();
var books = new List<Book>();

authors.Join(books, a => a.AuthorId, b => b.AuthorId,
    (author, book) => new { author.AuthorId, author.Age, book.Genre })
    .Distinct()
    .GroupBy(r => r.Genre)
    .Select(g => new
    {
        Genre = g.Key,
        Age = g.Average(x => x.Age)
    });

Step 4: Run the Query

Finally, let’s run our query and print the results:

foreach (var genre in authors)
{
    Console.WriteLine($"Genre: {genre.Genre}, Average Age: {genre.Age}");
}

Understanding the LINQ Query

Our LINQ query consists of several parts. Let’s break it down:

  • authors.Join(books, a => a.AuthorId, b => b.AuthorId, (author, book) => new { author.AuthorId, author.Age, book.Genre }): This line joins our two lists of authors and books based on the common key of AuthorId.
  • .Distinct(): After joining the data, we use the Distinct() method to remove any duplicate records. In this case, if an author is associated with multiple books in different genres, only one instance will remain.
  • .GroupBy(r => r.Genre): Next, we group our results by the Genre column.
  • .Select(g => new { Genre = g.Key, Age = g.Average(x => x.Age) }): Finally, we select a subset of columns from each group and calculate the average age for each genre.

Conclusion

In this article, we’ve explored how to use LINQ to group data from two tables, remove duplicate records based on a common key, and calculate the average value for each group. By following these steps and using the techniques outlined in this article, you should be able to create more efficient and effective queries when working with multiple tables in Entity Framework Core.

Common Pitfalls

Here are some common pitfalls to watch out for when writing LINQ queries:

  • Overusing Distinct(): While Distinct() can be useful for removing duplicates, it’s not always the best solution. In this example, we used Distinct() to remove duplicate records based on a common key.
  • Failing to Join Correctly: When joining tables in LINQ, make sure you’re using the correct keys and join types. Incorrect joins can lead to incorrect results or errors.
  • Not Grouping Correctly: When grouping data with LINQ, consider what columns you want to group by and what aggregation functions you want to apply.

Next Steps

If you’ve made it this far in our article, congratulations! You now have a solid understanding of how to use LINQ to group data from multiple tables and remove duplicates based on a common key. Here are some next steps:

  • Try More Complex Queries: Experiment with more complex queries that involve grouping, aggregating, and joining data.
  • Use Other LINQ Methods: Familiarize yourself with other LINQ methods, such as Where(), ThenBy(), and OrderBy().
  • Practice Regularly: The best way to improve your skills is to practice regularly. Try writing different queries and experimenting with new techniques.

References

Here are some additional resources to help you learn more about Entity Framework Core:

Code Example

Here’s the complete code example from our article:

using System;
using System.Collections.Generic;
using System.Linq;

public class Author
{
    public int AuthorId { get; set; }
    public string Name { get; set; }
    public int Age { get; set; }
}

public class Book
{
    public int BookId { get; set; }
    public string Title { get; set; }
    public string Genre { get; set; }
    public int AuthorId { get; set; }
}

public class MyDbContext : DbContext
{
    public DbSet<Book> Books { get; set; }
    public DbSet<Author> Authors { get; set; }

    protected override void OnConfiguring(DbContextOptionsBuilder options)
        => options.UseSqlServer(@"Server=(localdb)\mssqllocaldb;Database=MyDB;Trusted_Connection=True;");
}

class Program
{
    static void Main(string[] args)
    {
        var context = new MyDbContext();
        var authors = new List<Author>();
        var books = new List<Book>();

        authors.Join(books, a => a.AuthorId, b => b.AuthorId,
            (author, book) => new { author.AuthorId, author.Age, book.Genre })
            .Distinct()
            .GroupBy(r => r.Genre)
            .Select(g => new
            {
                Genre = g.Key,
                Age = g.Average(x => x.Age)
            });

        foreach (var genre in authors)
        {
            Console.WriteLine($"Genre: {genre.Genre}, Average Age: {genre.Age}");
        }
    }
}

Discussion

What do you think about the code example? Have any questions or suggestions for improvement?

Please let me know, and I’ll be happy to help.


Last modified on 2025-04-06