Grouping Data with LINQ and Removing Duplicate Records
When working with data from multiple tables in Entity Framework, it’s not uncommon to want to perform aggregations based on groups of records. In this article, we’ll explore how to use LINQ to group data from two tables, remove duplicate records based on a common key, and calculate the average value for each group.
Understanding the Problem
Let’s consider an example where we have two tables: Authors
and Books
. The Books
table has columns for ID
, Title
, Genre
, and AuthorID
, which references the AuthorID
column in the Authors
table. We want to calculate the average age of authors grouped by genre.
Using Entity Framework Core
We’ll be using Entity Framework Core, a popular ORM (Object-Relational Mapping) tool for .NET developers. If you’re new to EF Core, I recommend checking out the official documentation and tutorials.
Step 1: Define the Models
First, let’s define our models:
public class Author
{
public int AuthorId { get; set; }
public string Name { get; set; }
public int Age { get; set; }
}
public class Book
{
public int BookId { get; set; }
public string Title { get; set; }
public string Genre { get; set; }
public int AuthorId { get; set; }
}
Step 2: Create the Database Context
Next, let’s create a database context that will help us interact with our tables:
public class MyDbContext : DbContext
{
public DbSet<Book> Books { get; set; }
public DbSet<Author> Authors { get; set; }
protected override void OnConfiguring(DbContextOptionsBuilder options)
=> options.UseSqlServer(@"Server=(localdb)\mssqllocaldb;Database=MyDB;Trusted_Connection=True;");
}
Step 3: Query the Data
Now, let’s write our LINQ query to group the data and remove duplicates:
var context = new MyDbContext();
var authors = new List<Author>();
var books = new List<Book>();
authors.Join(books, a => a.AuthorId, b => b.AuthorId,
(author, book) => new { author.AuthorId, author.Age, book.Genre })
.Distinct()
.GroupBy(r => r.Genre)
.Select(g => new
{
Genre = g.Key,
Age = g.Average(x => x.Age)
});
Step 4: Run the Query
Finally, let’s run our query and print the results:
foreach (var genre in authors)
{
Console.WriteLine($"Genre: {genre.Genre}, Average Age: {genre.Age}");
}
Understanding the LINQ Query
Our LINQ query consists of several parts. Let’s break it down:
authors.Join(books, a => a.AuthorId, b => b.AuthorId, (author, book) => new { author.AuthorId, author.Age, book.Genre })
: This line joins our two lists of authors and books based on the common key ofAuthorId
..Distinct()
: After joining the data, we use theDistinct()
method to remove any duplicate records. In this case, if an author is associated with multiple books in different genres, only one instance will remain..GroupBy(r => r.Genre)
: Next, we group our results by theGenre
column..Select(g => new { Genre = g.Key, Age = g.Average(x => x.Age) })
: Finally, we select a subset of columns from each group and calculate the average age for each genre.
Conclusion
In this article, we’ve explored how to use LINQ to group data from two tables, remove duplicate records based on a common key, and calculate the average value for each group. By following these steps and using the techniques outlined in this article, you should be able to create more efficient and effective queries when working with multiple tables in Entity Framework Core.
Common Pitfalls
Here are some common pitfalls to watch out for when writing LINQ queries:
- Overusing
Distinct()
: WhileDistinct()
can be useful for removing duplicates, it’s not always the best solution. In this example, we usedDistinct()
to remove duplicate records based on a common key. - Failing to Join Correctly: When joining tables in LINQ, make sure you’re using the correct keys and join types. Incorrect joins can lead to incorrect results or errors.
- Not Grouping Correctly: When grouping data with LINQ, consider what columns you want to group by and what aggregation functions you want to apply.
Next Steps
If you’ve made it this far in our article, congratulations! You now have a solid understanding of how to use LINQ to group data from multiple tables and remove duplicates based on a common key. Here are some next steps:
- Try More Complex Queries: Experiment with more complex queries that involve grouping, aggregating, and joining data.
- Use Other LINQ Methods: Familiarize yourself with other LINQ methods, such as
Where()
,ThenBy()
, andOrderBy()
. - Practice Regularly: The best way to improve your skills is to practice regularly. Try writing different queries and experimenting with new techniques.
References
Here are some additional resources to help you learn more about Entity Framework Core:
- Entity Framework Core Official Documentation
- EF Core Tutorials on Microsoft Learn
- [LINQ Tutorial on MSDN](https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/keywords LINQ)
Code Example
Here’s the complete code example from our article:
using System;
using System.Collections.Generic;
using System.Linq;
public class Author
{
public int AuthorId { get; set; }
public string Name { get; set; }
public int Age { get; set; }
}
public class Book
{
public int BookId { get; set; }
public string Title { get; set; }
public string Genre { get; set; }
public int AuthorId { get; set; }
}
public class MyDbContext : DbContext
{
public DbSet<Book> Books { get; set; }
public DbSet<Author> Authors { get; set; }
protected override void OnConfiguring(DbContextOptionsBuilder options)
=> options.UseSqlServer(@"Server=(localdb)\mssqllocaldb;Database=MyDB;Trusted_Connection=True;");
}
class Program
{
static void Main(string[] args)
{
var context = new MyDbContext();
var authors = new List<Author>();
var books = new List<Book>();
authors.Join(books, a => a.AuthorId, b => b.AuthorId,
(author, book) => new { author.AuthorId, author.Age, book.Genre })
.Distinct()
.GroupBy(r => r.Genre)
.Select(g => new
{
Genre = g.Key,
Age = g.Average(x => x.Age)
});
foreach (var genre in authors)
{
Console.WriteLine($"Genre: {genre.Genre}, Average Age: {genre.Age}");
}
}
}
Discussion
What do you think about the code example? Have any questions or suggestions for improvement?
Please let me know, and I’ll be happy to help.
Last modified on 2025-04-06