Peeking Into LINQ DistinctBy Source Code

Cesar Aguirre - Aug 1 '22 - - Dev Community

I originally published this post on my blog a couple of weeks ago. It's part of a post series about LINQ.


"You should be ready, willing, and able to read the source code of your dependencies."

That's a piece of advice I found and shared in a past edition of my Monday Links.

Inspired by that advice, let's see what's inside the new LINQ DistinctyBy method.

What DistinctBy does

DistinctBy returns the objects containing unique values based on one of their properties. It works on collections of complex objects, not just on plain values.

DistinctBy is one of the new LINQ methods introduced in .NET 6.

Here's how to find unique movies by release year.

var movies = new List<Movie>
{
    new Movie("Schindler's List", 1993, 8.9f),
    new Movie("The Lord of the Rings: The Return of the King", 2003, 8.9f),
    new Movie("Pulp Fiction", 1994, 8.8f),
    new Movie("Forrest Gump", 1994, 8.7f),
    new Movie("Inception", 2010, 8.7f)
};

// Here we use the DistinctBy method with the ReleaseYear property
var distinctByReleaseYear = movies.DistinctBy(movie => movie.ReleaseYear);
//                                 👆👆👆

foreach (var movie in distinctByReleaseYear)
{
    Console.WriteLine($"{movie.Name}: [{movie.ReleaseYear}]");
}
// Output:
// Schindler's List: [1993]
// The Lord of the Rings: The Return of the King: [2003]
// Pulp Fiction: [1994]
// Inception: [2010]

record Movie(string Name, int ReleaseYear, float Score);
Enter fullscreen mode Exit fullscreen mode

We used DistinctBy() on a list of movies. We didn't use it on a list of released years to then find one movie for each unique release year found.

Puppy looking inside a gift bag

Let's peek into DistinctBy source code. Photo by freestocks on Unsplash

LINQ DistinctBy source code

This is the source code for the DistinctBy method. [Source]

public static IEnumerable<TSource> DistinctBy<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector, IEqualityComparer<TKey>? comparer)
{
    if (source is null)
    {
        ThrowHelper.ThrowArgumentNullException(ExceptionArgument.source);
    }
    if (keySelector is null)
    {
        ThrowHelper.ThrowArgumentNullException(ExceptionArgument.keySelector);
    }

    // Step 1️⃣
    return DistinctByIterator(source, keySelector, comparer);
}

private static IEnumerable<TSource> DistinctByIterator<TSource, TKey>(IEnumerable<TSource> source, Func<TSource, TKey> keySelector, IEqualityComparer<TKey>? comparer)
{
    // Step 2️⃣
    using IEnumerator<TSource> enumerator = source.GetEnumerator();

    // Step 3️⃣
    if (enumerator.MoveNext())
    {
        // Step 4️⃣
        var set = new HashSet<TKey>(DefaultInternalSetCapacity, comparer);
        do
        {
            // Step 5️⃣
            TSource element = enumerator.Current;
            if (set.Add(keySelector(element)))
            {
                yield return element;
            }
        }
        // Step 6️⃣
        while (enumerator.MoveNext());
    }
}
Enter fullscreen mode Exit fullscreen mode

Well, it doesn't look that complicated. Let's go through it.

1. Iterating over the input collection

First, DistinctBy() starts by checking its parameters and calling DistinctByIterator().

This is a common pattern in other LINQ methods. Check parameters in one method and then call a child iterator method to do the actual logic. (See Step 1. in the above code sample)

Then, the DistinctByIterator() initializes the underling enumerator of the input collection with a using declaration. The IEnumerable type has a GetEnumerator() method. (See Step 2.)

The IEnumerator type has:

  • a MoveNext() method to advance the enumerator to the next position
  • a Current property to hold the element at the current position.

If a collection is empty or if the iterator reaches the end of the collection, MoveNext() returns false. And, when MoveNext() returns true, Current gets updated with the element at that position. [Source]

Then, to start reading the input collection, the iterator is placed at the initial position of the collection calling MoveNext(). (See Step 3.) This first if avoids allocating memory by creating a set in the next step if the collection is empty.

2. Finding unique elements

After that, DistinctByIterator() creates a set with a default capacity and an optional comparer. This set keeps track of the unique keys already found. (See Step 4.)

The next step is to read the current element and add its key to the set. (See 5.)

If a set doesn't already contain the same element, Add() returns true and adds it to the set. Otherwise, it returns false. And, when the set exceeds its capacity, the set gets resized. [Source]

If the current element's key was added to the set, the element is returned with the yield return keywords. This way, DistinctByIterator() returns one element at a time.

Step 5 is wrapped inside a do-while loop. It runs until the enumerator reaches the end of the collection. (See Step 6.)

Voilà! That's the DistinctBy source code. Simple but effective.

Not that intimidating, after all. The trick was to use a set. It's a good exercise to read the source code of standard libraries to pick conventions and patterns.


Want to write more expressive code for collections? Join my Udemy course, Getting Started with LINQ, and master everything you need to work productively with LINQ — all in less than two hours!

Happy coding!

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .