Linq Distinct Classes in C#

Posted in software by Christopher R. Wirz on Wed Aug 10 2011



Language Integrated Query (LINQ) is a Microsoft .NET Framework component that adds native data querying capabilities to .NET languages. One of the more powerful extensions is .Distinct(), which returns a collection of unique objects - as defined by the comparison method. This has actually led to some ambiguity, but many third parties often follow Microsoft's guidance.

Note: Microsoft recommends using IEquitable<T> while StyleCop recommends using IEqualityComparer<T>.

To illustrate the concept, consider the following class:


namespace Testing
{
    class TestClass
    {
        public int A = 0;
        public int B = 0;
    }
}

Now consider the following experiment...


using System;
using System.Collections.Generic;
using System.Linq;

namespace Testing
{
    internal class Program
    {
        static void Main(string[] args)
        {
            var list = new List<TestClass>();
            for (int i = 0; i < 50; i++)
            {
                list.Add(new TestClass());
            }
            Console.WriteLine("Found " + list.Distinct().Count());
            Console.ReadKey();
        }
    }
}

When we run the test initially, it says

    Found 50

No surprise there, but let's see if Microsoft's documentation can tell us how to get .Distinct() to return 1 result.

If you want to return distinct elements from sequences of objects of some custom data type, you have to implement the IEquatable<T> generic interface in the class.

So lets try it


class TestClass : IEquatable<TestClass>
{
	public int A = 0;
	public int B = 0;

    #region IEquatable<TestClass> implementation
	/// <summary>
	///     Checks if an other class object is equal to the instance
	/// </summary>
	/// <param name="other">The other class object.</param>
	/// <returns><c>true</c> if equal, <c>false</c> if otherwise</returns>
	public bool Equals(TestClass other)
	{
		if (Object.ReferenceEquals(other, null)) { return false; }
		if (Object.ReferenceEquals(other, this)) { return true; }
		return A == other.A && B == other.B;
	}

	/// <summary>
	///     Returns a hash code for this instance.
	/// </summary>
	/// <remarks>
	///     Suitable for use in hashing algorithms and data structures like a hash table.
	/// </remarks>
	/// <returns>A hash code (integer value) for this instance</returns>
	public override int GetHashCode()
	{
		return (A.GetHashCode()) ^ (B.GetHashCode());
	}
    #endregion
}

The results:

    Found 1

That works! Our test shows that implementing IEquatable<TestClass> allows for correct distinctness. But do we need the interface?


class TestClass
{
	public int A = 0;
	public int B = 0;

	/// <summary>
	///     Checks if an other class object is equal to the instance
	/// </summary>
	/// <param name="other">The other class object.</param>
	/// <returns><c>true</c> if equal, <c>false</c> if otherwise</returns>
	public bool Equals(TestClass other)
	{
		if (Object.ReferenceEquals(other, null)) { return false; }
		if (Object.ReferenceEquals(other, this)) { return true; }
		return A == other.A && B == other.B;
	}

	/// <summary>
	///     Returns a hash code for this instance.
	/// </summary>
	/// <remarks>
	///     Suitable for use in hashing algorithms and data structures like a hash table.
	/// </remarks>
	/// <returns>A hash code (integer value) for this instance</returns>
	public override int GetHashCode()
	{
		return (A.GetHashCode()) ^ (B.GetHashCode());
	}
}

The results:

    Found 50

We do need the IEquatable<TestClass> interface! But what if we just override the object Equals?


class TestClass
{
	public int A = 0;
	public int B = 0;

	/// <summary>
	///     Checks if an other class object is equal to the instance
	/// </summary>
	/// <param name="other">The other class object.</param>
	/// <returns><c>true</c> if equal, <c>false</c> if otherwise</returns>
	private bool Equals(TestClass other)
	{
		if (Object.ReferenceEquals(other, null)) { return false; }
		if (Object.ReferenceEquals(other, this)) { return true; }
		return A == other.A && B == other.B;
	}

	/// <summary>
	///     Returns a hash code for this instance.
	/// </summary>
	/// <remarks>
	///     Suitable for use in hashing algorithms and data structures like a hash table.
	/// </remarks>
	/// <returns>A hash code (integer value) for this instance</returns>
	public override int GetHashCode()
	{
		return (A.GetHashCode()) ^ (B.GetHashCode());
	}

	/// <summary>
	///     Determines whether the specified <see cref="System.Object" />,
	///     is equal to this instance.
	/// </summary>
	/// <param name="obj">The <see cref="System.Object" /> to compare with this instance.</param>
	/// <returns>
	///   <c>true</c> if the specified <see cref="System.Object" /> is equal to this instance;
	///  otherwise, <c>false</c>.
	/// </returns>
	public override bool Equals(object obj)
	{
		if (!(obj is TestClass)) { return false; }
		var other = obj as TestClass;
		if (other == null) { return false; }
		return this.Equals(other);
	}
}

The results:

    Found 1

It looks like the IEquatable<TestClass> interface is not needed when Equals(object obj) is overridden. But what about equality operators?


class TestClass
{
	public int A = 0;
	public int B = 0;

	/// <summary>
	///     Checks if an other class object is equal to the instance
	/// </summary>
	/// <param name="other">The other class object.</param>
	/// <returns><c>true</c> if equal, <c>false</c> if otherwise</returns>
	private bool Equals(TestClass other)
	{
		if (Object.ReferenceEquals(other, null)) { return false; }
		if (Object.ReferenceEquals(other, this)) { return true; }
		return A == other.A && B == other.B;
	}

	/// <summary>
	///     Returns a hash code for this instance.
	/// </summary>
	/// <remarks>
	///     Suitable for use in hashing algorithms and data structures like a hash table.
	/// </remarks>
	/// <returns>A hash code (integer value) for this instance</returns>
	public override int GetHashCode()
	{
		return (A.GetHashCode()) ^ (B.GetHashCode());
	}

	/// <summary>
	///     Implements the operator == (equals).
	/// </summary>
	/// <param name="lhs">The left hand side.</param>
	/// <param name="rhs">The right hand side.</param>
	/// <returns><c>true</c> if equal, <c>false</c> if not equal</returns>
	public static bool operator ==(TestClass lhs, TestClass rhs)
		 => Object.ReferenceEquals(lhs, null) ?
		Object.ReferenceEquals(rhs, null) : lhs.Equals(rhs);

	/// <summary>
	///     Implements the operator != (not equals)
	/// </summary>
	/// <param name="lhs">The left hand side.</param>
	/// <param name="rhs">The right hand side.</param>
	/// <returns><c>true</c> if not equal, <c>false</c> if equal</returns>
	public static bool operator !=(TestClass lhs, TestClass rhs)
		 => Object.ReferenceEquals(lhs, null) ?
		!Object.ReferenceEquals(rhs, null) : !lhs.Equals(rhs);

}

The results:

    Found 50

It looks like the equality operator does not help establish distinctness.

So why all this testing? As it turns out System.Collections.Generic.EqualityComparer<TestClass>.Default checks for IEquatable<TestClass> and uses .Equals(TestClass obj) before it uses .Equals(object obj). This is important because under the hood, Distinct() uses System.Collections.Generic.EqualityComparer<TestClass>.Default for uniqueness if none is specified.
Let's try an experiment


if (System.Collections.Generic.EqualityComparer<TestClass>.Default.Equals(new TestClass(), new TestClass()))
{
    Console.WriteLine("Found Equals");
}
else
{
    Console.WriteLine("Not Equals");
}

We'll set the Equals(object obj) to return false, and Equals(TestClass obj) to return correctly.


class TestClass : IEquatable<TestClass>
{
	public int A = 0;
	public int B = 0;

    #region IEquatable<TestClass> implementation
	/// <summary>
	///     Checks if an other class object is equal to the instance
	/// </summary>
	/// <param name="other">The other class object.</param>
	/// <returns><c>true</c> if equal, <c>false</c> if otherwise</returns>
	public bool Equals(TestClass other)
	{
		if (Object.ReferenceEquals(other, null)) { return false; }
		if (Object.ReferenceEquals(other, this)) { return true; }
		return A == other.A && B == other.B;
	}

	/// <summary>
	///     Returns a hash code for this instance.
	/// </summary>
	/// <remarks>
	///     Suitable for use in hashing algorithms and data structures like a hash table.
	/// </remarks>
	/// <returns>A hash code (integer value) for this instance</returns>
	public override int GetHashCode()
	{
		return (A.GetHashCode()) ^ (B.GetHashCode());
	}
    #endregion

	/// <summary>
	///     Determines whether the specified <see cref="System.Object" />,
	///     is equal to this instance.
	/// </summary>
	/// <param name="obj">The <see cref="System.Object" /> to compare with this instance.</param>
	/// <returns>
	///   <c>true</c> if the specified <see cref="System.Object" /> is equal to this instance;
	///  otherwise, <c>false</c>.
	/// </returns>
	public override bool Equals(object obj)
	{
		return false;
	}
}

The results:

    Found Equals

In conclusion, follow the Microsoft recommendation.

If you want to return distinct elements from sequences of objects of some custom data type, you have to implement the IEquatable<T> generic interface in the class.