Why does Enumerable.Single() iterate all elements, even when more than one item has already been found?

C#Linq.Net 4.0

C# Problem Overview


When profiling one of our applications, we discovered a mysterious slowdown in some code where we were calling Enumerable.Single(source, predicate) for a large collection that had more than one item that matched the predicate near the start of the collection.

Investigation revealed that the implementation of Enumerable.Single() is as follows:

public static TSource Single<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate) 
{
		TSource result = default(TSource);
		long count = 0;
        // Note how this always iterates through ALL the elements:
		foreach (TSource element in source) { 
			if (predicate(element)) {
				result = element;
				checked { count++; }
			}
		}
		switch (count) {
			case 0: throw Error.NoMatch();
			case 1: return result;
		}
		throw Error.MoreThanOneMatch();
	}
	

That implementation will iterate through every element of the sequence, even if more than one element has already matched the predicate.

The following implementation would appear to yield the same results:

public static TSource Single<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate)
{
	TSource result = default(TSource);
	long count = 0;
	foreach (TSource element in source) {
		if (predicate(element)) {
			if (count == 1) // Exit loop immediately if more than one match found.
				throw Error.MoreThanOneMatch();
				
			result = element;
			count++; // "checked" is no longer needed.
		}
	}

    if (count == 0)
        throw Error.NoMatch();

	return result;
}

Does anyone know why the actual implementation doesn't use this obvious optimization? Is there something I'm missing? (I can't imagine that such an obvious optimization would be overlooked, and therefore there must be some concrete reason for it.)

(Note: I realize that this question may attract answers that are opinions; I'm hoping for answers that provide concrete reasons for iterating all elements. If the answer is actually "because the designers didn't think such an optimization was necessary", then this question is unanswerable and I guess I should just delete it...)


For comparison, look at the implementation of Single() which does not take a predicate:

public static TSource Single<TSource>(this IEnumerable<TSource> source) 
{
	IList<TSource> list = source as IList<TSource>;
	if (list != null) {
		switch (list.Count) {
			case 0: throw Error.NoElements();
			case 1: return list[0];
		}
	}
	else {
		using (IEnumerator<TSource> e = source.GetEnumerator()) {
			if (!e.MoveNext()) throw Error.NoElements();
			TSource result = e.Current;
			if (!e.MoveNext()) return result;
		}
	}
	throw Error.MoreThanOneElement();
}

In this case, they've gone to the effort of adding an optimisation for IList.

C# Solutions


Solution 1 - C#

You didn't seem to be the only one thinking that. The .NET Core implementation has an optimized version:

using (IEnumerator<TSource> e = source.GetEnumerator())
{
    while (e.MoveNext())
    {
        TSource result = e.Current;
        if (predicate(result))
        {
            while (e.MoveNext())
            {
                if (predicate(e.Current))
                {
                    throw Error.MoreThanOneMatch();
                }
            }

            return result;
        }
    }
}

So to answer your question: there doesn't seem to be a 'good' reason, other than just a developer not thinking about optimizing this use case.

Solution 2 - C#

The optimization was applied in .NET Core

The code now is :

public static TSource Single<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate)
{
    if (source == null)
    {
        throw Error.ArgumentNull(nameof(source));
    }

    if (predicate == null)
    {
        throw Error.ArgumentNull(nameof(predicate));
    }

    using (IEnumerator<TSource> e = source.GetEnumerator())
    {
        while (e.MoveNext())
        {
            TSource result = e.Current;
            if (predicate(result))
            {
                while (e.MoveNext())
                {
                    if (predicate(e.Current))
                    {
                        throw Error.MoreThanOneMatch();
                    }
                }

                return result;
            }
        }
    }

    throw Error.NoMatch();
}

Wherever possible, the code even checks whether the target is an IList<T> so it can avoid iterating:

public static TSource Single<TSource>(this IEnumerable<TSource> source)
{
    if (source == null)
    {
        throw Error.ArgumentNull(nameof(source));
    }

    if (source is IList<TSource> list)
    {
        switch (list.Count)
        {
            case 0:
                throw Error.NoElements();
            case 1:
                return list[0];
        }
    }
    else
    {
        using (IEnumerator<TSource> e = source.GetEnumerator())
        {
            if (!e.MoveNext())
            {
                throw Error.NoElements();
            }

            TSource result = e.Current;
            if (!e.MoveNext())
            {
                return result;
            }
        }
    }

    throw Error.MoreThanOneElement();
}

UPDATE

Checking the git blame output shows that the iteration optimization was applied back in 2016!

The IList<> optimization was added 1 year ago, probably as part of the Core 2.1 optimizations

Solution 3 - C#

As the other answers have pointed out, the optimization has been applied, but I would just like to raise the hypothesis that they had done it that way originally thinking of the fact that they have no way to guarantee that the predicate function does not have side effects.

I'm not sure that there would truly be a case where such behavior would be used/useful, but it is a consideration to keep in mind.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionMatthew WatsonView Question on Stackoverflow
Solution 1 - C#Patrick HofmanView Answer on Stackoverflow
Solution 2 - C#Panagiotis KanavosView Answer on Stackoverflow
Solution 3 - C#TurtleKwittyView Answer on Stackoverflow