Split an IEnumerable<T> into fixed-sized chunks (return an IEnumerable<IEnumerable<T>> where the inner sequences are of fixed length)

C#LinqIenumerable

C# Problem Overview


I want to take an IEnumerable<T> and split it up into fixed-sized chunks.

I have this, but it seems inelegant due to all the list creation/copying:

private static IEnumerable<IEnumerable<T>> Partition<T>(this IEnumerable<T> items, int partitionSize)
{
    List<T> partition = new List<T>(partitionSize);
    foreach (T item in items)
    {
        partition.Add(item);
        if (partition.Count == partitionSize)
        {
            yield return partition;
            partition = new List<T>(partitionSize);
        }
    }
    // Cope with items.Count % partitionSize != 0
    if (partition.Count > 0) yield return partition;
}

Is there something more idiomatic?

EDIT: Although this has been marked as a duplicate of https://stackoverflow.com/questions/3210824/divide-array-into-an-array-of-subsequence-array it is not - that question deals with splitting an array, whereas this is about IEnumerable<T>. In addition that question requires that the last subsequence is padded. The two questions are closely related but aren't the same.

C# Solutions


Solution 1 - C#

You could try to implement Batch method mentioned above on your own like this:

    static class MyLinqExtensions 
    { 
        public static IEnumerable<IEnumerable<T>> Batch<T>( 
            this IEnumerable<T> source, int batchSize) 
        { 
            using (var enumerator = source.GetEnumerator()) 
                while (enumerator.MoveNext()) 
                    yield return YieldBatchElements(enumerator, batchSize - 1); 
        } 

        private static IEnumerable<T> YieldBatchElements<T>( 
            IEnumerator<T> source, int batchSize) 
        { 
            yield return source.Current; 
            for (int i = 0; i < batchSize && source.MoveNext(); i++) 
                yield return source.Current; 
        } 
    }

I've grabbed this code from http://blogs.msdn.com/b/pfxteam/archive/2012/11/16/plinq-and-int32-maxvalue.aspx.

UPDATE: Please note, that this implementation not only lazily evaluates batches but also items inside batches, which means it will only produce correct results when batch is enumerated only after all previous batches were enumerated. For example:

public static void Main(string[] args)
{
    var xs = Enumerable.Range(1, 20);
    Print(xs.Batch(5).Skip(1)); // should skip first batch with 5 elements
}

public static void Print<T>(IEnumerable<IEnumerable<T>> batches)
{
    foreach (var batch in batches)
    {
        Console.WriteLine($"[{string.Join(", ", batch)}]");
    }
}

will output:

[2, 3, 4, 5, 6] //only first element is skipped.
[7, 8, 9, 10, 11]
[12, 13, 14, 15, 16]
[17, 18, 19, 20]

So, if you use case assumes batching when batches are sequentially evaluated, then lazy solution above will work, otherwise if you can't guarantee strictly sequential batch processing (e.g. when you want to process batches in parallel), you will probably need a solution which eagerly enumerates batch content, similar to one mentioned in the question above or in the MoreLINQ

Solution 2 - C#

It feels like you want two iterator blocks ("yield return methods"). I wrote this extension method:

static class Extensions
{
  public static IEnumerable<IEnumerable<T>> Partition<T>(this IEnumerable<T> items, int partitionSize)
  {
    return new PartitionHelper<T>(items, partitionSize);
  }

  private sealed class PartitionHelper<T> : IEnumerable<IEnumerable<T>>
  {
    readonly IEnumerable<T> items;
    readonly int partitionSize;
    bool hasMoreItems;

    internal PartitionHelper(IEnumerable<T> i, int ps)
    {
      items = i;
      partitionSize = ps;
    }

    public IEnumerator<IEnumerable<T>> GetEnumerator()
    {
      using (var enumerator = items.GetEnumerator())
      {
        hasMoreItems = enumerator.MoveNext();
        while (hasMoreItems)
          yield return GetNextBatch(enumerator).ToList();
      }
    }

    IEnumerable<T> GetNextBatch(IEnumerator<T> enumerator)
    {
      for (int i = 0; i < partitionSize; ++i)
      {
        yield return enumerator.Current;
        hasMoreItems = enumerator.MoveNext();
        if (!hasMoreItems)
          yield break;
      }
    }

    IEnumerator IEnumerable.GetEnumerator()
    {
      return GetEnumerator();      
    }
  }
}

Solution 3 - C#

Maybe?

public static IEnumerable<IEnumerable<T>> Partition<T>(this IEnumerable<T> items, int partitionSize)
{
    return items.Select((item, inx) => new { item, inx })
                .GroupBy(x => x.inx / partitionSize)
                .Select(g => g.Select(x => x.item));
}

There is an already implemented one too: morelinq's Batch.

Solution 4 - C#

Craziest solution (with Reactive Extensions):

public static IEnumerable<IList<T>> Partition<T>(this IEnumerable<T> items, int partitionSize)
{
    return items
            .ToObservable() // Converting sequence to observable sequence
            .Buffer(partitionSize) // Splitting it on spececified "partitions"
            .ToEnumerable(); // Converting it back to ordinary sequence
}

I know that I changed signature but anyway we all know that we'll have some fixed size collection as a chunk.

BTW if you will use iterator block do not forget to split your implementation into two methods to validate arguments eagerly!

Solution 5 - C#

For elegant solution, You can also have a look at MoreLinq.Batch.

It batches the source sequence into sized buckets.

Example:

int[] ints = new int[] {1,2,3,4,5,6};
var batches = ints.Batch(2); // batches -> [0] : 1,2 ; [1]:3,4 ; [2] :5,6

Solution 6 - C#

public static IEnumerable<IEnumerable<T>> Partition<T>(this IEnumerable<T> items, 
                                                       int partitionSize)
{
    int i = 0;
    return items.GroupBy(x => i++ / partitionSize).ToArray();
}

Solution 7 - C#

Solution 8 - C#

You can do this using an overload of Enumerable.GroupBy and taking advantage of integer division.

return items.Select((element, index) => new { Element = element, Index = index })
    .GroupBy(obj => obj.Index / partitionSize, (_, partition) => partition);

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionAlastair MawView Question on Stackoverflow
Solution 1 - C#takemyoxygenView Answer on Stackoverflow
Solution 2 - C#Jeppe Stig NielsenView Answer on Stackoverflow
Solution 3 - C#L.BView Answer on Stackoverflow
Solution 4 - C#Sergey TeplyakovView Answer on Stackoverflow
Solution 5 - C#TilakView Answer on Stackoverflow
Solution 6 - C#JustAnotherUserView Answer on Stackoverflow
Solution 7 - C#ChristofferView Answer on Stackoverflow
Solution 8 - C#Adam MarasView Answer on Stackoverflow