Split an IEnumerable<T> into fixed-sized chunks (return an IEnumerable<IEnumerable<T>> where the inner sequences are of fixed length)
C#LinqIenumerableC# Problem Overview
I want to take an IEnumerable<T>
and split it up into fixed-sized chunks.
I have this, but it seems inelegant due to all the list creation/copying:
private static IEnumerable<IEnumerable<T>> Partition<T>(this IEnumerable<T> items, int partitionSize)
{
List<T> partition = new List<T>(partitionSize);
foreach (T item in items)
{
partition.Add(item);
if (partition.Count == partitionSize)
{
yield return partition;
partition = new List<T>(partitionSize);
}
}
// Cope with items.Count % partitionSize != 0
if (partition.Count > 0) yield return partition;
}
Is there something more idiomatic?
EDIT: Although this has been marked as a duplicate of https://stackoverflow.com/questions/3210824/divide-array-into-an-array-of-subsequence-array it is not - that question deals with splitting an array, whereas this is about IEnumerable<T>
. In addition that question requires that the last subsequence is padded. The two questions are closely related but aren't the same.
C# Solutions
Solution 1 - C#
You could try to implement Batch method mentioned above on your own like this:
static class MyLinqExtensions
{
public static IEnumerable<IEnumerable<T>> Batch<T>(
this IEnumerable<T> source, int batchSize)
{
using (var enumerator = source.GetEnumerator())
while (enumerator.MoveNext())
yield return YieldBatchElements(enumerator, batchSize - 1);
}
private static IEnumerable<T> YieldBatchElements<T>(
IEnumerator<T> source, int batchSize)
{
yield return source.Current;
for (int i = 0; i < batchSize && source.MoveNext(); i++)
yield return source.Current;
}
}
I've grabbed this code from http://blogs.msdn.com/b/pfxteam/archive/2012/11/16/plinq-and-int32-maxvalue.aspx.
UPDATE: Please note, that this implementation not only lazily evaluates batches but also items inside batches, which means it will only produce correct results when batch is enumerated only after all previous batches were enumerated. For example:
public static void Main(string[] args)
{
var xs = Enumerable.Range(1, 20);
Print(xs.Batch(5).Skip(1)); // should skip first batch with 5 elements
}
public static void Print<T>(IEnumerable<IEnumerable<T>> batches)
{
foreach (var batch in batches)
{
Console.WriteLine($"[{string.Join(", ", batch)}]");
}
}
will output:
[2, 3, 4, 5, 6] //only first element is skipped.
[7, 8, 9, 10, 11]
[12, 13, 14, 15, 16]
[17, 18, 19, 20]
So, if you use case assumes batching when batches are sequentially evaluated, then lazy solution above will work, otherwise if you can't guarantee strictly sequential batch processing (e.g. when you want to process batches in parallel), you will probably need a solution which eagerly enumerates batch content, similar to one mentioned in the question above or in the MoreLINQ
Solution 2 - C#
It feels like you want two iterator blocks ("yield return
methods"). I wrote this extension method:
static class Extensions
{
public static IEnumerable<IEnumerable<T>> Partition<T>(this IEnumerable<T> items, int partitionSize)
{
return new PartitionHelper<T>(items, partitionSize);
}
private sealed class PartitionHelper<T> : IEnumerable<IEnumerable<T>>
{
readonly IEnumerable<T> items;
readonly int partitionSize;
bool hasMoreItems;
internal PartitionHelper(IEnumerable<T> i, int ps)
{
items = i;
partitionSize = ps;
}
public IEnumerator<IEnumerable<T>> GetEnumerator()
{
using (var enumerator = items.GetEnumerator())
{
hasMoreItems = enumerator.MoveNext();
while (hasMoreItems)
yield return GetNextBatch(enumerator).ToList();
}
}
IEnumerable<T> GetNextBatch(IEnumerator<T> enumerator)
{
for (int i = 0; i < partitionSize; ++i)
{
yield return enumerator.Current;
hasMoreItems = enumerator.MoveNext();
if (!hasMoreItems)
yield break;
}
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}
}
Solution 3 - C#
Maybe?
public static IEnumerable<IEnumerable<T>> Partition<T>(this IEnumerable<T> items, int partitionSize)
{
return items.Select((item, inx) => new { item, inx })
.GroupBy(x => x.inx / partitionSize)
.Select(g => g.Select(x => x.item));
}
There is an already implemented one too: morelinq's Batch.
Solution 4 - C#
Craziest solution (with Reactive Extensions):
public static IEnumerable<IList<T>> Partition<T>(this IEnumerable<T> items, int partitionSize)
{
return items
.ToObservable() // Converting sequence to observable sequence
.Buffer(partitionSize) // Splitting it on spececified "partitions"
.ToEnumerable(); // Converting it back to ordinary sequence
}
I know that I changed signature but anyway we all know that we'll have some fixed size collection as a chunk.
BTW if you will use iterator block do not forget to split your implementation into two methods to validate arguments eagerly!
Solution 5 - C#
For elegant solution, You can also have a look at MoreLinq.Batch.
It batches the source sequence into sized buckets.
Example:
int[] ints = new int[] {1,2,3,4,5,6};
var batches = ints.Batch(2); // batches -> [0] : 1,2 ; [1]:3,4 ; [2] :5,6
Solution 6 - C#
public static IEnumerable<IEnumerable<T>> Partition<T>(this IEnumerable<T> items,
int partitionSize)
{
int i = 0;
return items.GroupBy(x => i++ / partitionSize).ToArray();
}
Solution 7 - C#
How about the partitioner classes in the http://msdn.microsoft.com/en-us/library/system.collections.concurrent.aspx">System.Collections.Concurrent</a> namespace?
Solution 8 - C#
You can do this using an overload of Enumerable.GroupBy
and taking advantage of integer division.
return items.Select((element, index) => new { Element = element, Index = index })
.GroupBy(obj => obj.Index / partitionSize, (_, partition) => partition);