Splitting up a collection into batches using LINQ
Quite often I’ve found the need to split up a large list of data into multiple batches. Usually, I’d just have a counter increment and the end of a batch would be signified when the counter hits a multiple of 100, i.e.
int i = 1; foreach (var item in items) { if (i % 100 == 0) { //End of batch, submit changes here } //Perform operation i++; }
This is obviously quite ugly, and prone to error if the counter isn’t incremented properly. I’ve started using the below extension method
public static IEnumerable<IEnumerable<TSource>> Batch<TSource>( this IEnumerable<TSource> source, int batchSize) { var items = new TSource[batchSize]; var count = 0; foreach (var item in source) { items[count++] = item; if (count == batchSize) { yield return items; items = new TSource[batchSize]; count = 0; } } if (count > 0) yield return items.Take(count); }
The initial code could then be refactored into the below
foreach (var batch in items.Batch(100)) { foreach (var item in batch) { //Perform operation } //End of batch, submit changes here }
One scenario this method would be extremely useful would be if you’re using the collection items in a SQL ‘IN’ clause. There is a limit of 2100 items in an IN clause, so by splitting your data up into batches of 2100 and running your query for each batch (then concatenating them afterwards) you can circumvent this limit cleanly.
I have been wondering since I have read the code. Although this is exactly what I have been missing, I wonder if there is a better solution then an O(n2) solution.