Grouping sequential blocks of data using LINQ – GroupWhile
A few days ago, I was required to group sequential blocks of data into separate ‘chunks’. Imagine the below sequence of integers
1 | 2 | 4 | 3 | 5 | 6 | 7 | 9 | 10 | 11 |
We’d want to create a group 1 and 2, but 4 should be in a separate group as it’s missing 3. So the groups would look like this
A | - | B | - | C | - | D | - | E | |||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 2 | - | 4 | - | 3 | - | 5 | 6 | 7 | - | 9 | 10 | 11 |
I decided to write an extension method that would do this, but also be much more generic and keep it as flexible as possible. In the broadest sense, all we need to do is add items to a group while a condition is True, then as soon as the condition is False, create a new group and carry on.
public static IEnumerable<IEnumerable<T>> GroupWhile<T>(this IEnumerable<T> source, Func<T, T, bool> condition) { T previous = source.First(); var list = new List<T>() { previous }; foreach (T item in source.Skip(1)) { if (condition(previous, item) == false) { yield return list; list = new List<T>(); } list.Add(item); previous = item; } yield return list; }
The use of this is as below
items.GroupWhile((x, y) => y.Sequence - x.Sequence == 1);
As you can imagine you could put any logic inside the lambda expression – you could group until a certain flag is set like the below
items.GroupWhile((x, y) => !x.EndOfGroup);
I haven’t found that I nee to use this often – but it’s a nice tool to add to your library. Note that it will not work for LINQ to SQL of course, as it is unable to translate this code into a SQL query
Hi Chris.
Nice generalized example. A few comments.
1. Any IQueryable.AsEnumerable() will convert any query to an enumerable. This definitely has side-effects, but can still be a useful trick.
2. In general you should avoid generating a sequence multiple times since this also can have side-effects. For example, if the sequence is IQueryable.AsEnumerable() – you will hit the database twice!!
There are several ways to treat the first time through a loop as a special case. One is to set a Boolean or counter which indicates this. Another is to “unroll” the enumeration a little and use MoveNext in more than one place. Something you cannot do with foreach.
First here is how to use a Boolean. In this case, I take advantage of the fact that the list is only null before the first time through
public static IEnumerable<IEnumerable> GroupWhile2(
[NotNull] this IEnumerable source,
[NotNull] Func groupWhileTruePredicate)
{
if (source is null) throw new ArgumentNullException(nameof(source));
if (groupWhileTruePredicate is null) throw new ArgumentNullException(nameof(groupWhileTruePredicate));
List list = null;
T previous = default;
foreach (T current in source)
{
if (list is null)
{
list = new List();
}
else if (!groupWhileTruePredicate(previous, current))
{
yield return list;
list = new List();
}
previous = current;
list.Add(current);
}
if (!(list is null)) yield return list;
}
This second example shows how to use MoveNext.
public static IEnumerable<IEnumerable> GroupWhile(
this IEnumerable source, Func predicate)
{
using (IEnumerator sourceEnumerator = source.GetEnumerator())
{
// empty sequence
if (!sourceEnumerator.MoveNext()) yield break;
// handle the first item
T previous = sourceEnumerator.Current;
var list = new List { previous };
// handle items 2..n
while (sourceEnumerator.MoveNext())
{
T current = sourceEnumerator.Current;
if (!predicate(previous, current))
{
yield return list;
list = new List();
}
previous = current;
list.Add(previous);
}
yield return list;
}
}
I actually like the MoveNext version a bit more, since it handles the cases where n == 0 , and n == 1 at the start, and no extra checks after that;
A minor flaw in the code is that the source enumeration is evaluated twice – once by source.First() call and later by foreach(… source.Skip(1)).