Fabian's Mix

Mixins, .NET, and more

re-linq: How to recognize if a method is a query operator?

without comments

On the users’ mailing list, Alex Norcliffe, Lead Architect of Umbraco 5, describes a problem they are currently facing with re-linq. As an illustration, consider the following two queries, especially the sub-queries within the where clauses:

var query1 = from c in QuerySource

             where c.Assistants.Any ()

             select c;


var query2 = from c in QuerySource

             where c.GetAssistants().Any()

             select c;

When re-linq analyzes those sub-queries within the where clauses, the first query will produce a SubQueryExpression with a QueryModel whose MainFromClause has the following expression: [c].Assistants. In other words, the items produced by the sub-query are those identified by the Assistants property.

The second query, however, will produce an exception:

Remotion.Linq.Parsing.ParserException : Cannot parse expression ‘c’ as it has an unsupported type. Only query sources (that is, expressions that implement IEnumerable) and query operators can be parsed.

—-> Remotion.Linq.Utilities.ArgumentTypeException : Expected a type implementing IEnumerable<T>, but found ‘Remotion.Linq.UnitTests.Linq.Core.TestDomain.Cook’.

Why’s that?

re-linq assumes that all methods occurring in a query operator call chain should be treated like query operators (Where, Select, etc.). This means that for the sub-query within query2, re-linq regards c as the start of the query operator chain. And, since c’s type does not implement IEnumerable<T>, it throws the exception shown above. Even if c’s type implemented IEnumerable<T>, an exception would be thrown that
GetAssistants() “is currently not supported”, unless one registers a custom node parser for that method.

Of course, what Alex actually wanted was re-linq treating both query1 and query2 in an equivalent way. I.e., a SubQueryExpression with a QueryModel whose MainFromClause has the following expression: [c].GetAssistants().

There is an easy workaround for now (see the mailing list), but I’m wondering how we could change re-linq to produce this result out of the box. I can think of two possibilities, both of which have certain drawbacks:

1 – Have re-linq treat MethodCallExpressions the same way as MemberExpressions. I.e, if the method has a registered node parser, treat it as a query operator. Otherwise, treat it (and all expression parts of the call chain before it) as the start of the query.

This would work quite well in the scenario shown above, and it would be nicely symmetric to how MemberExpressions work in re-linq.

However, it would become a very breaking change regarding diagnostics. Consider this example, in which CustomOperator is actually a custom query operator:


Currently, re-linq will throw an exception that it can’t parse CustomOperator() if one forgets to register the respective parser, and the LINQ provider backend won’t even get a QueryModel to process.

If we change this behavior, the frontend will no longer throw an exception, and the backend will suddenly get a MainFromClause with a FromExpression of "[source].Where(…).CustomOperator()". I think it would be difficult to understand for LINQ provider implementers why exactly this occurs. I can even imagine people believing this must be "right" (as no exception occurred) and start manually parsing the Where(…) and CustomOperator() calls, effectively reimplementing logic from re-linq…

2 – Have re-linq only treat MethodCallExpressions called on enumerables as query operators. Otherwise, treat them (and all expression parts of the call chain before the method call) as the start of the query.

This would also work in the given scenario, and it has the advantage of still providing good diagnostics when methods taking IEnumerable<T> have no associated expression node parser. However, it’s still a heuristic way of parsing, and it is asymmetric (both with MemberExpressions and in itself). Consider the following three examples:


re-linq would parse the StartQueryHere method in the first example as a query operator (and throw an exception if there isn’t an expression node parser registered for it). The StartQueryHere method and property in the second and third example, on the other hand, would parse just fine. I believe this is difficult to understand just as well.

What do other people think of these two options? If you want to see this scenario to be supported out of the box, please give me some feedback about it on the developer mailing list: http://groups.google.com/group/re-motion-dev/t/f9f6198bbbecd796.

Written by Fabian

December 20th, 2011 at 9:29 am

Posted in re-linq

Leave a Reply