Fabian's Mix

Mixins, .NET, and more

re-linq: Query operators defined by interfaces

with 4 comments

And another question that makes for a good blog post: Why don’t we support automatic detection of interface implementation in the re-linq front-end when we detect query operators?

Consider the following example:

// This works

var cookNamesArray = new[] { "john", "joe", "jim" };

var query1 =
from c in Cooks
where cookNamesArray.Contains (c.FirstName)
select c.FirstName;

As the comment implies, this works – the Contains call is detected as a query operator. It is so because the expression above refers to Enumerable.Contains<T>(), for which a node type parser is, by default, registered in the MethodCallExpressionNodeTypeRegistry. re-linq uses this registry to determine what methods are query operators, and how to parse them.

Now, consider this example:

// This does not work

var cookNamesMyList = new MyList<string> { “john”, “joe”, “jim” };

var query2 =

    from c in Cooks

    where cookNamesMyList.Contains (c.FirstName)

    select c.FirstName;

 

// This works again

IEnumerable<string> cookNamesMyListAsEnumerable = new MyList<string> { “john”, “joe”, “jim” };

var query3 =

    from c in Cooks 
    where cookNamesMyListAsEnumerable.Contains (c.FirstName)

    select c.FirstName;

The first Contains call in that second example is not recognized as a query operator because MyList<T>.Contains() has not been registered as a query operator. Since re-linq analyzes the expression tree generated by the compiler, the second call is recognized again because it is again the Enumerable.Contains<T>() method.

Now, a sensible suggestion would be the following: Since MyList<T> probably implements ICollection<T>, why not register a parser for ICollection<T>.Contains()?

And the answer is: because it wouldn’t help. (Unless you cast to ICollection<T>, that is.)

As long as the method in the Expression tree isn’t ICollection<T>.Contains() but MyList<T>.Contains(), re-linq wouldn’t be able to determine that the registered parser should be used. But why not? After all, we can do it, why can’t re-linq?

Automatic detection of interface implementations for query operators is a difficult topic, primarily because it’s hard to get right with good performance. Currently, the re-linq front-end is quite fast because it only does dictionary lookups for the methods coming in via MethodCallExpressions (plus GetGenericTypeDefinition()/GetGenericMethodDefinition()).

With interface method implementations, this is not that simple – we can’t perform dictionary lookups, which are O(1) operations (leaving the calls mentioned above aside); we have to get an interface map for the registered interfaces and perform a linear search through the TargetMethods array of that interface map. This makes the lookup an O(N*M) operation, where N is the number of registered interface types, and M is the number of methods in those interfaces. (And probably, GetInterfaceMap() adds another dimension of complexity.)

We could try to make it perform better by detecting interface methods by name first, and only then check the interface map, but it would still be much slower than an ordinary lookup – and you must not forget that re-linq has to perform that lookup for every MethodCallExpression in an expression tree in order to detect whether a method call refers to a query operator or not. So this has to be really fast.

Complex logic like this could only be implemented via sophisticated caching, which we’ve been able to avoid before. We’d have to devise a good caching scheme, expose control over the cache’s lifetime to the user of re-linq (ie., the LINQ provider), and so on.

So, this is not a small feature to implement, which is why we haven’t done it up to now. However, I’ve created a JIRA feature request for it, and we’ll consider implementing it in the future. (Oh, and we accept patches if they come with the necessary unit tests. If somebody would like to implement this feature, let’s discuss the design on the re-motion developers list.)

Until the feature is implemented, there are a few workarounds:

  • You can use Enumerable.Contains<T>(), as illustrated above. This works out of the box right now.
  • You can register a parser for common collection types (eg., List<T>) to at least support those. (List<T>.Contains() will become a default operator with RM-3340.)
  • You can register parsers for ICollection<T>.Contains() or IList<T>.Contains(), but have to cast the collection to the standard interface type: where ((ICollection<string>) cookNamesMyList).Contains (c.FirstName). This will also be implemented with RM-3340.
  • You can allow your users to register parsers for their own query operators (MyList<T>.Contains()) by exposing the MethodCallExpressionNodeTypeRegistry.
  • You can register a parser for the “Contains” method name. This is not a good option because it would also cause re-linq to detect string.Contains() as a query operator.

The standard ContainsExpressionNode parser should work for all of these cases.

Written by Fabian

September 23rd, 2010 at 9:35 am

Posted in re-linq

4 Responses to 're-linq: Query operators defined by interfaces'

Subscribe to comments with RSS

  1. //…
    else
    {
    IList valueRange;
    // the InMemoryEvalCandidateFinder has found all lists to evaluate already and has converted them to constants. All other objects which
    // have a contains call on them aren’t supported.
    switch((int)handledSource.NodeType)
    {
    case (int)ExpressionType.Constant:
    // check if the object implements IList. If so assign the object directly. If not, convert to IList.
    ConstantExpression handledSourceAsConstant = (ConstantExpression)handledSource;
    if(handledSourceAsConstant.Value is IList)
    {
    valueRange = (IList)handledSourceAsConstant.Value;
    }
    else
    {
    // assume IEnumerable, otherwise the linq query wouldn’t compile. Use a linq to objects query to convert the IEnumerable to a list.
    valueRange = (from object v in (IEnumerable)handledSourceAsConstant.Value select v).ToList();
    }
    break;
    default:
    throw new ORMQueryConstructionException(string.Format("The Contains method on the type {0} isn’t convertable to an LLBLGen Pro construct.",
    declaringType.FullName));
    }

    toReturn = new InClauseExpression(valueRange, handledOperand);
    }
    …//

    tests:
    using(DataAccessAdapter adapter = new DataAccessAdapter())
    {
    LinqMetaData metaData = new LinqMetaData(adapter);

    List<string> countries = new List<string>() { "USA", "UK" };
    var q = from c in metaData.Customer
    where countries.Contains(c.Country)
    select c;

    List<CustomerEntity> results = q.ToList();
    Assert.AreEqual(20, results.Count);
    }

    using(DataAccessAdapter adapter = new DataAccessAdapter())
    {
    LinqMetaData metaData = new LinqMetaData(adapter);

    var q = from c in metaData.Customer
    where new List<string>() { "USA", "UK" }.Contains(c.Country)
    select c;

    List<CustomerEntity> results = q.ToList();
    Assert.AreEqual(20, results.Count);
    }

    using(DataAccessAdapter adapter = new DataAccessAdapter())
    {
    LinqMetaData metaData = new LinqMetaData(adapter);

    var q = from c in metaData.Customer
    where c.Orders.Select(oc => new { EID = oc.EmployeeId, CID = oc.CustomerId }).Contains(
    (from o in metaData.Order where o.CustomerId == "CHOPS" select new { EID = o.EmployeeId, CID = o.CustomerId }).First())
    select c;

    List<CustomerEntity> results = q.ToList();
    Assert.AreEqual(1, results.Count);
    Assert.AreEqual("CHOPS", results[0].CustomerId);
    }

    etc.

    It’s not that hard really. You DO have to handle up front the source of the Contains call and check whether it’s an in-memory materialized constant (which is detected by a funcletizer and which can be an ienumerable etc.) or a set (i.e. you need to use the IQueryable.Contains()). If it’s a List<T> or IEnumerable<T>, that’s irrelevant.

    Above code snippet is all it took, works OK.

    Frans Bouma

    23 Sep 10 at 10:15

  2. To elaborate abit more:
    1) the switch/case could be an if, but it’s written in the dark so I didn’t know I needed other states/types
    2) arrays and the like are also supported: every constant containing a Contains() call is either supported (Ilist implementing type) or it’s a type with a customer Contains() call, which isn’t supported (doesn’t implement IList). You can extend the code for other interfaces, but the point is the same: the list of values is a constant and as soon as you see that, you know it’s not an IQueryable.

    Frans Bouma

    23 Sep 10 at 10:21

  3. Hi Frans,

    Of course; we have very similar code in our own SQL back-end. But this is not about detecting whether an expression is a constant collection (quite easy) but to detect whether the method being called on it is one of the supported Contains methods.

    We enable users to register their own handlers for different methods, and we (currently) can only associate handlers by MethodInfo and by method name. We cannot associate by declaring interface of the MethodInfo. Which is the problem here.

    How do you do it? You probably just check whether the name is "Contains" and the declaring type is not String, right?

    That’s an ad-hoc solution that probably works (most of the time) but doesn’t integrate into the extensible handlers mechanism mentioned above.

    Although we could simulate it by allowing handlers to be registered by name (which we do) and "opt-out" if a name match wasn’t correct…

    Fabian

    fabian

    23 Sep 10 at 10:23

  4. And I like that concept (filters for name-based handler registrations) so much I’ve created a JIRA issue for it: https://www.re-motion.org/jira/browse/RM-3343 🙂

    fabian

    23 Sep 10 at 10:49

Leave a Reply