This is the second part of a two-part series of posts. Read the first part for a very short introduction to re-linq, read Stefan Wenig’s post or my whitepaper for more background.
As promised, here’s an introduction to the steps that need to be taken to implement a LINQ provider using re-linq.
Interfaces to start with
First, let’s take a look at the classes and interfaces LINQ and re-linq require you to implement. To start with, you need to provide an implementation of IQueryable<T>. That’s LINQ’s main query interface, and all of LINQ’s query methods, such as Queryable.Where, Queryable.OrderBy, or Queryable.Select are written against it. re-linq provides a base class, QueryableBase<T>, from which you can derive to implement this interface. Doing so is fairly trivial, it only requires adding two constructors – one used by your provider’s clients, one used by the LINQ infrastructure in the .NET framework.
Then, you need an implementation of IQueryProvider. LINQ query methods use this interface to create new queries around an existing IQueryable<T> and to actually execute queries. For example, a call to Queryable.Where will take an existing query and wrap its expression so that it now represents a query with a where clause. A call to Queryable.Single will use the IQueryProvider.Execute method to actually execute the query. Enumerating queries will also delegate to IQueryProvider.Execute.
re-linq provides an abstract base class, QueryProviderBase, and a default implementation, DefaultQueryProvider, which implement the IQueryProvider interface. Usually, DefaultQueryProvider is completely sufficient, so QueryableBase<T> uses that implementation by default.
While DefaultQueryProvider implements the query creation part of IQueryProvider, it of course cannot pre-implement the actual execution of a query against the target query system. Instead, it does the following:
IQueryExecutor is an interface representing the details of executing a query against a target queryable system. This means it needs to be implemented by you, of course, since you are the one who knows how to build queries for that system.
IQueryExecutor and result operators
When you take a look at IQueryExecutor, you can see that it has three methods: ExecuteScalar, ExecuteSingle, and ExecuteCollection.
Let’s start with ExecuteCollection, since that is the simplest of the three methods. Take a look at the following code:
var query = from o in QueryFactory.CreateLinqQuery<Order>()
where o.OrderNumber > 10
select o;
foreach (var order in query)
{
Console.WriteLine (order.OrderNumber);
}
When you execute that code, the query is enumerated and expected to return a collection (or sequence) of items. That’s why IQueryExecutor.ExecuteCollection() is called for that query (at least when the object returned by QueryFactory.CreateLinqQuery<T>() is based on QueryableBase<T>). ExecuteCollection is passed a QueryModel that has exactly one MainFromClause, one WhereClause, and one SelectClause. In short, the QueryModel directly corresponds to the LINQ query written above.
Now, what about ExecuteSingle and ExecuteScalar? Take a look at the following two queries:
var count = (from o in QueryFactory.CreateLinqQuery<Order>()
where o.OrderNumber > 10
select o).Count();
var item = (from o in QueryFactory.CreateLinqQuery<Order> ()
where o.OrderNumber > 10
select o).First ();
These two queries are different in that they are not expected to return collections. Instead, they are expected to return scalar, calculated values and single items from the sequence, respectively. Their QueryModels have operators attached to them that represent the calculation or single item selection. re-linq calls those ResultOperators.
The first query has a CountResultOperator, which represents a scalar value calculated from the query’s result sequence, therefore IQueryExecutor.ExecuteScalar is called in order to execute it. Other scalar operators are LongCountResultOperator, ContainsResultOperator, SumResultOperator, and AverageResultOperator.
The second query has a FirstResultOperator, which represents a single item that is selected from the result sequence, therefore IQueryExecutor.ExecuteSingle is called in order to execute it. Other single operators are SingleResultOperator, LastResultOperator, MinResultOperator, and MaxResultOperator. All of those choose a single item from the query sequence, so all of them are treated the same way. Note that even when those operators return a scalar value because the query returns a sequence of scalar values, they still invoke ExecuteSingle because a single item is chosen from the list rather than calculated.
Translating queries
For many target queryable systems it will be possible to simply implement ExecuteCollection and just delegate to that from ExecuteSingle or ExecuteScalar. For others, it might be important to take note of the semantic differences. Whichever path you follow, you’ll finally have to pose one important question. “How the heck do I create a query in my target system’s format from a QueryModel?”
And the answer is, of course, “That depends on your target system!” :)
However, re-linq gives you two important tools to do so: IQueryModelVisitor and ExpressionTreeVisitor.
The first of those two visitors operates on a large scale: it provides a way to execute specific code for each clause within a QueryModel, allowing you to translate one clause at a time. You can collect the partial results of your translations, and finally make one query for your target system from those parts.
The simplest way to make use of IQueryModelVisitor is to derive from QueryModelVisitorBase. That class implements the interface by automatically iterating over sub-clauses and collections, dispatching to the correct visitor methods for every element of the query. It’s also hardened against modifications of the QueryModel being iterated, but more about this later. Simply override its Visit… methods for the query components you want to handle, and generate your target query parts accordingly. Note that you need to handle all the clauses, result operators, and so on defined by re-linq. If you don’t at least throw an exception for those constructs you simply cannot translate, you’ll get invalid query translations.
While you’re visiting the clauses and result operators, you’ll notice that some of them contain LINQ Expressions. For example, WhereClause.Predicate contains an Expression, SelectClause.Selector does, and even MainFromClause.FromExpression is an expression tree. Now, haven’t I said earlier that LINQ expressions are inherently complex and hard to understand?
They are, but the expressions you can find in re-linq’s clauses have already been simplified. In them,
- references to outer variables (closures) and other evaluatable expressions have already been pre-evaluated into constants,
- sub-queries have been parsed and replaced by QueryModels wrapped in SubQueryExpressions, and, most importantly,
- transparent identifiers have been removed and references to query sources (from clauses, joins) have been replaced by QuerySourceReferenceExpressions, which link back to the respective query source.
Therefore, the expressions you find in re-linq’s clauses are usually quite straight-forward to translate to the target query system. Depending on the target query system, of course.
To implement the translation of expressions, you derive a class from ExpressionTreeVisitor or, better, ThrowingExpressionTreeVisitor. Both of them are meant to iterate over an expression tree and to visit each of the nodes in the tree, but ThrowingExpressionTreeVisitor throws an exception for unsupported node types by default.
Simply override the Visit… methods for those node types you want to support, and generate a semantically equivalent query element for your target query system. Then, from your IQueryModelVisitor, take the elements and integrate them into the current query part.
All of this works very fine. Unless, of course, you encounter a construct that’s just way incompatible with your target query system. What now, throw a NotSupportedException? Realistically, you’ll have to do that, sometimes. But in other cases, it would actually be possible to support some of these constructs, although you’d have to simulate them using other query mechanisms… somehow…
Transforming queries
For example, your target query system might not support sub-queries in from clauses. But sometimes, sub-queries in from clauses can be flattened, thus turning the unsupported query into a supported one.
Or, in other scenarios, you might want to move a Where clause from one side of a join to the other side in order to avoid creating a dependent sub-query. Or you might want to detect group clauses with aggregates if those are well-translatable into your target query system.
While re-linq does not – and cannot – pre-implement all conceivable query model transformations, it does provide a lot of infrastructural support for them. Here’s a list of what we do in order to make transformations less difficult:
- Apart from QuerySourceReferenceExpressions, there are no ordering dependencies between clauses in a QueryModel. You can simply remove clauses from the model, move them around, or insert new ones without any problems. Only when there are QuerySourceReferenceExpressions that reference those clauses, it is of course important to be more careful. Usually, referenced query sources must stay in the query, prior to the point where they are referenced, or the references must be updated (see below).
- All properties of clauses are settable, i.e. it’s easy to replace a WhereClause’s predicate or change an AdditionalFromClause’s item name.
- If both the original and the transformed QueryModel must be retained, the QueryModel.Clone() method provides a simple way of generating a deep copy (including clones of all query elements) of the QueryModel before it is transformed.
- QueryModel.TransformExpressions() provides an easy-to-use mechanism to transform all expressions held by a query model in one go.
- ReferenceReplacingExpressionTreeVisitor provides an easy-to-use mechanism to replace references to query sources after they were modified or removed, even across sub-queries. Use in combination with QueryModel.TransformExpressions() whenever replacing a query source or moving a clause from one QueryModel to another.
- ExpressionTreeVisitor supports custom modification of the expression tree being visited. Simply return new nodes from any of its Visit… methods, and ExpressionTreeVisitor will automatically create an expression tree containing your new nodes.
- QueryModelVisitorBase is hardened against changes made to the QueryModel while it is being visited. This means that from any QueryModelVisitorBase.Visit… method, you can modify any element of the QueryModel without having to fear exceptions because you’ve just modified a collection being iterated.
- Whenever you need to get information about the data produced by a QueryModel or a result operator, you can use the GetOutputDataInfo() methods to calculate the kind (single item, scalar value, sequence) and type of the data being returned.
Writing custom extensions
Last, but not least, you may also run into situations where you’d like to have support for a certain feature that is not supported by re-linq or even LINQ. It happens quite often that LINQ providers define their own, target system-specific query methods; for example to implement full-text querying or query hinting.
For such scenarios, re-linq provides options on several levels. On the query method level, you can implement a custom IExpressionNode parser class. These classes are used to analyze the structure of a LINQ expression tree and to build the QueryModel corresponding to that tree. To make use of this extension point, derive from the MethodCallExpressionNodeBase or ResultOperatorExpressionNodeBase classes, depending on your scenario. Then, create a MethodCallExpressionNodeTypeRegistry instance and register your new parser classes. Pass that registry to the DefaultQueryProvider from your QueryableBase<T> implementation.
On the QueryModel level, you can provide custom IBodyClause implementations, derive from MainFromClause and SelectClause, or subclass ResultOperatorBase. How you integrate them into the QueryModel depends on your use case, but most often, you’ll integrate them from your expression node parser’s (see above) Apply methods.
Wrapping it up
Now, this text, which has turned out to become more an article than a blog post, has given a short overview about the concepts and features of re-linq and how to use them when writing a LINQ provider.
All the options provided by re-linq may seem a little overwhelming, but actually, re-linq is quite straight-forward. A basic LINQ provider only needs to implement a few interfaces to start with, as well as two visitors: one for the QueryModel, one for the expression trees. Sample code for this can be found at the Remotion-Contrib repository – the sample builds a LINQ provider for the open-source O/R mapper NHibernate based on the query language HQL.
As the LINQ provider evolves, it will need to support queries that are more difficult to translate to the target system, so it will start using query transformations. Transformations are incremental, so you can add new transformations on a feature-by-feature basis. Sophisticated LINQ providers will also want to provide their own query methods in addition to the standard query operators, and again, re-linq supports this in an incremental fashion.
All in all, I’m quite proud of re-linq’s architecture; I think, we’ve managed to build a robust piece of framework code with great utility. So, as I said in part I:
Are you planning to write a LINQ provider? Try re-linq – it’s open-source (LGPL) – and it will save you a lot of headaches.