Fabian's Mix

Mixins, .NET, and more

Archive for the ‘re-linq’ Category

Handling the DefaultIfEmpty result operator in a re-linq-based provider

without comments

My recent musings about the DefaultIfEmpty LINQ query operator and how it is represented by re-linq had a cause: On the NHibernate mailing list, there is a discussion going on about (among other things) how to implement DefaultIfEmpty in NHibernate’s LINQ provider. Since that LINQ provider is based on re-linq, I thought I’d write a few things about this.

Let’s take a look at an example query similar to the one I used in my last post. I’ve simplified it a bit because I don’t want to talk about group joins in this post. Group joins are difficult and may be a topic for a future post on their own; here, I want to talk about DefaultIfEmpty. So, here’s the simplified query:

var query = from c in Cooks

            from k in Kitchens.DefaultIfEmpty()

            select new { c, k };

We have a query selecting the Cartesian product of the elements in the Cooks and Kitchens sequences, and we use DefaultIfEmpty so that we get tuples for each cook in the first sequence even if there are no kitchens in the second one.

This is the pretty-printed query model of that query:

from Cook c in TestQueryable<Cook>()
from Kitchen k in {TestQueryable<Kitchen>() => DefaultIfEmpty()}
select new <>f__AnonymousType2f`2(c = [c], k = [k])

As in the last post, we can see that the DefaultIfEmpty query operator causes a sub-query to be created, which selects its items from a query source (TestQueryable<Kitchen>()) before applying the operator.

The easiest way to handle the DefaultIfEmpty result operator in this query model is to keep the sub-query structure intact while analyzing the query. To illustrate what I mean, I’m going to show you step-by-step how I’d translate the second from clause to SQL if I were writing a SQL-generating LINQ provider (which, in fact, I am):

Okay, so here I am, I’ve implemented a QueryModelVisitor class which visits the query model’s clauses, one after the other, and generates SQL while doing so. I’ve already visited the main from clause, but no other clause yet, so my generated SQL looks as follows:

SELECT ???
FROM [CookTable] [t0]

Now, I’m in the VisitAdditionalFromClause method of my visitor, inspecting the additional from clause. An additional from clause usually means a Cartesian product, which is called “CROSS JOIN” in SQL:

SELECT ???
FROM [CookTable] [t0]
CROSS JOIN

I apply my ExpressionTreeVisitor to the clause’s FromExpression, and the visitor finds a SubQueryExpression. I keep the sub-query structure for now, and recursively generate SQL for the nested query by invoking a new instance of my QueryModelVisitor on the sub-query’s query model. I visit the sub-query’s main from clause and its select clause:

(SELECT [t1].*
FROM [KitchenTable] [t1])

Then, my visitor’s VisitResultOperator method is invoked, I detect the DefaultIfEmpty result operator. What now? I need a way to ensure that the query I just generated will always return at least one item; if there is none, I want NULLs – this is the semantics of DefaultIfEmpty.

In SQL, the way to achieve this is to perform a left outer join – but with what? I’m in my sub-query, I don’t have a left side I can join with? Shouldn’t I have descended into the sub-query so quickly? Maybe I should have flattened it somehow?

Yes, I could have done this – in this specific scenario. However, the important thing to notice about DefaultIfEmpty is that it can occur in the most absurd of places – for example on the source of the outermost main from clause. In such cases, there is no simple way to flatten the subquery. If I want a generic solution, I’ll have to find a way that always achieves the DefaultIfEmpty semantics in SQL. And, luckily, there is one:

(SELECT [q2].*
FROM (SELECT NULL AS [Dummy]) [Dummy]
LEFT OUTER JOIN (SELECT [t1].* FROM [KitchenTable] [t1]) [q2] ON 1 = 1
)

Can you see what I’ve done? I’ve put the sub-query emitted so far into yet another sub-query – and used this as the right side of a left outer join – with a dummy left side. Such are the tricks of those who translate LINQ queries to SQL…

Okay, I’ve now finished translating the sub-query, I can insert the sub-query into the outer query and finish translating the outer query:

SELECT [t0].*, [q3].*
FROM [CookTable] [t0]
CROSS JOIN (SELECT [q2].*
FROM (SELECT NULL AS [Dummy]) [Dummy]
LEFT OUTER JOIN (SELECT [t1].* FROM [KitchenTable] [t1]) [q2] ON 1 = 1
) [q3]

This does not look pretty, no it doesn’t. But you know what? It works. And, most importantly, this way of translating DefaultIfEmpty to SQL always works, no matter where a DefaultIfEmpty stands in the query. The reason for why it works is that I was able to keep the sub-query structure re-linq presented to me, and find an implementation for DefaultIfEmpty that works locally to that sub-query. This a convenient effect of the sub-query + result operators representation re-linq chooses for complex queries.  If you can find a way to keep the sub-queries and honor the result operators locally, you can literally translate every LINQ query to your target query system.

But what about optimization?

There is definitely potential for optimization here: Whenever I’m inserting sub-queries into their outer queries, I can apply a sub-query optimizer. The optimizer checks whether the sub-query contains a TOP, COUNT, GROUP BY, or similar construct, and, if it doesn’t, realizes that the sub-query’s content can be inlined. It first does so at the innermost sub-query, the one within the join:

(SELECT [t1].*
FROM (SELECT NULL AS [Dummy]) [Dummy]
LEFT OUTER JOIN [KitchenTable] [t1] ON 1 = 1
)

Then, it does the same at the next level:

SELECT [t0].*, [t1].*
FROM [CookTable] [t0]
CROSS JOIN (SELECT NULL AS [Dummy]) [Dummy]
LEFT OUTER JOIN [KitchenTable] [t1] ON 1 = 1

If I also implement the dummy select in such a way that it only appears if the LEFT OUTER JOIN would otherwise stand as the first table in the generated SQL, I’ve got a good final SQL query from my translation:

SELECT [t0].*, [t1].*
FROM [CookTable] [t0]
LEFT OUTER JOIN [KitchenTable] [t1] ON 1 = 1

Note that the CROSS JOIN has just become a LEFT OUTER JOIN. Since I’m doing all this while still in the VisitAdditionalFromClause method, this shouldn’t be too hard. There is a bit of work to do to get the SELECT clause to reference the right sub-query (q2, not q3), but that’s totally doable as well.

While working on the re-linq SQL backend (which supports DefaultIfEmpty in exactly the same way outlined above, but at the time of this writing still lacks the sub-query optimizer (which is planned)),  I’ve learned one thing: when dealing with complex LINQ queries, it’s really important to solve transformations as locally as possible. I always try to perform my transformations just in the scope of the current query model; consolidation of the sub-query with the outer query can be done when the visitors return to that outer query.

Any other way lies madness. In my opinion.

Written by Fabian

September 4th, 2010 at 9:40 pm

Posted in re-linq

How re-linq represents the DefaultIfEmpty query operator

without comments

In my last post, I explained what the DefaultIfEmpty query operator actually does, and why anybody should use it. In this post, I’ll analyze how re-linq’s front-end represents that query operator in its query model.

First, let’s look at one of the queries from the last post:

var query1 = from c in Cooks

             join a in Cooks on c equals a.Assisted into assistants

             from a in assistants.DefaultIfEmpty()

             select new { c, a };

Console.WriteLine (new QueryParser ().GetParsedQuery (query1.Expression));

This code causes the LINQ Expression tree built for the query to be analyzed by the re-linq front-end, and the ToString representation to be written to the console. After adding newlines and indentation, it looks like this:

from Cook c in TestQueryable<Cook>()

join Cook a in TestQueryable<Cook>()
  on [c] equals [a].Assisted
  into IEnumerable`1 assistants

from Cook a in {[assistants] => DefaultIfEmpty()}
select new <>f__AnonymousType30`2(c = [c], a = [a])

Looks astoundingly similar to the C# query, doesn’t it? But what’s that “{[assistants] => DefaultIfEmpty()}” part?

Before explaining this notation, I should probably say a few more words about the philosophy behind re-linq’s query model. It’s not a coincidence that the query model’s ToString output looks similar to the C# query above. The main goal behind the re-linq front-end was to simplify the LINQ query and represent it in a form that would be easy to understand for programmers building LINQ providers.

As mentioned above, the front-end constructs this from the Expression tree produced for the query, which looks like this (this is the ToString representation of the Expression tree):

TestQueryable<Cook>()
  .GroupJoin(
    TestQueryable<Cook>(),
    c => c,
    a => a.Assisted,
    (c, assistants) => new <>f__AnonymousType2f`2(
      c = c,
      assistants = assistants))
  .SelectMany(
    <>h__TransparentIdentifierf =>
      <>h__TransparentIdentifierf.assistants.DefaultIfEmpty(),
    (<>h__TransparentIdentifierf, a) =>
      new <>f__AnonymousType30`2(
        c = <>h__TransparentIdentifierf.c,
        a = a))

It then creates a number of clauses – the main from clause, additional from clauses, where clauses, a select clause, etc – and puts them together into a query model object. This is very similar to how a C# programmer would think of the query, and very different from what the Expression tree looks like.

The problem we had, however, is that a simple model can only represent a subset of what’s possible with LINQ. For example, in re-linq’s query model, look-alike to C#, you can only have one select clause in a query. In the expression tree, however, there can be more than one call to the Select method. We didn’t want to introduce the complexity of having multiple select clauses in the query model, but we couldn’t ignore that such things were possible (and even easy) to do with pure LINQ.

When we devised the query model, we therefore used a famous trick: we made the data structure recursive. The query model was kept very simple, but we added the feature of sub-query models. In other words, when the model could not represent a certain query, for example because it had two Select method calls, we put part of the query into a sub-query, which was again simple and well-defined, and nested that sub-query inside an outer query, again simple and well-defined.

In re-linq, sub-queries can stand, among other things, for source sequences, eg., in a from clause. The curly braces in the “{[assistants] => DefaultIfEmpty()}” part of the query model above represent such a sub-query. It’s abbreviated for “(from x in [assistants] select x) => DefaultIfEmpty()”. The square brackets around assistants indicate a reference to items coming from another query source.

The arrow represents a second trick we used: a result operator. Result operators are used by re-linq to embed information about query operators for which we don’t have clauses. For example, Count() is a result operator, as is Distinct(), or DefaultIfEmpty(). Result operators always stand at the end of a query model – they operate on the results of the query. If they can’t stand at the end because of the semantics of the query, a sub-query is built around them; then they can stand at the end again.

That explained, let’s take a look at the full query model again:

from Cook c in TestQueryable<Cook>()

join Cook a in TestQueryable<Cook>()
  on [c] equals [a].Assisted
  into IEnumerable`1 assistants

from Cook a in {[assistants] => DefaultIfEmpty()}
select new <>f__AnonymousType30`2(c = [c], a = [a])

Knowing what we do now, we can see that the second from clause selects its Cook items (named “a”) from a sub-query that gets its items directly from the elements of the assistants query source and has a DefaultIfEmpty result operator attached to its end.

Now, the one million dollars question: How do I handle such a sub-query with a DefaultIfEmpty result operator in my re-linq-based LINQ provider?

That will be answered in my next post.

Written by Fabian

September 4th, 2010 at 8:35 pm

Posted in re-linq

The DefaultIfEmpty LINQ query operator

without comments

The DefaultfEmpty LINQ query operator is a strange beast. According to its documentation, it:

Returns the elements of the specified sequence or the type parameter’s default value in a singleton collection if the sequence is empty.

What? I thought DefaultIfEmpty had to do something with joins?

Let’s dig into this; first, let’s analyze what the documentation says. When applied to a non-empty sequence, such as {1, 2, 3}, DefaultIfEmpty will return an equivalent sequence: {1, 2, 3}. When applied to an empty sequence, {}, it will return a singleton sequence containing the default value of the sequence’s element type: {0} (assuming the original sequence was an empty sequence of integers).

But what would anyone use that for?

When you think about this question, it turns out that DefaultIfEmpty is quite useful when sequences are combined; for instance, with additional from clauses. Consider the following example:

var query = from c in Cooks

            from a in c.Assistants

            select new { c, a };

The second from clause in this query (this is really a SelectMany query operator, when C#’s syntactic sugar is removed) causes a Cartesian product to be formed from the elements of the Cooks sequence and the elements of the sequences returned by the c.Assistants expression, which is evaluated once for each c.

Consider, for example, the following input data:

Cooks = { c1, c2 }
c1.Assistants = { a1, a2 }
c2.Assistants = { a3, a4 }

With this as input, the result of the query will be as follows:

{ c1, a1 }, { c1, a2 },{ c2, a3 }, { c2, a4 }

Consider, now, one of the Assistants sequences to be empty:

Cooks = { c1, c2 }
c1.Assistants = { a1, a2 }
c2.Assistants = { }

In this case, the result will be:

{ c1, a1 }, { c1, a2 }

As you can see, the result contains no entries for c2 – the Cartesian product discards items from the left sequence for which there is no item in the right sequence. Here is, where DefaultIfEmpty comes in handy. We can rewrite the query as follows:

var query = from c in Cooks

            from a in c.Assistants.DefaultIfEmpty()

            select new { c, a };

Now, the DefaultIfEmpty operator will guarantee that the right sequence always contains at least one element, and therefore, each of the elements of the Cooks collection is contained in the result set at least once:

{ c1, a1 }, { c1, a2 }, { c2, null }

If you know your SQL, this will indeed remind you strongly of the concept of left outer joins, and in fact, the MSDN documentation for DefaultIfEmpty does hint at this in its Remarks section:

This method can be used to produce a left outer join when it is combined with the GroupJoin) [sic] method.

It’s true, DefaultIfEmpty can be combined with a GroupJoin, to form something very similar to a SQL left outer join:

var query = from c in Cooks

            join a in Cooks on c equals a.Assisted into assistants

            from a in assistants.DefaultIfEmpty()

            select new { c, a };

But as shown above, DefaultIfEmpty can also be very useful when applied to sequences other than a GroupJoin’s result.

And, of course, it can also be used in contexts outside of joins or cartesian products:

var query = Cooks.DefaultIfEmpty ();

 

// This should really be done using the Any() operator

var query = from c in Cooks

            where c.Assistants.DefaultIfEmpty ().First() != null

            select c;

These situations don’t really match the concept of a left outer join (at first sight), but they’re still perfectly legal usages of DefaultIfEmpty. All things considered, the MSDN authors were probably right about their initial assessment. Generally speaking, that query operator simply:

Returns the elements of the specified sequence or the type parameter’s default value in a singleton collection if the sequence is empty.

Exactly.

Written by Fabian

September 4th, 2010 at 7:43 pm

Posted in re-linq

Making it hard to implement language-compatible LINQ providers

without comments

There are two things compiler writers can do in order to give us LINQ provider writers significant headaches:

  • Firstly, they can make certain kinds of expressions much more common than other languages do. For example, they can use overloads of query operators or produce expression combinations that are otherwise rarely used.
  • Secondly, they can embed completely custom operators in their expression trees, masked as MethodCallExpressions to some compiler service methods.

The first issue is annoying because missing features that aren’t a problem for other languages suddenly get really important. Implementing a LINQ provider is a LOT of work, so there’ll always have to be some feature prioritizing. One of my favorite quotes on this is by Frans Bouma, author of LLBLGen Pro’s LINQ provider:

Writing a Linq provider is a lot of work which requires a lot of code. If you’re dealing with a Linq provider which is just, say, 32KB in size, you can be sure it will not support the majority of situations you will run into. However, the O/R mapper developer likely simply said ‘We have Linq support’, and it’s even likely the provider can handle the more basic examples of a single entity type fetch with a Where, an Order By or even a Group By. But in real life, once you as a developer have tasted the joy of writing compact, powerful queries using Linq, you will write queries with much more complexity than these Linq 101 examples. Will the Linq ‘provider’ you chose be able to handle these as well? In other words: is it a full Linq provider or, as some would say, a ‘toy’ ?

Usually, there is a time limit when implementing a LINQ provider, so you have to decide what features to implement, what query methods to support – and what query methods not to support. You’d usually implement the most important ones first, and if you decide one query method is not that important because you don’t know another language requires it, that might be a problem.

However, it could be argued that these features also provide benefits for users of the other languages. For example, while the C# compiler won’t specify a result selector in the GroupBy calls it generates, C# users can of course manually call the respective method overloads. So, implementing support for that overload of GroupBy is not a language-specific feature, it’s just the priority that differs between the languages.

The second issue, on the other hand, the one of embedding custom operators in the expression tree, is the real problem of LINQ provider compatibility. You need to support these operators only for the sake of one programming language. You won’t even know these operators exist unless you perform extensive testing in that language. The programming language might just as well construct expression trees with self-defined nodes instead of using the standard ones – the effect would be similar: there’s no way the LINQ provider could handle them unless it was built with explicit support for that language.

That’s a problem. Especially since VisualBasic.NET, of course, does both of these things.

VB.NET does make use of expressions uncommon in, say, a C# program; for example, it uses the GroupBy overloads taking a result selector, it generates additional, unnecessary Select (x => x) calls, and so on. These things you thought no programmer would commonly use in a LINQ query – the VB.NET compiler does use them, all too often.

And VB.NET does define its own operators. Expression trees coming from VB.NET may contain calls to the Microsoft.VisualBasic.CompilerServices.LikeOperator’s methods because VB does have a LIKE operator – which can’t be represented using standard expression types. And even string comparisons, which are usually represented as BinaryExpressions, cause special operators to be used.

Of course, there are good arguments as of why VB needs those operators: There is no standard expression type for LIKE, and VB has some specific string comparison semantics. Still, I wonder if the LINQ teams at Microsoft shouldn’t have thought about these problems a little harder, a little longer before releasing VB.NET’s LINQ support as is.

At the very least, some documentation of these VB-specifics would have saved a few headaches. Definitely.

Written by Fabian

July 16th, 2010 at 12:51 pm

Posted in re-linq

Support for VB-specific string comparisons in re-linq

with one comment

In a previous post of mine, I’ve discussed the VB.NET compiler’s characteristics of emitting very special expressions when a string comparison is performed in a LINQ query. Instead of emitting a simple BinaryExpression, as C# would, VB.NET emits a MethodCallExpression to the Microsoft.VisualBasic.CompilerServices.Operators.CompareString method. The reason for this is that VB.NET has some special features regarding string comparison – and the standard BinaryExpressions cannot represent those features.

Many LINQ providers are written in C# and therefore never learn of that specialty, until one day a VB.NET user submits a bug report about string comparisons leading to an exception in the LINQ provider. Therefore I thought it sensible to provide support for this VB.NET quirk directly in re-linq’s front-end.

It has taken some time – there was a lot of other work to do –, but starting with re-linq 1.13.65, there is now built-in support for VB.NET string comparisons.

(Unfortunately, the release notes of that version describe the feature in a different form than was implemented. The release notes for version 1.13.68 will contain an updated version of the feature description that is consistent with my explanations below.)

There were a few different options about how to actually handle this issue because of two conflicting goals:

  • make it as simple for the standard LINQ provider – one that simply needs to know that two strings were compared, without any VB-specific quirks – as possible, and
  • provide all necessary information to those LINQ providers that want to simulate VB’s string comparison semantics.

The first goal’s trivial implementation would be to have re-linq detect the MethodCallExpressions and replace them with standard BinaryExpressions [1]. That way, the LINQ provider would never even notice that VB.NET produces different expression trees; VB.NET’s comparisons would be handled just like C#’s ones.

On the other hand, this couldn’t possibly fulfill the second goal. Once the MethodCallExpression has been replaced, vital information (the textCompare parameter to the CompareString method, for example) is lost. To fulfill the second goal, the most sensible thing would therefore be to leave the MethodCallExpression in the expression tree – or to replace it with a specific custom expression, let’s call it VBStringComparisonExpression, which would hold all the required metadata about the comparison. Then, the LINQ provider could recognize this expression and deal with the expression the VB way. On the downside, this would require every LINQ provider to deal with those VBStringComparisonExpressions. And back to square one.

So, the question was, how to reach both goals at once?

One possibility would have been to make it configurable. I didn’t like this option for a number of reasons. First, as a library user, I like APIs that just work, without me having to configure anything, reading through documentation, and so on. Second, re-linq is not currently designed to be configurable at this part in the front-end, and while it would have been possible to add this feature, it seemed strange to add a configuration infrastructure just for this one option of turning automatic VB support on and off. Third, I knew there had to be a better way.

And find one I did: The .NET BCL 4.0 made some changes to the LINQ Expression classes, which were required because the Dynamic Language Runtime uses the same classes to represent their abstract syntax trees. There were a few features I liked so much that I re-implemented them in re-linq (which also has to run on .NET 3.5), in the ExtensionExpression base class. That class contains some boilerplate code for the Visitor pattern, and it also has a feature for reducing expressions. If an extension expression returns true from its CanReduce property, this means that it’s Reduce method can be used to transform the expression into a simplified version with the same (more or less) semantics. And this is the feature I used to implement the VBStringComparisonExpression with.

Here are the facts:

  • When parsing an expression tree into a QueryModel, re-linq’s frontend will now automatically detect calls to VB.NET’s CompareString operator.
  • It will simplify that comparison by using a BinaryExpression [1]. The BinaryExpression will be wrapped into a VBStringComparisonExpression, which also contains the textCompare flag defining how VB.NET would compare the string.
  • VBStringComparisonExpression is implemented so that its VisitChildren method will automatically visit the simplified BinaryExpression. This means that ordinary expression visitors (derived from the ExpressionTreeVisitor base class) can simply choose to ignore the VBStringComparisonExpressions. The simplified BinaryExpressions will automatically be visited.
  • VBStringComparisonExpression.Reduce is implemented so that it returns the simplified BinaryExpression. Visitors that are defined to throw an exception on any expression whose type they don’t know should try to reduce such expressions before giving up. That way, they can deal with complex, unknown expressions as long as they reduce to something simpler. ThrowingExpressionTreeVisitor, the base class that implements this behavior, does this automatically. This means that throwing expression visitors (derived from the ThrowingExpressionTreeVisitor base class) can also ignore the VBStringComparisonExpressions. The BinaryExpression will automatically be used instead.
  • If a visitor wants to explicitly deal with VBStringComparisonExpressions, it can implement the IVBSpecificExpressionVisitor. (Of course, such a visitor has to visit the expression before any ThrowingExpressionTreeVisitor reduces the nodes. Ordinary ExpressionTreeVisitors won’t reduce, so they are safe to be used at any time.)

And here is a picture:

VBStringComparisonExpression

The concept works quite well, I’ve tested it with re-linq’s own SQL backend – which does not handle VB comparisons in a special way. So far, the backend works nicely with VB expressions without explicitly dealing with them in any way.

It seems as if it was indeed possible to have the cake, and eat it too.


[1] This is actually a simplification. VB.NET’s CompareString is also produced for the <, <=, >, and >= operators. BinaryExpression can’t represent those operators for strings, so the “standard” expressions for them involve a MethodCallExpression to String.CompareTo.

Written by Fabian

July 15th, 2010 at 6:03 pm

Posted in re-linq

Building tuple expressions

with 3 comments

In my previous post I explained how the re-linq SQL backend moves OrderBy clauses from sub-queries to the outermost surrounding query in order to be compatible with SQL sub-query restrictions. Because the outer query might not have access to the properties required for the ordering, it is necessary to replace the subquery’s projection with a tuple containing both the original projection and all the ordering expressions:

from o in Orders

from product in (from oi in o.OrderItems

                 orderby oi.Quantity descending

                 select oi.Product)

select new { Order = o, Product = product }

becomes

from o in Orders

from product in (
  from oi in o.OrderItems

  select new KeyValuePair<string, int> (oi.Product, oi.Quantity))

orderby product.Value descending

select new { Order = o, Product = product.Key }

The SQL backend uses the predefined KeyValuePair type defined by .NET to construct tuples, but this presents a problem: what to do if there is more than one OrderBy expression in the sub-query? KeyValuePair can only store two values – the original projection and one OrderBy expression. So what if we have to store more than that?

[Side note: Of course, we could just implement our own Triple with three elements. And Quadruple, Quintuple, Sextuple with four, five, and six items. But we can’t build a tuple type that can take an arbitrary number of items. We could use an array, but the re-linq backend does not currently support analyzing array index operations.]

One elegant solution is to use KeyValuePair as a recursive data type. If we need to store more than two values, we’ll put a second KeyValuePair into the first one’s Value property – and thus gain one more place to store something:

Values to be stored Expression
1 1
1,2 new KVP(1,2)
1,2,3 new KVP(1,new KVP(2,3))
1,2,3,4 new KVP(1,new KVP(2,new KVP(3,4)))

As you can see, this is quite flexible. It is, by the way, how many functional and logic-oriented programming languages construct lists.

Since building such tuple expressions can be handy in different situations while building a LINQ provider, re-linq now contains a helper class, TupleExpressionBuilder, which helps with building such tuples and accessing the members held by a tuple.

Here is the code:

public class TupleExpressionBuilder

{

  public static Expression AggregateExpressionsIntoTuple (
      IEnumerable<Expression> expressions)

  {

    ArgumentUtility.CheckNotNull (“expressions”, expressions);

 

    return expressions

        .Reverse ()

        .Aggregate ((current, expression) => CreateTupleExpression (
            expression, current));

  }

 

  public static IEnumerable<Expression> GetExpressionsFromTuple (
      Expression tupleExpression)

  {

    ArgumentUtility.CheckNotNull (“tupleExpression”, tupleExpression);

 

    while (tupleExpression.Type.IsGenericType
        && tupleExpression.Type.GetGenericTypeDefinition()
            == typeof (KeyValuePair<,>))

    {

      yield return Expression.MakeMemberAccess (
          tupleExpression,
          tupleExpression.Type.GetProperty (“Key”));

      tupleExpression = Expression.MakeMemberAccess (
          tupleExpression,
          tupleExpression.Type.GetProperty (“Value”));

    }

 

    yield return tupleExpression;

  }

 

  private static Expression CreateTupleExpression (
      Expression left,
      Expression right)

  {

    var tupleType =
        typeof (KeyValuePair<,>).MakeGenericType (left.Type, right.Type);

    var newTupleExpression =

        Expression.New (

            tupleType.GetConstructor (new[] { left.Type, right.Type }),

            new[] { left, right },

            new[] {
                tupleType.GetMethod (“get_Key”),
                tupleType.GetMethod (“get_Value”) });

    return newTupleExpression;

  }

}

Is anybody else feeling the LISP-ness of those tuples? I only say car and cdr…
(Although, actually, I prefer the ./2 predicate.)

Exercise for the reader: Get rid of the while loop in GetExpressionsFromTuple. It would be so much easier if we had a NIL element at the end of the list…

Written by Fabian

July 2nd, 2010 at 8:24 pm

Posted in re-linq

Translating sub-queries with OrderBy to SQL

with 2 comments

Recently, Andreas and I have been working hard on a new SQL backend for re-linq. We’re creating a set of visitors and transformers that translate the QueryModels provided by the re-linq frontend for any given LINQ query into a SQL statement. This post is about one particular issue we’ve recently solved: OrderBy in sub-queries.

Consider a user writing the following LINQ query:

from o in Orders

from product in
  o.OrderItems.OrderByDescending (oi => oi.Quantity).Select (oi => oi.Product)

select new { Order = o, Product = product }

This effectively returns a flattened list of Order/Product combinations, where products are sorted descending by quantity per Order. The second from clause contains a sub-query; it could also be written like this:

from o in Orders

from product in (from oi in o.OrderItems

                 orderby oi.Quantity descending

                 select oi.Product)

select new { Order = o, Product = product }

A feature that we’ve recently implemented in the backend is automatic OrderBy extraction. You see, a straight-forward SQL translation of the queries above might be the following:

SELECT [t0].*, [q1].[value]

FROM [Order] [t0]

CROSS APPLY (

  SELECT [t2].[Product]
  FROM [OrderItem] [t2]

  WHERE [t2].[OrderID] = [t0].[ID]

  ORDER BY [t2].[Quantity] DESC) [q1]

However, SQL Server (starting from 2005) will not accept this query and give an exception: The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP or FOR XML is also specified.

The source of this error is an impedance mismatch between LINQ and SQL queries. In LINQ, the order of items returned by a subquery will be retained when they are combined with the items from the outer query; in the query above, the Order/Product tuples are sorted by quantity within each Order. In SQL, the order of items returned by a subquery is not guaranteed to have any effect on the surrounding query. That’s why it is forbidden to include an ORDER BY clause in the sub-query unless you have a sub-query where it actually makes a difference. Oh, you can force the database to accept your ORDER BY clause by including a TOP 100 Percent clause, and it will most likely achieve the result you want – unless an optimization kicks in that causes reordering of the sub-query result set. The point is that SQL does not guarantee any ordering of items coming from a sub-query.

So, what could we do? Of course, we could have thrown an exception (“Cannot translate sub-queries with OrderBy clauses to SQL because SQL does not allow sub-query results to be ordered.”) – after all, the user could rewrite the query. However, we didn’t really want to place the burden of rewriting on the user, especially since there were some situations where our SQL backend would itself move parts of the query to sub-queries – and the exception wouldn’t be very helpful in those cases at all.

So, we decided to find a better way: we implemented an automatic transformation that moves the OrderBy clauses from sub-queries to the outer query in the SQL backend.

Here is what the backend transforms queries such as the ones above to:

from o in Orders

from product in (
  from oi in o.OrderItems

  select new KeyValuePair<string, int> (oi.Product, oi.Quantity))

orderby product.Value descending

select new { Order = o, Product = product.Key }

We remove the OrderBy clause from the sub-query and replace the inner projection with a tuple containing both the original projection and the OrderBy expressions. Then we perform the ordering in the outer query and replace all usages of the range variable (products) with a member access to the original projection (products.Key).

Here’s the corresponding SQL:

SELECT [t0].*, [q1].[Key]

FROM [Order] [t0]

CROSS JOIN (

  SELECT [t2].[Product] AS [Key], [t2].[Quantity] AS [Value]

  FROM [OrderItem] [t2]

  WHERE [t2].[OrderID] = [t0].[ID]) [q1]

ORDER BY [q1].[Value] DESC

With that solution, the backend is able to support arbitrary OrderBy clauses in sub-queries, be they added by the user or because re-linq decided to move part of a statement to a sub-query. Another impedance mismatch solved – but certainly not the only one.

Written by Fabian

July 2nd, 2010 at 7:31 pm

Posted in re-linq

re-linq now with partial trust support

without comments

I’m happy to announce that re-linq is going to support partial trust environments out of the box starting from build 1.13.63, which is due at the end of this week. Once it’s ready, you can get it here: http://www.re-motion.org/builds/. The SVN trunk already includes the partial trust feature.

re-linq can also be used from ASP.NET medium trust, but some LINQ query features (such as anonymous types created by the C# compiler or access to local variables from a query) require reflection permissions not available in default medium trust. Note that the C# compiler often automatically uses anonymous types when you write a LINQ query with multiple from clauses or joins.

In order to be able to parse such queries using re-linq (or any other LINQ provider, I assume), the ReflectionPermission must explicitly be granted with the MemberAccess permission flag in addition to the default ASP.NET medium trust permissions.

Copied from our mailing list. Please post any questions about this feature to the list.

Actually, there wasn’t much work to do to enable partial trust support within re-linq, it was only a matter of applying the AllowPartiallyTrustedCallersAttribute to the relevant assemblies. However, before we published this feature, we wanted to be able to run our integration tests within a partially trusted sandbox in order to ensure that everything was really working. Andreas Obkircher implemented this. He implemented Sandbox utility classes and even had to patch NUnit, which otherwise wouldn’t support partial trust in the version we are using.

So, I just want to say: Good job, Andreas!

Written by Fabian

June 10th, 2010 at 9:56 am

Posted in re-linq

How a Visitor implementation can handle unknown nodes

without comments

Two weeks ago, I explained how we use interfaces and dynamic type casts to overcome the Visitor pattern’s problems with extensible class hierarchies in re-linq. In a nutshell, we have a visitor base class, ExpressionTreeVisitor, with Visit methods for all standard expression node types. For extension nodes, we define specific visitor interfaces, and we test whether a specific visitor instance supports the interface using a runtime type check. This pattern is known as the Acyclic Visitor pattern (to some, at least…).

Here’s an example; the Accept method of SqlCaseExpression (an extension node type):

public override Expression Accept (ExpressionTreeVisitor visitor)

{

  ArgumentUtility.CheckNotNull (“visitor”, visitor);

 

  var specificVisitor = visitor as ISqlSpecificExpressionVisitor;

  if (specificVisitor != null)

    return specificVisitor.VisitSqlCaseExpression (this);

  else

    return base.Accept (visitor);

}

The base class’ Accept method, which is called if the visitor instance does not support the extension node, is implemented as follows:

public virtual Expression Accept (ExpressionTreeVisitor visitor)

{

  ArgumentUtility.CheckNotNull (“visitor”, visitor);

 

  return visitor.VisitUnknownExpression (this);

}

Last time I said that there were three possibilities of how the VisitUnknownExpression method could be implemented in a specific visitor: it could ignore the unknown expression, it could throw an exception, or do … something else. But what could that third option be?

The answer sounds simple: Handle the unknown node in a generic way.

Sometimes, that’s easy. For example, re-linq has a visitor called FormattingExpressionVisitor, which is used to generate a string representation of expression trees. (We need that because the implementation of Expression.ToString acts just plain stupid when confronted with a non-standard expression being a child of a standard one.) That visitor  handles unknown nodes in a generic way – it just calls ToString on them.

Sometimes, it’s harder. While some expression visitors, such as a SQL-generating visitor or the FormattingExpressionTreeVisitor, are used to transform expression trees into a completely different representation, the much more common scenario is that specialized visitors search for certain node patterns. They then either return what they’ve found or transform the nodes, replacing them with other structures.

For example, consider a visitor that looks for ConditionalExpressions (the equivalent of C#’s conditional operator ?: in expression trees) and transforms those into SqlCaseExpressions:

protected override Expression VisitConditionalExpression (
    ConditionalExpression expression)

{

  ArgumentUtility.CheckNotNull (“expression”, expression);

 

  return new SqlCaseExpression (
    VisitExpression (expression.Test),
    VisitExpression (expression.IfTrue),
    VisitExpression (expression.IfFalse));

}

How should such a pattern matching visitor handle unknown expressions in a generic way? After all, a ConditionalExpression might be located beneath any unknown expression!

Unknown expression containing conditional expression

Ideally, a pattern matching visitor should be able to ignore all unknown expressions and at the same time analyze the expressions’ child nodes. And, voilà, this is exactly what the default implementation of ExpressionTreeVisitor.VisitUnknownExpression does:

protected internal virtual Expression VisitUnknownExpression (
    Expression expression)

{

  var extensionExpression = expression as ExtensionExpression;

  if (extensionExpression != null)

    return extensionExpression.VisitChildren (this);

  else

  {

    var message = string.Format (
        “Expression type {0} is not supported by this {1}.”,
        expression.NodeType,
        GetType().Name);

    throw new NotSupportedException (message);

  }

}

Every extension expression has a VisitChildren method used by visitors that need to visit a specific node’s children even when they don’t know the node’s type. Here’s SqlCaseExpression’s implementation of VisitChildren:

protected override Expression VisitChildren (ExpressionTreeVisitor visitor)

{

  ArgumentUtility.CheckNotNull (“visitor”, visitor);

 

  var testPredicate = visitor.VisitExpression (TestPredicate);

  var thenValue = visitor.VisitExpression (ThenValue);

  var elseValue = visitor.VisitExpression (ElseValue);

 

  if (testPredicate != TestPredicate
      || thenValue != ThenValue
      || elseValue != ElseValue)

    return new SqlCaseExpression (testPredicate, thenValue, elseValue);

  else

    return this;

}

This clever design was not my own idea, it originates from the Dynamic Language Runtime’s abstract syntax tree implementation, and it has later been included in .NET 4.0’s Expression class. The method is called VisitExtension there; you can read my short summary of that feature in my post about .NET 4.0’s extension expressions. As I promised in that post, we copied the design for re-linq and .NET 3.5.

So, this post concludes my little series about the visitor pattern. To summarize,

“And yet, in this instance, a function by any other name is still as virtual,” she smiled, and glided away. Her words drifted off with her as she passed out of sight: “And a Visitor pattern by any other name is still as useful, and is more aptly described. But that is the way of things….”

Conversations: By Any Other Name, Jim Hyslop and Herb Sutter, Dr. Dobb’s

Written by Fabian

May 24th, 2010 at 7:54 pm

Posted in patterns,re-linq

The second problem of Visitors – and a solution

without comments

In my last post, I explained that one big weakness of the Visitor design pattern is that it can only be used when supported by the visited classes. In LINQ, I explained, we have a good scenario for the Visitor pattern because we have to define a lot of algorithms, the specifics of which depend on the node types present in an expression tree. Since, however, the LINQ expression class designers didn’t provide the necessary hooks, we can’t make use of the pattern (and are left with a bad simulation of Visitor that is built on runtime type checks).

I promised that in the course of this next post I’d talk about the second big weakness of the Visitor design pattern, but before I do this, I’d like to reiterate why Visitor is a Good Thing, despite its weaknesses. I’ll quote the good consequences of using Visitor from Design Patterns by Gamma, Helm, Johnson, and Vlissides, the famous Gang of Four (the explanations are my own):

  • Visitor makes adding new operations easy. It defines a framework that, when added to a class, allows easy addition of new operations. Classic case of Open/Closed.
  • A visitor gathers related operations and separates unrelated ones. It provides clean separation of concerns and avoids unrelated operations being entangled in a single class.
  • Visiting across class hierarchies. It allows processing of object graphs in which not all objects have a single base class.
  • Accumulating state. The visitor implementation can gather state while operating on the objects in the graph being visited.

Those are good consequences of using the pattern. But Design Patterns also lists that second big disadvantage I want to talk about: Adding new ConcreteElement classes is hard. What does that mean?

To explain, remember that Visitor requires an interface (or base class) that defines the contract of all visitor implementations:

interface IFruitVisitor

{

  void VisitApple (Apple apple);

  void VisitOrange (Orange orange);

}

This interface is required for the double dispatch mechanism Visitor is based on: the visited object decides what visitor method to call based on its own type.

So, the visitor interface contains one method per visitable class; in our example, those would be Apple and Orange. Now consider what happens if we add a new class, Boysenberry. To be able to write algorithms visiting instances of this new type, we’d need to add a VisitBoysenberry method to the interface. And thus to every existing visitor class based on that interface. Of which there might be many. In a nutshell: Adding new ConcreteElement classes is hard.

It might even be impossible.

We had that problem in the new SQL backend we’re currently building for re-linq. re-linq’s frontend defines the expression visitor interface, but the SQL backend defines a number of additional SQL-specific expression types. Adapting the visitor interface to add Visit methods for those new expression types would require adding a dependency from the front-end to the SQL backend, which we didn’t want at all.

There are alternatives to the Visitor pattern that do not require the class hierarchy to be stable, some of which re-linq uses in places where a lot of extensibility is needed (e.g., a dictionary mapping node types to implementations of a strategy interface). But in the situation of transforming expression trees? Shouldn’t the Visitor pattern be the cleanest, most object-oriented approach to this problem?

Therefore, we thought hard to overcome the issue of low flexibility, and this is our solution:

image

The .NET framework (mscorlib) defines the standard expression types. Normally, it would also define the standard visitor interface (or base class), but as mentioned before, re-linq has to simulate the Visitor implementation in its frontend (Remotion.Data.Linq). In that assembly, there are also several visitor implementations operating on the standard expression types, for example a PartialEvaluator. Those classes implement the standard visitor interface (deriving from ExpressionTreeVisitor) and thus only know how to deal with standard expression types.

The SQL backend (Remotion.Data.Linq.SqlBackend) defines additional expression types, e.g., SqlCaseExpression. It also defines special visitor interfaces for visitors that can handle the additional expressions, such as ISqlSpecificExpressionVisitor. Expression visitors in the backend, e.g., SqlGenerator, implement both the standard and the custom visitor interfaces and are thus able to visit both standard and custom expression nodes.

Here’s what the Accept method of SqlCaseExpression looks like:

public override Expression Accept (ExpressionTreeVisitor visitor)

{

  ArgumentUtility.CheckNotNull (“visitor”, visitor);

 

  var specificVisitor = visitor as ISqlSpecificExpressionVisitor;

  if (specificVisitor != null)

    return specificVisitor.VisitSqlCaseExpression (this);

  else

    return base.Accept (visitor);

}

[Side note: the re-motion team has recently switched to Visual Studio 2010. As you can see, the Copy As HTML add-in can now copy the syntax highlighting performed by ReSharper. Here’s a description of how to adapt Copy As HTML to run in VS 2010.]

As you can see, the concept of custom expressions and dedicated visitors now requires a runtime type check: being of a non-standard expression type, the SqlCaseExpression instance has to determine whether the visitor can deal with it; only visitors implementing the ISqlSpecificExpressionVisitor interface have a VisitSqlCaseExpression method to dispatch to.

But what happens if a standard visitor, such as PartialEvaluator, is used to visit an object graph containing a custom expression, such as SqlCaseExpression?

The answer is in the call to base.Accept(). As you may have noticed, the listing above and the picture don’t actually match. Expression does not have an Accept method in .NET 3.5. So how can SqlCaseExpression override it?

The answer is that SqlCaseExpression is not derived directly from Expression, but from ExtensionExpression. That class contains boilerplate code for the Visitor pattern, and it contains the following base implementation of Accept:

public virtual Expression Accept (ExpressionTreeVisitor visitor)

{

  ArgumentUtility.CheckNotNull (“visitor”, visitor);

 

  return visitor.VisitUnknownExpression (this);

}

So, if a visitor does not support a custom expression, a catch-all VisitUnknownExpression method is called. Depending on its purpose, a visitor can implement that method to ignore the unknown expression, throw an expression, or … do something completely different.

What that is, I will explain in my next blog post.

Written by Fabian

May 9th, 2010 at 5:29 pm

Posted in patterns,re-linq