Archive for April, 2009
As I explained in my last post, the idea of re-linq is to transform ugly IQueryable-based ASTs into nice query expression ASTs (from … select).
A discussion with Frans Bouma, distinguished LINQ tamer, brought up a point we discussed for quite a while internally: does this approach limit our ability to express every conceivable LINQ query? And if not, will it still provide value, or will this result in ASTs that are just as hard to use as IQueryable ASTs, only different?
Frans brings up the following example:
from c in md.Customer.Where(x=>x.Country=="Germany") join o in md.... select c.Orders;
Initially we were thinking that re-linq would just lead to a 80+% solution, because using Queriable’s extension methods just lets you do things that cannot be expressed directly using query expressions. (That was good enough for us anyway, since all we wanted to do back then was to get serious – but not necessarily complete – LINQ support for re-store, our own ORM. So we just went ahead.)
When we thought about turning re-linq into a much more generic beast, we revisited this assumption and discovered that for any direct use of Queryable there is not only a way to express the same using query expressions (from … select notation). It is also easier to process these expressions, because when we get down to it, using a full LINQ provider is only useful if we transform the query to a powerful enough query language, like SQL, HQL or Entity SQL. And those languages usually support sub-queries, but none of them supports anything even remotely similar than direct invocations on Where(), SelectMany() etc.
For Frans’ example, that would be:
from c in (from x in md.Customer where x.Country=="Germany" select x) join o in md.... select c.Orders;
The transformation result might not be shorter, but it’s much easier to process and transform because
- it is structurally similar to the output model,
- you can use existing transformations in a modular way for sub-queries, and
- everything strange (like transparent identifiers) has already been dealt with at that point.
A more intelligent transformation could transform the whole thing to just
from c in md.Customer join o in md.... where x.Country=="Germany" select c.Orders;
These queries are semantically identical (i.e., there should only be a difference if you execute these statements imperatively using LINQ to objects). This is a very interesting transformation that does a nice optimization. For any simple SQL-emitting back-end, this would also lead to much cleaner SQL queries without the back-end even being aware of this particular optimization).
At the end of the day, this is a matter of how many special cases you want to support. LINQ to SQL is known to go out of its way to produce the simplest SQL possible even for frightening LINQ queries. But some optimizations might be even easier to achieve using re-linq, because the patterns are more easily recognizable.
Either way, moving this kind of optimizations into a shared OSS project sounds more promising to me than solving it for every single provider. It doesn’t only reduce the overall effort, it also means that all optimizations that are done on the query model itself can be shared between providers. Everyone who thinks that re-linq cannot do some trick that they absolutely need is welcome to contribute to re-linq. (For instance, we here at rubicon are not particularly interested in optimizing LINQ joins, because joins are usually implicit in ORMs, and a good DB like SQL server would recognize the pattern anyway and come up with identical execution plans for both queries. But if someone else builds this optimization (and it won’t cost us an arm and a leg in terms of performance), we will gladly use it.)
Now we’re not there yet, and I cannot promise that we’re not going to discover situations that are more difficult to solve than we anticipated, but we’re confident that this is not a dead-end approach that will just fail for more complex scenarios. The design behind re-linq’s transformations is not trivial, and we are quite confident that it is a solid ground to build on. I’m sure Fabian will find some time to go into internals when he’s back at work.
I’m pretty sure that everyone who has ever tried to create a serious LINQ provider has suffered through the same stages:
- Fascination: The way C# 3’s new features were combined to create query comprehensions are cool. Lambdas, anonymous types, extension methods, and all the rest are really useful language features on their own, and the query expression syntax (from … select) is really just a very thin layer on top of it (although they somehow managed to get Monads into the game).
- Excitement: Via IQueryable and Expression<Func<…>>, we even get ASTs for the queries we write, and we can process them to generate SQL!
- Hope: LINQ query expressions look much like SQL, so there must be a simple way to transform one into the other.
- Frustration: The Queryable class makes a big mess out of those nice-looking query expressions. As we look at more complex queries and read the C# specs, we realize that the transformation from query expressions into method invocations follows some strange rules. They are very necessary for creating expressions that can be compiled and executed in C#, but unfortunately, we get the same bloated expressions for our ASTs. Now transformation to SQL doesn’t look so simple anymore. Transparent identifiers anyone?
- Despair: System.Core does not contain any useful code to deal with those ASTs. LINQ to SQL does, but it doesn’t expose it (unfortunately, because LINQ to SQL is one oft the few LINQ providers that can deal with anything you throw at it).
- Capitulation: We settle with a simple LINQ provider that understands just a primitive subset of LINQ (and throws exceptions for everything else), or we find that other LINQ providers might be good enough.
OK, now the last point is an exaggeration. Not everyone surrenders to the complexity of LINQ, or we would not have any useful LINQ providers. But few people actually get past this point, just have a look at Frans Bouma’s 15-part series Developing Linq to LLBLGen Pro. And honestly, it’s a real PITA that everyone would have to do this from scratch.
If we look at the nice syntax that query expressions have, it’s really a pity. Why can’t we have ASTs that resemble this beautiful syntax? After all, they are so much more like anything we’d want to transform them to (except for compiled code). SQL, XQuery, NHibernate’s HQL, Entity Framework’s Entity SQL, you name it.
Starting last year, we created a LINQ infrastructure that supports our own O/R mapper (re-store), and we started from exactly this point: To reduce complexity, we’d transform LINQ ASTs back into ASTs that resemble the original query expression syntax. (In some cases this is not the original syntax, since the extension methods of Queryable can also be called directly. But in compiled code this makes no difference, so we transform everything into query expressions.)
The result is re-linq. If you are thinking about writing a LINQ provider that supports more than just from/where/select, I’d like to encourage you to take a close look at re-link. Also, do not hesitate to contact us about missing features or chances to participate. (Unfortunately, providing public access to our issue tracker and a mailing list are still on our to-do list, but supply will follow demand!)
Fabian has written a short paper about re-linq (including open issues), and we’re going to publish more information in this blog soon. Stay tuned.