Fabian's Mix

Mixins, .NET, and more

Archive for the ‘patterns’ Category

How a Visitor implementation can handle unknown nodes

without comments

Two weeks ago, I explained how we use interfaces and dynamic type casts to overcome the Visitor pattern’s problems with extensible class hierarchies in re-linq. In a nutshell, we have a visitor base class, ExpressionTreeVisitor, with Visit methods for all standard expression node types. For extension nodes, we define specific visitor interfaces, and we test whether a specific visitor instance supports the interface using a runtime type check. This pattern is known as the Acyclic Visitor pattern (to some, at least…).

Here’s an example; the Accept method of SqlCaseExpression (an extension node type):

public override Expression Accept (ExpressionTreeVisitor visitor)

{

  ArgumentUtility.CheckNotNull (“visitor”, visitor);

 

  var specificVisitor = visitor as ISqlSpecificExpressionVisitor;

  if (specificVisitor != null)

    return specificVisitor.VisitSqlCaseExpression (this);

  else

    return base.Accept (visitor);

}

The base class’ Accept method, which is called if the visitor instance does not support the extension node, is implemented as follows:

public virtual Expression Accept (ExpressionTreeVisitor visitor)

{

  ArgumentUtility.CheckNotNull (“visitor”, visitor);

 

  return visitor.VisitUnknownExpression (this);

}

Last time I said that there were three possibilities of how the VisitUnknownExpression method could be implemented in a specific visitor: it could ignore the unknown expression, it could throw an exception, or do … something else. But what could that third option be?

The answer sounds simple: Handle the unknown node in a generic way.

Sometimes, that’s easy. For example, re-linq has a visitor called FormattingExpressionVisitor, which is used to generate a string representation of expression trees. (We need that because the implementation of Expression.ToString acts just plain stupid when confronted with a non-standard expression being a child of a standard one.) That visitor  handles unknown nodes in a generic way – it just calls ToString on them.

Sometimes, it’s harder. While some expression visitors, such as a SQL-generating visitor or the FormattingExpressionTreeVisitor, are used to transform expression trees into a completely different representation, the much more common scenario is that specialized visitors search for certain node patterns. They then either return what they’ve found or transform the nodes, replacing them with other structures.

For example, consider a visitor that looks for ConditionalExpressions (the equivalent of C#’s conditional operator ?: in expression trees) and transforms those into SqlCaseExpressions:

protected override Expression VisitConditionalExpression (
    ConditionalExpression expression)

{

  ArgumentUtility.CheckNotNull (“expression”, expression);

 

  return new SqlCaseExpression (
    VisitExpression (expression.Test),
    VisitExpression (expression.IfTrue),
    VisitExpression (expression.IfFalse));

}

How should such a pattern matching visitor handle unknown expressions in a generic way? After all, a ConditionalExpression might be located beneath any unknown expression!

Unknown expression containing conditional expression

Ideally, a pattern matching visitor should be able to ignore all unknown expressions and at the same time analyze the expressions’ child nodes. And, voilà, this is exactly what the default implementation of ExpressionTreeVisitor.VisitUnknownExpression does:

protected internal virtual Expression VisitUnknownExpression (
    Expression expression)

{

  var extensionExpression = expression as ExtensionExpression;

  if (extensionExpression != null)

    return extensionExpression.VisitChildren (this);

  else

  {

    var message = string.Format (
        “Expression type {0} is not supported by this {1}.”,
        expression.NodeType,
        GetType().Name);

    throw new NotSupportedException (message);

  }

}

Every extension expression has a VisitChildren method used by visitors that need to visit a specific node’s children even when they don’t know the node’s type. Here’s SqlCaseExpression’s implementation of VisitChildren:

protected override Expression VisitChildren (ExpressionTreeVisitor visitor)

{

  ArgumentUtility.CheckNotNull (“visitor”, visitor);

 

  var testPredicate = visitor.VisitExpression (TestPredicate);

  var thenValue = visitor.VisitExpression (ThenValue);

  var elseValue = visitor.VisitExpression (ElseValue);

 

  if (testPredicate != TestPredicate
      || thenValue != ThenValue
      || elseValue != ElseValue)

    return new SqlCaseExpression (testPredicate, thenValue, elseValue);

  else

    return this;

}

This clever design was not my own idea, it originates from the Dynamic Language Runtime’s abstract syntax tree implementation, and it has later been included in .NET 4.0’s Expression class. The method is called VisitExtension there; you can read my short summary of that feature in my post about .NET 4.0’s extension expressions. As I promised in that post, we copied the design for re-linq and .NET 3.5.

So, this post concludes my little series about the visitor pattern. To summarize,

“And yet, in this instance, a function by any other name is still as virtual,” she smiled, and glided away. Her words drifted off with her as she passed out of sight: “And a Visitor pattern by any other name is still as useful, and is more aptly described. But that is the way of things….”

Conversations: By Any Other Name, Jim Hyslop and Herb Sutter, Dr. Dobb’s

Written by Fabian

May 24th, 2010 at 7:54 pm

Posted in patterns,re-linq

The second problem of Visitors – and a solution

without comments

In my last post, I explained that one big weakness of the Visitor design pattern is that it can only be used when supported by the visited classes. In LINQ, I explained, we have a good scenario for the Visitor pattern because we have to define a lot of algorithms, the specifics of which depend on the node types present in an expression tree. Since, however, the LINQ expression class designers didn’t provide the necessary hooks, we can’t make use of the pattern (and are left with a bad simulation of Visitor that is built on runtime type checks).

I promised that in the course of this next post I’d talk about the second big weakness of the Visitor design pattern, but before I do this, I’d like to reiterate why Visitor is a Good Thing, despite its weaknesses. I’ll quote the good consequences of using Visitor from Design Patterns by Gamma, Helm, Johnson, and Vlissides, the famous Gang of Four (the explanations are my own):

  • Visitor makes adding new operations easy. It defines a framework that, when added to a class, allows easy addition of new operations. Classic case of Open/Closed.
  • A visitor gathers related operations and separates unrelated ones. It provides clean separation of concerns and avoids unrelated operations being entangled in a single class.
  • Visiting across class hierarchies. It allows processing of object graphs in which not all objects have a single base class.
  • Accumulating state. The visitor implementation can gather state while operating on the objects in the graph being visited.

Those are good consequences of using the pattern. But Design Patterns also lists that second big disadvantage I want to talk about: Adding new ConcreteElement classes is hard. What does that mean?

To explain, remember that Visitor requires an interface (or base class) that defines the contract of all visitor implementations:

interface IFruitVisitor

{

  void VisitApple (Apple apple);

  void VisitOrange (Orange orange);

}

This interface is required for the double dispatch mechanism Visitor is based on: the visited object decides what visitor method to call based on its own type.

So, the visitor interface contains one method per visitable class; in our example, those would be Apple and Orange. Now consider what happens if we add a new class, Boysenberry. To be able to write algorithms visiting instances of this new type, we’d need to add a VisitBoysenberry method to the interface. And thus to every existing visitor class based on that interface. Of which there might be many. In a nutshell: Adding new ConcreteElement classes is hard.

It might even be impossible.

We had that problem in the new SQL backend we’re currently building for re-linq. re-linq’s frontend defines the expression visitor interface, but the SQL backend defines a number of additional SQL-specific expression types. Adapting the visitor interface to add Visit methods for those new expression types would require adding a dependency from the front-end to the SQL backend, which we didn’t want at all.

There are alternatives to the Visitor pattern that do not require the class hierarchy to be stable, some of which re-linq uses in places where a lot of extensibility is needed (e.g., a dictionary mapping node types to implementations of a strategy interface). But in the situation of transforming expression trees? Shouldn’t the Visitor pattern be the cleanest, most object-oriented approach to this problem?

Therefore, we thought hard to overcome the issue of low flexibility, and this is our solution:

image

The .NET framework (mscorlib) defines the standard expression types. Normally, it would also define the standard visitor interface (or base class), but as mentioned before, re-linq has to simulate the Visitor implementation in its frontend (Remotion.Data.Linq). In that assembly, there are also several visitor implementations operating on the standard expression types, for example a PartialEvaluator. Those classes implement the standard visitor interface (deriving from ExpressionTreeVisitor) and thus only know how to deal with standard expression types.

The SQL backend (Remotion.Data.Linq.SqlBackend) defines additional expression types, e.g., SqlCaseExpression. It also defines special visitor interfaces for visitors that can handle the additional expressions, such as ISqlSpecificExpressionVisitor. Expression visitors in the backend, e.g., SqlGenerator, implement both the standard and the custom visitor interfaces and are thus able to visit both standard and custom expression nodes.

Here’s what the Accept method of SqlCaseExpression looks like:

public override Expression Accept (ExpressionTreeVisitor visitor)

{

  ArgumentUtility.CheckNotNull (“visitor”, visitor);

 

  var specificVisitor = visitor as ISqlSpecificExpressionVisitor;

  if (specificVisitor != null)

    return specificVisitor.VisitSqlCaseExpression (this);

  else

    return base.Accept (visitor);

}

[Side note: the re-motion team has recently switched to Visual Studio 2010. As you can see, the Copy As HTML add-in can now copy the syntax highlighting performed by ReSharper. Here’s a description of how to adapt Copy As HTML to run in VS 2010.]

As you can see, the concept of custom expressions and dedicated visitors now requires a runtime type check: being of a non-standard expression type, the SqlCaseExpression instance has to determine whether the visitor can deal with it; only visitors implementing the ISqlSpecificExpressionVisitor interface have a VisitSqlCaseExpression method to dispatch to.

But what happens if a standard visitor, such as PartialEvaluator, is used to visit an object graph containing a custom expression, such as SqlCaseExpression?

The answer is in the call to base.Accept(). As you may have noticed, the listing above and the picture don’t actually match. Expression does not have an Accept method in .NET 3.5. So how can SqlCaseExpression override it?

The answer is that SqlCaseExpression is not derived directly from Expression, but from ExtensionExpression. That class contains boilerplate code for the Visitor pattern, and it contains the following base implementation of Accept:

public virtual Expression Accept (ExpressionTreeVisitor visitor)

{

  ArgumentUtility.CheckNotNull (“visitor”, visitor);

 

  return visitor.VisitUnknownExpression (this);

}

So, if a visitor does not support a custom expression, a catch-all VisitUnknownExpression method is called. Depending on its purpose, a visitor can implement that method to ignore the unknown expression, throw an expression, or … do something completely different.

What that is, I will explain in my next blog post.

Written by Fabian

May 9th, 2010 at 5:29 pm

Posted in patterns,re-linq

LINQ: A good Visitor use case (but bad implementation)

without comments

Last week, I wrote about the Visitor design pattern as a means of executing different code depending on the runtime type of an object. For example, if you want to deal with Apples and Oranges in different ways, Visitor is the way to go.

Another, and maybe a little more realistic area of application of the pattern is the analysis and transformation of tree data structures. In fact, Gamma, Helm, Johnson, and Vlissides used exactly this – the processing of an abstract syntax tree – as the motivating example in their explanation of Visitor in the Design Patterns book. Recently, I’ve been doing a lot of abstract syntax tree processing because I’m working on a new SQL backend for re-linq.

In LINQ, everything is built around the concept of expression trees, abstract syntax trees that represent queries. The re-linq front-end parses those trees and constructs query models from them in order to enable easier interpretation of those queries. The expressions used inside the query, such as select projections, where conditions, and so on, are kept as (simplified) expression trees, however. Therefore, anybody building a LINQ provider based on re-linq will have to write some code to analyze these expression trees.

The analyzing code might be very simple, such as “for constant expressions, emit the constant value to the SQL statement being built”, or more complex, such as, “for member expressions, determine whether the member denotes an entity and, if so, emit a join”. But no matter how complex or simple it is, it will always be code that depends on the actual type of the expression nodes in the tree. And how is type dependent code handled most elegantly? Here comes the Visitor pattern: re-linq provides an ExpressionTreeVisitor base class with methods such as VisitConstantExpression and VisitMemberExpression that are automatically called depending on the type of expression being processed by the visitor. Just derive from that class and override the Visit… methods to execute your type-dependent code.

However, re-linq has to cheat for bringing the Visitor pattern to .NET 3.5 expression tree data structures. Due to some oversight (or, more likely, time constraints), .NET 3.5’s expression nodes do not provide Accept methods, so the visitor base class cannot use double dispatch to execute the right Visit… method for an expression. Instead, it has to resort to dynamic type comparisons and discriminator (= NodeType property) checks. This dirty workaround is hidden in re-linq’s ExpressionTreeVisitor.VisitExpression method:

 

public virtual Expression VisitExpression (Expression expression)

{

  // …

 

  switch (expression.NodeType)

  {

    case ExpressionType.ArrayLength:

    case ExpressionType.Convert:

    case ExpressionType.ConvertChecked:

    case ExpressionType.Negate:

    case ExpressionType.NegateChecked:

    case ExpressionType.Not:

    case ExpressionType.Quote:

    case ExpressionType.TypeAs:

    case ExpressionType.UnaryPlus:

      return VisitUnaryExpression ((UnaryExpression) expression);

    case ExpressionType.Add:

    case ExpressionType.AddChecked:

    case ExpressionType.Divide:

    case ExpressionType.Modulo:

    case ExpressionType.Multiply:

    case ExpressionType.MultiplyChecked:

    case ExpressionType.Power:

    case ExpressionType.Subtract:

    case ExpressionType.SubtractChecked:

    case ExpressionType.And:

    case ExpressionType.Or:

    case ExpressionType.ExclusiveOr:

    case ExpressionType.LeftShift:

    case ExpressionType.RightShift:

    case ExpressionType.AndAlso:

    case ExpressionType.OrElse:

    case ExpressionType.Equal:

    case ExpressionType.NotEqual:

    case ExpressionType.GreaterThanOrEqual:

    case ExpressionType.GreaterThan:

    case ExpressionType.LessThan:

    case ExpressionType.LessThanOrEqual:

    case ExpressionType.Coalesce:

    case ExpressionType.ArrayIndex:

      return VisitBinaryExpression ((BinaryExpression) expression);

    case ExpressionType.Conditional:

      return VisitConditionalExpression ((ConditionalExpression) expression);

    case ExpressionType.Constant:

      return VisitConstantExpression ((ConstantExpression) expression);

    case ExpressionType.Invoke:

      return VisitInvocationExpression ((InvocationExpression) expression);

    case ExpressionType.Lambda:

      return VisitLambdaExpression ((LambdaExpression) expression);

    case ExpressionType.MemberAccess:

      return VisitMemberExpression ((MemberExpression) expression);

    case ExpressionType.Call:

      return VisitMethodCallExpression ((MethodCallExpression) expression);

    case ExpressionType.New:

      return VisitNewExpression ((NewExpression) expression);

    case ExpressionType.NewArrayBounds:

    case ExpressionType.NewArrayInit:

      return VisitNewArrayExpression ((NewArrayExpression) expression);

    case ExpressionType.MemberInit:

      return VisitMemberInitExpression ((MemberInitExpression) expression);

    case ExpressionType.ListInit:

      return VisitListInitExpression ((ListInitExpression) expression);

    case ExpressionType.Parameter:

      return VisitParameterExpression ((ParameterExpression) expression);

    case ExpressionType.TypeIs:

      return VisitTypeBinaryExpression ((TypeBinaryExpression) expression);

 

    default:

      if (expression is SubQueryExpression)

        return VisitSubQueryExpression ((SubQueryExpression) expression);

      else if (expression is QuerySourceReferenceExpression)

        return VisitQuerySourceReferenceExpression (
            (QuerySourceReferenceExpression) expression);

      else

        return VisitUnknownExpression (expression);

  }

}

In .NET 4.0, this has finally been addressed, and the Expression base class provides an Accept method that can be used to implement double dispatch. However, .NET 3.5 expressions show one of the two big weaknesses of the Visitor pattern: The visited class hierarchy must support the pattern for it to be used. If the authors of the class hierarchy didn’t anticipate the need for double dispatch, or if they didn’t knew the pattern, it’s out of reach.

Next time: the second big problem of the Visitor pattern. Can you guess what it is?

Written by Fabian

April 18th, 2010 at 3:50 pm

Posted in patterns

Virtual methods outside of classes: The Visitor Pattern

with 6 comments

A great many years ago (i.e., in 2003), I worked at a the research institution “CT SE2” at Siemens in Munich, Germany. At that time, a friend and myself implemented a Profiling-API-based AOP tool for the .NET 1.1 runtime, and that tool had to be written in C++. Because C++ is a difficult language that can be devastatingly complex, but also extremely elegant, depending on how you use it, I started to read the excellent Conversations column published by Jim Hyslop and Herb Sutter in the Dr. Dobb’s journal. In Conversations, the two authors would present common C++ idioms, pitfalls, and patterns wrapped in short stories about a young programmer who is taught those things by the Guru.

One of the stories, called “By Any Other Name”, was about the Visitor pattern. The story described how Visitor was not a good name for the pattern:

“Consider the parable yet again. What is Personnel::Accept?” – “Uh, um?” I said rather dimly. “This class implements the Poorly Named Pattern, or PNP. It is also known as the Visitor pattern.” “Uh, um!” I said, a little more brightly now. “I’ve read about Visitor. But that’s just for having objects that can iteratively visit each other. Isn’t it?” She sighed. “A common misapprehension. Think that the V means, not so much Visitor, but Virtual,” she explained. “The PNP’s most useful application is to enable the addition of virtual functions to an existing class hierarchy without further changing that hierarchy.

The visitor pattern is often seen as a solution for iterating over an inhomogenous collection of objects. The story was about how it’s really much more than that – it enables “the addition of virtual functions to an existing class hierarchy without further changing that hierarchy”, as the Guru described. But even if you already know the Visitor pattern, it might not exactly be clear at the first glance what this actually means.

To explain this idea, let’s try to motivate an implementation of the pattern via a vitamin-packed example. Consider a class hierarchy, such as Fruit, Apple, and Orange, with Apple and Orange each derived from Fruit. Consider further that you have the requirement to be able to eat a piece of fruit, the algorithm of course depending on what kind of fruit it is. You might realize this operation by adding an abstract method on the Fruit class and implement it in the derived classes:

abstract class Fruit

{

  public abstract void BeEaten (Person eater);

}

 

class Apple : Fruit

{

  public override void BeEaten(Person eater)

  {

    while (HasFlesh)

      eater.Bite (this);

  }

 

  // …

}

 

class Orange : Fruit

{

  public override void BeEaten(Person eater)

  {

    eater.Peel (this);

    while (Slices.Count != 0)

    {

      var slice = Slices.RemoveNext ();

      while (slice.HasFlesh)

      {

        eater.Bite (slice);

      }

    }

  }

 

  // …

}

And after implementing this piece of code, you’d hopefully think, “BeEaten? What kind of method name is that?”

The awkwardness of the method name is a sign that the code is not in the right location – in reality, not the fruit should know how to be eaten, the person should know how to eat the different kinds of fruit. Time for refactoring – a naive Person.Eat method could look like this:

class Person

{

  public void Eat (Fruit fruit)

  {

    var apple = fruit as Apple;

    if (apple != null)

    {

      while (apple.hasFlesh)

        Bite (apple);

 

      return;

    }

 

    var orange = fruit as Orange;

    if (orange != null)

    {

      Peel (orange);

      while (orange.Slices.Count != 0)

      {

        var slice = orange.Slices.RemoveNext ();

        while (slice.HasFlesh)

        {

          Bite (slice);

        }

      }

      return;

    }

 

    throw new NotSupportedException (
        “Don’t know how to eat a(n) “ + fruit.GetType ().Name + “.”);

  }

 

  // …

}

Now, the code is in the right place (the Person class), but that kind of type test is of course not a good coding practice, and it’s not acceptable in most cases.

A better solution is to apply the Visitor pattern. Effectively, a Person is just a FruitVisitor:

interface IFruitVisitor

{

  void VisitApple (Apple apple);

  void VisitOrange (Orange orange);

}

 

class Person : IFruitVisitor

{

  public void Eat (Fruit fruit)

  {

    fruit.Accept (this);

  }

 

  void IFruitVisitor.VisitApple(Apple apple)

  {

    while (apple.hasFlesh)

      Bite (apple);

  }

 

  void IFruitVisitor.VisitOrange(Orange orange)

  {

    Peel (this);

    while (orange.Slices.Count != 0)

    {

      var slice = orange.Slices.RemoveNext ();

      while (slice.HasFlesh)

      {

        Bite (slice);

      }

    }

  }

 

  // …

}

Now, Fruit only needs an abstract method Accept (IFruitVisitor visitor), which Apple implements by calling visitor.VisitApple (this), and Orange implements by calling visitor.VisitOrange (this). The Accept method dispatches the method call to the right Visit… method of the IFruitVisitor, and because this happens after the person.Eat (fruit) method call is dispatched to the Person.Eat method implementation, the concept is also called double dispatch.

My, this is a nice implementation, isn’t it? The code is in the right place, we don’t need awkward type checks – by implementing the interface, the compiler will even tell us if we forget to implement the algorithm for a specific kind of fruit!

Back to the original question: Nice the Visitor-based implementation might be, but how exactly did it allow us to add a new virtual method to the Fruit hierarchy without actually changing that hierarchy?

Well, first of all, it’s not really a virtual method that is added. Actually, it’s just code that differentiates between whether it is executed for an Apple or an Orange, just like a virtual method overridden in Apple and Orange would. For brevity, let’s call that type-dependent code because it is code that depends on the type of the fruit instance.

Second, “without changing that hierarchy” is true only when the Visitor pattern is already in place in the class hierarchy. There must already be an IFruitVisitor interface, and the Fruit classes must define Accept methods.

With these clarifications, the question is now: How exactly did the Visitor pattern allow us to add new type-dependent code to the Fruit hierarchy (which implements the Accept method) without actually changing that hierarchy?

In our example, the type-dependent code is of course the implementation of the algorithms to eat the different kinds of fruit. If you compare the last listing with the first one, you can see that the VisitApple and VisitOrange methods contain more or less the same code we would have written into the overrides of a virtual BeEaten method. By using the Visitor pattern, we were therefore able to add that type-dependent code (which usually requires a virtual method) without actually touching the Fruit hierarchy. And this is what the Guru meant in the story above.

If you think about it, the Accept method defined by the pattern is nothing else than an extensibility hook in your class hierarchy for external type-dependent code. One general-purpose virtual method for others to use on their convenience.

There remains one question, of course: When to use the Visitor pattern? Should we just avoid using virtual methods at all and instead put every type-dependent code into visitor classes? Of course not. Virtual methods are still the way to go for type-dependent code that belongs into the class hierarchy itself. The Visitor pattern is for type-dependent code that belongs somewhere else. It helps you to achieve better separation of concerns by allowing you to put your code where it belongs.

And this, as I’m sure the Guru would agree, is something to meditate on.

Written by Fabian

April 10th, 2010 at 5:45 pm

Posted in patterns