LRN Quick Tips: Don’t trust SyntaxNode.ToFullString()

At Code Connect we’ve been working hard on a project that rewrites a user’s code and then compiles the rewritten solution. Without giving too much away, it essentially acts as a logger and can intercept all variable assignments.

So the following code:


class MyClass
{
void MyMethod()
{
int x = 5;
}
}

view raw

BasicClass.cs

hosted with ❤ by GitHub

Is rewritten to:


class MyClass
{
void MyMethod()
{
int x = LogValue("x", 5);
}
public static T LogValue<T>(string name, T value)
{
//Write to file
//…
return value;
}
}

view raw

Rewritten.cs

hosted with ❤ by GitHub

At some point we decided we no longer wanted to inject the LogValue method directly into the code we were instrumenting. We would place it in an entirely different namespace: CodeConnect.Instrumentation.

Here was how we originally created the invocation to LogValue:


var newNode = SyntaxFactory.InvocationExpression(
SyntaxFactory.IdentifierName(
"LogValue"))
.WithArgumentList(
//…
);

So we figured we’d incorporate our new namespace as follows:


var newNode = SyntaxFactory.InvocationExpression(
SyntaxFactory.IdentifierName(
"CodeConnect.Instrumentation.LogValue"))
.WithArgumentList(
//…
);

At first glance, everything checked out. Calling .ToFullString() on this node revealed what looked like a proper invocation:

CodeConnect.Instrumentation.LogValue("x", 5);

However, try as we may we could not get our code to compile. We would constantly errors telling us the CodeConnect.Instrumentation type didn’t exist:

CS0103 at line 12 (character 19): The name 'CodeConnect.Instrumentation' does not exist in the current context.

We checked our references, we checked our rewritten solution and we checked the spelling at least a dozen times. Everything checked out. Eventually we did the classic “Which change broken this functionality?” walk through out Git history. (Note: If I had written more complete unit tests when I first wrote the rewriter, we would have caught this a lot quicker!)

We realized it was the change to the rewriter that introduced this problem. It took some time before we realized exactly what was wrong. Once again the ever-truthful Syntax Visualizer (now packaged with the .NET Compiler Tools SDK) came to the rescue and showed us what the syntax tree for a valid invocation of  CodeConnect.Instrumentation.LogValue("x", 5); would look like:

graph

After looking over the above tree, we realized we were supposed to create a chain of SimpleMemberAccessExpressions, not a single InvocationExpression with the name “CodeConnect.Instrumentation.LogValue”. When the binder when to bind this invocation to a symbol, it failed as no method can be declared with this name. The tree we had created was invalid and could never have been parsed out of a user’s source code.

This is a key point to understand when creating or rewriting syntax trees:

No one will stop you from creating invalid syntax trees.

Or from Kevin Pilch-Bisson on Stack Overflow:

> The Roslyn Syntax construction API does not guarantee that you can only build valid programs.

We can take this to its logical extreme and create trees that don’t make any sense. For example, we can create WhitespaceTrivia whose value is “public void M() { }” .


//Make some impossible non-whitespace trivia
var trivia = SyntaxFactory.SyntaxTrivia(SyntaxKind.WhitespaceTrivia,
@"
public void M() {
}");
var node = SyntaxFactory.ClassDeclaration(" test")
.WithOpenBraceToken(
SyntaxFactory.Token(SyntaxKind.OpenBraceToken)
.WithTrailingTrivia(trivia)
);
Console.WriteLine(node.ToFullString());

The above prints the following misleading string:

class test{
public void M() {

}}

The output is misleading as it causes the reader to make assumptions about how the syntax tree must be formed. In my experience the only true arbiter of truth when it comes to syntax tree is the Roslyn Syntax Visualizer. I’d love for it to be expanded to visualize in memory trees while debugging.

The takeaways here:

  • Don’t trust .ToString() or .ToFullString()
  • Understand that you may accidentally generate invalid trees
  • Write tests

5 thoughts on “LRN Quick Tips: Don’t trust SyntaxNode.ToFullString()

  1. One catch (at least for me). Occasionally when I rewrite expression statements, I’d like to replace one expression statement with two.

    My method for doing this is to put both statements inside a single block statement. I construct the block statement with missing open and close tokens.

    So any time I’m (ab)using this approach, I won’t actually get the same tree I started with.

  2. First of all, thanks for your great series, it really helped learn and understand Roslyn, keep on the great work, and I hope my comments will help you, so to pay back a little…

    I would recommend that you should make use as much as possible of the SyntaxFactory.ParseExpression() API call, as it will save you a lot of headaches, including this one… use the other API calls of the SyntaxFactory only when you don’t have the expression string…

Leave a reply to joshvarty Cancel reply