At Code Connect we’ve been working hard on a project that rewrites a user’s code and then compiles the rewritten solution. Without giving too much away, it essentially acts as a logger and can intercept all variable assignments.
So the following code:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
class MyClass | |
{ | |
void MyMethod() | |
{ | |
int x = 5; | |
} | |
} |
Is rewritten to:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
class MyClass | |
{ | |
void MyMethod() | |
{ | |
int x = LogValue("x", 5); | |
} | |
public static T LogValue<T>(string name, T value) | |
{ | |
//Write to file | |
//… | |
return value; | |
} | |
} |
At some point we decided we no longer wanted to inject the LogValue
method directly into the code we were instrumenting. We would place it in an entirely different namespace: CodeConnect.Instrumentation
.
Here was how we originally created the invocation to LogValue
:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
var newNode = SyntaxFactory.InvocationExpression( | |
SyntaxFactory.IdentifierName( | |
"LogValue")) | |
.WithArgumentList( | |
//… | |
); |
So we figured we’d incorporate our new namespace as follows:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
var newNode = SyntaxFactory.InvocationExpression( | |
SyntaxFactory.IdentifierName( | |
"CodeConnect.Instrumentation.LogValue")) | |
.WithArgumentList( | |
//… | |
); |
At first glance, everything checked out. Calling .ToFullString()
on this node revealed what looked like a proper invocation:
CodeConnect.Instrumentation.LogValue("x", 5);
However, try as we may we could not get our code to compile. We would constantly errors telling us the CodeConnect.Instrumentation
type didn’t exist:
CS0103 at line 12 (character 19): The name 'CodeConnect.Instrumentation' does not exist in the current context.
We checked our references, we checked our rewritten solution and we checked the spelling at least a dozen times. Everything checked out. Eventually we did the classic “Which change broken this functionality?” walk through out Git history. (Note: If I had written more complete unit tests when I first wrote the rewriter, we would have caught this a lot quicker!)
We realized it was the change to the rewriter that introduced this problem. It took some time before we realized exactly what was wrong. Once again the ever-truthful Syntax Visualizer (now packaged with the .NET Compiler Tools SDK) came to the rescue and showed us what the syntax tree for a valid invocation of CodeConnect.Instrumentation.LogValue("x", 5)
; would look like:
After looking over the above tree, we realized we were supposed to create a chain of SimpleMemberAccessExpressions
, not a single InvocationExpression
with the name “CodeConnect.Instrumentation.LogValue”. When the binder when to bind this invocation to a symbol, it failed as no method can be declared with this name. The tree we had created was invalid and could never have been parsed out of a user’s source code.
This is a key point to understand when creating or rewriting syntax trees:
No one will stop you from creating invalid syntax trees.
Or from Kevin Pilch-Bisson on Stack Overflow:
> The Roslyn Syntax construction API does not guarantee that you can only build valid programs.
We can take this to its logical extreme and create trees that don’t make any sense. For example, we can create WhitespaceTrivia
whose value is “public void M() { }
” .
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
//Make some impossible non-whitespace trivia | |
var trivia = SyntaxFactory.SyntaxTrivia(SyntaxKind.WhitespaceTrivia, | |
@" | |
public void M() { | |
}"); | |
var node = SyntaxFactory.ClassDeclaration(" test") | |
.WithOpenBraceToken( | |
SyntaxFactory.Token(SyntaxKind.OpenBraceToken) | |
.WithTrailingTrivia(trivia) | |
); | |
Console.WriteLine(node.ToFullString()); |
The above prints the following misleading string:
class test{
public void M() {
}}
The output is misleading as it causes the reader to make assumptions about how the syntax tree must be formed. In my experience the only true arbiter of truth when it comes to syntax tree is the Roslyn Syntax Visualizer. I’d love for it to be expanded to visualize in memory trees while debugging.
The takeaways here:
- Don’t trust .ToString() or .ToFullString()
- Understand that you may accidentally generate invalid trees
- Write tests
When in doubt, use ToFullString() and send the output to the parser. Did you get the same tree you started with?
One catch (at least for me). Occasionally when I rewrite expression statements, I’d like to replace one expression statement with two.
My method for doing this is to put both statements inside a single block statement. I construct the block statement with missing open and close tokens.
So any time I’m (ab)using this approach, I won’t actually get the same tree I started with.
The Syntax Visualizer link doesn’t work
Thanks, I’ve updated the article with the new link for the .NET Compiler Platform SDK: https://visualstudiogallery.msdn.microsoft.com/2ddb7240-5249-4c8c-969e-5d05823bcb89
First of all, thanks for your great series, it really helped learn and understand Roslyn, keep on the great work, and I hope my comments will help you, so to pay back a little…
I would recommend that you should make use as much as possible of the SyntaxFactory.ParseExpression() API call, as it will save you a lot of headaches, including this one… use the other API calls of the SyntaxFactory only when you don’t have the expression string…