Learn Roslyn Now: Part 4 CSharpSyntaxWalker

In Part 2: Analyzing Syntax Trees With LINQ, we explored different approaches to picking apart pieces of the syntax tree. This approach works well when you’re only interested in specific pieces of syntax (methods, classes, throw statement etc.) It’s great for singling out certain parts of the syntax tree for further investigation.

However, sometimes you’d like to operate on all nodes and tokens within a tree. Alternatively, the order in which you visit these nodes might be important. Perhaps you’re trying to convert C# into VB.Net. Or maybe you’d like to analyze a C# file and output a static HTML file with correct colorization. Both of these programs would require us to visit all nodes and tokens within a syntax tree in the correct order.

The abstract class CSharpSyntaxWalker allows us to construct our own syntax walker that can visit all nodes, tokens and trivia. We can simply inherit from CSharpSyntaxWalker and override the Visit() method to visit all nodes within the tree.


public class CustomWalker : CSharpSyntaxWalker
{
static int Tabs = 0;
public override void Visit(SyntaxNode node)
{
Tabs++;
var indents = new String('\t', Tabs);
Console.WriteLine(indents + node.Kind());
base.Visit(node);
Tabs;
}
}
static void Main(string[] args)
{
var tree = CSharpSyntaxTree.ParseText(@"
public class MyClass
{
public void MyMethod()
{
}
public void MyMethod(int n)
{
}
");
var walker = new CustomWalker();
walker.Visit(tree.GetRoot());
}

view raw

gistfile1.cs

hosted with ❤ by GitHub

This short sample contains an implementation of CSharpSyntaxWalker called CustomWalker. CustomWalker overrides the Visit() method and prints the type of the node being currently visited. It’s important to note that CustomWalker.Visit() also calls the base.Visit(SyntaxNode) method. This allows the CSharpSyntaxWalker to visit all the child nodes of the current node.

The output for this program:

output1
We can clearly see the various nodes of the syntax tree and their relationship with one another. There are two sibling MethodDeclarations who share the same parent ClassDeclaration.

This above example only visits the nodes of a syntax tree, but we can modify CustomWalker to visit tokens and trivia as well. The abstract class CSharpSyntaxWalker has a constructor that allows us to specify the depth with which we want to visit.

We can modify the above sample to print out the nodes and their corresponding tokens at each depth of the syntax tree.


public class DeeperWalker : CSharpSyntaxWalker
{
static int Tabs = 0;
//NOTE: Make sure you invoke the base constructor with
//the correct SyntaxWalkerDepth. Otherwise VisitToken()
//will never get run.
public DeeperWalker() : base(SyntaxWalkerDepth.Token)
{
}
public override void Visit(SyntaxNode node)
{
Tabs++;
var indents = new String('\t', Tabs);
Console.WriteLine(indents + node.Kind());
base.Visit(node);
Tabs;
}
public override void VisitToken(SyntaxToken token)
{
var indents = new String('\t', Tabs);
Console.WriteLine(indents + token);
base.VisitToken(token);
}
}

view raw

gistfile1.cs

hosted with ❤ by GitHub

Note: It’s important to pass the appropriate SyntaxWalkerDepth argument to CSharpSyntaxWalker. Otherwise, the overridden VisitToken() method is never called. Personally, I don’t think CSharpSyntaxWalker’s arguments should be optional. It was unclear to me that the most conservative depth would be walked when I was learning how to use this class.

The output when we use this CSharpSyntaxWalker:

output2

The previous sample and this one share the same syntax tree. The output contains the same syntax nodes, but we’ve added the corresponding syntax tokens for each node.

In the above examples, we’ve visited all nodes and all tokens within a syntax tree. However, sometimes we’d only like to visit certain nodes, but in the predefined order that the CSharpSyntaxWalker provides. Thankfully the API allows us to filter the nodes we’d like to visit based on their syntax.

Instead of visiting all nodes as we did in previous samples, the following only visits ClassDeclarationSyntax and MethodDeclarationSyntax nodes. It’s extremely simple, just printing out the concatenation of the class’ name with the method’s name.


public class ClassMethodWalker : CSharpSyntaxWalker
{
string className = String.Empty;
public override void VisitClassDeclaration(ClassDeclarationSyntax node)
{
className = node.Identifier.ToString();
base.VisitClassDeclaration(node);
}
public override void VisitMethodDeclaration(MethodDeclarationSyntax node)
{
string methodName = node.Identifier.ToString();
Console.WriteLine(className + '.' + methodName);
base.VisitMethodDeclaration(node);
}
}
static void Main(string[] args)
{
var tree = CSharpSyntaxTree.ParseText(@"
public class MyClass
{
public void MyMethod()
{
}
}
public class MyOtherClass
{
public void MyMethod(int n)
{
}
}
");
var walker = new ClassMethodWalker();
walker.Visit(tree.GetRoot());
}

view raw

gistfile1.cs

hosted with ❤ by GitHub

This sample simply outputs:
MyClass.MyMethod
MyOtherClass.MyMethod

The CSharpSyntaxWalker acts as a really great API for analyzing syntax trees. It allows one to accomplish a lot without resorting to using the semantic model and forcing a (possibly) expensive compilation. Whenever inspecting syntax trees and order is important, the CSharpSyntaxWalker is usually what you’re looking for.

9 thoughts on “Learn Roslyn Now: Part 4 CSharpSyntaxWalker

  1. I need to create a syntax tree for C rather than C# for an embedded system.
    Can I just use a class containing unsafe code to create the syntax tree?
    It will then be used for a chip design that executes C statements without the overhead of
    compiling to an ISA to run on a CPU.

    1. Not really. Roslyn is C# specific so there are bound to be C statements and expressions that you are unable to create with Roslyn. There are also many, many C# syntax trees that are not valid C trees.

      I suspect something like Clang is what you’re looking for: http://clang.llvm.org/

  2. Thanks Josh, I boils down to :
    I need the if/else, for, while, do blocks and assignments with operator precedence.
    Here’s what I am doing:
    1) Create a main() function containing my source code.
    2) Compile, run and debug.
    3) Compile text to build the AST.
    4) Use SyntaxWalker to get statement nodes.
    5) Use SyntaxWalker to monitor nodes and when a SimpleAssignmentExpression appears, the BinaryExpression sequence is the evaluation order according to operator precedence. Or I can use the tokens to convert infix to postfix RPN and use postfix evaluation.

    It is very hard to find AST operator precedence, but when I discovered that the AST itself is created based on the shunting yard algorithm then reverse engineering was straightforward.

    Next I will format the content for a pre-configured FPGA that has 3 embedded block memories(control, stack, and operands/data), an ALU, a multiplier, and some counters.

    The execution speed is close to a custom FPGA design, but it can be re-loaded quicker.

    I am not implementing a full C compiler, but rather a kind of computer that does C statements, calls, and assignments very efficiently.

    Thanks for the helpful Learn Roslyn Now: Syntax articles.

  3. Hey!!
    I am super pumped with your tutorials.
    I got a thing, where you have to reach the parameters of method like
    a = Sum(2,3);
    i have to reach to the 2 and 3 and check them if they are declared by the user or improvise it to diagnose.
    Could you help me out.
    Have we got any methods for that already

  4. Great tutorials! One suggestion, Tabs++ should appear after the node kind is written, immediately before base.Visit(node), so that tokens are properly indented under their parent nodes. The root node therefore will (correctly) have no indentation.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s