Syntax trees are made up of three things: Syntax Nodes, Syntax Tokens and Trivia.
The Roslyn documentation describes Syntax Nodes and Syntax Tokens as follows:
Syntax nodes are one of the primary elements of syntax trees. These nodes represent syntactic constructs such as declarations, statements, clauses, and expressions. Each category of syntax nodes is represented by a separate class derived from SyntaxNode.
Syntax tokens are the terminals of the language grammar, representing the smallest syntactic fragments of the code. They are never parents of other nodes or tokens. Syntax tokens consist of keywords, identifiers, literals, and punctuation.
While both definitions are accurate, they don’t give newcomers much insight on the difference between the two.
Let’s take a look at the following class and its Syntax Tree.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
class SimpleClass | |
{ | |
public void SimpleMethod() | |
{ | |
} | |
} |
Using Roslyn’s Syntax Visualizer, we can take a peek at the syntax tree:
The Syntax Visualizer shows Syntax Nodes in blue and Syntax Tokens in green.
Syntax Nodes:
ClassDeclaration
MethodDeclaration
ParamteterList
Block
Syntax Tokens:
class
SimpleClass
Punctuation
void
SimpleMethod
Syntax Tokens cannot be broken into simpler pieces. They are the atomic units that make up a C# program. They are the leaves of a syntax tree. They always have a parent Syntax Node (as their parent cannot be a Syntax Token).
Syntax Nodes, on the other hand, are combinations of other Syntax Nodes and Syntax Tokens. They can always be broken into smaller pieces. In my experience, you’re most interested in Syntax Nodes when trying to reason about a syntax tree.
Great article and series overall!!! For completeness, I also add the Trivia definition from the Roslyn doc:
Syntax trivia represent the parts of the source text that are largely insignificant for normal understanding of the code, such as whitespace, comments, and preprocessor directives.