LRN Quick Tips – Working with nameof

The nameof operator has gone through five iterations as the Roslyn team worked to nail down its syntax and semantics. Now that the design of the nameof operator has been finalized, we can look at some simple examples.

Within C#, nameof is a contextual keyword. This means there is no way to distinguish the nameof keyword from a call to a method that happens to be named nameof.

Lucian Wischik elaborates:

In C#, nameof is stored in a normal InvocationExpressionSyntax node with a single argument. That is because in C# ‘nameof’ is a contextual keyword, which will only become the “nameof” operator if it doesn’t already bind to a programmatic symbol named “nameof”

Identifying nameof Expressions

This means we can only identify nameof expressions at the semantic level. We do so by finding all invocations to “nameof” that do not bind to any symbol. These invocations must also be standalone (ie. not part of a member access like MyClass.nameof())

As with all contextual keywords, it’s a bit of a pain to work with. But that’s the price we pay for backwards compatability.

Creating a nameof expression

Previous versions of the Roslyn API allowed for direct creation of a NameOfExpressionSyntax. Now we must create an InvocationExpressionSyntax with the identifer “nameof”.

For example, we can generate the following nameof expression:

string result = nameof(result);

Side Note: As a mere mortal, I have no idea how to work with the SyntaxFactory API. I generated the above code with Kirill Osenkov’s fantastic Roslyn Quoter tool.

The important takeaway is that (at the syntax level) nameof expressions are no different than regular invocations.

Learn Roslyn Now: Part 8 Data Flow Analysis

Writing this blog post has been really painful. It’s been three months since I last published my introduction to the semantic model and I’ve been putting off this post for as long as I could. I started a new series called Learn Roslyn Now Quick Tips, I helped build Source Browser, and I even submitted a small pull request to clean up the analysis APIs. Basically, I’ve done everything but learn and write about these APIs.

The two reasons I’ve struggled to write about AnalyzeControlFlow and AnalyzeDataFlow are:

  1. I’ve struggled to imagine how one would use them in an analyzer or extension.
  2. They’re weird, unintuitive and they frighten me.

I put out a tweet asking how others were using them, and it appears they’re only really used within Microsoft to implement the “Extract Method” functionality. A handful of questions on Stack Overflow have mentioned these APIs, so I’m sure someone out there is putting them to good use.

Data Flow Analysis

This API can be used to inspect how variables are read and written within a given block of code. Perhaps you’d like to make a Visual Studio extension that captures and logs all assignments to a certain variable. You could use the data flow analysis API to find the statements, and a rewriter to log them.

To demonstrate the capabilities of this API, we’ll be looking at a modified piece of code posted on Stack Overflow. I’ve cleaned it up slightly, but it shows a number of interesting behaviors consumers of this API should be aware of.

We can analyze the for-loop in the following code:

At this point we’ve got access to a DataFlowAnalysis object.

Perhaps the most important property on this object is Succeeded. This tells you if the data flow analysis completed successfully. In my experience the API has been pretty good at dealing with semantically invalid code. Neither invocations to missing methods nor use of undeclared variables seemed to trip it up. The documentation notes that if the analyzed region does not span a single expression or statement then analysis is likely to fail.

The DataFlowAnalysis object exposes a pretty rich API for uses to consume. It exposes information about unsafe addresses, local variables captured by anonymous methods and much more.

In our case, we’re interested in the following properties:

To refresh, the code on which we’ve analyzed is displayed below. The region we’ve declared interest in is the for-loop.

The results from analysis are as follows:

AlwaysAssigned: index
index is always assigned to as it is contained within the initializer of the for-loop, which runs unconditionally.

WrittenInside: index, innerArray
Both index and innerArray are clearly written within the loop.

One important point is that outerArray is not. While we’re mutating the array, we’re not mutating the reference contained within the outerArray variable. Therefore it does not show up in this list.

WrittenOutside: outerArray, this
outerArray is clearly written to outside of the for-loop.

However, it surprised me that this showed up as a parameter symbol within the WrittenOutside list. It appears as though this is passed as a parameter to the class and its member, which means that it shows up here as well. This appears to be by design, although I suspect most consumers of this API will be surprised, and likely ignore this value.

ReadInside: index, outerArray
It is clear that the value of index is read within the loop.

It was surprising to me that outerArray is considered to be “read” inside the loop as we’re not reading its value directly. I suppose that technically we must first read the value of outerArray in order to calculate the offset and retrieve the correct address for the given element of the array. So we’re performing a sort of “implicit read” inside the loop here.

VariablesDeclared: index, innerArray
This is fairly straightforward. index is declared within the loop initializer and innerArray within the body of the for-loop.

Final Thoughts

The general weirdness of the data flow analysis API has long kept me from writing about it. The issues with this and what’s considered a read vs. a write is pretty offputting to me. I suspect these kinds of issues will prevent a lot of people from taking advantage of this API, but I could be wrong. It’s difficult to say this early in the game and I have not seen very much discussion about this API and the above problems.

 

LRN: Quick Tips – Fields and Symbols

One recurring problem I’ve seen people run into with Roslyn is working with fields and symbols. Consider the following:

The above program consists of a ClassDecarationSyntax with child FieldDeclarationSyntax, PropertyDeclarationSyntax and MethodDeclarationSyntax.

In previous blog posts, we discussed how we could use SemanticModel.GetDeclaredSymbol(SyntaxNode) to retrieve the symbol for pieces of declaration syntax. So it would make sense if we could get the symbols for our field, property and method with the same approach.

Typically one would try the following:

However, there’s a problem here. fieldSymbol is null! Our approach worked for methods and properties, but didn’t for fields. The reason for this is actually quite simple:

Fields can contain multiple symbols.

For example:

This is even clearer when we look at the syntax tree (I’ve omitted tokens and trivia).

fieldTree

What symbol could be returned for the above FieldDeclarationSyntax? In order to access these symbols we instead look at the individual variables within the field as shown below:

It turns out fields are not the only “special syntax” that cannot be converted into a symbol. If you’re interesting, you can see them all online on the Roslyn Reference Source. They are:

  • Global Statements – Global statements don’t declare anything, even though they inherit from MemberDeclarationSyntax.
  • IncompleteMembers – Incomplete members don’t declare any symbols.
  • Event Field Declaration – Can contain multiple variable declarators. GetDeclaredSymbol should be called on them (the declarators) directly.
  • Field Declaration – Can contain multiple variable declarators. GetDeclaredSymbol should be called on them (the declarators) directly.

This bit me in August and I submitted an Issue to the Roslyn team about this. I originally thought an exception should be thrown in these cases, but I’ve since changed my mind. Instead, I think there needs to be clearer documentation on the GetDeclaredSymbol() function. It also might be appropriate for someone to create an analyzer that detects when people do this and warn them.

LRN: Quick Tips – Working with Regions

Structured trivia are an interesting corner case within Roslyn. Take a look at the following syntax tree representing a simple program with regions.

syntaxTreeRegions

Typically the terminals of a Roslyn syntax tree are pieces of trivia. However, in the above program we can see that a RegionDirectiveTrivia has a child RegionDirectiveTriviaSyntax syntax node.

Well, why don’t we try getting access to these syntax nodes? Originally, I tried the following:

Oddly enough (to me, at least), neither of these approaches worked and both returned nothing. They also didn’t throw or indicate any error on my part. Thankfully @Pilchie pointed out that DescendantNodes has a couple of optional arguments.

  • Func<SyntaxNode, bool> descendIntoChildren -  A function that decides whether or not we should descend into the children of a given node. We could use this to avoid descending into nodes we know we don’t care about.
  • bool descendIntoTrivia – A simple boolean that signals if we’d like to descend into the children of trivia when searching for nodes.

In our case, we’d like to search for syntax nodes within trivia, so we’ll signal that.

This time, everything works as we’d expect and we get access to our RegionDirectiveTriviaSyntax node as expected.

So why does Roslyn avoid descending into trivia by default? My guess is performance. Most consumers of Roslyn will be looking for nodes such as methods, properties, and fields. None of these are contained within syntax trivia so it would be a waste of CPU cycles to consider their children.

This is something to be aware of when working with structured trivia and something that’s not immediately obvious.

Introducing Source Browser

I’m happy to announce the release of a project we’ve been working on at Code Connect called Source Browser.

Source Browser is based off of Kirill Osenkov’s Reference Source tool. The basic idea is that we can take C# files and generate static HTML that is linkable and searchable.

Want to see what that method invocation does? Just click it.
Want to see where that property was defined? Just click it.

It’s a simple concept, but it makes it an absolute joy to explore third party libraries your software depends upon. At Code Connect we use the Roslyn Reference Source on a daily basis.

Source Browser currently supports open source C# and VB.Net projects hosted on GitHub. Just provide the link to your GitHub repository to get started.

Further Development

We built Source Browser with more than C# and VB .Net in mind. We’re hoping to add support for other statically typed languages such as F# and C++. We’ve worked on an intermediate model that we believe most static languages can theoretically fit into. This means once a source  file for a given language is converted to this model, Source Browser is language agnostic. We’re hoping individuals will work with us to create”converters” for other languages.

Special Thanks

We’re especially grateful to the following individuals who helped commit code to Source Browser.  Their advice and guidance has helped shape this tool.

Code Connect for Visual Studio 2015

I’m proud to announce that Code Connect now supports Visual Studio 14 CTP 4.

Download for VS 2015

With this release comes a number of improvements:

  • Unlimited project size
    We’ve used Code Connect on projects spanning millions of lines of code
  • Reduced loading time
    Code Connect starts in just seconds
  • A responsive and clean user-interface
    Hold <ALT> to highlight the relationships between nodes

If you’re using Visual Studio 2013, we haven’t forgotten about you! You can enjoy the above improvements in Code Connect for Visual Studio 2013 as well!

 

Learn Roslyn Now: Part 7 Introducing the Semantic Model

Up until this point we’ve been working with C# code on a purely syntactical level. We can find property declarations, but we can’t track down references to this property within our source code. We can identify invocations, but we can’t tell what’s being invoked. And God help us if we want to try to solve the really hard problems like overload resolution.

In this developer’s opinion, the semantic layer is where the power of Roslyn really shines. Roslyn’s semantic model can answer all the hard compile-time questions we might have. However, this power comes at a cost. Querying the semantic model is typically more expensive than querying syntax trees. This is because requesting a semantic model often triggers a compilation.

There are 3 different ways to request the semantic model:

1. Document.GetSemanticModel()
2. Compilation.GetSemanticModel(SyntaxTree)
3. Various Diagnostic AnalysisContexts including CodeBlockStartAnalysisContext.SemanticModel and SemanticModelAnalysisContext.SemanticModel

To avoid the boiler plate involved in setting up our own Workspace, we’ll simply create compilations for individual syntax trees as follows:

Symbols

Before continuing, it’s worth taking a moment to discuss Symbols.

C# programs are comprised of unique elements, such as types, methods, properties and so on. Symbols represent most everything the compiler knows about each of these unique elements.

At a high level, every symbol contains information about:

  • Where this elements is declared in source or metadata (It may have come from an external assembly)
  • What namespace and type this symbol exists within
  • Various truths about the symbol being abstract, static, sealed etc.
  • More information may be found in ISymbol.

Other, more context-dependent information may also be uncovered. When dealing with methods, IMethodSymbol allows us to determine:

  • Whether the method hides a base method.
  • The symbol representing the return type of the method.
  • The extension method from which this symbol was reduced.

Requesting Symbols

The semantic model is our bridge between the world of syntax and the world of symbols.

SemanticModel.GetDeclaredSymbol() accepts declaration syntax and provides the corresponding symbol.

SemanticModel.GetSymbolInfo() accepts expression syntax  (eg. InvocationExpressionSyntax) and returns a symbol. If the model could not successfully resolve a symbol, it provides candidate symbols which can serve as best guesses.

Below, we retrieve the symbol for a method via it’s declaration syntax. We then retrieve the same symbol, but via an invocation (InvocationExpressionSyntax) instead.

Note on performance:

The documentation for SemanticNode notes the following:

An instance of SemanticModel caches local symbols and semantic information. Thus, it is much more efficient to use a single instance of SemanticModel when asking multiple questions about a syntax tree, because information from the first question may be reused. This also means that holding onto an instance of SemanticModel for a long time may keep a significant amount of memory from being garbage collected.

Essentially, Roslyn is allowing you to make the tradeoff between memory and computation. When querying the  semantic model repetitively, it may be in your best interest to keep an instance of it around, instead of requesting a new model from a compilation or document.

Next Time

We’ve only scratched the surface of the Semantic Model. Next time we’ll take a look at the control and data flow analysis APIs.

Learn Roslyn Now: Part 6 Working with Workspaces

Special thanks to @JasonMalinowski for his help clarifying some of the subtleties of the workspace API. Until this point, we’ve simply been constructing syntax trees from strings. This approach works well when creating short samples, but often we’d like to work with entire solutions. Enter: Workspaces. Workspaces are the root node of a C# hierarchy that consists of a solution, child projects and child documents. A fundamental tenet within Roslyn is that most objects are immutable. This means we can’t hold on to a reference to a solution and expect it to be up-to-date forever. The moment a change is made, this solution will be out of date and a new, updated solution will have been created. Workspaces are our root node. Unlike solutions, projects and documents, they won’t become invalid and always contain a reference to the current, most up-to-date solution. There are four Workspace variants to consider:

Workspace

The abstract base class for all other workspaces. It’s a little disingenuous to claim that it’s a workspace variant, as you’ll never actually have an instance of it. Instead, this class serves as a sort of API around which actual workspace implementations can be created. It can be tempting to think of workspaces solely within the context of Visual Studio. After all, for most C# developers this is the only way we’ve dealt with solutions and projects. However, Workspace is meant to be agnostic as to the physical source of the files it represents. Individual implementations might store the files on the local filesystem, within a database, or even on a remote machine. One simply inherits from this class and overrides Workspace’s empty implementations as they see fit.

MSBuildWorkspace

A workspace that has been built to handle MSBuild solution (.sln) and project (.csproj, .vbproj) files. Unfortunately it cannot currently write to .sln files, which means we can’t use it to add projects or create new solutions.

The following example shows how we can iterate over all the documents in a solution:

For more information see Learn Roslyn Now – E06 – MSBuildWorkspace.

AdhocWorkspace

A workspace that allows one to add solution and project files manually. One should note that the API for adding and removing solution items is different within AdhocWorkspace when compared to the other workspaces. Instead of calling TryApplyChanges(), methods for adding projects and documents are provided at the workspace level. This workspace is meant to be consumed by those who just need a quick and easy way to create a workspace and add projects and documents to it.

For more information see Learn Roslyn Now – E08 – AdhocWorkspace

VisualStudioWorkspace

The active workspace consumed within Visual Studio packages. As this workspace is tightly integrated with Visual Studio, it’s difficult to provide a small example on how to use this workspace. Steps:

  1. Create a new VSPackage.
  2. Add a reference to the Microsoft.VisualStudio.LanguageServices.dll. It’s now available on NuGet.
  3. Navigate to the <VSPackageName>Package.cs file (where <VSPackageName> is the name you chose for your solution.
  4. Find the Initalize() method.
  5. Place the following code within Initialize()

When writing VSPackages, one of the most useful pieces of functionality exposed by the workspace is the WorkspaceChanged event. This event allows our VSPackage to respond to any changes made by the user or any other VSPackage. Naturally, the best way to familiarize oneself with workspaces is to use them. Roslyn’s immutability can impose a slight learning curve so we’ll be exploring how to modify documents and projects in future posts.

For more information see Learn Roslyn Now – E07 – Visual StudioWorkspace

Code Connect Alpha Announced

We’re pleased to announce September 2, 2014 as the release date of the Code Connect Alpha.

We’ve released a video that covers some of the features you’ll see in Code Connect next week.

It’s important to stress that Code Connect is far from complete at this point. There remains a lot of work to be done when working with large solutions and undoubtedly many of you will uncover bugs.

In particular, Code Connect currently struggles with large solutions. Our current implementation naively pre-loads the entire solution, which can take considerable time when dealing with projects consisting of more than 200 C# files.

There also remains a lot of work to be done when it comes to the user interface and overall polish. You will continue to see improvements in this area during future releases of Code Connect.

One other point of note: Code Connect requires a copy of Visual Studio 2013 running Microsoft’s Roslyn compiler. We’ve compiled step-by-step instructions for installing Roslyn at: joshvarty.wordpress.com/2014/07/06/learn-roslyn-now-part-1-installing-roslyn/

As Microsoft’s Roslyn compiler has not yet reached a stable release, we would recommend against using it in a production environment.

We’re excited to bring the Code Connect experience to life and we’re looking forward to hearing your feedback.

Learn Roslyn Now: Part 5 CSharpSyntaxRewriter

In Part 4, we discussed the abstract CSharpSyntaxWalker and how we could navigate the syntax tree with the visitor pattern. Today, we go one step further with the CSharpSyntaxRewriter, and “modify” the syntax tree as we traverse it. It’s important to note that we’re not actually mutating the original syntax tree, as Roslyn’s syntax trees are immutable. Instead, the CSharpSyntaxRewriter creates a new syntax tree resulting from our changes.

The CSharpSyntaxRewriter can visit all nodes, tokens or trivia within a syntax tree. Like the CSharpSyntaxVisitor, we can selectively choose what pieces of syntax we’d like to visit. We do this by overriding various methods and returning one of the following:

  • The original, unchanged node, token or trivia.
  • Null, signalling the node, token or trivia is to be removed.
  • A new syntax node, token or trivia.

As with most APIs, the CSharpSyntaxRewriter is best understood through examples. A recent question on Stack Overflow asked How can I remove redundant semicolons in code with SyntaxRewriter?

Roslyn treats all redundant semicolons as part of an EmptyStatementSyntax node. Below, we demonstrate how to solve the base case: an unnecessary semicolon on a line of its own.

The output of this program produces a simple program without any redundant semicolons.

However, odulkanberoglu points out some problems with this approach. When either leading or trailing trivia is present, this trivia is removed. This means, comments above and below the semicolon will be stripped out.

svick has a pretty clever workaround. By constructing an EmptyStatementSyntax with a missing token instead of a semicolon, we can manage to remove the semicolon from the original tree. His approach is demonstrated below:

The output of this approach is:

This approach has the side effect of leaving a blank line wherever there was a redundant semicolon. That being said, I think it’s probably worth the trade-off as there doesn’t seem to be a way to retain trivia otherwise. Ultimately, the trivia can only be retained by attaching it to a node, and then returning that node.

An aside: I suspect this will be the de facto approach to removing any syntax nodes in the future. It’s highly likely that any syntax node one might wish to remove might have associated comment trivia. The only way to remove the node while retaining the trivia is to construct a replacement node. The best candidate for replacement will likely be an EmptyStatementSyntax with a missing semicolon.

This might also indicate a limitation with the CSharpSyntaxRewriter. It seems like it should be easier to remove nodes, while retaining their trivia.