LRN Quick Tip: PCL References and MSBuildWorkspace

2015/07/23 Edit: All of these problems should now be fixed in the latest Roslyn NuGet packages

We first looked at MSBuildWorkspace in Part 6 Working with Workspaces. MSBuildWorkpace works really well when loading up solutions from .sln files. It properly understands .csproj files so we don’t have to worry about tracking down references, documents,  or MSBuild targets.

However, when compiling solutions that contained Portable Class Libraries (PCLs) I had been continuously running into frustrating problems with missing references to System.Runtime.dll. For example I’d see a handful of errors like:

error CS0012: The type 'Object' is defined in an assembly that is not referenced.
You must add a reference to assembly 'System.Runtime, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a'.

I’d also seen this issue pop up within Visual Studio when working on XAML and when debugging.

After learning some more about PCLs, it turns out that PCLs must reference facade assemblies that delegate the actual work to proper assemblies later. For the full story on PCLs and facade references see: http://alxandr.me/2014/07/20/the-problems-with-portable-class-libraries-and-the-road-to-solving-them/

In our case we needed to add a reference to System.Runtime.dll. I won’t bother to show the code for this as it’s fraught with its own set of problems. Although it resolves our System.Object reference, we quickly run into other problems with other types such as System.Tasks.Task. Manually adding these references was definitely not going to scale.

The Workaround

Originally this problem was reported as a bug within MSBuildWorkspace. After all, why wouldn’t it resolve the references properly when we open the solution? Visual Studio obviously seemed to be capable of figuring the references out…

It turns out there’s a MSBuild property called CheckForSystemRuntimeDependency. And we can set this to true and all our PCL worries seem to go away.

Simply create your MSBuildWorkspace with:

That’s it. That’s all. And best of all the Roslyn team will be removing this property when Roslyn v1.0 ships.

More info may be found at: https://github.com/dotnet/roslyn/issues/560

Thanks to @Pilchie and @JasonMalinowski for helping out with this problem and for the workaround!

LRN Quick Tips: Don’t trust SyntaxNode.ToFullString()

At Code Connect we’ve been working hard on a project that rewrites a user’s code and then compiles the rewritten solution. Without giving too much away, it essentially acts as a logger and can intercept all variable assignments.

So the following code:

Is rewritten to:

At some point we decided we no longer wanted to inject the LogValue method directly into the code we were instrumenting. We would place it in an entirely different namespace: CodeConnect.Instrumentation.

Here was how we originally created the invocation to LogValue:

So we figured we’d incorporate our new namespace as follows:

At first glance, everything checked out. Calling .ToFullString() on this node revealed what looked like a proper invocation:

CodeConnect.Instrumentation.LogValue("x", 5);

However, try as we may we could not get our code to compile. We would constantly errors telling us the CodeConnect.Instrumentation type didn’t exist:

CS0103 at line 12 (character 19): The name 'CodeConnect.Instrumentation' does not exist in the current context.

We checked our references, we checked our rewritten solution and we checked the spelling at least a dozen times. Everything checked out. Eventually we did the classic “Which change broken this functionality?” walk through out Git history. (Note: If I had written more complete unit tests when I first wrote the rewriter, we would have caught this a lot quicker!)

We realized it was the change to the rewriter that introduced this problem. It took some time before we realized exactly what was wrong. Once again the ever-truthful Syntax Visualizer (now packaged with the .NET Compiler Tools SDK) came to the rescue and showed us what the syntax tree for a valid invocation of  CodeConnect.Instrumentation.LogValue("x", 5); would look like:

graph

After looking over the above tree, we realized we were supposed to create a chain of SimpleMemberAccessExpressions, not a single InvocationExpression with the name “CodeConnect.Instrumentation.LogValue”. When the binder when to bind this invocation to a symbol, it failed as no method can be declared with this name. The tree we had created was invalid and could never have been parsed out of a user’s source code.

This is a key point to understand when creating or rewriting syntax trees:

No one will stop you from creating invalid syntax trees.

Or from Kevin Pilch-Bisson on Stack Overflow:

> The Roslyn Syntax construction API does not guarantee that you can only build valid programs.

We can take this to its logical extreme and create trees that don’t make any sense. For example, we can create WhitespaceTrivia whose value is “public void M() { }” .

The above prints the following misleading string:

class test{
public void M() {

}}

The output is misleading as it causes the reader to make assumptions about how the syntax tree must be formed. In my experience the only true arbiter of truth when it comes to syntax tree is the Roslyn Syntax Visualizer. I’d love for it to be expanded to visualize in memory trees while debugging.

The takeaways here:

  • Don’t trust .ToString() or .ToFullString()
  • Understand that you may accidentally generate invalid trees
  • Write tests

Learn Roslyn Now: Part 9 Control Flow Analysis

Control flow analysis is used to understand the various entry and exit points within a block of code and to answer questions about reachability. If we’re analyzing a method, we might be interested in all the points at which we can return out of the method. If we’re analyzing a for-loop, we  might be interested in all the places we break or continue.

We trigger control flow analysis via an extension method on the SemanticModel. This returns an instance of ControlFlowAnalysis to us that exposes the following properties:

  • EntryPoints – The set of statements inside the region that are the destination of branches outside the region.
  • ExitPoints – The set of statements inside a region that jump to locations outside the region.
  • EndPointIsReachable – Indicates whether a region completes normally. Returns true if and only if the end of the last statement is reachable or the entire region contains no statements.
  • StartPointIsReachable – Indicates whether a region can begin normally.
  • ReturnStatements – The set of returns statements within a region.
  • Succeeded – Returns true if and only if analysis was successful. Analysis can fail if the region does not properly span a single expression, a single statement, or a contiguous series of statements within the enclosing block.

Basic usage of the API:

Alternatively, we can specify two statements and analyze the statements between the two. The following example demonstrates this and the usage of EntryPoints:

In the above example, we see an example of a possible entry point label L3. To the best of my knowledge, labels are the only possible entry points.

Finally, we’ll take a look at answering questions about reachability. In the following, neither the start point or the end point is reachable:

Overall, the Control Flow API seems a lot more intuitive than the Data Flow Analysis API. It requires less knowledge of the C# specification and is straightforward to work with. At Code Connect, we’ve been using it when rewriting and logging methods. Although it looks like no one has experimented much with this API, I’m really interested to see what uses others will come up with.

LRN Quick Tips – Working with nameof

The nameof operator has gone through five iterations as the Roslyn team worked to nail down its syntax and semantics. Now that the design of the nameof operator has been finalized, we can look at some simple examples.

Within C#, nameof is a contextual keyword. This means there is no way to distinguish the nameof keyword from a call to a method that happens to be named nameof.

Lucian Wischik elaborates:

In C#, nameof is stored in a normal InvocationExpressionSyntax node with a single argument. That is because in C# ‘nameof’ is a contextual keyword, which will only become the “nameof” operator if it doesn’t already bind to a programmatic symbol named “nameof”

Identifying nameof Expressions

This means we can only identify nameof expressions at the semantic level. We do so by finding all invocations to “nameof” that do not bind to any symbol. These invocations must also be standalone (ie. not part of a member access like MyClass.nameof())

As with all contextual keywords, it’s a bit of a pain to work with. But that’s the price we pay for backwards compatability.

Creating a nameof expression

Previous versions of the Roslyn API allowed for direct creation of a NameOfExpressionSyntax. Now we must create an InvocationExpressionSyntax with the identifer “nameof”.

For example, we can generate the following nameof expression:

string result = nameof(result);

Side Note: As a mere mortal, I have no idea how to work with the SyntaxFactory API. I generated the above code with Kirill Osenkov’s fantastic Roslyn Quoter tool.

The important takeaway is that (at the syntax level) nameof expressions are no different than regular invocations.

Learn Roslyn Now: Part 8 Data Flow Analysis

Writing this blog post has been really painful. It’s been three months since I last published my introduction to the semantic model and I’ve been putting off this post for as long as I could. I started a new series called Learn Roslyn Now Quick Tips, I helped build Source Browser, and I even submitted a small pull request to clean up the analysis APIs. Basically, I’ve done everything but learn and write about these APIs.

The two reasons I’ve struggled to write about AnalyzeControlFlow and AnalyzeDataFlow are:

  1. I’ve struggled to imagine how one would use them in an analyzer or extension.
  2. They’re weird, unintuitive and they frighten me.

I put out a tweet asking how others were using them, and it appears they’re only really used within Microsoft to implement the “Extract Method” functionality. A handful of questions on Stack Overflow have mentioned these APIs, so I’m sure someone out there is putting them to good use.

Data Flow Analysis

This API can be used to inspect how variables are read and written within a given block of code. Perhaps you’d like to make a Visual Studio extension that captures and logs all assignments to a certain variable. You could use the data flow analysis API to find the statements, and a rewriter to log them.

To demonstrate the capabilities of this API, we’ll be looking at a modified piece of code posted on Stack Overflow. I’ve cleaned it up slightly, but it shows a number of interesting behaviors consumers of this API should be aware of.

We can analyze the for-loop in the following code:

At this point we’ve got access to a DataFlowAnalysis object.

Perhaps the most important property on this object is Succeeded. This tells you if the data flow analysis completed successfully. In my experience the API has been pretty good at dealing with semantically invalid code. Neither invocations to missing methods nor use of undeclared variables seemed to trip it up. The documentation notes that if the analyzed region does not span a single expression or statement then analysis is likely to fail.

The DataFlowAnalysis object exposes a pretty rich API for uses to consume. It exposes information about unsafe addresses, local variables captured by anonymous methods and much more.

In our case, we’re interested in the following properties:

To refresh, the code on which we’ve analyzed is displayed below. The region we’ve declared interest in is the for-loop.

The results from analysis are as follows:

AlwaysAssigned: index
index is always assigned to as it is contained within the initializer of the for-loop, which runs unconditionally.

WrittenInside: index, innerArray
Both index and innerArray are clearly written within the loop.

One important point is that outerArray is not. While we’re mutating the array, we’re not mutating the reference contained within the outerArray variable. Therefore it does not show up in this list.

WrittenOutside: outerArray, this
outerArray is clearly written to outside of the for-loop.

However, it surprised me that this showed up as a parameter symbol within the WrittenOutside list. It appears as though this is passed as a parameter to the class and its member, which means that it shows up here as well. This appears to be by design, although I suspect most consumers of this API will be surprised, and likely ignore this value.

ReadInside: index, outerArray
It is clear that the value of index is read within the loop.

It was surprising to me that outerArray is considered to be “read” inside the loop as we’re not reading its value directly. I suppose that technically we must first read the value of outerArray in order to calculate the offset and retrieve the correct address for the given element of the array. So we’re performing a sort of “implicit read” inside the loop here.

VariablesDeclared: index, innerArray
This is fairly straightforward. index is declared within the loop initializer and innerArray within the body of the for-loop.

Final Thoughts

The general weirdness of the data flow analysis API has long kept me from writing about it. The issues with this and what’s considered a read vs. a write is pretty offputting to me. I suspect these kinds of issues will prevent a lot of people from taking advantage of this API, but I could be wrong. It’s difficult to say this early in the game and I have not seen very much discussion about this API and the above problems.

 

LRN: Quick Tips – Fields and Symbols

One recurring problem I’ve seen people run into with Roslyn is working with fields and symbols. Consider the following:

The above program consists of a ClassDecarationSyntax with child FieldDeclarationSyntax, PropertyDeclarationSyntax and MethodDeclarationSyntax.

In previous blog posts, we discussed how we could use SemanticModel.GetDeclaredSymbol(SyntaxNode) to retrieve the symbol for pieces of declaration syntax. So it would make sense if we could get the symbols for our field, property and method with the same approach.

Typically one would try the following:

However, there’s a problem here. fieldSymbol is null! Our approach worked for methods and properties, but didn’t for fields. The reason for this is actually quite simple:

Fields can contain multiple symbols.

For example:

This is even clearer when we look at the syntax tree (I’ve omitted tokens and trivia).

fieldTree

What symbol could be returned for the above FieldDeclarationSyntax? In order to access these symbols we instead look at the individual variables within the field as shown below:

It turns out fields are not the only “special syntax” that cannot be converted into a symbol. If you’re interesting, you can see them all online on the Roslyn Reference Source. They are:

  • Global Statements – Global statements don’t declare anything, even though they inherit from MemberDeclarationSyntax.
  • IncompleteMembers – Incomplete members don’t declare any symbols.
  • Event Field Declaration – Can contain multiple variable declarators. GetDeclaredSymbol should be called on them (the declarators) directly.
  • Field Declaration – Can contain multiple variable declarators. GetDeclaredSymbol should be called on them (the declarators) directly.

This bit me in August and I submitted an Issue to the Roslyn team about this. I originally thought an exception should be thrown in these cases, but I’ve since changed my mind. Instead, I think there needs to be clearer documentation on the GetDeclaredSymbol() function. It also might be appropriate for someone to create an analyzer that detects when people do this and warn them.

LRN: Quick Tips – Working with Regions

Structured trivia are an interesting corner case within Roslyn. Take a look at the following syntax tree representing a simple program with regions.

syntaxTreeRegions

Typically the terminals of a Roslyn syntax tree are pieces of trivia. However, in the above program we can see that a RegionDirectiveTrivia has a child RegionDirectiveTriviaSyntax syntax node.

Well, why don’t we try getting access to these syntax nodes? Originally, I tried the following:

Oddly enough (to me, at least), neither of these approaches worked and both returned nothing. They also didn’t throw or indicate any error on my part. Thankfully @Pilchie pointed out that DescendantNodes has a couple of optional arguments.

  • Func<SyntaxNode, bool> descendIntoChildren -  A function that decides whether or not we should descend into the children of a given node. We could use this to avoid descending into nodes we know we don’t care about.
  • bool descendIntoTrivia – A simple boolean that signals if we’d like to descend into the children of trivia when searching for nodes.

In our case, we’d like to search for syntax nodes within trivia, so we’ll signal that.

This time, everything works as we’d expect and we get access to our RegionDirectiveTriviaSyntax node as expected.

So why does Roslyn avoid descending into trivia by default? My guess is performance. Most consumers of Roslyn will be looking for nodes such as methods, properties, and fields. None of these are contained within syntax trivia so it would be a waste of CPU cycles to consider their children.

This is something to be aware of when working with structured trivia and something that’s not immediately obvious.