Scanning incomplete C# projects for vulnerable code patterns

At Securify Inline, we are always looking for smart techniques to support our security code review process. We want to find certain insecure patterns in our client's code, using static analysis. For our C# projects, we developed a custom scanner, based on the Roslyn parser. This article discusses what problems we ran in to and how we use Roslyn to find potentially vulnerable patterns in code.

Hooking into the compiler

Roslyn, the nickname for the .NET Compiler Platform, already provides the possibility to run custom checks during compile time. The checks in Security Code Scan use this, for example. During compilation, information about the syntax and semantics of source code is passed to the security check, which can then generate build warnings or errors where applicable.

Performing security checks with information from the compiler works great. Creating a syntax tree and a semantic model is something the compiler does anyway, and passing it to a checker is not much extra work. The check sees the code exactly the same as the compiler sees it, which is not always the case when using other tools to parse the code. The checks have much information available to them to determine what is happening in the code.

However, to compile a project, you need to have a buildable project, with all dependencies, resources, and access to NuGet repositories. For most projects, we don't have this. We get access to a certain set of source code repositories. Enough to review the important business logic, but not enough to completely build the project. We tried several things to solve this.

Mock missing parts

First, we tried creating mocks for classes that we were missing. If a project would call into SomeWidget.Create(), we created an empty SomeWidget class with a Create method that does nothing. Doing this mainly consisted of pressing Alt-Enter in the IDE to perform auto-suggest actions.

This worked for a while, but it had a couple of disadvantages. First, we needed to modify the client's code, which gave merge conflicts. We would add a reference to SomeWidget in their project file, the developers would then edit the project file with some unrelated change, and these edits could not always be merged automatically.

Second, if the client added or changed a dependency, we would need to create new mock classes. Since we did not control how often this happened or how much work this would require, this introduced quite some risk for our planning. Because of this, we decided to pull the plug on this method.

It could still be a promising solution, if it can be automated. Creating mocks was repetitive work, greatly assisted by the IDE, which hints that it can be automated. We did not attempt this, and went with a different solution.

Faking semantic information

The checks in Security Code Scan expect information about the syntax of the code, and semantic information about what all symbols mean. Since we have trouble compiling the project, we don't have the semantic information. However, with Roslyn it's pretty easy to parse a C# file and get the syntax tree. Can we pass this syntax tree to the checks in Security Code Scan, and sufficiently fake the semantic information so that the check still gives relevant results?

It turns out this is pretty hard. The Roslyn API that is used during compilation is not really suited for extension without a compilation. And the checks in Security Code Scan expect the API to be pretty sure about its results. The abstraction layer would basically need to do symbol resolution itself, which is a big task. So this didn't work out.

Performing checks on syntax

In the end we decided to give up on semantic information, and base our security checks on syntax alone. We get a syntax tree by calling CSharpSyntaxTree.ParseText, and then inspect that tree with a CSharpSyntaxWalker. We can search for certain patterns by writing our own syntax walkers that go over the abstract syntax tree. Checks are invoked like this:

var contents = File.ReadAllText(filename);
var tree = CSharpSyntaxTree.ParseText(contents);
var root = tree.GetCompilationUnitRoot();
check.Visit(root);

Below is an example check, that retrieves all attributes on a method and alerts if a HttpPost attribute is present without a ValidateAntiForgeryToken attribute:

class HttpPostHasAntiForgeryToken : SecurityCheck
{
    public override void VisitMethodDeclaration(MethodDeclarationSyntax node)
    {
        var attributes = node.AttributeLists.SelectMany(l => l.Attributes);
        var names = attributes.Select(a => a.Name.ToString()).ToImmutableList();
        if (names.Contains("HttpPost") && !names.Contains("ValidateAntiForgeryToken"))
        {
            CreateIssue(node, "method has HttpPost but not ValidateAntiForgeryToken");
        }
        base.VisitMethodDeclaration(node);
    }
}

As you can see, this code retrieves all attribute names of each method. If the method contains a [HttpPost] attribute but not a [ValidateAntiForgeryToken] attribute, it creates a new issue. Issues are either written to standard out, or to a SARIF file.

Finding vulnerable patterns using a syntax visitor works well. It is fast, and once familiar with the syntax tree, checks are pretty easy to write. Since we don't have semantic information, we check strings for certain names. This is a little bit ugly, but the best we can do with syntax information only. Another disadvantage is that we cannot search for patterns across files.

Conclusion

We created a solution to find patterns in C# code, without building the project. It has already found vulnerabilities that we reported to our clients.

As soon as Semgrep supports C# code, we want to switch to Semgrep. It has similar capabilities, but creating rules is easier.

Vragen of feedback?