Static Function Property Analysis

fundamental

Short description: This project proposes to add static analysis for verifying annotated properties of functions within clang. The analysis of annotated properties should be general enough for verifying reentrancy of functions, among other properties. This project was motivated by the necessity of a tool to verify realtime safe code for use with RT-audio and while static analysis will not find all errors, it should identify many common ones.

Additional info: http://lists.cs.uiuc.edu/pipermail/cfe-dev/2012...

Proposal

Ensuring that a function is safe to call within a restricted environment is a task fraught with bugs and simple mistakes. In order to make this process easier, I propose the addition of a general function property analysis to the clang static analyzer. Reentrant code provides one common functional restriction. It is very easy to break reentrant code by calling a non-reentrant function in a non-obvious manner. In the below example marked_fn() is explicitly annotated with the information that it should be a reentrant function. The intermediate function, called_fn(), has no annotation, nor is one needed as the analyzer can deduce whether or not it is reentrant. Lastly an external function, marked_fn(), is explicitly non-reentrant.

void marked_fn(void) __attribute__(annotate("reentrant"));
void called_fn(void);
void bad_fn(void) __attribute__(annotate("non-reentrant"));

void marked_fn(void)
{
    called_fn();
}

void called_fn(void)
{
    bad_fn();
}

The analyzer should be able to indicate to the user that marked_fn() cannot be reentrant due to bad_fn(). While this problem is not very difficult when looking at the trivial case given, a number of complications are generated with:

  • Library code

  • Different translation units

  • Language Complexities

Libraries form the case where it is not practical to directly annotate functions, therefore an external means of providing properties for functions must be utilized. The mechanism would be an additional annotation file provided to the compiler at the time of the analysis via a "--add-annotation-file" option. As these annotation files may not cover all the possible functions within a  library an additional assumption file can be provided to allow to give unlabeled external functions a default set of properties. This information can be provided with "--add-assumption-file".

Different translation units are another difficulty that exists in working on this project. Currently, the clang static analyzer does not provide any simple means of providing information across translation units. In the previous example, if called_fn() was in a different translation unit than marked_fn(), detailed information on its contents could not be used within the static analyzer when evaluating marked_fn(). Without this functionality, a large number of functions would need to receive extraneous annotations in source.

In performing the analysis, some complications that C/C++ introduces will be overlooked in the name of having a complete project at the end of GSoC. In particular templates have a history of generating difficulty for analysis and as such they will be excluded from this proposal. Function pointers will likely have some support in this analysis, however it will be fairly minimal compared to the purely functional analysis. In total, C support should be robust within this project and only basic C++ support will be in place.

Why would this be useful?

Users of this feature

Users can make use of this analysis to verify that their code adheres to fairly abstract properties. Some of these are:

  1. Security - it can be used to audit what functions can possibly be called from some entry point (eg verifying that no network functions are accessed)

  2. Reentrant Code Checks - it could verify that no global state is accessed from functions (or variables with an extension to this proposal)

  3. Realtime Safety - it can check what system/library calls are made to verify that latency goals can be meet.

Realtime safety is the primary motivation for me, after seeing some of the issues that exist within the Linux audio community. The primary issue is that it is very easy to mistakenly create non-realtime safe code and it is possible for this mistake to go unnoticed for a large period of time. Some simple mistakes that can be found are calls to locking mutexes, calls to memory allocation, IO, and other blocking functions. With static analysis many of these issues can be found and taken care of. This can also be applied to legacy code to allow for an incremental upgrade from an unsafe system to one that complies with the new standard.

Clang Developers

In order to implement this proposed project, clang’s static analyzer must have some means of translating information across translation units. Currently this functionality is not present in clang and its addition will open up a number of avenues for static analysis in the future. As identifying the call graph is a very important element of this inspection, the rough call graph support is likely to receive improvements in the course of this project.

The Details

This code will be implemented within the standard analyzer that clang provides and it should be able to simply integrate with the existing scan-build tool that hijacks the build process for analysis. It is estimated that this project will take the full three months of the GSoC program.

Validating Functions

The function validation process would be a two stage process of gathering information on the source code and then checking the call graph for contradictions.

The call graph must be constructed across every translation unit. For each function this will require storing at least:

  • What function(s) it calls

  • Debugging information sufficient to print errors

  • The explicit or implicit annotated properties. This should be derived from the assumption file, explicit annotations and library function annotation files (this may require some additional information to be stored within the static analyzer)

To find erroneous properties, it should be sufficient to propagate information on functions that have a known property or violation of a property until no further deductions can be made. Any edge case that cannot be deduced can easily be worked around by users through extra explicit annotations using provided diagnostic information.

Contact Information

  • email: mark DOT d DOT mccurry AT gmail DOT com

  • irc: fundamental on freenode

Prior Experience

I currently am finishing up my Undergraduate degrees at Clarkson University. This May I will obtain a BS in Computer Engineering and a BS in Electrical Engineering. For the last six years I have become progressively more involved with open source software. First I started out with helping people set up systems in forums. More recently I have moved into sending patches to developers and helping with the maintenance of open source projects. My skills in C and C++ have assisted me greatly in collaborating with the open source community and patches of mine have worked their way into faust, clang, asciidoc, and libmapper. I also maintain a software synthesizer, ZynAddSubFX. At this point I have about 4 years of experience with C++ and about 5 years with C. Outside of open source work, I have spent some time developing code for embedded systems such as the atmel AVR, ARM processors, and even my own FPGA CPU.

My experience has not been restricted to working at my current university, this past summer I worked at the MIT Haystack programming GPUs to with digital signal processing code. This work resulted in a conference presentation, "GPU Accelerated Processing for VLBI Digital Backends," at the National Radio Science Meeting.

My experience with llvm is somewhat limited. I have read about the internal assembly, but I have not written code that utilizes it. I have lurked on the clang mailing list watching the development, but I have not had a reason to join the dialog before now. I have however fixed one bug which prevented clang from linking code on 32-bit Slackware machines (http://llvm.org/bugs/show_bug.cgi?id=9731).