GSoC/GCI Archive
Google Summer of Code 2011

The LLVM Compiler Infrastructure

Web Page:

Mailing List:


The LLVM Project is a collection of modular and reusable compiler and toolchain technologies. Despite its name, LLVM has little to do with traditional virtual machines, though it does provide helpful libraries that can be used to build them.

LLVM began as a research project at the University of Illinois, with the goal of providing a modern, SSA-based compilation strategy capable of supporting both static and dynamic compilation of arbitrary programming languages. Since then, LLVM has grown to be an umbrella project consisting of a number of different subprojects, many of which are being used in production by a wide variety of commercial and open source projects as well as being widely used in academic research. Code in the LLVM project is licensed under the "UIUC" BSD-Style license.

The primary sub-projects of LLVM are:

  1. The LLVM Core libraries provide a modern source- and target-independent optimizer, along with code generation support for many popular CPUs (as well as some less common ones!) These libraries are built around a well specified code representation known as the LLVM intermediate representation ("LLVM IR"). The LLVM Core libraries are well documented, and it is particularly easy to invent your own language (or port an existing compiler) to use LLVM as an optimizer and code generator.

  2. Clang is an "LLVM native" C/C++/Objective-C compiler, which aims to deliver amazingly fast compiles (e.g. about 3x faster than GCC when compiling Objective-C code in a debug configuration), extremely useful error and warning messages and to provide a platform for building great source level tools. The Clang Static Analyzeris a tool automatically finds bugs in your code, and is a great example of the sort of tool that can be built using the Clang frontend as a library to parse C/C++ code.

  3. dragonegg and llvm-gcc 4.2 integrate the LLVM optimizers and code generator with the GCC 4.5 (which is GPL3) and GCC 4.2 (which is GPL2) parsers, respectively. This allows LLVM to compile Ada, Fortran, and other languages supported by the GCC compiler frontends, and provides high-fidelity drop-in compatibility with their respective versions of GCC.

  4. The LLDB project builds on libraries provided by LLVM and Clang to provide a great native debugger. It uses the Clang ASTs and expression parser, LLVM JIT, LLVM disassembler, etc so that it provides an experience that "just works". It is also blazing fast and much more memory efficient than GDB at loading symbols.

  5. The libc++ project provides a standard conformant and high-performance implementation of the C++ Standard Library, with an aim of supporting C++'0x when the standard is ratified.

  6. The compiler-rt project provides highly tuned implementations of the low-level code generator support routines like "__fixunsdfdi" and other calls generated when a target doesn't have a short sequence of native instructions to implement a core IR operation.

  7. The vmkit project is an implementation of the Java and .NET Virtual Machines that is built on LLVM technologies.

  8. The klee project implements a "symbolic virtual machine" which uses a theorem prover to try to evaluate all dynamic paths through a program, in an effort to find bugs and to prove properties of functions. A major feature of klee is that it can produce a testcase in the event that it detects a bug.

In addition to official subprojects of LLVM, there are a broad variety of other projects that use components of LLVM for various tasks. Through these external projects you can use LLVM to compile Ruby, Python, Haskell, Java, D, PHP, Pure, Lua, and a number of other languages. A major strength of LLVM is its versatility, flexibility, and reusability, which is why it is being used for such a wide variety of different tasks: everything from doing light-weight JIT compiles of embedded languages like Lua to compiling Fortran code for massive super computers.

As much as everything else, LLVM has a broad and friendly community of people who are interested in building great low-level tools. If you are interested in getting involved, a good first place is to skim the LLVM Blog and to sign up for the LLVM Developer mailing list.



  • Adaptive Compilation Framework for LLVM JIT Compiler One of the current drawbacks of the LLVM JIT is the lack of an adaptive compilation System. All the non-adaptive bits are already there in LLVM: optimizing compiler with the different types of instruction selectors, register allocators, preRA schedulers, etc. and a full set of optimizations changeable at runtime. What's left is a system that can keep track of and dynamically look-up the hotness of methods and re-compile with more expensive optimizations as the methods are executed over and over.
  • C++0x Lambda Functions for Clang Implement lambda functions as defined in the new C++ standard.
  • Fast JIT Code Generation for x86-64 Goal of this project is to implement a fast path through code generation for x86-64 that produces unoptimized code in a very short time. The codegen bypasses the more expensive backend passes altogether by directly emitting machine code that was prepared leveraging the existing TableGen instruction descriptions. This codegen will be useful for fast first-time compilation in an adaptive compilation scheme. The LLVM JIT benefits through an expected increase in speed while remaining portable.
  • PTX Back-End Code Generator This proposal outlines a summer project to finish the initial implementation of the PTX back-end within LLVM. The motivation for such a back-end is given, along with a concrete implementation plan, by a developer that is already familiar with, and has submitted contributions to, the current PTX back-end. This project would extend the range of influence of LLVM and let it enter the GPU compiler field with fully open-source code.
  • Segmented Stacks in LLVM Implement segmented stacks inside LLVM. Once this is implemented, instead of having to allocate a worst-case (large) amount of contiguous stack space to each thread, we'll be able to get each thread to allocate stack space in small atomic blocks, as and when more space is required.
  • Superoptimization for LLVM IR This project focuses on implementing superoptimization algorithms targeted at the LLVM IR. The project uses arbitrary LLVM bitcode as a training set to discover new peephole optimizations that can be later integrated into LLVM instruction simplify phase or as a separate peephole optimization pass.
  • Support for memory access transformations in Polly The proposed project aims at adding support for memory access transformations in Polly(Polyhedral optimization in LLVM). In many cases it would be great to change the pattern of memory access to obtain better data locality. This can remove dependences that would otherwise block transformations and it can allow LLVM to use registers to store such values.