An Apache Thrift Implementation for D

David Nadlinger

Short description: This project strives to provide a D implementation of the Apache Thrift framework. Thrift, originally developed for use at Facebook, is both a data serialization/RPC protocol and its reference implementation for a number of popular languages. Having such functionality readily available would be a first (and large) step towards making the compelling language that D is attractive to adopt for development in service-based architectures, and would also be usable as simple RPC/serialization scheme.

Introduction

Stability of the D programming language has been improving steadily during the last months, and there is promising development to been seen at the tooling front lately. But still there are some areas where D could improve dramatically with regards to enterprise development, mostly regarding the availability of libraries and ready-made solutions for common tasks.

This project strives to fill a specific part of this gap by providing a D implementation of the Apache Thrift framework1. Thrift, originally developed for internal use at Facebook, is both a data serialization/RPC protocol and its reference implementation, currently for a large number of languages including C++, Java, PHP and Python. In short, it works by defining a service/message interface in a language-agnostic definition file – supporting »soft« versioning to allow protocols to evolve naturally over time.

I believe that having such functionality readily available would be a first (and large) step towards making the compelling language that D is attractive to adopt for development in medium- to large-scale service-based architectures, while also being interesting for smaller projects just needing a flexible serialization or RPC scheme.

Implementation

Thrift consists of three parts: An interface definition language (IDL) for specifying an interface consisting of data types and services, a compiler for generating target language code out of these definitions, and target language support libraries which contain the actual serialization protocol/RPC implementation.

As the IDL is solely used to define the common interface, there will not be any changes required to it (besides the obvious addition of D to the language namespace list). Thus, the proposed project would consist of two main components:

First, a Thrift compiler generating D glue code from the definition file, with two conceivable ways of implementing it. In theory, it would seem possible to forgo using an external generator altogether and use D’s generative capabilities along with the import expression to generate the binding code on the fly. There are several issues with this approach to be considered, though:

  • While the Thrift IDL is a fairly small language, it would still take considerable effort to get a lexer/parser to be executable at compile time, both due to the current CTFE restrictions unlikely to be lifted in the near future, and due to the comparatively high amount of related compiler bugs.
  • More specifically, as the import expression takes a compile-time string, it is not usable from CTFE, but the Thrift IDL has an include statement which requires loading and parsing another IDL file. This issue is solvable (essentially by switching to a two-pass model so that the list of files to be imported ends up in an array literal between them), but it adds further complexity to a possible implementation.
  • Additionally, given the current state of compile-time memory management in DMD, I think parsing larger interface files might currently be unfeasible (for example, the cassandra.thrift file from the eponymous Apache project is nearly 30 kB in size). Don Clugston’s upcoming CTFE overhaul might partially mitigate this concern, but a non-negligible impact on compile times would probably still remain.
  • Generating the glue code on the fly effectively hides it completely from the user, which could be a burden to ease of adoption, bug reporting, and maintenance. This is not an unsurmountable problem, but certainly needs to be taken into consideration.
  • Changes to the interface file(s) for a given application are usually rather infrequent, meaning that the advantages of not having to run a separate code generator are not as big as for more dynamic source files.

Overall, a pure CTFE implementation would certainly be a nice experiment to demonstrate the expressiveness of D, but I think this might better be deferred to a later point when there is already a working Thrift implementation to be used as base. My current proposal is to extend the default »offline« compiler distributed with Thrift instead, which is written in C++ (using lex/yacc) and used for all other language implementations I am aware of. I would be glad to further discuss this decision though.

The second part would be the actual D Thrift library, which implements the various protocols and provides basic RPC client/server functionality. The detailed goals for this part of the project are still to be determined, also taking feedback from potential users into account. A preliminary objective might be to include support for the default binary and JSON formats, and to provide a solid foundation for RPC, along with a basic implementation.

An important part of the library support would be a non-blocking asynchronous RPC server for D, similar to the C++ implementation that comes with Thrift. Given the current lack of native support for event-based networking I/O in D, this will take a considerable amount of work to implement, but is absolutely necessary to make D a compelling choice for writing Thrift servers.

Timeline

The following highly tentative schedule for the execution of the project over the course of this year’s Google Summer of Code has been written under the assumption that the existing Thrift generator would be used.

During the community bonding period (April 25–May 23), I would get in touch with the Apache Thrift community, most importantly to clarify the process and requirements of a possible integration with the mainline after the project has been completed. There is also a proposal to redesign parts of the existing language libraries, which I intend to discuss with Thrift community because it could directly be taken into account for the design of the D library.

My main focus during the first half of the coding phase (May 23–July 11) would be to define the central pieces of the D library API in detail, to modify the Thrift compiler to emit D code and target said API, and to implement the basic parts of the library (e.g. the default binary protocol). During this process, I would actively use and extend the Thrift test suite to foster code quality.

In the mid of July, the mid-term evaluations are taking place.

In the second part of the summer (July 15–August 15), I would complete the remaining parts of the library, along with the accompanying documentation. This notably includes an actual RPC server and client implementation, and the related part of the test suite.

Towards the end of the coding period, I would prepare my work for integration into the official Thrift source tree, where I intend to maintain it for the foreseeable future.

If I should, in agreement with my (prospective) mentor, come to the conclusion that the Thrift compiler should be implemented via CTFE, the final goal could instead be to take the library through the Phobos review process2. However, as this would be a quite substantial submission (and thus entail a review period of several weeks), the actual review/vote would have to take place after the formal end of the Summer of Code program.

About me

My name is David Nadlinger, I am a 20-year-old student from Austria and currently in the first year of the mathematics and physics Bachelor’s programs at the Vienna University of Technology. Being both adept and passionate in analyzing and solving complex problems, programming has been my primary hobby since I wrote my first console games in QBasic more than ten years ago. For a long time, C++ has been my main language, although my interest in software design approaches led my to play with a number of other languages and libraries. I picked up D in 2008, soon after which it became my primary focus, and have been an avid member of the community since, making a number of smaller contributions to related projects. My latest personal project is a flexible Boost-licensed units of measurement implementation for D, which I expect to present to the community soon.

I am fluent in D, C++ and ActionScript, and did a few non-trivial applications in Ruby and PHP. Being a contributor to KDE, I am proficient with the Qt framework, and am currently helping to maintain QtD. Other open source projects I am currently involved with include SWIG (to which I contributed D support) and the Open Asset Import Library. As part of a personal web application project, I implemented a custom ActionScript/PHP RPC system (which uses e.g. XML-RPC as wire format), providing me with basic first-hand experience regarding serialization and RPC.

Austrian term dates do not usually fit the GSoC schedule well (with important exams being in the second half of June), but as I am not going to formally finish this term3, I expect to have large amounts of time at my disposal in the summer. Participating in the Google Summer of Code would enable me to fully concentrate on this project.

My Google Melange profile should include any necessary contact information, otherwise feel free to see my website (I’ll gladly send you my instant messenger details via mail). As always, you will also find me regularly on #d at irc.freenode.net, where I am known as »klickverbot«.

 

1 Another well-known project with a similar goal (but a slightly narrower scope, particularly with regards to RPC) would be Google’s Protocol Buffers.

2 I am not sure if Phobos would be the right place for a non-small, specialized library like this project would be, but it was suggested to me several times on the newsgroup and IRC. I will share my opinion and ideas on possible library development and organization schemes on the newsgroup soon.

3 I am going to move to Switzerland and start my studies at the ETH Zürich later this fall. As I am not going to receive any credit there for exams taken in Austria, I would get no benefits from formally finishing the current term.