Automated benchmark suite for numerical libraries in Gentoo
Short description: The project concerns the construction of a complete automated benchmark suite for implementation of standard linear algebra libraries such as BLAS or LAPACK, which will allow the user to understand the best implementation for his machine. Compilers, compiler versions and flags are taken into account for this.
Most of the computed-related topics rely upon numerical computation: simulations, graphics, games and video-editing are just examples of tasks that are based on linear algebra computations. Gentoo provides many implementations of numerical interfaces such as BLAS through Portage. But different implementation lead to different performance results. As Gentoo developers, we want to show how good this operating system is suited for numerical purposes by providing facilities that can help the user to decide which implementation is the best-suited for his work: optimized computations mean better computer-related work experiences. The Scientific Gentoo Project already investigated this topic. Now it is time to code!
This project starts from an idea of Sébastien Fabbro. He and myself started at the begin of march discussing the proposal vie email. In the meantime I've done some research and started designing the proposal. I finally announced the project proposal on the mailing list. What you can see here is the result.
The main objective is to integrate in the Gentoo tools a comprehensive automated benchmark suite for the numerical libraries that can decide which implementation is optimal on the current machine. Based on these benchmarks, we will construct tools to let the system administrator to choose the right default implementation system-wise and the single user to optionally override the administrator’s choice.
A robust, portable and modular benchmarks suite for linear algebra routines implementations. I plan to test every standard task, i.e. the three levels of BLAS with different input matrices and vectors, the LAPACK routines, the sparse BLAS ones. The benchmarks will not only measure the speed, what most benchmarks suite do, but also the accuracy and the memory usage. Accuracy measurements are crucial in some fields: the behaviour and the results of an implementation when the given problem is ill-conditioned, for example, can lead to different decision about which implementation to use for a particular task. Memory requirements of an implementation could be very important in case of big-sized problems, but also in case of normal problems on embedded devices. I plan to begin my work using already well-known patterns -- see for example HPL or BTL, which is used by Eigen for benchmarking the library --, and then implement other specific tests. The suite will be written in C, compiled separately and linked against the implementation to test.
A script (the script in the following) that allows the user to choose the default implementation for the system, based on the benchmarks. There are already eselect modules that let the user choose the default implementation for the system. My script will ask the user which implementation to test, how to test them and which aspect to take into account in order to decide the best implementation. The script will then run the benchmarks with the different desired implementations, each one compiled with different compilers, compiler flags and linker flags; it will use portage to (re)build the implementation libraries and clean the system after the tests; the results of the benchmarks will also be interpreted by this tool, which at the end will be able to decide which library with which configuration fits best the user’s requirements and, after the user’s agreement, will set the system to this configuration using portage and the existing eselect modules. The script will be written in Python. I choose Python because of many reasons: it is a complete language with many useful features that will make the script easier to maintain -- the script will probably reach a considerable size; with Python will it be more comfortable to provide optional features, such as reporting tools which are accessible only if the related libraries are installed; not least, I’m very fluent with Python, way more than with Bash -- I’m not proposing a project because I want to learn something: I want just do my best within fields that I already know very good.
An ebuild will be released in order to let the user install the project results using portage. The ebuild will cover both the benchmarks suite and the system tool.
Everything will be well documented. For the benchmarks suite much documentation will be generated from the sources using doxygen: this will help the maintenance and improvement of the suite; also a man page will be released in order to explain what the suite does, how to interpret the results and how to make a good use of it. The script will also have a man page. A web page will be published with instructions and further explanations.
Since the project is open source, the source code will be available.
The script will be designed in order to be at the same time user-friendly and highly customizable for the advanced user. The work to be performed will be decided by means of initial prompts. The following are the steps which are performed by the script, when called by the user for testing BLAS implementations:
The script detects the installed implementations and asks the user which ones it has to test. The user can also request more libraries to be tested. Prompt example:
* Detected libraries: blas-reference-3.3, atlas-22.214.171.124
|-- Test blas-reference-3.3? [Y, n] n
|-- Test atlas-126.96.36.199? [Y, n]
|-- Enter another package or blank for continue: mkl-10.3
|-- Enter another package or blank for continue:
It will then detect the installed compiler and ask which versions have to be tested and the flag sets to be used. Some meaningful sets will be already provided. The usage and standard options will be easy for the novice user, but the advanced user has much room for customization. Prompt example:
* Detected the following compilers: gcc-4.4.5, gcc-4.5.1, icc-11.1.046-r2
|-- Test gcc-4.4.5? [Y, n] n
|-- Test gcc-4.5.1? [Y, n] Y
|-- |-- Using flag set “-O2 -funroll-loops”? [Y, n]
|-- |-- Using flag set “-O3 -DNDEBUG”? [Y, n] n
|-- |-- Enter a new flag set or blank for continue:
|-- Test icc-11.1.046-r2? [Y, n]
|-- |-- Using...
The user is then asked for the aspects to be monitored, which can be speed, accuracy or memory usage. He is also asked for problem sizes and typology (for example only big Level 1 tests or medium-sized problem of every level).
For each test configuration the test are performed. This includes library building with the correct compiler flags, linking with the benchmarks suite, execution, results collection and interpretation.
At the end a comprehensive report is generated. Graphs can be generated using matplotlib or gnuplot. The script also shows a suggestion regarding the implementation to use.
The script polishes the system and, in case, uses the eselect modules to set the default implementation to be used.
The rebuild of the libraries will be done using the advanced features of Portage. In order to avoid undesired permanent changes to the system, the whole process will be performed in a special environment -- such as a chroot.
The script will not only handle system-wise configurations, but also per-user ones. This will be similar to the behaviour of eselect modules that handle both system and user informations. In both cases, symlinks will be used to point to the right library. In the user case, a local build has to be performed -- I mean, in thge user disk space --, or just system installed libraries can be benchmarked and used.
I expect the most relevant part to be ready-for-use just before the mid-term evaluation. This includes the benchmark suite with support for at least some implementations and the Gentoo tools. It is quite crucial that this condition is met in order to deliver an high-quality software at the end of the script, although an useful result can be anyway provided otherwise.
The final part of the summer will be probably spent on improving the suite and the tools, adding support for other implementations, writing a good documentation and testing as much as possible. Having a full month of time to do that would be a warranty for good results.
A more precise list of milestones is the following:
After 1 week: investigation on the different implementations to take into account for the tests. Deep knowledge of two or three implementation (with code-reading) as case study is acquired. These could be the BLAS implementation of lapack (in Gentoo blas-reference), ATLAS and GotoBLAS2.
After 2.5 weeks: the benchmarks -- at least an initial meaningful and complete version for BLAS implementations -- suite is ready and stable for use and tested with the implementations.
After 4 - 4.5 weeks: the script is ready, at least in an initial working version. This does not include the final reports feature, but every other part -- implementation recognition, compiler versioning, flag sets, benchmarks running, result interpretation -- is ready for use.
After 6 weeks: the benchmarks suite is complete, the script can run it with meaningful results, an initial work on the final report feature is complete, the ebuild is ready for use.
Mid term evaluation
After 1.5 weeks: the script is more modular: it can run different benchmarks suite with different library interfaces -- not just BLAS. For example: CBLAS, LAPACK.
- After 2.5 weeks: the documentation is collected and organized.
- After 4 weeks: many tests have been performed for different library interfaces, with different compilers, flags and library implementations. The test results will compare on the documentation.
- Final evaluation
I calculate to work approximately 5 houres a day during the whole week, which ammounts to 35 houres for each week. Knowing my usual work velocity, this should be enough to provide a final product which keeps the given project.
After the end of the Google program, this project will be continously maintaned: new modules for library interfaces will be added, new benchmarks. Tests will be done in order to prove the stability. It will probably remain a Gentoo-only project, due to the nature of the used pattern.
My name is Andrea Arteaga and I am a student of Computational Science and Engineering at the Swiss Federal Institute of Technology, Zurich. I’m finishing the bachelor program with success and beginning the master program on the same field.
I use Linux since 5 years and Gentoo since 3, although I love to experience new distributions. My contributions until now are alas just forum discussion, help for novices and some blog post. But my academic experience is about numerical programming in C, C++ and Python, which are my preferred fields and that I have good skills within. Numerical computing is not just an academic work for me, but also a true passion. Being accepted for this project would be for me the best way to join the Gentoo development team.