GSoC/GCI Archive
Google Summer of Code 2013

mlpack: scalable C++ machine learning library

Web Page: http://www.mlpack.org/trac/wiki/SummerOfCodeIdeas

Mailing List: http://lists.cc.gatech.edu/mailman/listinfo/mlpack

mlpack is a C++ machine learning library with emphasis on scalability, speed, and ease-of-use. Its aim is to make machine learning possible for novice users by means of a simple, consistent API, while simultaneously exploiting C++ language features to provide maximum performance and maximum flexibility for expert users. mlpack outperforms competing machine learning libraries by large margins.  mlpack is developed by the fundamental algorithmic and statistical tools laboratory (FASTLab) at Georgia Tech. It is released free of charge, under the GNU Lesser General Public License (LGPL) version 3.

The mlpack website can be found at http://www.mlpack.org/.

mlpack's participation in Google Summer of Code 2013 involves three general classes of projects:

  • Code maintenance activities: improving and cleaning already-existing frameworks for the sake of usability.
  • Core functionality improvements: loading and saving additional types of datasets; improvements in model loading/saving and output; command-line parameter improvement.
  • Machine learning method improvements and addition: implementation of new machine learning methods; improvement of existing methods for further speedups; application of recent papers to existing machine learning method implementations.

This breadth of available activities means that students with many different levels of experience with either coding or machine learning can contribute to the project.  A successful contribution to mlpack should be rewarding as usage of mlpack is picking up in the machine learning community.

In addition, it should be emphasized that the Ideas List is not comprehensive.  If you have a neat idea of your own, we will happily consider it.  Also, if you are going to email mlpack@cc.gatech.edu, you must be subscribed to the list.

 

Projects

  • Automatic benchmarking of mlpack methods Many machine learning libraries have been developed since the development of the computer. Advanced and scalable machine learning libraries such as MLPACK need a huge amount of work, primarily from the side of developers, but also from the research side. For that reason it's crucial to the project to have up-to-date benchmarks to get informations which changesets have caused speedups or slowdowns. This project entails writing support scripts which will run machine learning methods especially from MLPACK, but also from competing libraries and produce runtime information.
  • Collaborative Filtering Framework for mlpack The aim of the project is to develop a rigorously tested, well documented collaborative filtering framework for mlpack with a flexible API. The project involves defining the input/output model, implementing collaborative filtering algorithms and deriving various recommendations from the data. The project aims to give maximum user control and freedom to the user to work with the algorithms. The algorithms proposed in this proposal include QUIC-SVD and ALS-WR
  • MLPACK: Bindings Language bindings for MLPACK. It is important that powerful libraries like this be available from common scripting languages such as Python and R.