GSoC/GCI Archive
Google Summer of Code 2014

mlpack: scalable C++ machine learning library

License: GNU Library or "Lesser" General Public License version 3.0 (LGPLv3)

Web Page: http://www.mlpack.org/trac/wiki/SummerOfCodeIdeas

Mailing List: https://lists.cc.gatech.edu/mailman/listinfo/mlpack

mlpack is a C++ machine learning library with emphasis on scalability, speed, and ease-of-use. Its aim is to make machine learning possible for novice users by means of a simple, consistent API, while simultaneously exploiting C++ language features to provide maximum performance and maximum flexibility for expert users. mlpack outperforms competing machine learning libraries by large margins; links to papers and benchmarks can be found on the project's homepage.

Projects

  • Collaborative Filtering Package Improvements Collaborative Filtering is a popular technique used in Recommender Systems. The current CF framework in MLPACK uses only NMF as a decomposition method. Many alternatives to NMF are available which perform better in terms of efficiency and accuracy, that can be included in the package.
  • Collaborative Filtering Package Improvements 'NMF, Non-Negative-Matrix factorization though fast and accurate, restricts the use of mean normalization due to its non-negativity constraint. This leads to the motivation of implementing SVD based matrix factorization methods which allows the addition of normalization and regularization. This project aims at building a strong CF module which will support most of the outperforming CF techniques as well as creating a solid backbone for future development by creating a robust abstraction.
  • Implementation of Multi-Class Adaboost algorithm in Mlpack. AdaBoost, short for Adaptive Boosting, uses an ensemble of weak classifiers for classification, after tweaking subsequent weak learners in favor of previously misclassified instances. This project aims at providing a multi-class implementation of the AdaBoost algorithm for Mlpack. Implementing AdaBoost would not only extend the range of the project, but adding weak learners would create a template for other ensemble classification algorithms like Gradient Boosting & LP Boost.
  • Improvement of automatic benchmarking system The aim of this project is to : a) Improve the existing algorithm benchmarking system of mlpack by implementing new performance metrics and comparing various classifiers on their performance over the implemented metrics. b) Implement a new classifier based on the backpropagation algorithm.
  • Optimization of tree-traversal in mlpack The goals of this project are twofold: A) Speed-up dual-tree traversal algorithms in mlpack. 1) Define a set of benchmarks. 2) Improve the speed of the tree traversal as measured by these benchmarks. 3) Document, test, and support this code. B) Implement various trees in a manner consistent with the current API. 1) X-tree 2) R*-tree 3) R-tree 4) Hilbert-R-tree 5) UB-tree 6) M-tree