GSoC/GCI Archive
Google Summer of Code 2012

Orange – Data Mining Fruitful & Fun

Web Page: http://orange.biolab.si/trac/wiki/GSoC/Ideas

Mailing List: http://orange.biolab.si/forum/

Orange is an open source component-based data mining and machine learning software suite, featuring friendly yet powerful and flexible visual programming front-end for explorative data analysis and visualization, and Python bindings and libraries for scripting. It includes comprehensive set of components for data preprocessing, feature scoring and filtering, modeling, model evaluation, and exploration techniques. It is maintained and developed at the Bioinformatics Laboratory of the Faculty of Computer and Information Science, University of Ljubljana, Slovenia.

Projects

  • Computer vision add-on for Orange Development of a add-on (set of widgets for Orange Canvas) that will introduce computer vision functionality to Orange. The core of the add-on will be the Python computer vision library OpenCV. The development of this add-on and the gathered findings will be then used for improving the Orange core.
  • Multi-Target Learning for Orange Orange already has a multi-target tree learner, but it is written in python and is therefore slow, especially when used in a random forest. Implementing the multi-target tree learner in C++ would quicken classification considerably and also lower its spatial complexity. The tree learner would be based on the Top-down induction of clustering trees proposed by Blockeel and De Raedt and would extend Orange's SimpleTreeLearner. Because tree learning algorithms really come to life inside random forests, integration with Orange's random forest would be another focal point. Orange is progressing towards version 3.0, therefore the implemented code would be integrated with the new version. Once the algorithms are implemented and integrated, an experimental study would be performed comparing the implemented multi-target tree classifier with established multi-target classifiers (e.g. PLS, Bayesian classifiers) on benchmark datasets. Finally, tests, documentation and scripting reference would be written.
  • Widgets for statistics The goal of this project is implementing a set of standard statistical methods in Orange in its own spirit and style. A lot of code for various statistical methods already exists in Python; we will not reimplement what we can reuse. The bulk of the work will be implementation of widgets and not of statistical computation. It is my hope that the project will encourage others to contribute more widgets of this kind.