GSoC/GCI Archive
Google Summer of Code 2011

Orange – Data Mining Fruitful & Fun

Web Page: http://orange.biolab.si/trac/wiki/GSoC/Ideas

Mailing List: http://orange.biolab.si/forum/

Orange is an open source component-based data mining and machine learning software suite, featuring friendly yet powerful and flexible visual programming front-end for explorative data analysis and visualization, and Python bindings and libraries for scripting. It includes comprehensive set of components for data preprocessing, feature scoring and filtering, modeling, model evaluation, and exploration techniques. It is maintained and developed at the Bioinformatics Laboratory of the Faculty of Computer and Information Science, University of Ljubljana, Slovenia.

Projects

  • 2D visualization using PyQt Replace the current graph framework based on PyQwt with a pure Qt implementation. This allows greater flexibility, which would be used to make graphs prettier and more readable. It also removes a dependency on PyQwt, which is not available for Python 3.
  • Matrix Factorization Techniques for Data Mining Our objective is to provide the Orange community with a unifi ed and efficient interface to matrix factorization algorithms and methods. For that purpose we will develop a scripting library which will include a number of published factorization algorithms and initialization methods and will facilitate the combination of these to produce new strategies. Extensive documentation, working examples and visualization methods will be provided to help with the interpretation of the results.
  • Multi-label classification The main goal of Multi-label classification is to extend the Orange to support multi-label, including dataset support, two basic multi-label classifications-problem-transformation methods & algorithm adaptation methods, evaluation measures, GUI support, documentation, testing, and so on.