GSoC/GCI Archive
Google Summer of Code 2011

R Project for Statistical Computing

Web Page: http://rwiki.sciviews.org/doku.php?id=developers:projects:gsoc2011

Mailing List: mailto:gsoc-r@googlegroups.com

The R Foundation (as the legal entity behind the R Project) is a not-for-profit organization working in the public interest. It has been founded by the members of the R Development Core Team in order to

  •  Provide support for the R project and other innovations in statistical computing. We believe that R has become a mature and valuable tool and we would like to ensure its continued development and the development of future innovations in software for statistical and computational research.
  • Provide a reference point for individuals, instititutions or commercial enterprises that want to support or interact with the R development community.
  • Hold and administer the copyright of R software and documentation. R is an official part of the Free Software Foundation's GNU project, and the R Foundation has similar goals to other open source software foundations like the Apache Foundation or the GNOME Foundation. Among the goals of the R Foundation are the support of continued development of R, the exploration of new methodology, teaching and training of statistical computing and the organization of meetings and conferences with a statistical computing orientation.

Projects

  • A GUI based package to assist optimization problems in R This project aims at building up a GUI based package of R to assist the preparation and solution of optimization problems. It is anticipated to improve the usability of optimization tools in R by providing users with meaningful suggestions on the choice of optimizer and parameters through a visible and interactive way. The program will also provide a mechanism to auto-generate codes that could be run in R to solve a specific optimization problem.
  • Convergence acceleration of the Expectation-Maximization (EM) algorithms in computational statistics: A suite of cutting-edge acceleration schemes The Expectation-Maximization (EM) algorithm is a useful and popular optimization approach that arises in a wide range of scientific applications. Adaptations of the original EM approach have been proposed that provide faster convergence rates without compromising its global convergence property. We propose to develop an R package which will provide a unified implementation of the diverse set of accelerations schemes to the EM algorithm in an open source, user-friendly environment.
  • Cranvastime: Interactive longitudinal and temporal data plots The project involves developing interactive time series and longitudinal data plots, in association with a new interactive graphics package for R called cranvas, which is based on Qt, and has the capability to handle large amounts of data. The purpose is to improve R’s capabilities for exploring temporal data. The time series plot will enable exploring slightly irregular seasonality, and associations between multiple series.The longitudinal plot will enable the study of the individual variation.
  • DClusterm: Model-based detection of disease clusters Analysis of disease data is important in order to detect disease outbreaks and risk factors. Some of the methods for cluster detection have been implemented in the DCluster package. However, a model-based approach would be of interest in order to explore disease incidence to potential risk factors. Model-based clustering will be implemented using Generalized Linear Models. Hence, many possible clusters will be proposed and the most likely cluster will be selected using model selection techniques
  • Developing a hyperSpec GUI Currently hyperSpec provides a limited GUI interface via the `locator()` function for basic graphics. This proposal will develop a Graphical User Interface for the hyperSpec package. This GUI will be made up of smaller widgets that can be chained, synchronised, and included in batch scripts.
  • Exploratory visualization of dynamic stochastic processes. To contribute with functions to help explore, visualize and analyze data from multivariate stochastic dynamic systems.
  • HUGE: High-dimensional Undirected Graph Estimation Modern data acquisition routinely produces massive amount of complex datasets. Despite the high dimensionality and complexity, many problems have hidden structure that makes efficient statistical inference possible. One important hidden structure is sparse conditional independence graphs (or undirected graphical models). Our HUGE project aims at providing a fast and scalable toolkit for nonparametric graphical models in ultrahigh-dimensional data analysis.
  • Image Analysis in R To bring full integration of ImageJ to R and to expand the RImageJ into a fully functional R image analysis engine.
  • Manipulating RStudio Graphics Towards Creating Intuitive Mathematical Comprehension The manipulate package in RStudio can be used to demonstrate mathematical ideas intuitively through interacting with sliders and watching a corresponding graph shift. I will create a variety of applets written in R to show basic calculus and statistical concepts for Professor Daniel Kaplan and J.J. Allaire's contribution to Project MOSAIC.
  • OpenMP parallel framework for R As an existing project in the ideas list, it aims to use multi-threaded programming to impose parallelism based on multicore/shared memory architecture. As OpenMP is a well known specification for parallel programming, it is performed in a neat way without hassle in messaging passing or load balancing, and supports hybrid programming with MPI as well. The expected results include a usable R-OpenMP package that will reside on CRAN servers with good performance, compatibility and user experience.
  • optile Category order optimization for graphical displays of categorical data The project goal is to implement an interface in R which provides category order optimization for different types of input (such as tables, data frames or matrices) and 2- as well as k-dimensional categorical data.
  • Proposal for Components in TradeAnalytics Toolchain enhancements The existing packages have included necessary tools/functions to construct and apply trading strategies. More functions related to trading a portfolio, testing of parameters and evaluation of strategies can be added. This proposal is focus on some of the targets related to these new developments.
  • R-EM-Accelerator---Smarter Iterative Schemes Save Your Time This project aims at developing an R package that offers multiple latest acceleration schemes under a single call and can be used to accelerate any EM algorithm. In the proposal, I will show how flexible and convenient it will be for any R user to use this package and a reasonable timeline, which is the result of prior learning, is also included. In addition, I’d like to mention that I want R project as the mentoring organization and Professor Ravi Varadhan as my mentor.
  • SMART: Sparse Multivariate Adaptive Regression Toolkit The project aims at providing the “fastest and most scalable” implementations of three modern nonparametric predictive methods (SpAM, MT-SpAM and G-SpAM). This package has the potential to become a general-purpose exploratory data analysis toolbox for a wide range of data analysis practitioners. The targeted applications include large-scale scientific data analysis (e.g. genomics/proteomics/bio-imaging), social media data analysis (e.g. image/audio/video/text modeling) and financial time-series