GSoC/GCI Archive
Google Summer of Code 2013

R Project for statistical Computing

Web Page: http://rwiki.sciviews.org/doku.php?id=developers:projects:gsoc2013

Mailing List: https://groups.google.com/group/gsoc-r

R is a free software environment for statistical computing and graphics. 

Projects

  • A package extension for Exponential Random Graph Model with block structure We propose to develop a package extension to incorporate block structure into the Exponential Random Graph Models (ERGM) for ERGM package. We have three block models and each is under different dependence assumption among edges or homogeneous assumption among network structures. The goal of block model is to better fit network data and reduce degeneracy which often occurs in traditional ERGMs. The extension will have both R and C implementations. We hope to provide an efficient and convenient interface for researcher in social network analysis research.
  • Addressing IID Assumptions in Finance: Autocorrelation and Drawdowns in Performance Analysis The fact that financial data is not independent and identically distributed (IID) exhibiting extraordinary levels of autocorrelation is a well-known and an accepted fact. The effect of this autocorrelation on investment returns diminishes the apparent risk of such asset classes as the true returns/risk is easily camouflaged within a haze of illiquidity, stale prices, averaged price quotes and smoothed return reporting. Such discrepancies lead to misleading performance statistics such as volatility, Sharpe ratio, correlation,market-beta and other investment indicators based on the Assumptions of Normality/IID of data.Our aim is to develop the different approaches for addressing autocorrelation observed in financial data that have recently been discussed in research journals and include the functions in PerformanceAnalytics, an R package that provides a collection of econometric functions for performance and risk analysis.
  • Addressing IID Assumptions in Finance: Autocorrelation and Drawdowns in Performance Analysis Literature on drawdown is in its developing and evolving stage and is gaining in importance in the hedge fund industry. The PerformanceAnalytics package lacks proper coverage of the developments in the literature concerning drawdown. This project aims to fill this gap and incorporate the presence of non-normality and autocorrelation in returns.
  • Biodiversity data visualization in R R is increasingly being used in Biodiversity information analysis. There are several R packages like rgbif and rvertnet to query, download and to some extent analyze the data within R workflow. We also have packages like dismo and SDMTools for modeling the data. Proposed visualizations would be helpful to understand completeness of biodiversity inventory, extent of geographical, taxonomic and temporal coverage, gaps and biases in data. We propose to develop a package to fill in this gap.
  • CAMEL: Calibrated Machine Learning The package "camel" provides the implementation of a family of high-dimensional calibrated machine learning tools, including (1) LAD, SQRT Lasso and Calibrated Dantzig Selector for estimating sparse linear models; (2) Calibrated Multivariate Regression for estimating sparse multivariate linear models; (3) Tiger, Calibrated Clime for estimating sparse Gaussian graphical models. We adopt the combination of the dual smoothing and monotone fast iterative soft-thresholding algorithm (MFISTA). The computation is memory-optimized using the sparse matrix output, and accelerated by the path following and active set tricks.
  • Collection of functionality ported from the MATLAB code of Attilio Meucci Extend the functionality of the Meucci package with additional research by Attilio Meucci, a thought leader in risk and portfolio management.
  • Handle parallel (vectorized) objective functions in a new optimization wrapper package The idea is to allow vectorized call of ‘fn’ argument. Exmaple: for now, when nmkb is building a polytype, the points are passed to ‘fn’ sequentially. If the points could be submitted at the same time to ‘fn’ (that is the 'vectorized’ or 'paralleled' way), ‘fn’ may be able to parallelize these evaluations.
  • Highfrequency: add inferential methods to highfrequency The economic value of analyzing high-frequency financial data is now obvious, both in the academic and financial world. It is not only the basis of intraday and daily risk monitoring and forecasting, but high-frequency data based statistics are also an important input to the portfolio allocation process, and also for high-frequency trading. The highfrequency package was created last year as a merger of the RTAQ and Realized packages and is in progress of being extended. There are many functions have been created by users and authors of the package themselves. I wish to contribute to the success of the package and therefore apply for the project: “Highfrequency: add inferential methods to highfrequency”.
  • Implement/Port Spectral Unmixing Methods to R Implement spectral unmixing methods in R and integrate them with two existing packages (HyperSpec and ChemoSpec). In linear spectroscopy, each spectrum is a linear combination of pure spectral signals weighted with their respective concentrations. Spectral unmixing deals with the problem of recovering pure component spectra and their respective concentrations from a set of measured spectra. The project will consider the two spectral unmixing algorithms N-FINDR and Vertex Component Analysis.
  • Improve display of imported vector graphics in R grImport is currently able to import some PostScript graphics, but the implementation is not yet capable of importing more complicated PostScript images. Additionally, other types of vector graphics (particularly SVG) are not yet supported directly. This project aims to improve grImport's capacity to import vector images for use within R graphics.
  • Improve rendering of animated/interactive ggplots in d3 using animint Implement versions of ggplot2's scales, transformations, and stats for the animint package, which facilitates creation of interactive, animated javascript graphs using R.
  • Improvements to data construction, subsetting, and manipulation for time series data. While matrix structure of the xts objects gives great performance, it limits the flexibility of time series objects. The main goal of this project is to implement a class which can contain columns of multiple types, while keeping the performance of the xts objects.
  • Improving rapport and pander packages This is a proposal for improving the "rapport" and the "pander" packages. During the first part of the project I will work on the rapport package thus I will produce new templates. In the second part the S3 method will be extended in the pander package.
  • Linear factor model for asset returns Building an R package for estimation, risk analysis and performance analysis of linear factor models for asset returns and portfolios. Factor models for asset returns are used to decompose risk and return into explainable and unexplainable components, generate estimates of abnormal return, describe the covariance structure of returns, predict returns in specified stress scenarios, and provide a framework for portfolio risk analysis.
  • Profiling Tools for Parallel Computing with R This project aims at delivering comprehensive details of output of profile text through visualization and hassle free compling and linking profiling libraries available eg.fpmpi,mpiP,tau,ipm. The key to the project is that we want to make profiling MPI codes for R users hassle-free.
  • Proposal_PortfolioAnalytics_Ross _Bennett Proposal for PortfolioAnalytics R package to extend and improve constraints, usability, and graphics.
  • RIGHT: R Interactive Graphics via HTml This project aims to create R Interactive Graphics via HTml (RIGHT) package that enables interactive data visualization and analysis on a variety of platforms. RIGHT will provide a seamless analysis-visualization workflow in R and allow users to easily explore data and gain valuable insight for the subsequent analysis. For example, the user can highlight the relation among visual elements in multiple plots in a figure, or selectively remove outlier points and re-analyze the updated data either locally or remotely. Since HTML5 canvas and JavaScript will be used to create the visualization, it can be delivered to virtually any device/platform with a modern web browser. RIGHT will become a valuable tool for understanding large amounts of data, hence turning them into knowledge.
  • robgpu The goal of this project is the publication of an R package for robust statistical methods that are capable of handling large data sets and are highly-parallelized on GPUs. The followng methods for robust data analysis will be implemented: the minimum covariance determinant (MCD) estimator of location and scatter, least trimmed squares (LTS) regression, as well as regularized version of MCD and LTS. All four methods are thereby computed with a C-step algorithm.