GSoC/GCI Archive
Google Summer of Code 2010

Apache Software Foundation

Web Page:

Mailing List:

The Apache Software Foundation provides support for the Apache community of open-source software projects.

The Apache projects are characterized by a collaborative, consensus based development process, an open and pragmatic software license, and a desire to create high quality software that leads the way in its field.

We consider ourselves not simply a group of projects sharing a server, but rather a community of developers and users.


  • Add support for temporary repositories used for staging for Archiva Creating a repository where the users can deploy their artifacts and once the testing tasks are completed user can deploy the artifact to the common place where the we can use those artifacts as dependencies in pom files or remove the artifact from the temporary repository if it fails during the testing phase.
  • Adding support for Album Subscriptions at Apache PhotArk Nowadays, there are several popular websites that allows one to upload and share pictures, and different people have different preference resulting in various related pictures (e.g for a given event) to be available in multiple websites. Apache PhotArk is an open source photo gallery application and currently allows you to share your pictures, and adding the ability to subscribe to external albums will permit that pictures from these different websites be aggregated into one single place.
  • An Eclipse-based Visual State Chart XML editor/debuger that generates SCXML documents State Chart XML (SCXML) provides a generic state-machine based execution environment based on Harel State Tables.It is very useful to handler complex status transfer logic,but if a SCXML file is really so huge and complex,it will become too difficult to maintain and refactor or to test its logic validity.This project aims to provide a Eclipse and GMF based Visual Editor and Debugger for SCXML,and we can also use it to generate SCXML document and specific codes according to a State Chart XML.
  • Apache AXIS2 integration with Free pastry Apache axis2 is a light -weight Web Services engine, which has been implemented in both Java and C .It provides a better SOAP processing model, with considerable increase in performance. Pastry is a generic, scalable and efficient substrate for peer-to-peer applications. Pastry nodes form a decentralized, self-organizing and fault-tolerant overlay network within the Internet. By integrating free pastry in to the Apache axis2, we can inherit the capabilities of free pastry's network overlay.
  • Apache Derby-4587- Add tools for improved analysis and understanding of query plans and execution statistics Apache Derby, an Apache DB sub-project, is an open source relational database implemented entirely in Java and available under the Apache License, Version 2.0, and is a very interesting piece of software. Derby is based on Java, JDBC and SQL. Quite frequently, users of Derby have troubles in comprehending the execution of a query. The objective of this project is to provide visual displays to help people understand the way their query is being run.
  • Apache MyFaces - MyFaces Application Builder This project will create a MyFaces Application Builder (MAB), an application builder for MyFaces which enables the user to easily generate a MyFaces application. The MAB will use the OpenWebBeans CDI implementation to enable the users to create add-ons and to render the MAB as flexible and extensible as possible.
  • Apache Vysper extension implementing BOSH for HTTP clients Apache Vysper is an Apache MINA based XMPP/Jabber server. This project aims at implementing a Vysper extension that enables HTTP clients to be XMPP clients. The solution for communication with the HTTP clients is to use the Bidirectional-streams Over Synchronous HTTP (BOSH) protocol (in course of standardization at XMPP as XEP-124 and XEP-206 draft standards).
  • Asynchronous Servlet integration with SCA callbacks The goal of this project is to design and implement asynchronous operation for Apache Tuscany's Web support. Tuscany/SCA already has an asynchronous programming API which enables easily creating asynchronous services, and the very latest JEE Servlet spec also now supports asynchronous programming, so the idea is to use the new Servlet APIs to allow web browser clients to asynchronously receive the results of SCA callbacks.
  • Automated webapp tests for MyFaces core and extensions MyFaces uses JUnit for testing in MyFaces Core and also provides a test-webapp. The main drawbacks are that JUnit uses to much Mock classes, and the tests are not able to run against the “real” classes, hence these solution cannot be used for testing extensions. Another issue concerns the need of a complete integration testing. The test-webapp drawback is that if exhaustive testings were to be performed, each page would have to be checked manually. The project addresses this exact issues.
  • Create nifty components - Tapestry 5 Create the components Tapestry Drag and Drop Palette and SelectWithAutocomplete. In case I have time intend create others components the issue TAP5-1071.
  • EigenCuts spectral clustering implementation on map/reduce for Apache Mahout Clustering algorithms are advantageous when the number of classes are not known a priori. However, most techniques still require an explicit K to be chosen, and most spectral algorithms' use of piecewise constant approximation of eigenvectors breaks down when the clusters are tightly coupled. EigenCuts solves both these problems by choosing an eigenvector to create a new cluster boundary and iterating until no more edges are cut.
  • Enhancing JMX descriptors in Apache Tomcat JMX is a technology which can be used to manage Java applications. Apache Tomcat is JMX enabled and this can be leveraged to provide a remote management interface. Even though required JMX infrastructure is present in Tomcat, some lapses currently present in the infrastructure, makes it hard to utilize it for fully configuring a remote Tomcat instance. A solid JMX infrastructure would be beneficial for any endeavor which is aimed at providing a remote management application for Tomcat.
  • Ext-Scripting CDI Integration The Ext-Scr. module enables developers either to use a scripting language to develop JSF applications or to use Java in a way as if it was a scripting language as well. This works thanks to dependency graphs being built internally reflecting the current state of the application. Having said that, this project aims to enable developers to use an implementation of the CDI specification with Apache MyFaces Ext-Scripting, i.e. this project basically provides the meta data information it requires.
  • HTML5 Support for Apache MyFaces2 HTML5 Support for Apache MyFaces2 will extend MyFaces2 core components and also deliver a new set of JSF components to integrate browsers with Java Server Faces server-side rendering technology.
  • Implement a Dead Letter Channel for Synapse Apache Synapse[1] is an Open Source ESB and a Mediation Framework which Support some Enterprise integration patterns.This document propose to implement a feature to support dead letter channel enterprise application integration pattern.
  • Implement read-write API and Serialization strategy for Apache Woden Component level API ( Woden -20) Apache Woden implements the W3C WSDL 2.0 spec. According to W3C WSDL 2.0 spec the Woden has two APIs,Element Level and Component level APIs.The Component Level API is a read only API and implementing a read write features is the main goal.Further defining a serialization strategy for the Component level API is also required. another goal of this project is identifying those serialization alternatives and encapsulating them using a strategy pattern and implement them.
  • Implement the missing functionalities in javax.ImageIO module The current javax.imageio package in Harmony is incomplete. It is my goal to implement the remaining functionalities that are missing from this package. Not only that, I will be extending missing image plug-ins, such as JPEG, PNG, BMP, and GIF, for the Harmony JDK.
  • Implementing a parser and an evaluator for Schema Component Designators (SCD) Apache Xerces2 is a high-performance XML processor which implements a collection of standard APIs. The objective of this project is to design and implement a parser and an evaluator for schema component designators (SCD) that can be used to identify and retrieve XML schema component(s) from the XML schema data model used by Xerces.
  • Implementing a streamable subset for XPointer xpointer() scheme for XInclude Xerces2 is a XML parser written in java which allows to parse,manipulate and validate XML documents.Xerces XPointer xpointer() scheme lets users to select document fragments using the XPath expressions.The objective of this project is to improve the Xerces' streaming XInclude processor so that it provides support for a streamable subset of XPointer xpointer() scheme.
  • Implementing XML Schema 1.1 overriding component definitions (<xs:override>) Apache Xerces2-J XML Schema processor currently provides support for W3C XML Schema specification 1.1. Although it fulfils more than spec's minimal requirement ,some vital XML Schema Structures support is yet to be realized. This project tries to implement one such requirement, namely xs:override support for XML Schema 1.1 .The xs:override semantics are intended for unconstrained Schema component replacement , mitigating some of the bottlenecks in existing structures such as xs:redefine.
  • Integrating OpenID with PhotArk Apache PhotArk is a photo gallery application. It currently contains a display piece, a Feed Reader, a Content Repository and a web based Admin Panel. For PhotArk to become a complete product, it needs authentication and authorization. Since OpenID provides easier and faster way to login web pages, it was selected as the authentication mechanism. An Access Control Layer will be implemented for authorization which will protect the repository and manage user access to view and modify albums.
  • JiBX Databinding Support for Apache CXF JiBX is extremely flexible Java objects to XML mapping technology which allows you to use the existing Java code, generate Java classes from XML schema or to bridge the existing code to schema that represents the same data. JiBX databinding implementation for Apache CXF would allow its users to use their existing code base to a greater extent when implementing Web services merely by bundling databinding definitions along with its many other features.
  • Junit test conversions and bug fixes To help Derby become perfect, it's important to fix existing bugs and replace existing logic in the Derby test harness with JUnit code. With the new JUnit tests, Derby would gain all the benefits of JUnit, such as running tests from ant, integration with IDEs, ability to hook into other JUnit suites, easier understanding of how Derby tests are run etc. Current conversion focuses on LOB tests and tests in tests/derbynet/, tests/lang/, tests/jdbcapi and existing SQL scripts.
  • Linear SVM Package (LIBLINEAR) for Mahout Linear Support Vector Machine (SVM) is pretty useful in plenty of applications with large-scale datasets or datasets with high dimension features. This proposal will port one of the most famous linear SVM solvers, LIBLINEAR [1] to mahout with unified interface with Pegasos [2] on mahout, which is another linear SVM solver and almost finished by myself (Mahout-232). Two distinct contributions would be: 1) Introduce LIBLINEAR to Mahout; 2) Unified interfaces for linear SVM classifier on Mahout.
  • Maildir based Mailbox I would like to contribute a Mailbox implementation for the Apache James mail server which uses the maildir directory structure. While the current development version of James supports a variety of back-ends for storing messages, it is lacking this standard message store. Supporting maildir facilitates migration from other mail servers, integration into existing systems, and encourages using James with a message store familiar to many administrators.
  • Mini CMS guide for Apache Sling Apache Sling is a powerful Web Framework and provides to the developers a lot of interesting ways to build a web application. It isn't always true that many features make a framework interesting, because a lot of features means more initial problems. This project will try to reduce the gap between Apache Sling and its new users, showing all its features.
  • MyFaces2 State Saving Performance Improvements This project’s aim is to study the current state saving performance of MyFaces 2.0 and to find where this could be improved and the means by which it can be done. The first steps for improving the state-saving performance were already made by introducing partial state saving. This way, the memory consumption is reduced. More detailed measures should be done at first and determine what else might be improved. Then, these improvements should be implemented.
  • Object LDAP Persistence Tooling * Select an entry in the directory tree of the LDAP Browser, right click, and generate the domain model and persistent code (DAOs). * Generate LDAP schema and persistence code based on a selected Eclipse Java package (Domain model) * Special challenge: Support of object relationships, e.g. Departments-Users-Groups-Roles-Permissions. * Extendablity and supportability: pluging it to create code for different languages (Scala, C#, etc.), support for different persistence engines.
  • OpenAuth support for CXF The OAuth protocol enables a service consumer to access protected resources from a web service provider through an API. The API gives service consumers access to services without requiring that users disclose their service provider credentials. Apache CXF is an open source services framework. CXF helps you build and develop services using frontend programming APIs, like JAX-WS and JAX-RS. The project purpose is to secure CXF services with OAuth protocol.
  • Pig - binary comparator for secondary sort When Hadoop sorts the keys in the shuffle phase, it will use a binary (raw) comparator, if available. The binary comparator does not deserialize the key into an object and compares directly the byte encoding for better performance. Pig uses the binary comparator when the key is of simple type, but not for tuples. This is important when doing secondary sort, because Pig relies on Hadoop to sort both main and secondary key. Using a binary comparator for tuples will produce a significant speedup.
  • Proposal for implement of the JDB command line debugger Implement the JDB command line debugger tool in Java using the Eclipse JDI library and extend it with other features that add value to the Harmony JDB implementation.
  • Rolling in support for UTF-8 characters in Apache Derby (DERBY-728 && DERBY-4009) Apache Derby relies on the open standard Distributed Relational Database Architecture (DRDA) to implement the abstraction between SQL and a standard DRDA language. Its implementation on Derby is currently limited to ASCII characters. There is an actual and current need to support Japanese and Chinese characters as requested by the community. My task will be to refactor and improve the code so that these characters are supported by the DRDA engine on Derby.
  • SCXML Code Generation Framework, JavaScript Edition (SCXML/cgf/js): An SCXML-to-JavaScript Compiler Optimized for User Interface Development on the World Wide Web This project has two goals. The first is to develop an SCXML-to-JavaScript compiler optimized for User Interface development on the World Wide Web. This would allow developers to elegantly describe and implement Web-based UIs with complex behavioural requirements. The second goal is to generate graphical depictions of statecharts, which may then be animated in response to live UI events. This would allow developers to better comprehend the dynamic behaviour described by their statecharts.
  • Search and Tagging Functionality Apache PhotArk is an open source photo gallery application. This proposal suggests the implementation of a search functionality, exposed as a SCA component, for Apache PhotArk. The search component would be responsible for indexing text information extracted from a photo object and enable it for search. A tagging functionality is also suggested, where the user could attach any text to a photo or album, increasing the amount of text information related to it and more information to search for.
  • Simple and lightweight Atom HTML-based browser for CXF logs Simple and lightweight Atom HTML-based browser, that will be used for browsing the CXF logs. This browser will support feed paging - will let users see the contents of the current page plus it will provide an option to follow the links: next, previous, first and last page. Implementation will be based on the existing CXF JAX-RS WebClient API. All code will be added to the 'rt/management-web' module.
  • Subversion: Add support for Git/Mercurial style unidiff format extensions Add support for Git/Mercurial style unidiff format extensions 'svn diff' and 'svn patch' should be able to produce and apply unidiff files containing git-style extensions to the unidiff format. See That would allow us to use patches for tree and mode changes.
  • ZooKeeper Failure Detector Model ZooKeeper servers detect the failure of other servers and clients by counting the number of 'ticks' for which it doesn't get a heartbeat from other machines. This is the 'timeout' method and it works very well; however it is possible that it is too aggressive and not easily tuned for some more unusual ZooKeeper installations. This project's goals are to abstract the failure detector to a separate module, to implement several failure detectors and to compare their appropriateness for ZooKeeper.
  • Zookeeper Read-Only Mode When a ZooKeeper server loses contact with over half of the other servers in an ensemble ('loses a quorum'), it stops responding to client requests. For some applications, it would be beneficial if a server still responded to read requests when the quorum is lost, but caused an error condition when a write request was attempted. This project would implement a 'read-only' mode for ZooKeeper servers that allowed read requests to be served as long as the client can contact a server.
  • ZooKeeper Web-based Administrative Interface I would start by forking the Django Dashboard created by Patrick Hunt, improve the UI and add more features. I want to capture much more information from ZooKeeper by adding hooks, if necessary. As Patrick suggested I will also focus my attention on providing ready to be used scripts for monitoring (for Ganglia, Cacti, Nagios etc.) and historical/real-time data collection.