GSoC/GCI Archive
Google Summer of Code 2014

Stratosphere Project

License: Apache License, 2.0

Web Page: https://github.com/stratosphere/stratosphere/wiki/Google-Summer-of-Code-2014

Mailing List: https://groups.google.com/forum/#!forum/stratosphere-dev

Stratosphere is an Apache-licensed platform for massively parallel data analysis. It started in 2009 as a research project between three Universities in the Berlin area of Germany. Stratosphere’s development is community-driven, with all decision making and planning being fully transparent using mailing lists and GitHub. Stratosphere allows to analyze huge amounts of data using a set of operators such as map, reduce, join, cross, cogroup or iterate. These operators can be assembled into a dataflow graph that represents a data analysis job. Stratosphere currently has a programming API for Java and Scala. The support for iterative algorithms allows users to perform advanced graph analytics and machine learning tasks in distributed cluster systems. Stratosphere is an alternative to other massively parallel data processing engines such as Apache Hadoop or Apache Spark. Stratosphere offers advanced database technology such as a build-in optimizer or support for iterative algorithms within the big data space. While being an alternative to Hadoop, Stratosphere still offers full compatibility to the Hadoop ecosystem by allowing to interact with the Hadoop Distributed File System, support for Hadoop's InputFormats and various other systems in the field.

Projects

  • Implementation of a Hadoop Abstraction Layer for Stratosphere Project Apache Hadoop is one of the major systems in the Big Data age. Therefore, it is important that other systems are able to interface with it. Stratosphere, a high-performance Big Data Analytics Platform utlising abstractions from Java and Scala has a notable reputation in the field. This proposal is about making this interfacing of Stratosphere and Hadoop a reality in a very transparent way to the user for the benefits of Stratosphere and all Big Data enthusiasts in the world.
  • Proposal for building stream processing engine in Stratosphere project In this proposal, I will introduce myself, present my idea for stream processing in Stratosphere, convince my capability of collaborating with the team, and clarify detailed schedules for participating the project.