GSoC/GCI Archive
Google Summer of Code 2013 Apache Software Foundation

Implementation of a Semi-Clustering Algorithm In Apache Hama

by renil for Apache Software Foundation

Semi-Clustering is applied in cases like,Vertices in a social graph typically represent people, and edges represent connections between them. Edges may be based on explicit actions or may be inferred from people’s behaviour . Edges may have weights, to represent the interactions frequency or strength .A semi-cluster in a social graph is a group of people who interact frequently with each other and less frequently with others .What distinguishes it from ordinary clustering is that, a vertex may belong to more than one semi-cluster. Apache Hama Graph API is a good way of applying semi-clustering algorithm on data stored in Hadoop Distributed File System. A message passing paradigm beyond Map-Reduce framework would increase its flexibility in its communication capability. Bulk Synchronous Parallel (BSP) model fills the bill appropriately.