“GRAPH BASED CASSANDRA PARTITIONER” PROJECT PROPOSAL

Ahmet Erdem Sariyuce

Abstract

I am interested on "Graph based Cassandra Partitioner" project under "Globus Online" topic, since it fits best to my interests. Currently, I am working at High Performance Computing Lab at BMI Department as a research assistant (bmi.osu.edu/hpc) where I do research on parallel graph algorithms and graph coloring problem. I summarized existing graph partitioning algorithms in the proposal and suggested a reasonable timeline. I have contacted to Tom Howe (mentor) and got a positive feedback.

Additional Information

-        Partitioning with nodal coordinates

 

o   Rely on graphs having nodes connected (mostly) to “nearest neighbors” in space ,

 

o   Algorithm does not depend on where actual edges are ,

 

o   Common when graph arises from physical model ,

 

o   Ignores edges, but can be used as good starting guess for subsequent partitioners that do examine edges ,

 

o   Can do poorly if graph connection is not spatial ,

 

§  inertial partitioning method for 2D partitioning ,

 

§  random spheres method for 3D partitioning ,

 

-        Partitioning without nodal coordinates

 

o   Coordinate-free ,

 

§  BFS method

 

§  Kernighan-Lin

 

·       Take an initial partitioning and improve it ,

 

·       Most basic approach ,

 

·       Most expensive, O(n3) ,

 

§  Spectral Bisection

 

-        Multilevel Partitioning

 

o   Coarsen, Expand, Improve; replace graph G by a coarse approximation Gc, and partition Gcinstead. Then use partition of Gc to get a rough partitioning of G, and then iteratively improve it , 

 

o   Coarsen and Expand make a V cycle, each constituting a leg of V, and Improve repeats that cycle. V cycle has not to be complete, Coarsen and Expand may have different lengths in a single V cycle ,

 

§  Multilevel Kernighan-Lin

 

·       Coarsen and Expand with maximal matching ,

 

o   A matching of a graph G(N,E) is a subset Em of E such that no two edges in Em share an endpoint ,

 

o   Greedy approach for maximal matching (like graph coloring) ,

 

o   Coarsen the related nodes and edges ,

 

§  Improve by Kernighan-Lin ,

 

·       Multilevel Spectral Bisection

 

o   Coarsen and Expand with maximal independent set approach ,

 

o   Improve by Rayleigh Quotient Iteration ,

 

 

 

-        Implemantations

 

o   Multilevel Kernighan/Lin

 

§  METIS (www.cs.umn.edu/~metis)

 

§  ParMETIS - parallel version

 

o   Multilevel Spectral Bisection

 

§  S. Barnard and H. Simon, “A fast multilevel implementation of recursive spectral bisection …”, Proc. 6th SIAM Conf. On Parallel Processing, 1993

 

§  Chaco (www.cs.sandia.gov/CRF/papers_chaco.html)

 

o   Hybrids possible

 

§  Ex: Using Kernighan/Lin to improve a partition from spectral bisection

 

o   Recent package, collection of techniques

 

§  Zoltan (www.cs.sandia.gov/Zoltan)

 

§  See www.cs.sandia.gov/~bahendr/partitioning.html

 

o   No best method, some of them is better at speed while the others provide better cuts ,

 

o   Beyond graph partitioning: hypergraph partitioning

 

§  Gives better communication volume than graph partitioning

 

§  Pros and cons

 

·       When matrix is non-symmetric, the graph partitioning model (using A+AT ) loses information, resulting in suboptimal partitioning in terms of communication and load balance ,

 

·       Even when matrix is symmetric,  graph cut size is not an accurate measurement of communication volume ,

 

·       Hypergraph partitioning model solves both these problems ,

 

·       However, hypergraph partitioning (PaToH) can be much more expensive than graph partitioning (METIS) ,