“GRAPH BASED CASSANDRA PARTITIONER” PROJECT PROPOSAL
Ahmet Erdem Sariyuce
Abstract
I am interested on "Graph based Cassandra Partitioner" project under "Globus Online" topic, since it fits best to my interests. Currently, I am working at High Performance Computing Lab at BMI Department as a research assistant (bmi.osu.edu/hpc) where I do research on parallel graph algorithms and graph coloring problem. I summarized existing graph partitioning algorithms in the proposal and suggested a reasonable timeline. I have contacted to Tom Howe (mentor) and got a positive feedback.
Additional Information
- Partitioning with nodal coordinates
o Rely on graphs having nodes connected (mostly) to “nearest neighbors” in space ,
o Algorithm does not depend on where actual edges are ,
o Common when graph arises from physical model ,
o Ignores edges, but can be used as good starting guess for subsequent partitioners that do examine edges ,
o Can do poorly if graph connection is not spatial ,
§ inertial partitioning method for 2D partitioning ,
§ random spheres method for 3D partitioning ,
- Partitioning without nodal coordinates
o Coordinate-free ,
§ BFS method
§ Kernighan-Lin
· Take an initial partitioning and improve it ,
· Most basic approach ,
· Most expensive, O(n3) ,
§ Spectral Bisection
- Multilevel Partitioning
o Coarsen, Expand, Improve; replace graph G by a coarse approximation Gc, and partition Gcinstead. Then use partition of Gc to get a rough partitioning of G, and then iteratively improve it ,
o Coarsen and Expand make a V cycle, each constituting a leg of V, and Improve repeats that cycle. V cycle has not to be complete, Coarsen and Expand may have different lengths in a single V cycle ,
§ Multilevel Kernighan-Lin
· Coarsen and Expand with maximal matching ,
o A matching of a graph G(N,E) is a subset Em of E such that no two edges in Em share an endpoint ,
o Greedy approach for maximal matching (like graph coloring) ,
o Coarsen the related nodes and edges ,
§ Improve by Kernighan-Lin ,
· Multilevel Spectral Bisection
o Coarsen and Expand with maximal independent set approach ,
o Improve by Rayleigh Quotient Iteration ,
- Implemantations
o Multilevel Kernighan/Lin
§ METIS (www.cs.umn.edu/~metis)
§ ParMETIS - parallel version
o Multilevel Spectral Bisection
§ S. Barnard and H. Simon, “A fast multilevel implementation of recursive spectral bisection …”, Proc. 6th SIAM Conf. On Parallel Processing, 1993
§ Chaco (www.cs.sandia.gov/CRF/papers_chaco.html)
o Hybrids possible
§ Ex: Using Kernighan/Lin to improve a partition from spectral bisection
o Recent package, collection of techniques
§ Zoltan (www.cs.sandia.gov/Zoltan)
§ See www.cs.sandia.gov/~bahendr/partitioning.html
o No best method, some of them is better at speed while the others provide better cuts ,
o Beyond graph partitioning: hypergraph partitioning
§ Gives better communication volume than graph partitioning
§ Pros and cons
· When matrix is non-symmetric, the graph partitioning model (using A+AT ) loses information, resulting in suboptimal partitioning in terms of communication and load balance ,
· Even when matrix is symmetric, graph cut size is not an accurate measurement of communication volume ,
· Hypergraph partitioning model solves both these problems ,
· However, hypergraph partitioning (PaToH) can be much more expensive than graph partitioning (METIS) ,
