GSoC/GCI Archive
Google Summer of Code 2009

Association Gephi

Web Page: http://gephi.org/forum/topic.php?id=408

Mailing List: http://gephi.org/forum/

[IMAGE http://gephi.org/wp-content/themes/gephi/images/logo80.jpeg]

Gephi is an open source software for graph and network analysis. It uses a 3D render engine to display large networks in real-time and to speed up the exploration. A flexible and multi-task architecture brings new possibilities to work with complex data sets and produce valuable visual results.

Gephi will officially be presented at the 3rd Int'l AAAI Conference on Weblogs and Social Media (ICWSM) conference (website).

Tags : network, network science, infoviz, visualization, graph, graph theory, complex network, software, open source, science

About network visualization : Network Science is a new and emerging scientific discipline that examines the interconnections among diverse physical, informational, biological, cognitive, and social networks. This field of science seeks to discover common principles, algorithms and tools that govern network behavior. The National Research Council defines Network Science as "the organized knowledge of networks based on their study using the scientific method." In this context, network visualization brings a complementary way to statistical analysis to discover, extract and classify new patterns in network structure and data.

Learn more: http://en.wikipedia.org/wiki/Network_science
Video documentary: How Kevin Bacon cured cancer

Why is Gephi cool ?

  • Because you can make, see and print that things : http://www.flickr.com/photos/gephi/
  • Because we want to set networks in the hands of the most, as a tool for understanding the world and revealing its organization.
  • Because we don't like complicated scientific software and build a WYSIWYG user-centric editor.
  • Because we are French ;-)

Projects

  • A distributed approach for graph spatialization Spatialization plays an important role in rendering networks, specially when the size of the networks are huge and the complexity of current spatialization techniques starts to dominate. This proposal is about developing and implementing a highly scalable distributed spatialization algorithm, so that Gaphi can ideally spatialize [and render] massive networks (~1M nodes).
  • Extending Gephi with realtime dynamic graph navigation I propose to add a new graphical component to Gephi, that will use the current slice-based storage of Gephi to provide an intuitive temporal graph navigation. This implementation will be designed with performance and ease of use in mind.
  • Network Algorithms and Statistics This proposal outlines and expands upon the Network Algorithms and Statistics project idea. This proposal calls for the implementation of 9 network algorithms and metrics which will enhance Gephi's user experience by providing information about the network which is not usually apparent from mere visual introspection. The proposed network metrics are: HITS, PageRank, clustering coefficient, network diameter, mean shortest path, betweenness centrality, modularity, degree-distribution and closeness centrality. Within this document I discuss the proposed network metrics and the approaches I intend to take to implement them. I propose a tentative time-line, outline appropriate milestones and discuss anticipated development hurdles and how I intend to avoid them.
  • Vectorial Preview Gephi is able to export its network as SVG or PDF format. Using vectorial drawings for graph have many benefits like infinite zooming or clear shapes. The aim of this proposal is to develop a preview module in Gephi for seeing how the output will exactly looks like with the given parameters. Whereas the embedded 3D engine is designed for efficient network exploration the vectorial export concentrate on clarity, readability and outstanding design. This module has many different settings like edge thickness, arrow size, white borders around labels, node border and so on. In the vein of WYSIWYG editors we would like to see direct setting change on a Vectorial Preview window. The module must be scalable to support large networks. To resume there are two requirements, previewing the network exactly how it will looks-like in output and support huge graphs efficiently. An approach proposed by the mentors consists in using the output generated by the Gephi Rich SVG Export module to reconstruct and display the network. By importing the demo input "celegans.gexf", the generated SVG file has size 1.8MB. Building the preview from an SVG file generated each time an option is (un)checked seems to be a too loud process appearing not to be scalable for large networks, especially on old computers. Moreover, there does not seem to be a "transitory" state of data: data is directly exported to SVG in the ExporterSVG class. Finally, my approach is to generalize the vectorial transformation process into a class called ExporterVectorial, for instance, in which the final shape of rendering is abstracted: it will create a new transitory data set designed for a vectorial output. By (un)checking any option, ExporterVectorial will be called, generating the vectorial data that a new class VectorialPreview will render on the SVG export panel using Java 2D. When the "Finish" button will be pressed, ExporterVectorial will be called again and then ExporterSVG will use the output data to render it into the SVG format. In fact, SVG and preview outputs will come from a same data input designed especially for vectorial rendering. In addition, an idea could be to implement a "Preview" checkbox on each vectorial export panel, which would allow people using old computers to choose if they want an automatic preview or not. Nevertheless, the main interrogations are: * will the graph be displayed entirely or partially? * if displayed partially, how to browse it and make the preview readable? About the development method, a new Bazaar branch will be created from a "rather stable" Gephi development version, and the work will be done on it. Once the GSOC ended, it is hoped that it will be merged into the trunk branch. Here is the planning schedule: * April 20th - May 23rd: understanding of existing Gephi code, following of current development evolution; * May 23rd - June 26th: class diagram design, Java 2D API discovering, rendering few nodes in the preview window then thousands/millions to test performance; * June 27th - August 17th: proper development of the proposed approach.