Database Size/Growth Visualization

Ian McEwen

Abstract

Using javascript (jquery) tools and the statistics regularly recorded by the MusicBrainz server (including historical data), I intend to create a flexible visualization system to help users and developers understand and explore the size and the growth of the MusicBrainz database.

Additional Information

Using the data recorded for statistics (what's in the data dumps as mbdump-stats.tar.gz), I intend to create a page or set of pages that can be used to explore and understand the size and growth of data in MusicBrainz. Since MusicBrainz server uses jQuery for the javascript parts, I would use either flot (http://code.google.com/p/flot/), jqPlot (http://www.jqplot.com/index.php), or RaphaelJS (http://raphaeljs.com/) with a jQuery SVG layer such as http://keith-wood.name/svg.html.

These options need to be evaluated more closely for their strengths and weaknesses; RaphaelJS is a mature and well-developed library supported in part by Sencha Labs (known for other well-built javascript solutions) but is less integrated with jQuery, while flot and jqPlot are more directly part of jQuery but may be less well-developed. jqPlot seems to be more featureful than flot, but may be less well-developed.

As for parts of the interface, the major goals are:

  1. view the data, complete picture or zoomed in (to specific date ranges or just a generic "give me more/less"), at different granularities (week, day, month, year come to mind)
  2. view rates of change across various time periods (per week, per day, per month, etc.) and with equivalent zoom options
  3. Understand information at a glance or in closer detail: seeing trends is great but exact numbers are often better! This goes for both data points and scales, compared with the current graphs (see the notes on graph interactivity below for one part of the solution; the rest is in more-readable scales)
  4. include and exclude parts of the graph as appropriate
  5. show MusicBrainz events like software releases and policy changes (potentially with many categories -- some people care about every style change, and some people only want to see software releases, and everything in-between)
  6. copyable (from somewhere, whether a textbox or the URL bar, or both) links to a specific view (config options encoded either in GET parameters or through magic in the hash-part of the URL)

I see this being implemented as two graphs and a control panel: one graph of actual numbers (similar to the current graphs), one of rates of change, and a control panel including controls for:

  • Which data lines to include (releases, tracks/recordings/works, ARs, specific kinds of ARs?)
  • Whether to show both graphs or just one or the other
  • Granularities for both graphs (time periods)
  • Zooming
  • Scale (linear/log)
  • MusicBrainz events (what to show, or not)
  • (potentially/in the future) multiple scales, for wildly different data (label-label ARs versus releases, perhaps?) -- log scales accounts for some of this
  • (potentially/in the future) add/remove extra graphs, if multiple scales/log scales still isn't enough

The graphs themselves would also be interactive; policy/server changes, for example, would likely be simply marked lines with details upon hover; individual nodes for absolute numbers/rate of change would display exact values on hover. Some controls could be included as well -- double-click on a node to zoom in on it, for example.