GSoC/GCI Archive
Google Code-in 2014 Wikimedia Foundation

CommonsMetadata should parse <time> tags

completed by: m4tx

mentors: Gergő Tisza

CommonsMetadata parses machine-readable data on image description pages and exposes them through the imageinfo API. Currently dates extracted from the description are output as they are, without any processing, which often results in date strings the clients of the API cannot deal with (e.g. EXIF timestamps).

Many file pages use a <time> element to make the date machine-readable; that is, the date field will contain something like

<time datetime="2012-08-26">August 26th, 2012</time>

Currently CommonsMetadata returns such HTML unchanged; it should only return the value of the datetime attribute instead.

This is T63701 in Wikimedia's issue tracker.

Students are required to read Wikimedia's general instructions first.