GSoC/GCI Archive
Google Code-in 2014 Wikimedia Foundation

pywikibot: New generator ItemClaimFilterPageGenerator

completed by: m4tx

mentors: John Vandenberg

Pywikibot (PWB) is a Python-based framework to write bots for MediaWiki. See https://www.mediawiki.org/wiki/Manual:Pywikibot for more information. Patches can be submitted via Gerrit (you need a MediaWiki.org account). More documentation on Gerrit can be found at https://www.mediawiki.org/wiki/Manual:Pywikibot/Gerrit. After you have successfully claimed this task in Google Melange please do use the task in Phabricator for communication instead of Google Melange. This allows more PWB developers to be reached! General development questions can be asked on the Pywikibot mailing list at https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l and the #pywikibot IRC channel (see https://www.mediawiki.org/wiki/MediaWiki_on_IRC).

 

Wikidata is a collaboratively edited wiki knowledge base that stores structured data in JSON records.  The software is an extension to MediaWiki called Wikibase.  See https://www.wikidata.org/wiki/Wikidata:Introduction and https://www.mediawiki.org/wiki/Wikibase for more information.   Its primary use is as a central store of facts that can be used by all wiki projects.  Each claim (fact) in Wikidata is essentially in the form of property=value.  A property may have many values.  e.g. India 'shares a border' (property P47) with several countries.  Some facts need qualifiers that qualify (add conditions on) when or how the fact is true. e.g. India was a member of the United Nations Security Council - but only between 1950 and 1951.

This task is to create a new generator (essentially a list) filter (tentatively named 'ItemClaimFilterPageGenerator') that excludes items which do not contain a fact specified on the command line.  e.g. If the generator/list contains all countries, filter it to remove countries which have not been a member of the United Nations Security Council (past or present).  It is optional to implement support for qualifiers as part of this task.

An example of a very simple generator filter is DuplicateFilterPageGenerator ; there are several other filters in the pagegenerators.py module of varying complexity which can be used as a guide.

To complete this task, a unit test needs to be added to "tests/pagegenerators_tests.py" to simulate calling ItemClaimFilterPageGenerator from the command line.  An example unit test is EdittimeFilterPageGeneratorTestCase .

The Phabricator task is https://phabricator.wikimedia.org/T69568 .