GSoC/GCI Archive
Google Summer of Code 2014


License: GNU General Public License version 2.0 (GPLv2)

Web Page:

Mailing List:

We believe that knowledge should be free for every human being. We prioritize efforts that empower disadvantaged and underrepresented communities, and that help overcome barriers to participation. We believe in mass collaboration, diversity and consensus building to achieve our goals.

Wikipedia has become the fifth most-visited site in the world, used by more than 400 million people every month in more than 270 languages. We have other content projects including Wikimedia Commons, Wikidata and the most recent one, Wikivoyage. We also maintain the MediaWiki engine and a wide collection of open source software projects around it.

But there is much more we can do: stabilize infrastructure, increase participation, improve quality, increase reach, encourage innovation.

You can help to these goals in many ways.


  • (Automation Tool) Google Books > Internet Archive > Commons upload cycle Wikisources all around the world use heavily Google-Books digitizations for transcription and proofreading. Currently the users have to manually download a book from GB, then upload them to IA(if they want to preserve) or directly upload to Wikimedia-Commons(again manual task) with appropriate meta-data. ->This project focuses on automating all of it! The user will just have to give appropriate url(or identifier) for the book(s) they wish to upload, and all other task is just automated.
  • A modern, scalable and attractive skin for MediaWiki I plan on building a modern, scalable and attractive skin for MediaWiki based on wikiHow's current default skin, which is available under GPLv2. The skin, tentatively named BlueSky, will feature elements of modern design, including a fixed personal header with descriptive icons. The target product will be fully functional on less powerful or older devices and browsers, such as mobile browsers and Internet Explorer 8.
  • Adding proper email bounce handling to MediaWiki (with VERP) Improperly handled email bounces can cause for a lot of resource wastage, specially for a vast community like Wikipedia which handles tonnes of mails daily. A return bounce is generated to the sender address ( on an email delivery failure and currently, its forwarded to /dev/null. VERP implementation can help in finding out and un-confirming email address of failing recipients.
  • Annotation Tool that extracts information and feed them on Wikidata Project aims to improve the user interactivity with Wikidata and create a whole new world of data sharing and saving by creating a tool that on highlighting a statement would provide a GUI to fix its structure then feed it to Wikidata. Wikidata centralizes access to and structurally manage data so that every piece of data is easily available and accessible. By the means of the plugin people can save their important notes and quotes directly on Wikidata hence making it more accessible.
  • Automatic cross-language screenshots for user documentation MediaWiki's user guide for core and those for extensions (like VisualEditor) each have a large number of images illustrating functions. However, these often date quickly as the software changes, and are generally only available in English, which means that non-English users are not as well served. Hence my GSOC project deals with creation of automated system which capture the current look of the software with screenshots across the hundreds of languages that MediaWiki supports.
  • Book management in Wikibooks/Wikisource This project focuses mainly on improving user experience for BookManager extension. It will provide an easy-to-use tool to edit, view and manage books.
  • Catalog for mediawiki extensions MediaWiki can be enhanced with additional functionality by adding extensions. There are currently about 2000 extensions available on However, it is hard to identify and assess which extension fits a particular need. The aim of this project is to address this situation by adding functionality such as a rating review system, data syndication and collection to enhance, which maintains structured data of all the extensions correctly.
  • Chemical Markup support for Commons or MediaWiki or both Currently, MediaWiki is unable to render Chemical Markup Files such as MDL molfiles. The goal I pursue with this project is to allow users at Wikimedia Commons to either upload or draw and upload Chemical markup.
  • Critical bug fixes for Translate extension This project aims at resolution of some critical bugs and some feature improvements in the Translate extension. These bugs limit the extension’s usability and also make its use inconvenient in a lot of cases. •Make page title translation optional •Improvement for core- Inability to set page language while page creation. •Fix non-updating of translation pages on deletion of translation units •Redesign of interface of Translate extension •Adding features to Special:AggregateGroups
  • LUv2: Generic, efficient localisation update service Generic, efficient Localisation Update service. The service would keep track of translation updates in way which allows clients to only request a delta of changes since the last update
  • MassMessage page input list improvements Currently, the MassMessage extension reads the list of pages to which a message should be delivered from a wikitext page containing a list of parser functions. This is not user friendly, and also technically undesirable since there is no enforced structure for the pages. This proposal seeks to implement a ContentHandler-based backend to replace the current system, and to create a usable frontend for managing the lists stored using the new backend.
  • Parsoid-based online-detection of broken wikitext This GSOC project aims to detect broken and deprecated wikitext found on wiki pages and in some cases, possible fixups, using Parsoid. During parsing, Parsoid has access to this information that can help wiki editors know where broken wikitext is and how they can fix it. This tool might be quite useful for the community by communicating this information to wiki editors. This tool will also help Parsoid developers to collect statistics about use of templates in balanced / unbalanced contexts.
  • Separating skins from core MediaWiki I intend to separate core skins out of MediaWiki, removing cross-dependencies with MediaWiki itself, and making it possible for non-core skins to have all of the capabilities of core ones. As a prerequisite, I wish to simplify the skin creation, packaging and (un)installation. This would make the lives of both skin creators and site administrators wishing to use a non-default skin a lot easier. If everything goes well, the core skins will be moved out of core, to separate git repositories.
  • Switching Semantic Forms Autocompletion to Select2 Autocompletion of fields in Semantic Forms is one of the most important feature which allows users to see what the previously-entered values were for a given field, this greatly helps to avoid issues of naming ambiguity, spelling, etc. The goal of this project is to switch the currently available autocompletion in Semantic Forms from jQuery UI Autocomplete to Select2 which is much more complete library for autocompletion.
  • Tools for mass migration of legacy translated wiki content The MediaWiki Translate extension provides an interface for translation of wiki pages. But it takes a lot of effort for the page under question to be converted into a format that would be recognized by the Translate extension. The process of preparing the page for translation needs to take into consideration various markups. Plus, wikis have a lot of legacy content needing translation. With this motivation, the project aims to automate the entire process and save manual effort.
  • UniversalLanguageSelector fonts for Chinese wikis Chinese uses more than 80000 characters, and 70217 are included in Unicode 5.0. However, only 3500 of them are used in our daily life. Most of the rarely used characters are not often installed on readers' systems. So we are sure to meet tofu problems, and webfonts service is triggered. However, including all characters in the font file makes it huge. We may want to tailor the font file for every page based on characters used on that page.