GSoC/GCI Archive
Google Summer of Code 2010

Apertium

Web Page: http://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code

Mailing List: apertium-stuff@lists.sourceforge.net


  • The Apertium project develops a free/open-source platform for machine translation and language technology. We try and focus our efforts on lesser-resourced and marginalised languages, but also work with larger languages.
  • The platform, including data for a large number of language pairs, a translation engine and auxiliary tools is being developed around the world, largely in universities and companies (e.g. Prompsit Language Engineering), but also independent free-software developers play a huge role.
  • There are currently 23 published language pairs within the project (including a number of "firsts" — for example Spanish—Occitan, Breton—French, and Basque—Spanish among others), and several more in development.

Projects

  • Apertium-fin-sme: machine translation between Finnish and Northern Sámi Apertium-fin-sme is project that uses already existing morphological analysis tools and constraint grammars in Finnish and Northern Sámi to produce a language pair within Apertium to translate from Finnish to Northern Sámi.
  • Easy dictionary maintenance The idea is to develop a GUI tool to manage Apertium Monolingual and Bilingual XML files with the follow objectives: • Create a alternative form to edit dix files with GUI resources. • Develop, initially, monolingual dictionaries but keeping the particular format of each file. • Minimize the direct manipulation of XML files, providing features that reduce this need. • Making use of DixTools to keep code reuse.
  • French-Portuguese language pair for Apertium Creation of a French-Portuguese language pair for Apertium, including monolingual and bilingual dictionaries, transfer rules, testing, and documentation.
  • Improving multiword support in Apertium Natural languages can have lexical units which consist of two or more separate words. To handle these lexical units in apertium the concept of multiwords is used. Because the ways in which languages use multiword constructs are so varied, only some cases can be handled with the current dictionary syntax and implementation in apertium. This project aims at extending multiword support in Apertium so that two more major types of multiwords can be handled.
  • Java Runtime Port A proposal for a Java port of the Apertium runtime. The reasons for undertaking such a project include making it easier to run on non-*nix platforms, enabling it to be run in embedded spaces, and encouraging more involvement in the engine's development at a low level.
  • Morphology with HFST This project is the development a new tool for doing morphological analysis and generation, integrated well into the Apertium pipeline, and based on the Helsinki Finite State Toolkit (HFST). It will function as an alternate for Apertium's lt-proc tool. This will provide Apertium with the ability to handle languages whose morphology is too complicated for lttoolbox to deal with, reducing the barrier to creating new language pairs for languages with freely available HFST-compatible data.
  • Polish-Czech language pair machine translation for Apertium The project aims at creating a two-way shallow transfer machine translation system between Polish and Czech build on Apertium platform.
  • Web-based advanced translation environment Building advanced post-editing tools for checking and improving the precision of Apertium translations, and include them into a user-friendly web interface to make common users able to use it and suggest their own corrections.