Improve support for non-latin languages in Mapnik text rendering
Short description: Mapnik has dozens of bugs related to rendering right-to-left (RTL) languages, Unicode and text rendering in general: https://github.com/mapnik/mapnik/wiki/InternationalText These bugs are critical in that they pose to hold back the adoption of OSM in various parts of the world and none of the main Mapnik developers have the unicode experience necessary to solve them. So my job would be to rewrite this part of mapnik.
Name: Hermann Kraus
School and degree: University of Ratisbona, Physics
Mapnik is OSM's main renderer and is used by many other projects to render OSM data. Mapnik has very poor support for rendering text in non-latin languages related to algorithms which lack robustness in the face of complex Unicode and RTL text. The problem is the placement finder which is the part of Mapnik which tries to determine where text should go. This involves selecting the fonts, calculating glyph position, rotation, etc. Basically all work that is required to render text except actually drawing the pixels is done there.
Mapnik has dozens of bugs related to rendering right-to-left (RTL) languages, Unicode and text rendering in general:
These bugs are critical in that they pose to hold back the adoption of OSM in various parts of the world and none of the main Mapnik developers have the unicode experience necessary to solve them.
With great effort it might be possible to fix all these bugs individually, but the main problem is the placement finder code which has become to large and complicated to maintain. So my job would be to develop a robust set of testcases for the placement find for current behavior (to prevent regressions) and then set off to rewrite the placement finder code taking into consideration all the unicode and RTL issues. I will try to find a solution that is easily maintainable and provides a base for further enhancements in the future.
How will OSM benefit from this?
OpenStreetMap will get better support for non-latin languages. This will motivate more users to participate, which in turn will result in better maps.
Also it will allow fixing some long standing bugs more easily and help rendering the best maps possible from OSM's data.
What will be done?
- Building an extensive set of test cases for the Unicode/RTL bugs before fixing.
- Develop a listing of good open source fonts to use for various languages.
- Developing a list of collaborators and native speakers in the OSM community who can review my fixes.
- Creating a robust visual testing framework for placement/text rendering in Mapnik (to ensure no regressions).
- Creating a robust performance testing framework for placement/text rendering in Mapnik (to ensure no regressions).
- Research the usage of ICU, Harfbuzz, Pango and other supporting libraries and document on the Mapnik wiki their relevance to Mapnik.
- Research how various open source browsers support unicode text and document on the Mapnik wiki their relevance to Mapnik
- Rewrite the placement finder to properly handle unicode data for all placement methods (point,line,vertex,etc).
- Fix as many of the Unicode/RTL bugs during refactoring and develop plans for addressing the remainder.
- Document the new code to help keep it maintainable.
List of bugs related to this work
On the mailing list there was also a request for multi line text rendered on lines, but there is no bug report for it yet.
I can't promise to fix each and every bug from this list, but I will try to close as many as possible.
Community Bonding Period:
I already know most people involved in Mapnik development, having done a GSOC project last year as well. I plan to use this time to learn about selecting correct glyphs for various languages and to study the complex elements of unicode rendering. I will also try to find native speakers from the OSM community to review my patches.
May 21: Students begin coding for their GSoC projects
During the first part I will rewrite placement finder with at least the same functionality as it has currently but with easier maintainable code and keeping in mind the additional requirements from the bugs mentioned above. I will document the functions I write.
July 9: Mentors and students can begin submitting mid-term evaluations.
During the second part I will add code for correct handling of non-latin languages and I will fix as many bugs as possible (both those from the list above as well as new ones reported during Summer of Code). For each bug fixed I will create a test case.
August 20: Firm 'pencils down' date.
This timeline is intentionally not very detailed as experience from last years has shown that accurate estimates for small sub-tasks are hard and that the complete plan can shift when new ideas are introduced. Having completed all Summer of Code projects I started should be proof that I'm able to manage the available time.
I already took part in 6 previous Summer of Codes and completed all my projects successfully.
2006: (Bazaar) Sending patches by mail. Standalone plugin
2007: (Scribus) Adding LaTeX support. This evolved to a much larger function after SoC, now supporting many different applications. Very positive feedback from users and even mentioned in some magazines, etc.
2008: (Scribus) Use GraphicsMagick and UniConvertor for import of many fileformats. Included in version 1.3.6.
2009: (OpenStreetMap) Process SRTM data, to produce information about altitude differences along ways. This data can be used to improve bicycle routing, etc.
2010: (Mapnik) Metawriters. Output meta data about features. Live demo at http://r2d2.stefanm.com/mapnik/demo.html
2011: (Mapnik) Support for text formatting and alternative placements. Examples can be found at https://github.com/mapnik/mapnik/tree/master/tests/visual_tests/images
I have a deep knowledge of mapnik's internals as I've touched nearly all code that has to do with text rendering and I have rewritten some parts of it.
Over the past two years I've become quite familiar with Mapnik and I have seen quite a few bug reports relating to this proposal and I already fixed at least one of them. But the way Unicode handling currently is implemented fixing a bug often means creating a workaround for something broken by the last workaround. I also fixed bug #608 recently. It was a long standing issue but the fix was trivial. It's just the fact that the placement finder code is so complicated nobody wants to change it to avoid breaking this. I want to fix the underlying problem so Mapnik's development can go on faster.