Django CMS replacement for the www.gentoo.org website

Theo Chatzimichos

Abstract

I propose to implement a replacement for the gentoo.org web pages, that include the documentation, project, news and other informational pages, via a Django CMS. Additional Django/Python plugins will be used, e.g. for LDAP integration. The Beacon editor, a former GSOC project, will also be adapted to the project as a WYSIWYG editor for the Gentoo-specific XML schemas. Various features that will improve the documentation handling are going to be implemented.

Additional Information

Abstract:
I propose to implement a replacement for the gentoo.org web pages, that include the documentation, project, news and other informational pages, via a Django CMS. Additional Django/Python plugins will be used, e.g. for LDAP integration. The Beacon editor, a former GSOC project, will also be adapted to the project as a WYSIWYG editor for the Gentoo-specific XML schemas. Various features that will improve the documentation handling are going to be implemented. Those features will help not only the Gentoo Documentation Team to easily fix issues or enhance the content, but also translators. In addition, statistics and reports for translated documentation will be provided. Community members with user accounts and official developers can report bugs through bugzilla for documentation updates/corrections, and the system will automatically provide a diff against the current file directly. As our XML files are currently stored in a CVS repository, soon to be moved to Git, the application will provide support for a Git backend. Apart from documentation, which is the most important part and the original motivation of this proposal, the web application, using some Django/Python libraries, will provide LDAP integration, extending the current Gentoo LDAP replication for both user and developer profiles. Last but not least, a decent System Admin, for different needs (e.g. sysadmins, documentation moderators, webmasters), is a must.

Objective: 
Currently Gentoo employs a number of websites with separate user accounts. LDAP servers are used for the developers only. Developers update their LDAP data through a perl script. Complaints about difficult handling of forgotten passwords occur frequently. A web interface around LDAP will make LDAP more user friendly, plus it can be easily extended to support non-developers, just by adding a new schema and OU (organizational unit) for them. The choice of Django was made for several reasons. It is a web framework based on Python, that is a programming language very easy to learn and maintain. A Django project is equally easy to maintain and extend. In Django parlance, the term "project" refers to a collection of pluggable, modular applications, that together form a more complex web application, the project. This modularity will let the Gentoo Community easily write new Django applications on top of the main instance, in order to enhance its features. Apart from the above, the Django and Python upstream communities are doing a great job, notably by releasing often, especially for security reasons.
Today, documentation is stored in XML files in a CVS repository. An XML to HTML converter (wrapped up with Ruby), called Gorg, is currently used to display the actual result. People who want to contribute to documentation or translations have to locally install a Gorg instance and check out the CVS repository, which can be rather cumbersome. Having a WYSIWYG editor embedded in the gentoo.org website would be preferred, as it will make it really easy to contribute to documentation, or even send improvements through bugzilla to the appropriate documentation or translation team. By all appearances Gorg has been abandoned as a project. This has resulted in difficulty in maintaining the Gentoo website, and also makes the work of updating the website's design or functionality all the more troublesome. Therefore, a complete redesign is easier than resurrecting the old system. The overall end result of the new system will be a complete rewrite of the old Gentoo website, including everything under www.gentoo.org

Problems: 
As stated before, Gorg as an abandoned project is suffering from bugs that get pilled up, not to mention possible security risks that may arise. Apart from Gorg itself, having to edit XML files directly is a hard job, as XML files are not to be read by humans. The lack of a WYSIWYG editor is obvious between the Gentoo developer community. Very often developers were asking for Documentation Team members to edit their project space or various docs, since they declare they have no idea about guideXML.
Other alternatives that have been brought to table were to replace Gorg with another famous CMS like Drupal or a wiki service. Since our website is complex enough, we would have to write a lot of modules for that CMS. The maintenance will be too hard though, since those applications are widely known for their frequent security updates, including their plugins as well. Apart from that, we would have to update our own plugins pretty often as well, in order to make sure that they are always compatible with the latest version. PHP itself is also not the best choice nowadays, given the many security issues it has, while Python is the preferred language amongst Gentoo contributors (used in Portage, eix and other tools, plus it is the main language used by the Gentoo Infrastructure Team for its custom scripts).

Reports: 
During the GSOC period, it is an absolute need to keep a journal of the work, in order to closely track the progress. I'll provide weekly blog posts for that reason. Apart from that, I will take daily notes, in form of a journal, that will help me personally to see the progress of the application. Those notes will not be code related, as Git is perfect for that reason. I will be keeping mostly performance and benchmarking notes, that will allow me to see which ones of my improvements were more efficient, and improve the code accordingly.

Deliverables: 
The project will consist of the following in general:
  • A complete web application using Django/Python, HTML, CSS, JavaScript/jQuery, including a System Admin, plus tests. It will replace all the web pages under www.gentoo.org
  • Instead of a build system, as part of this project I will create several ebuilds and distutils/setup.py scripts. The ebuilds will take care of all the dependencies, install the files in the appropriate places, will be able to run the tests that I am going to deliver with the source code, and have a post install message with sufficient final installation instructions. The main ebuild will use webapp-config for the static media, for easy development and production deployment. It will be used and updated for every step since it will let run the tests. The test output will be stored in MongoDB in 3 lines of python (as part of the test runner). The setup.py scripts will allow installation through Python's easy_install tool.
  • A custom LDAP authentication backend for Django, that will ensure that data between the LDAP and the Django DB are in sync. More specifically, the Django DB will be used to cache a number of LDAP data to make the web application faster.
  • Support for all our XML languages that Beacon lacks (like projXML) will be added
  • The current XML documentation files will be migrated from CVS to Git, and displayed through the new Django application.
  • Documentation. I'll provide developer docs, generated with sphinx, plus end-user docs and installation instructions.
  • The current Gentoo LDAP configuration will be extended by adding support for regular users, since it now includes only developer accounts.
  • The project will be released under AGPLv3 license.
Major Features:
  • A new user registration system, that will import the data in the LDAP server.
  • A System Admin, with support of different group permissions, based on the current LDAP ACL, e.g. recruiters, sysadmins, documentation moderators, Gentoo developers, users. For more strict LDAP related operations, the Django Admin will be used, as it will be better especially for following the LDAP ACL. The Django Admin will be used mainly by privileged users, like Infra and Recruiters. Anyone else will be able to view and edit his data through his account profile page and a custom made System Admin. There will be no required data for users apart from a primary email address and a nickname/password.
  • A syndicator-like frontpage, that will display recent GLSAs, planet posts, PR team's news.
  • A continuous integration testing model. There will be automated regression testing, by using Portage to build and test the package on a regular basis.
  • Support for viewing XML documents.
  • Beacon, a WYSIWYG editor, for easily editing those XML documents.
  • Git backend support for storing the XML files, using a dummy Git account.
  • An easy to translate interface. Translators will be able to select a document and translate it using Beacon.
  • Statistics for translators.
  • A "send for review" button, so that end users will be able to file a bug to Gentoo Documentation team attaching the XML diff. It needs a custom pybugz script to file the bug. The same system will be used for translations.
  • A new improved look, using JavaScript/jQuery. On this area, there will be strict adherence to web standards and accessibility best practices, since the Gentoo website and Documentation pages are being viewed by too many people every day.
  • An improved devmap, where developers will be able to click their position on the map.
Expected outcome:
The project's primary goal is to create a replacement for the current www.gentoo.org web application that will be user friendly, extensible and easy to maintain. As stated before, Django's modularity and extensibility is a perfect motivation to import all the custom Gentoo webspaces under a common umbrella, plus create new ones directly there (e.g. a calendar web app as a replacement to the Google calendar we currently use, packages.gentoo.org or the newly created qa-reports.g.o website). This way we will have a consistent web interface between our various web applications. Apart from that, this project will finally make LDAP user friendly, and take full advantage of its features. Given the access as member of the Gentoo Infra Team, I will import the web application in Cfengine in order to fire it up (Cfengine is a centralized configuration management tool used in Gentoo servers). All our web applications will be integrated with the LDAP replication, so that everyone (users and developers) will have a unified account amongst *gentoo.org websites.

Long-Term Maintainance:
A custom CMS will always need a Gentoo guy to do the updates and bug fixes. This is a good thing, as the Django web framework (and Python itself) forces the developer to always write a clean code, that will make the maintenance an easy task. In the immediate future following its deployment, the maintenance will fall on the project's originator. To insure long term maintenance and extension, the goal is to make it an accessible project and encourage contributors. The use of Python and Django will facilitate this endeavor, given their popularity and ease of development, and general use within Gentoo. A dedicated team would be the perfect way to ensure its ongoing progress. The installation documentation, plus Git, will be very helpful for external contributors in order to have a working installation locally without having to mess around with e.g. LDAP configuration, which can get very tricky. Django's embedded server is also a plus, allowing people to do their various contributions without a web server, using just a default SQLite3 in their system.

Timeline:
  • April 25 ~ May 23 (before the coding begins):
    • Mentor / mentee bonding period.
    • Get familiar with the beacon API.
    • Create a preliminary design based on the www-redesign work.
    • Take a look at LDAP + SSL and various security related docs.
    • Study.
  • Target 1 (2 Weeks):
    • Develop the registration / login systems and account profiles with the LDAP server.
    • Work on the System Admin and account profile page.
    • Create the ebuilds and distutils packages. They will be used and updated for every step in order to run tests.
  • Target 2 (1 Week):
    • Develop the automated testing model.
    • Work on sphinx and generated documentation.
  • Target 3 (3 Weeks):
    • Develop the XML parser to view XMLs as HTML pages.
    • Create the syndicator-like front page.
    • Make sure all web pages under www.gentoo.org are imported in the new system (like GLSA's, news, Documentation, project pages).
  • Midterm: a basic website is ready. At this point, only the following actions will be available:
    • View the documents (not edit them yet).
    • Register (using a dummy LDAP server).
    • Log in, view and edit the data of the dummy LDAP server
  • Target 4 (1 Week):
    • Integrate Beacon with the application, add support for any additional XML needed.
  • Target 5 (1 Week):
    • Develop the Git backend support to commit the changes in the XML files.
    • Write a pybugz script to provide the ability for users to report documentation patches.
  • Target 6 (1 Week):
    • Develop the translation interface, where translators will be able to select a new or existing file and edit it through Beacon.
    • Create a script to generate a statistics page for translators.
  • Target 7 (3 Weeks):
    • Site becomes available for beta testing in a test domain.
    • Finish the documentation.
    • Various cleanup based on benchmarking and beta testing:
      • Possible additional web design work.
      • Code cleanup / improvement based on that benchmarking.
Total = 12 Weeks
Biography:
My name is Theo (Theodoros) Chatzimichos. I am an undergraduate student at the department of Computer Science and Telecommunications, Technical Educational Institute of Larissa, Greece.

I am an official and active Gentoo Developer since February 2008, currently the KDE Team leader, member of the Qt Team and of the Infrastructure project, mainly administering Planet/Blogs, Git/SVN overlays and Mirrors requests.
Gentoo was one of my first distros, as it was the only one providing KDE4 packages back in the days. I quickly became affiliated with an unofficial overlay (called kdesvn-portage), and after some time I became a member of the official KDE Team. My contributions to KDE involve upstream as well, I have developer access in the upstream KDE repositories which makes it easier for us to push upstream our patches, especially regarding the build system.

While KDE/Qt is my main interest inside Gentoo, I recently joined the Infrastructure project. One of my biggest contributions there was to migrate the developer blogs from b2evolution to Wordpress, and also migrate planet and blogs services to Cfengine-controlled boxes, making them easier to administer. My Infra position gives me the ability to install and further maintain the web application, plus co-ordinate its extensibility along with the current web / docs / other related teams.

Apart from my Gentoo and KDE contributions, I am also a core member of my college's LinuxTeam. I am the lead sysadmin of our servers (services include Gitolite, Drupal, various other webapps, backups); our main goal is to promote Linux in our school by organizing install fests and conferences. Apart from the promoting part, we always try to organize coding sessions on various projects. Several applications that were part of a member's thesis were later expanded as LinuxTeam projects, just like mine. My thesis was a Django project, with LDAP authentication, that was able to store all the accounts a student has in millions web spaces used in our school. Additionally, it provided user accounts and could display info in a consistent web space (info like selected announcements, projects, emails, a list of teachers and their mails and personal info like his grades). My thesis is still online under http://cronos.teilar.gr which is now under the LinuxTeam administration, and there have been already new feature additions. I spent a year on that project as part of my thesis, plus almost another year as my pet project, plus recruiting others to take over various parts, and I am still spending time on it for various bug fixes every now and then.

Code samples