GSoC/GCI Archive
Google Summer of Code 2014

Xapian Search Engine Library

License: GNU General Public License (GPL)

Web Page: http://trac.xapian.org/wiki/GSoCProjectIdeas

Mailing List: http://trac.xapian.org/wiki/GSoC_Mailing_List

[IMAGE http://xapian.org/xapian-logo.png]

Xapian is a Search Engine Library which aims to be fast, scalable, and flexible. It's used by many organizations around the world, including Debian, Gmane, One Laptop per Child, and Ubuntu. It supports ranking by TF-IDF, probabilistic schemes, and Divergence from Randomness, plus a rich set of boolean query operators. The core library is written in C++, with bindings to allow use from many other languages.

You can:

Projects

  • GSoC 2014 Proposal for Xapian‘s Learning to Rank project by Hanxiao Sun In my proposal of GSoC 2014, I am applying for Xapian‘s Learning to Rank project. During this summer, I plan to complete three tasks: 1.to optimize the existing three algorithms(refactor the code) 2.to complete a test framework for learning to rank algorithm 3.to complete a feature selection algorithm for ranking
  • Learning to Rank The goal of this project is to bring a stable letor module for xapian, which will basically include C++ implementation with good code style, detailed documents and a simple but extensible sample program.
  • Posting list encoding improvements In Xapian, storing a list( post list ) for a specific term is an important part.Current approach is not so ideal, so I come up with some ways to improve the encoding of the post list.Linear searching is used in some part of Xapian now, I'm going to replace it with a skip list or hashing. Storing a list ( position list ) of the positions where a term appear in a document is also of great importance. I'd like to use dynamic encoding in place of the interpolative encoding.