GSoC/GCI Archive
Google Summer of Code 2012 PostgreSQL Project

Document Collection Foreign Data Wrapper

by Zheng Yang for PostgreSQL Project

Document collection FDW allows users to map an entire directory of documents (e.g. Reuters Corpora RCV1) as a single foreign table in PostgreSQL database. The FDW supports building inverted index and postings file in a user specific location. And then using the index and postings file to support various types of information retrieval tasks such as boolean retrieval, vector space model (VSM) with tf-idf (term frequency - inverted document frequency) weighting schemes.