GSoC/GCI Archive
Google Summer of Code 2015

Red Hen Lab

License: GNU General Public License (GPL)

Web Page:

Mailing List:

Red Hen is an international consortium for research on multimodal communication.

Participants include faculty, staff, and students at several universities around the world, including UCLA, FAU Erlangen, Case Western Reserve University, Centro Federal de Educação Tecnológica in Rio, Oxford University, Universidad de Navarra, University of Southern Denmark, and more.

We develop open-source tools across a range of tasks, including automated data acquisition, distributed data storage, data enhancement, joint parsing of text, audio/speech, and video, statistical analysis, multimodal search engines, user interfaces, presentation tools, publishing platforms, and pedagogical applications. 

We collaboratively build and do research on a large international dataset of television news.


  • A web-based front-end for the mwetoolkit multiword expression tagger This project aims at facilitating a specific corpus annotation task, namely, tagging of multiword expressions. The goal of the project is to develop an integrated language-agnostic pipeline from user input of multiword expressions to a fully annotated corpus. The pipeline consists of the utility scripts that perform input and output conversion, the backend that communicates with the mwetoolkit (the tagger), and the frontend that allows the user to customize their tagging task.
  • Audio Analysis by Conceptors Automatic identification of voice features is generally a complex task as computers have no concepts of features such as emotions, tones or genders of the speakers. Here I propose a generic method to characterize and identify these features using Conceptors.
  • Commercial detection The project is to detect the location of known commercials in a stream of TV, regardless of the noise in transmission. In tests performed the system obtained an accuracy of 100%.