GSoC/GCI Archive
Google Summer of Code 2010

NESCent - National Evolutionary Synthesis Center

Web Page: http://hackathon.nescent.org/Phyloinformatics_Summer_of_Code_2010

Mailing List:

NESCent facilitates synthetic research on grand challenge questions in evolutionary biology and also works to address critical needs in software infrastructure and education through promoting open, collaborative development of interoperable and standards-supporting open-source software. The Center is located in Durham, North Carolina, is jointly operated by Duke University, the University of North Carolina at Chapel Hill, and North Carolina State University, and receives its core funding from the National Science Foundation (NSF). NESCent has so far run four collaborative software source code and vocabulary development sprints aimed at improving interoperability in phyloinformatics, engaging developers of scientific software tools, integration among online data resources, and sustaining the development of shared vocabularies. These events, and our past Summer of Code participations, continue to have significant and lasting impacts on the landscape of collaborative software development in our field. The Center is committed to FLOSS and sharing of scientific data (see for example the NESCent Data and Software Policy at http://www.nescent.org/informatics/data_software_policy.php); all software products of the Center are released as open source and established as collaborative projects on sites such as SourceForge, Google Code, and GitHub. Members of the Center's Informatics team are lead developers in several open-source projects, and one of our organization administrators has been active for seven years on the Board of the Open Bioinformatics Foundation (http://www.open-bio.org/), the umbrella organization for the Bio* projects.

Projects

  • Ancestral State Reconstruction in R The goal of this project is the integration of important phylogenetic tools for ancestral state reconstruction and rate testing into a commonly used scripting environment (R). R is an ideal place to port the functions contained in Brownie due to it's robust linkage to C/C++ (innately and through external packages) and because of it's potential to act as a staging area for all steps in phylogenetic data analysis, from reading in genetic data to running analyses to plotting the results.
  • Develop an API for NeXML I/O, and, RDF triples for BioRuby NeXML, is a data exchange format for phylogenetics, developed to overcome the problems posed by the existing formats. RDF is a data model used to represent metadata. NeXML by design supports the concept of RDF triples. The goal of the project is to: 1. Implement a NeXML parser/serialzer in BioRuby. 2. Develop an API to express RDF triples in BioRuby.
  • Extending Jalview Capabilities to Support RNA Sequence Alignment Annotation and Secondary Structure Visualization The overall goal of this project is to extend many of the useful features Jalview has for protein sequence alignments to support RNA sequence analysis. By adding more parsing and analysis of the information in Stockholm files, I can add support for RNA secondary structure alignment annotation, such as coloring schemes. Other features, such as an embedded RNA secondary structure viewer and the ability to import existing RNA sequences and alignments from the Rfam database will also be added.
  • Galaxy phylogenetics pipeline development Galaxy is a popular web based interface for integrating biological tools and analysis pipelines. HyPhy provides a popular package for molecular evolution and sequence statistical analysis. Integrating those tools will make Galaxy even more powerful and useful. Hence probably increase its popularity. The goal of this project is to integrate the relevant tools into Galaxy. Functional tests will be developed for tools and workflows, along with high level documentation for end users.
  • Georeferencing Library Implemented in Java The goal of this project is to create a georeferencing library for parsing and displaying geographical and phylogenetic information. The main aim of the project is to bring together geographical and phylogenetic information in a way that is usable and useful to the user. Given an input of a tree file and geographical information the library will return a tree with phylogenetic and geographical information in a format such as KML or shapefile or a georeferenced phylogenetic format such as neXML.