GSoC/GCI Archive
Google Summer of Code 2014 Google Open Source Programs Office

Topic modeling LaTeX equations on the arXiv

by Jaan Altosaar for Google Open Source Programs Office

Exposing scientists to alternate mathematical descriptions of problems they are working on has the potential to accelerate research. This necessitates incorporating mathematics into current topic modeling approaches such as Latent Dirichlet Allocation. By applying this approach to the arXiv's corpus of LaTeX equations, we aim to develop tools to analyze and predict historical trends of mathematical formulas in science and enhance scientific recommendation systems.