Quantifying language change 15:10 Fri 1 Jun, 2018 :: Horace Lamb 1022 :: A/Prof Eduardo Altmann :: University of Sydney
Mathematical methods to study natural language are increasingly important because of the ubiquity of textual data in the Internet. In this talk I will discuss mathematical models and statistical methods to quantify the variability of language, with focus on two problems: (i) How the vocabulary of languages changed over the last centuries? (ii) How the language of scientific disciplines relate to each other and evolved in the last decades? One of the main challenges of these analyses stem from universal properties of word frequencies, which show high temporal variability and are fat-tailed distributed. The later feature dramatically affects the statistical properties of entropy-based estimators, which motivates us to compare vocabularies using a generalized Jenson-Shannon divergence (obtained from entropies of order alpha).