Quantitative Analysis of Literary Styles

Roger Peng, Nicolas Hengartner
Writers are often viewed as having an inherent style which can serve as a literary fingerprint. By quantifying relevant features related to literary style, one may hope to classify written works and even attribute authorship to newly discovered texts. Beyond its intrinsic interest, the study of literary styles presents the opportunity to introduce and motivate many standard multivariate statistical techniques. Today the statistical analysis of literary styles is made much simpler by the wealth of real data readily available from the Internet. This paper presents an overview and brief history of the analysis of literary styles. In addition we use canonical discriminant analysis and principal component analysis to identify structure in the data and distinguish authorship.