What can we learn about the author of an anonymous text?
In the standard authorship attribution problem, we are told that the author of an anonymous document is one of a given set of suspects and are asked to choose the likeliest candidate among them based on their respective known writings. Posed this way, the problem is a reasonably straightforward text categorization problem and most of the usual tricks (which I will briefly review) apply. In the real world, though, we usual face one of two harder variations.
In the first variation, we have an anonymous text but no suspect authors at all and we are asked to profile the author. That is, we wish to determine, for example, the author's age, gender, linguistic background, etc. I will show how this can be done with a focus on the issue of gender (taking the opportunity to wash my hands clean of some of the dumb hype surrounding this work).
In the second variation, the problem of authorship verification, we are given the known writing of a single author and are asked to determine if this author is also the author of a given anonymous text. I will introduce a new meta-learning technique for solving this problem, demonstrate its effectiveness on various classic books, and use it to solve at least one real-life literary mystery.