Robust Authorship Verification with Transfer Learning
Published in CICLing 2019, 2019
Recommended citation: Dainis Boumber, Yifan Zhang, Marjan Hosseinia, Arjun Mukherjee, and Ricardo Vilalta. "Robust Authorship Verification with Transfer Learning", Proceedings of the 20th International Computational Linguistics and Intelligent Text Processing Conference, CICLing 2019,, La Rochelle, France, April 7-13, 2019.
Excerpt
We address the problem of open-set authorship verification, a classification task that consists of attributing texts of unknown authorship to a given author when the unknown documents in the test set are excluded from the training set. We present an end-to-end model-building process that is universally applicable to a wide variety of corpora, and requires little to no modification or fine-tuning. It relies on transfer learning of a deep language model, using a generative adversarial network and a number of text augmentation techniques to improve the model’s generalization ability. The language model encodes documents into a domain-invariant space, aligning document pairs as input to the classifier, while keeping them separate. The resulting embeddings are used as input to an ensemble of recurrent and quasi-recurrent neural networks classifiers. The entire pipeline is bidirectional; forward and backward pass results are averaged. We perform experiments on four traditional authorship verification datasets, a collection of machine learning papers collected from the web, and a large Amazon-Reviews dataset. Experimental results surpass baseline and current state-of-the-art techniques, validating our proposed approach.
Recommended citation:
Dainis Boumber, Yifan Zhang, Marjan Hosseinia, Arjun Mukherjee, and Ricardo Vilalta. "Robust Authorship Verification with Transfer Learning", Proceedings of the 20th International Computational Linguistics and Intelligent Text Processing Conference, CICLing 2019,, La Rochelle, France, April 7-13, 2019.