Training Deep Semantic Models: an Adversarial Transfer Learning Approach

Date:

Abstract

The NLP research community has been facing a major challenge as of late; however, it appears that it has not often discussed much in public until very recently, perhaps due to it’s somewhat embarrassing nature. The problem we are talking about is the fact that while being very much intertwined with Machine Learning, NLP research has not seen as much progress in terms of Deep Learning as ML, Computer Vision, AI, Robotics, or other fields. In fact until very recently what has been referred to as “deep models” really meant Neural Networks of various kinds; however, a CNN with 2 layers is not deep just by virtue of being a CNN. Throughout the past year, we worked on a number of such problems and developed a straightforward approach which utilizes transfer learning, generative adversarial models and adversarial discriminators, as well as a number of other tricks to make deep learning viable even when data is highly dimensional, unstructured, and lacking in volume. Simultaneously with our work, a number of publications on deep pre-trained language models appeared independent from one another, which also make deep learning in NLP a much less daunting task, albeit through different means. In this presentation, we introduce a method that relies on a language model to construct a classifier for an authorship verification problem where number of data samples is equal to number of classes, the samples belong to different domains, and are not structured in any way we can rely on. We demonstrate a robust solution to these problem types through the use of language models coupled with transfer learning, domain adaptation by way of mapping of samples into a common decision space while pushing distinct classes further apart, generation of synthetic training samples via a generative adversary, and provide a bag full of regularization tricks that worked for us. Finally, we briefly discuss experimental results used to validate the premises of our approach and compare them with what we know to be current state of the art.