How well do neural NLP systems generalize?

Event Description

Title: How well do neural NLP systems generalize?


Neural networks have rapidly become central to natural language processing systems. While such systems perform well on typical test set examples, their behavior often diverges from human behavior in unexpected ways. In this talk, I will argue that a focus on interpretable axes of generalization from the training set, rather than on average test set performance, can help us better characterize and address the gaps between the language understanding abilities of neural systems and those of humans.

I will show that recurrent neural network (RNN) language models are able to process syntactic dependencies in typical sentences with considerable success, but when evaluated on syntactically controlled materials, their error rate increases sharply. Likewise, a standard natural language inference model — BERT fine-tuned on the MNLI corpus — achieves high accuracy on the test set, but shows little sensitivity to syntactic structure; instead, the model relies on word overlap between the two sentences of the inference problem, and concludes, for example, that “the doctor visited the lawyer” entails “the lawyer visited the doctor”. Our focus on generalization exposes substantial variability in the linguistic knowledge acquired by the model as a function of its initial weights, and suggests targeted data augmentation methods for increasing the robustness of language understanding systems.


Tal Linzen is an Assistant Professor of Cognitive Science at Johns Hopkins University (with a joint appointment in Computer Science). Before moving to Johns Hopkins, he was a PhD student at New York University and a postdoctoral researcher at the École Normale Supérieure in Paris. At Johns Hopkins, Tal directs the Computation and Psycholinguistics Lab, which develops computational models of human language comprehension and acquisition, as well as psycholinguistically-informed methods for interpreting, evaluating and extending neural network models for natural language processing. The lab’s work has appeared in conferences such as ACL, EMNLP and ICLR, as well as journals such as Cognitive Science and Journal of Neuroscience.

Location TBD