A recent approach to the popular extractive question answering (extractive QA) task that generates its own training data instead of requiring existing annotated question answering examples. Extractive QA is a popular task for natural language processing (NLP) research, where models must extract a short snippet from a document in order to answer a natural language question. Though supervised models perform well at extractive QA, they require thousands — sometimes hundreds of thousands — of annotated examples for training, and their performance suffers when tested outside of the textual domains and language they were trained on. By approaching extractive QA as a self-supervised task, our technique outperformed early supervised models on the widely used SQuAD dataset while requiring no annotated question answering training data. The code for our method is now available to download.
Our two-step method starts by training a model to create fill-in-the-blank (also referred to as cloze) questions from sample documents. This generation pipeline consists of first identifying potential answers from text, then formulating a cloze question, and finally reframing that question in natural language. For example, the model could be presented with this text:
The Broncos took an early lead in Super Bowl 50 and never trailed. [...] Denver linebacker Von Miller was named Super Bowl MVP, recording five solo tackles, two and a half sacks, and two forced fumbles.
The system might first identify “Broncos,” “Denver” or various numbers (such as “five” and “two”) as probable answers. For the answer “Broncos,” the model would create the cloze question “The _____ took an early lead in Super Bowl 50 and never trailed,” followed by a final, non-cloze version of the question: “Who took an early lead in Super Bowl 50?”
In our method’s second step, we take a standard extractive QA model architecture, which usually requires human-annotated QA data to train on, and instead train it with data from our question-generating model. To evaluate our approach, we measured the resulting model’s performance on test data from the SQuAD benchmark and found that it scored 56.4 F1, beating an early supervised model.
Our results demonstrate that self-supervised extractive QA is not only achievable but already competitive with some supervised systems. And since our two-step method is able to generate its own training examples without requiring existing annotated training data in a specific domain or language, this work can bring us closer to creating extractive QA models that can generalize to more kinds of tasks and work with more languages, potentially increasing the accessibility of virtual assistant systems. By releasing the code for our technique, we believe this research will contribute to Facebook AI’s broader efforts to advance the state of self-supervised learning, and also help the wider AI community explore methods that are less reliant on resource-intensive annotated datasets.