NLP

CORE MACHINE LEARNING

DOBF: A Deobfuscation Pre-Training Objective for Programming Languages

November 08, 2021

Abstract

Recent advances in self-supervised learning have dramatically improved the state of the art on a wide variety of tasks. However, research in language model pre-training has mostly focused on natural languages, and it is unclear whether models like BERT and its variants provide the best pre-training when applied to other modalities, such as source code. In this paper, we introduce a new pre-training objective, DOBF, that leverages the structural aspect of programming languages and pre-trains a model to recover the original version of obfuscated source code. We show that models pre-trained with DOBF significantly outperform existing approaches on multiple downstream tasks, providing relative improvements of up to 12.2% in unsupervised code translation, and 5.3% in natural language code search. Incidentally, we found that our pre-trained model is able to deobfuscate fully obfuscated source files, and to suggest descriptive variable names.

Download the Paper

AUTHORS

Written by

Baptiste Rozière

Marie-Anne Lachaux

Marc Szafraniec

Guillaume Lample

Publisher

Neurips

Research Topics

Natural Language Processing (NLP)

Core Machine Learning

Related Publications

December 06, 2021

COMPUTER VISION

CORE MACHINE LEARNING

Debugging the Internals of Convolutional Networks

Bilal Alsallakh, Narine Kokhlikyan, Vivek Miglani, Shubham Muttepawar, Edward Wang (AI Infra), Sara Zhang, David Adkins, Orion Reblitz-Richardson

December 06, 2021

December 06, 2021

CORE MACHINE LEARNING

Revisiting Graph Neural Networks for Link Prediction

Yinglong Xia

December 06, 2021

December 06, 2021

NLP

Pay Better Attention to Attention: Head Selection in Multilingual and Multi-Domain Sequence Modeling

Hongyu Gong, Yun Tang, Juan Miguel Pino, Xian Li

December 06, 2021

December 06, 2021

INTEGRITY

CORE MACHINE LEARNING

BulletTrain: Accelerating Robust Neural Network Training via Boundary Example Mining

Weizhe Hua, Yichi Zhang, Chuan Guo, Zhiru Zhang, Edward Suh

December 06, 2021