OctConv: A flexible, efficient alternative to standard convolution

October 16, 2019

Written byMarcus Rohrbach, Zhicheng Yan, Haoqi Fan, Bing Xu

Written by

Marcus Rohrbach, Zhicheng Yan, Haoqi Fan, Bing Xu

Share

What it is:

Octave convolution (OctConv) is an easy-to-implement, efficient alternative to standard 2D or 3D convolution. OctConv can simply replace a standard convolution in neural networks without requiring any other network architecture adjustments. It can boost accuracy for image- and video-recognition tasks, while reducing the memory and computational footprint during both training and inference.

What it does:

Leveraging the fact that lower frequencies require less memory and computation, OctConv stores and processes feature maps at different frequencies one octave apart. While multifrequency convolutional architectures have been studied previously, OctConv communicates more efficiently between different frequencies without losing performance. Specifically, OctConv uses a separate path to process low-frequency feature maps, and it uses minimalistic operations to exchange information with the original high-frequency path.

In our evaluations, an OctConv-equipped ResNet-50 reduces GFLOPs by 40 percent and runs 1.5x faster in practice, while maintaining the same level of accuracy. An OctConv-equipped ResNet-152 model can achieve 82.9 percent top-1 accuracy on ImageNet while using only 22.2 GFLOPs. With video classification, we documented similar accuracy and efficiency results when benchmarking on Kinetics-400/600.

Why it matters:

OctConv provides a simple yet effective way to reduce computation and memory, allowing the use of larger, more powerful models under the same computational budget. This can potentially improve performance in these systems as well as in other tasks, such as object detection and image and video segmentation. Furthermore, more efficient image and video recognition is especially important for on-device computation on mobile and other chipsets with limited processing power.

We’ve evaluated OctConv for image and video classification at Facebook, and achieved gains in classification accuracy and model latency that are consistent with those on public benchmarks. Hardware library optimization for GPUs may further reduce computation and effective memory consumption and deliver additional performance improvements.

Get it on GitHub:

https://github.com/facebookresearch/OctConv

More details on OctConv are available in this paper. Those attending ICCV 2019 can also learn more at a poster session (poster #52) on Wednesday, October 30, at 10:30 a.m. local time.

Written by

Research Scientist

Research Scientist

Software Engineer

Bing Xu

Applied Research Scientist