Computer Vision

ML Applications

Improving Optical Flow on a Pyramid Level

August 22, 2020

Abstract

In this work we review the coarse-to-fine spatial feature pyramid concept, which is used in state-of-the-art optical flow estimation networks to make exploration of the pixel flow search space computationally tractable and efficient. Within an individual pyramid level, we improve the cost volume construction process by departing from a warping- to a sampling-based strategy, which avoids ghosting and hence enables us to better preserve fine flow details. We further amplify the positive effects through a level-specific, loss max-pooling strategy that adaptively shifts the focus of the learning process on under-performing predictions. Our second contribution revises the gradient flow across pyramid levels. The typical operations performed at each pyramid level can lead to noisy, or even contradicting gradients across levels. We show and discuss how properly blocking some of these gradient components leads to improved convergence and ultimately better performance. Finally, we introduce a distillation concept to counteract the issue of catastrophic forgetting during finetuning and thus preserving knowledge over models sequentially trained on multiple datasets. Our findings are conceptually simple and easy to implement, yet result in compelling improvements on relevant error measures that we demonstrate via exhaustive ablations on datasets like Flying Chairs2, Flying Things, Sintel and KITTI. We establish new state-of-the-art results on the challenging Sintel and KITTI 2012 test datasets, and even show the portability of our findings to different optical flow and depth from stereo approaches.

Download the Paper

AUTHORS

Written by

Markus Hofinger

Samuel Rota Bulò

Lorenzo Porzi

Arno Knapitsch

Thomas Pock

Peter Kontschieder

Publisher

European Conference on Computer Vision (ECCV)

Related Publications

November 10, 2022

Computer Vision

Learning State-Aware Visual Representations from Audible Interactions

Unnat Jain, Abhinav Gupta, Himangi Mittal, Pedro Morgado

November 10, 2022

November 06, 2022

Computer Vision

Neural Basis Models for Interpretability

Filip Radenovic, Abhimanyu Dubey, Dhruv Mahajan

November 06, 2022

October 25, 2022

Theseus: A Library for Differentiable Nonlinear Optimization

Mustafa Mukadam, Austin Wang, Brandon Amos, Daniel DeTone, Jing Dong, Joe Ortiz, Luis Pineda, Maurizio Monge, Ricky Chen, Shobha Venkataraman, Stuart Anderson, Taosha Fan, Paloma Sodhi

October 25, 2022

October 22, 2022

Computer Vision

Time-rEversed diffusioN tEnsor Transformer: A new TENET of Few-Shot Object Detection

Naila Murray, Lei Wang, Piotr Koniusz, Shan Zhang

October 22, 2022

April 30, 2018

Computer Vision

NAM – Unsupervised Cross-Domain Image Mapping without Cycles or GANs | Facebook AI Research

Yedid Hoshen, Lior Wolf

April 30, 2018

December 11, 2019

Speech & Audio

Computer Vision

Hyper-Graph-Network Decoders for Block Codes | Facebook AI Research

Eliya Nachmani, Lior Wolf

December 11, 2019

April 30, 2018

NLP

Speech & Audio

Identifying Analogies Across Domains | Facebook AI Research

Yedid Hoshen, Lior Wolf

April 30, 2018

November 01, 2018

NLP

Computer Vision

Non-Adversarial Unsupervised Word Translation | Facebook AI Research

Yedid Hoshen, Lior Wolf

November 01, 2018

Help Us Pioneer The Future of AI

We share our open source frameworks, tools, libraries, and models for everything from research exploration to large-scale production deployment.